Implement PCRE2_SUBSTITUTE_MATCHED.
This commit is contained in:
parent
777582d4de
commit
d170829b26
|
@ -26,6 +26,8 @@ now correctly backtracked, so this unnecessary restriction has been removed.
|
|||
|
||||
6. Avoid some VS compiler warnings.
|
||||
|
||||
7. Added PCRE2_SUBSTITUTE_MATCHED.
|
||||
|
||||
|
||||
Version 10.34 21-November-2019
|
||||
------------------------------
|
||||
|
|
|
@ -48,8 +48,8 @@ Its arguments are:
|
|||
<i>outlengthptr</i> Points to the length of the output buffer
|
||||
</pre>
|
||||
A match data block is needed only if you want to inspect the data from the
|
||||
match that is returned in that block. A match context is needed only if you
|
||||
want to:
|
||||
match that is returned in that block or if PCRE2_SUBSTITUTE_MATCHED is set. A
|
||||
match context is needed only if you want to:
|
||||
<pre>
|
||||
Set up a callout function
|
||||
Set a matching offset limit
|
||||
|
@ -75,16 +75,17 @@ zero-terminated strings. The options are:
|
|||
PCRE2_SUBSTITUTE_EXTENDED Do extended replacement processing
|
||||
PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
|
||||
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
|
||||
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
|
||||
</pre>
|
||||
PCRE2_SUBSTITUTE_LITERAL overrides PCRE2_SUBSTITUTE_EXTENDED,
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY.
|
||||
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED,
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored.
|
||||
</P>
|
||||
<P>
|
||||
The function returns the number of substitutions, which may be zero if there
|
||||
were no matches. The result can be greater than one only when
|
||||
are no matches. The result may be greater than one only when
|
||||
PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
|
||||
is returned.
|
||||
</P>
|
||||
|
|
|
@ -3302,12 +3302,19 @@ same number causes an error at compile time.
|
|||
<b> PCRE2_SIZE *<i>outlengthptr</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
This function calls <b>pcre2_match()</b> and then makes a copy of the subject
|
||||
string in <i>outputbuffer</i>, replacing one or more parts that were matched
|
||||
with the <i>replacement</i> string, whose length is supplied in <b>rlength</b>.
|
||||
This can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
|
||||
The default is to perform just one replacement, but there is an option that
|
||||
requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below for details).
|
||||
This function optionally calls <b>pcre2_match()</b> and then makes a copy of the
|
||||
subject string in <i>outputbuffer</i>, replacing parts that were matched with
|
||||
the <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This
|
||||
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
|
||||
is to perform just one replacement if the pattern matches, but there is an
|
||||
option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
|
||||
for details).
|
||||
</P>
|
||||
<P>
|
||||
If successful, <b>pcre2_substitute()</b> returns the number of substitutions
|
||||
that were carried out. This may be zero if no match was found, and is never
|
||||
greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
|
||||
returned if an error is detected (see below for details).
|
||||
</P>
|
||||
<P>
|
||||
Matches in which a \K item in a lookahead in the pattern causes the match to
|
||||
|
@ -3327,14 +3334,31 @@ allocate memory for the compiled code.
|
|||
<P>
|
||||
If an external <i>match_data</i> block is provided, its contents afterwards
|
||||
are those set by the final call to <b>pcre2_match()</b>. For global changes,
|
||||
this will have ended in a matching error. The contents of the ovector within
|
||||
this will have ended in a no-match error. The contents of the ovector within
|
||||
the match data block may or may not have been changed.
|
||||
</P>
|
||||
<P>
|
||||
The <i>outlengthptr</i> argument must point to a variable that contains the
|
||||
length, in code units, of the output buffer. If the function is successful, the
|
||||
value is updated to contain the length of the new string, excluding the
|
||||
trailing zero that is automatically added.
|
||||
As well as the usual options for <b>pcre2_match()</b>, a number of additional
|
||||
options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
|
||||
One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
|
||||
<i>match_data</i> block must be provided, and it must have been used for an
|
||||
external call to <b>pcre2_match()</b>. The data in the <i>match_data</i> block
|
||||
(return code, offset vector) is used for the first substitution instead of
|
||||
calling <b>pcre2_match()</b> from within <b>pcre2_substitute()</b>. This allows
|
||||
an application to check for a match before choosing to substitute, without
|
||||
having to repeat the match.
|
||||
</P>
|
||||
<P>
|
||||
The <i>code</i> argument is not used for the first substitution, but if
|
||||
PCRE2_SUBSTITUTE_GLOBAL is set, <b>pcre2_match()</b> will be called after the
|
||||
first substitution to check for further matches, and the contents of the
|
||||
<i>match_data</i> block will be changed.
|
||||
</P>
|
||||
<P>
|
||||
The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
|
||||
variable that contains the length, in code units, of the output buffer. If the
|
||||
function is successful, the value is updated to contain the length of the new
|
||||
string, excluding the trailing zero that is automatically added.
|
||||
</P>
|
||||
<P>
|
||||
If the function is not successful, the value set via <i>outlengthptr</i> depends
|
||||
|
@ -3353,7 +3377,7 @@ The replacement string, which is interpreted as a UTF string in UTF mode,
|
|||
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
|
||||
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
|
||||
default, however, a dollar character is an escape character that can specify
|
||||
the insertion of characters from capture groups or names from (*MARK) or other
|
||||
the insertion of characters from capture groups and names from (*MARK) or other
|
||||
control verbs in the pattern. The following forms are always recognized:
|
||||
<pre>
|
||||
$$ insert a dollar character
|
||||
|
@ -3378,16 +3402,6 @@ facility can be used to perform simple simultaneous substitutions, as this
|
|||
apple lemon
|
||||
2: pear orange
|
||||
</pre>
|
||||
As well as the usual options for <b>pcre2_match()</b>, a number of additional
|
||||
options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
|
||||
</P>
|
||||
<P>
|
||||
As mentioned above, PCRE2_SUBSTITUTE_LITERAL causes the replacement string to
|
||||
be treated as a literal, with no interpretation. If this option is set,
|
||||
PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
|
||||
</P>
|
||||
<P>
|
||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
|
||||
replacing every matching substring. If this option is not set, only the first
|
||||
matching substring is replaced. The search for matches takes place in the
|
||||
|
@ -3501,14 +3515,17 @@ substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
|
|||
groups in the extended syntax forms to be treated as unset.
|
||||
</P>
|
||||
<P>
|
||||
If successful, <b>pcre2_substitute()</b> returns the number of successful
|
||||
matches. This may be zero if no matches were found, and is never greater than 1
|
||||
unless PCRE2_SUBSTITUTE_GLOBAL is set.
|
||||
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and
|
||||
are ignored.
|
||||
</P>
|
||||
<br><b>
|
||||
Substitution errors
|
||||
</b><br>
|
||||
<P>
|
||||
In the event of an error, a negative error code is returned. Except for
|
||||
PCRE2_ERROR_NOMATCH (which is never returned), errors from <b>pcre2_match()</b>
|
||||
are passed straight back.
|
||||
In the event of an error, <b>pcre2_substitute()</b> returns a negative error
|
||||
code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from
|
||||
<b>pcre2_match()</b> are passed straight back.
|
||||
</P>
|
||||
<P>
|
||||
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
|
||||
|
@ -3526,6 +3543,10 @@ needed is returned via <i>outlengthptr</i>. Note that this does not happen by
|
|||
default.
|
||||
</P>
|
||||
<P>
|
||||
PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
|
||||
<i>match_data</i> argument is NULL.
|
||||
</P>
|
||||
<P>
|
||||
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
|
||||
replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
|
||||
(invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket
|
||||
|
@ -3876,7 +3897,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 26 December 2019
|
||||
Last updated: 27 December 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
224
doc/pcre2.txt
224
doc/pcre2.txt
|
@ -3193,97 +3193,110 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
|||
PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer,
|
||||
PCRE2_SIZE *outlengthptr);
|
||||
|
||||
This function calls pcre2_match() and then makes a copy of the subject
|
||||
string in outputbuffer, replacing one or more parts that were matched
|
||||
This function optionally calls pcre2_match() and then makes a copy of
|
||||
the subject string in outputbuffer, replacing parts that were matched
|
||||
with the replacement string, whose length is supplied in rlength. This
|
||||
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
|
||||
The default is to perform just one replacement, but there is an option
|
||||
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
|
||||
for details).
|
||||
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The
|
||||
default is to perform just one replacement if the pattern matches, but
|
||||
there is an option that requests multiple replacements (see PCRE2_SUB-
|
||||
STITUTE_GLOBAL below for details).
|
||||
|
||||
Matches in which a \K item in a lookahead in the pattern causes the
|
||||
match to end before it starts are not supported, and give rise to an
|
||||
If successful, pcre2_substitute() returns the number of substitutions
|
||||
that were carried out. This may be zero if no match was found, and is
|
||||
never greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A nega-
|
||||
tive value is returned if an error is detected (see below for details).
|
||||
|
||||
Matches in which a \K item in a lookahead in the pattern causes the
|
||||
match to end before it starts are not supported, and give rise to an
|
||||
error return. For global replacements, matches in which \K in a lookbe-
|
||||
hind causes the match to start earlier than the point that was reached
|
||||
hind causes the match to start earlier than the point that was reached
|
||||
in the previous iteration are also not supported.
|
||||
|
||||
The first seven arguments of pcre2_substitute() are the same as for
|
||||
The first seven arguments of pcre2_substitute() are the same as for
|
||||
pcre2_match(), except that the partial matching options are not permit-
|
||||
ted, and match_data may be passed as NULL, in which case a match data
|
||||
block is obtained and freed within this function, using memory manage-
|
||||
ment functions from the match context, if provided, or else those that
|
||||
ted, and match_data may be passed as NULL, in which case a match data
|
||||
block is obtained and freed within this function, using memory manage-
|
||||
ment functions from the match context, if provided, or else those that
|
||||
were used to allocate memory for the compiled code.
|
||||
|
||||
If an external match_data block is provided, its contents afterwards
|
||||
are those set by the final call to pcre2_match(). For global changes,
|
||||
this will have ended in a matching error. The contents of the ovector
|
||||
If an external match_data block is provided, its contents afterwards
|
||||
are those set by the final call to pcre2_match(). For global changes,
|
||||
this will have ended in a no-match error. The contents of the ovector
|
||||
within the match data block may or may not have been changed.
|
||||
|
||||
The outlengthptr argument must point to a variable that contains the
|
||||
length, in code units, of the output buffer. If the function is suc-
|
||||
cessful, the value is updated to contain the length of the new string,
|
||||
excluding the trailing zero that is automatically added.
|
||||
As well as the usual options for pcre2_match(), a number of additional
|
||||
options can be set in the options argument of pcre2_substitute(). One
|
||||
such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
|
||||
match_data block must be provided, and it must have been used for an
|
||||
external call to pcre2_match(). The data in the match_data block (re-
|
||||
turn code, offset vector) is used for the first substitution instead of
|
||||
calling pcre2_match() from within pcre2_substitute(). This allows an
|
||||
application to check for a match before choosing to substitute, without
|
||||
having to repeat the match.
|
||||
|
||||
If the function is not successful, the value set via outlengthptr de-
|
||||
pends on the type of error. For syntax errors in the replacement
|
||||
The code argument is not used for the first substitution, but if
|
||||
PCRE2_SUBSTITUTE_GLOBAL is set, pcre2_match() will be called after the
|
||||
first substitution to check for further matches, and the contents of
|
||||
the match_data block will be changed.
|
||||
|
||||
The outlengthptr argument of pcre2_substitute() must point to a vari-
|
||||
able that contains the length, in code units, of the output buffer. If
|
||||
the function is successful, the value is updated to contain the length
|
||||
of the new string, excluding the trailing zero that is automatically
|
||||
added.
|
||||
|
||||
If the function is not successful, the value set via outlengthptr de-
|
||||
pends on the type of error. For syntax errors in the replacement
|
||||
string, the value is the offset in the replacement string where the er-
|
||||
ror was detected. For other errors, the value is PCRE2_UNSET by de-
|
||||
ror was detected. For other errors, the value is PCRE2_UNSET by de-
|
||||
fault. This includes the case of the output buffer being too small, un-
|
||||
less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which case
|
||||
the value is the minimum length needed, including space for the trail-
|
||||
the value is the minimum length needed, including space for the trail-
|
||||
ing zero. Note that in order to compute the required length, pcre2_sub-
|
||||
stitute() has to simulate all the matching and copying, instead of giv-
|
||||
ing an error return as soon as the buffer overflows. Note also that the
|
||||
length is in code units, not bytes.
|
||||
|
||||
The replacement string, which is interpreted as a UTF string in UTF
|
||||
mode, is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option
|
||||
The replacement string, which is interpreted as a UTF string in UTF
|
||||
mode, is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option
|
||||
is set. If the PCRE2_SUBSTITUTE_LITERAL option is set, it is not inter-
|
||||
preted in any way. By default, however, a dollar character is an escape
|
||||
character that can specify the insertion of characters from capture
|
||||
groups or names from (*MARK) or other control verbs in the pattern. The
|
||||
following forms are always recognized:
|
||||
character that can specify the insertion of characters from capture
|
||||
groups and names from (*MARK) or other control verbs in the pattern.
|
||||
The following forms are always recognized:
|
||||
|
||||
$$ insert a dollar character
|
||||
$<n> or ${<n>} insert the contents of group <n>
|
||||
$*MARK or ${*MARK} insert a control verb name
|
||||
|
||||
Either a group number or a group name can be given for <n>. Curly
|
||||
brackets are required only if the following character would be inter-
|
||||
Either a group number or a group name can be given for <n>. Curly
|
||||
brackets are required only if the following character would be inter-
|
||||
preted as part of the number or name. The number may be zero to include
|
||||
the entire matched string. For example, if the pattern a(b)c is
|
||||
matched with "=abc=" and the replacement string "+$1$0$1+", the result
|
||||
the entire matched string. For example, if the pattern a(b)c is
|
||||
matched with "=abc=" and the replacement string "+$1$0$1+", the result
|
||||
is "=+babcb+=".
|
||||
|
||||
$*MARK inserts the name from the last encountered backtracking control
|
||||
verb on the matching path that has a name. (*MARK) must always include
|
||||
a name, but the other verbs need not. For example, in the case of
|
||||
$*MARK inserts the name from the last encountered backtracking control
|
||||
verb on the matching path that has a name. (*MARK) must always include
|
||||
a name, but the other verbs need not. For example, in the case of
|
||||
(*MARK:A)(*PRUNE) the name inserted is "A", but for (*MARK:A)(*PRUNE:B)
|
||||
the relevant name is "B". This facility can be used to perform simple
|
||||
the relevant name is "B". This facility can be used to perform simple
|
||||
simultaneous substitutions, as this pcre2test example shows:
|
||||
|
||||
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
||||
apple lemon
|
||||
2: pear orange
|
||||
|
||||
As well as the usual options for pcre2_match(), a number of additional
|
||||
options can be set in the options argument of pcre2_substitute().
|
||||
|
||||
As mentioned above, PCRE2_SUBSTITUTE_LITERAL causes the replacement
|
||||
string to be treated as a literal, with no interpretation. If this op-
|
||||
tion is set, PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
|
||||
and PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
|
||||
|
||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
|
||||
string, replacing every matching substring. If this option is not set,
|
||||
only the first matching substring is replaced. The search for matches
|
||||
takes place in the original subject string (that is, previous replace-
|
||||
ments do not affect it). Iteration is implemented by advancing the
|
||||
startoffset value for each search, which is always passed the entire
|
||||
string, replacing every matching substring. If this option is not set,
|
||||
only the first matching substring is replaced. The search for matches
|
||||
takes place in the original subject string (that is, previous replace-
|
||||
ments do not affect it). Iteration is implemented by advancing the
|
||||
startoffset value for each search, which is always passed the entire
|
||||
subject string. If an offset limit is set in the match context, search-
|
||||
ing stops when that limit is reached.
|
||||
|
||||
You can restrict the effect of a global substitution to a portion of
|
||||
You can restrict the effect of a global substitution to a portion of
|
||||
the subject string by setting either or both of startoffset and an off-
|
||||
set limit. Here is a pcre2test example:
|
||||
|
||||
|
@ -3291,87 +3304,87 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
|||
ABC ABC ABC ABC\=offset=3,offset_limit=12
|
||||
2: ABC A!C A!C ABC
|
||||
|
||||
When continuing with global substitutions after matching a substring
|
||||
When continuing with global substitutions after matching a substring
|
||||
with zero length, an attempt to find a non-empty match at the same off-
|
||||
set is performed. If this is not successful, the offset is advanced by
|
||||
one character except when CRLF is a valid newline sequence and the next
|
||||
two characters are CR, LF. In this case, the offset is advanced by two
|
||||
two characters are CR, LF. In this case, the offset is advanced by two
|
||||
characters.
|
||||
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
||||
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
|
||||
ORY immediately. If this option is set, however, pcre2_substitute()
|
||||
ORY immediately. If this option is set, however, pcre2_substitute()
|
||||
continues to go through the motions of matching and substituting (with-
|
||||
out, of course, writing anything) in order to compute the size of buf-
|
||||
fer that is needed. This value is passed back via the outlengthptr
|
||||
variable, with the result of the function still being PCRE2_ER-
|
||||
out, of course, writing anything) in order to compute the size of buf-
|
||||
fer that is needed. This value is passed back via the outlengthptr
|
||||
variable, with the result of the function still being PCRE2_ER-
|
||||
ROR_NOMEMORY.
|
||||
|
||||
Passing a buffer size of zero is a permitted way of finding out how
|
||||
much memory is needed for given substitution. However, this does mean
|
||||
Passing a buffer size of zero is a permitted way of finding out how
|
||||
much memory is needed for given substitution. However, this does mean
|
||||
that the entire operation is carried out twice. Depending on the appli-
|
||||
cation, it may be more efficient to allocate a large buffer and free
|
||||
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
||||
cation, it may be more efficient to allocate a large buffer and free
|
||||
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
||||
FLOW_LENGTH.
|
||||
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
|
||||
do not appear in the pattern to be treated as unset groups. This option
|
||||
should be used with care, because it means that a typo in a group name
|
||||
should be used with care, because it means that a typo in a group name
|
||||
or number no longer causes the PCRE2_ERROR_NOSUBSTRING error.
|
||||
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including un-
|
||||
known groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated
|
||||
as empty strings when inserted as described above. If this option is
|
||||
known groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated
|
||||
as empty strings when inserted as described above. If this option is
|
||||
not set, an attempt to insert an unset group causes the PCRE2_ERROR_UN-
|
||||
SET error. This option does not influence the extended substitution
|
||||
SET error. This option does not influence the extended substitution
|
||||
syntax described below.
|
||||
|
||||
PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
|
||||
replacement string. Without this option, only the dollar character is
|
||||
special, and only the group insertion forms listed above are valid.
|
||||
PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
|
||||
replacement string. Without this option, only the dollar character is
|
||||
special, and only the group insertion forms listed above are valid.
|
||||
When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
|
||||
|
||||
Firstly, backslash in a replacement string is interpreted as an escape
|
||||
Firstly, backslash in a replacement string is interpreted as an escape
|
||||
character. The usual forms such as \n or \x{ddd} can be used to specify
|
||||
particular character codes, and backslash followed by any non-alphanu-
|
||||
meric character quotes that character. Extended quoting can be coded
|
||||
particular character codes, and backslash followed by any non-alphanu-
|
||||
meric character quotes that character. Extended quoting can be coded
|
||||
using \Q...\E, exactly as in pattern strings.
|
||||
|
||||
There are also four escape sequences for forcing the case of inserted
|
||||
letters. The insertion mechanism has three states: no case forcing,
|
||||
There are also four escape sequences for forcing the case of inserted
|
||||
letters. The insertion mechanism has three states: no case forcing,
|
||||
force upper case, and force lower case. The escape sequences change the
|
||||
current state: \U and \L change to upper or lower case forcing, respec-
|
||||
tively, and \E (when not terminating a \Q quoted sequence) reverts to
|
||||
no case forcing. The sequences \u and \l force the next character (if
|
||||
it is a letter) to upper or lower case, respectively, and then the
|
||||
tively, and \E (when not terminating a \Q quoted sequence) reverts to
|
||||
no case forcing. The sequences \u and \l force the next character (if
|
||||
it is a letter) to upper or lower case, respectively, and then the
|
||||
state automatically reverts to no case forcing. Case forcing applies to
|
||||
all inserted characters, including those from capture groups and let-
|
||||
all inserted characters, including those from capture groups and let-
|
||||
ters within \Q...\E quoted sequences.
|
||||
|
||||
Note that case forcing sequences such as \U...\E do not nest. For exam-
|
||||
ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
|
||||
\E has no effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EX-
|
||||
ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
|
||||
\E has no effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EX-
|
||||
TRA_ALT_BSUX options do not apply to replacement strings.
|
||||
|
||||
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
||||
flexibility to capture group substitution. The syntax is similar to
|
||||
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
||||
flexibility to capture group substitution. The syntax is similar to
|
||||
that used by Bash:
|
||||
|
||||
${<n>:-<string>}
|
||||
${<n>:+<string1>:<string2>}
|
||||
|
||||
As before, <n> may be a group number or a name. The first form speci-
|
||||
fies a default value. If group <n> is set, its value is inserted; if
|
||||
not, <string> is expanded and the result inserted. The second form
|
||||
specifies strings that are expanded and inserted when group <n> is set
|
||||
or unset, respectively. The first form is just a convenient shorthand
|
||||
As before, <n> may be a group number or a name. The first form speci-
|
||||
fies a default value. If group <n> is set, its value is inserted; if
|
||||
not, <string> is expanded and the result inserted. The second form
|
||||
specifies strings that are expanded and inserted when group <n> is set
|
||||
or unset, respectively. The first form is just a convenient shorthand
|
||||
for
|
||||
|
||||
${<n>:+${<n>}:<string>}
|
||||
|
||||
Backslash can be used to escape colons and closing curly brackets in
|
||||
the replacement strings. A change of the case forcing state within a
|
||||
replacement string remains in force afterwards, as shown in this
|
||||
Backslash can be used to escape colons and closing curly brackets in
|
||||
the replacement strings. A change of the case forcing state within a
|
||||
replacement string remains in force afterwards, as shown in this
|
||||
pcre2test example:
|
||||
|
||||
/(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
|
||||
|
@ -3380,31 +3393,36 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
|||
somebody
|
||||
1: HELLO
|
||||
|
||||
The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
||||
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause un-
|
||||
The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
||||
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause un-
|
||||
known groups in the extended syntax forms to be treated as unset.
|
||||
|
||||
If successful, pcre2_substitute() returns the number of successful
|
||||
matches. This may be zero if no matches were found, and is never
|
||||
greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
|
||||
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrele-
|
||||
vant and are ignored.
|
||||
|
||||
In the event of an error, a negative error code is returned. Except for
|
||||
PCRE2_ERROR_NOMATCH (which is never returned), errors from
|
||||
pcre2_match() are passed straight back.
|
||||
Substitution errors
|
||||
|
||||
In the event of an error, pcre2_substitute() returns a negative error
|
||||
code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors
|
||||
from pcre2_match() are passed straight back.
|
||||
|
||||
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
|
||||
tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
|
||||
|
||||
PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
|
||||
ing an unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
|
||||
when the simple (non-extended) syntax is used and PCRE2_SUBSTITUTE_UN-
|
||||
ing an unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
|
||||
when the simple (non-extended) syntax is used and PCRE2_SUBSTITUTE_UN-
|
||||
SET_EMPTY is not set.
|
||||
|
||||
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big
|
||||
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big
|
||||
enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
|
||||
of buffer that is needed is returned via outlengthptr. Note that this
|
||||
of buffer that is needed is returned via outlengthptr. Note that this
|
||||
does not happen by default.
|
||||
|
||||
PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
|
||||
match_data argument is NULL.
|
||||
|
||||
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in
|
||||
the replacement string, with more particular errors being PCRE2_ER-
|
||||
ROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE
|
||||
|
@ -3727,7 +3745,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 26 December 2019
|
||||
Last updated: 27 December 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_SUBSTITUTE 3 "26 December 2019" "PCRE2 10.35"
|
||||
.TH PCRE2_SUBSTITUTE 3 "27 December 2019" "PCRE2 10.35"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -36,8 +36,8 @@ Its arguments are:
|
|||
\fIoutlengthptr\fP Points to the length of the output buffer
|
||||
.sp
|
||||
A match data block is needed only if you want to inspect the data from the
|
||||
match that is returned in that block. A match context is needed only if you
|
||||
want to:
|
||||
match that is returned in that block or if PCRE2_SUBSTITUTE_MATCHED is set. A
|
||||
match context is needed only if you want to:
|
||||
.sp
|
||||
Set up a callout function
|
||||
Set a matching offset limit
|
||||
|
@ -67,15 +67,16 @@ zero-terminated strings. The options are:
|
|||
PCRE2_SUBSTITUTE_EXTENDED Do extended replacement processing
|
||||
PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
|
||||
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
|
||||
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
|
||||
.sp
|
||||
PCRE2_SUBSTITUTE_LITERAL overrides PCRE2_SUBSTITUTE_EXTENDED,
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY.
|
||||
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED,
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored.
|
||||
.P
|
||||
The function returns the number of substitutions, which may be zero if there
|
||||
were no matches. The result can be greater than one only when
|
||||
are no matches. The result may be greater than one only when
|
||||
PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
|
||||
is returned.
|
||||
.P
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "26 December 2019" "PCRE2 10.35"
|
||||
.TH PCRE2API 3 "27 December 2019" "PCRE2 10.35"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -3321,12 +3321,18 @@ same number causes an error at compile time.
|
|||
.B " PCRE2_SIZE *\fIoutlengthptr\fP);"
|
||||
.fi
|
||||
.P
|
||||
This function calls \fBpcre2_match()\fP and then makes a copy of the subject
|
||||
string in \fIoutputbuffer\fP, replacing one or more parts that were matched
|
||||
with the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP.
|
||||
This can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
|
||||
The default is to perform just one replacement, but there is an option that
|
||||
requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below for details).
|
||||
This function optionally calls \fBpcre2_match()\fP and then makes a copy of the
|
||||
subject string in \fIoutputbuffer\fP, replacing parts that were matched with
|
||||
the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This
|
||||
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
|
||||
is to perform just one replacement if the pattern matches, but there is an
|
||||
option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
|
||||
for details).
|
||||
.P
|
||||
If successful, \fBpcre2_substitute()\fP returns the number of substitutions
|
||||
that were carried out. This may be zero if no match was found, and is never
|
||||
greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
|
||||
returned if an error is detected (see below for details).
|
||||
.P
|
||||
Matches in which a \eK item in a lookahead in the pattern causes the match to
|
||||
end before it starts are not supported, and give rise to an error return. For
|
||||
|
@ -3343,13 +3349,28 @@ allocate memory for the compiled code.
|
|||
.P
|
||||
If an external \fImatch_data\fP block is provided, its contents afterwards
|
||||
are those set by the final call to \fBpcre2_match()\fP. For global changes,
|
||||
this will have ended in a matching error. The contents of the ovector within
|
||||
this will have ended in a no-match error. The contents of the ovector within
|
||||
the match data block may or may not have been changed.
|
||||
.P
|
||||
The \fIoutlengthptr\fP argument must point to a variable that contains the
|
||||
length, in code units, of the output buffer. If the function is successful, the
|
||||
value is updated to contain the length of the new string, excluding the
|
||||
trailing zero that is automatically added.
|
||||
As well as the usual options for \fBpcre2_match()\fP, a number of additional
|
||||
options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP.
|
||||
One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
|
||||
\fImatch_data\fP block must be provided, and it must have been used for an
|
||||
external call to \fBpcre2_match()\fP. The data in the \fImatch_data\fP block
|
||||
(return code, offset vector) is used for the first substitution instead of
|
||||
calling \fBpcre2_match()\fP from within \fBpcre2_substitute()\fP. This allows
|
||||
an application to check for a match before choosing to substitute, without
|
||||
having to repeat the match.
|
||||
.P
|
||||
The \fIcode\fP argument is not used for the first substitution, but if
|
||||
PCRE2_SUBSTITUTE_GLOBAL is set, \fBpcre2_match()\fP will be called after the
|
||||
first substitution to check for further matches, and the contents of the
|
||||
\fImatch_data\fP block will be changed.
|
||||
.P
|
||||
The \fIoutlengthptr\fP argument of \fBpcre2_substitute()\fP must point to a
|
||||
variable that contains the length, in code units, of the output buffer. If the
|
||||
function is successful, the value is updated to contain the length of the new
|
||||
string, excluding the trailing zero that is automatically added.
|
||||
.P
|
||||
If the function is not successful, the value set via \fIoutlengthptr\fP depends
|
||||
on the type of error. For syntax errors in the replacement string, the value is
|
||||
|
@ -3366,7 +3387,7 @@ The replacement string, which is interpreted as a UTF string in UTF mode,
|
|||
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
|
||||
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
|
||||
default, however, a dollar character is an escape character that can specify
|
||||
the insertion of characters from capture groups or names from (*MARK) or other
|
||||
the insertion of characters from capture groups and names from (*MARK) or other
|
||||
control verbs in the pattern. The following forms are always recognized:
|
||||
.sp
|
||||
$$ insert a dollar character
|
||||
|
@ -3390,14 +3411,6 @@ facility can be used to perform simple simultaneous substitutions, as this
|
|||
apple lemon
|
||||
2: pear orange
|
||||
.sp
|
||||
As well as the usual options for \fBpcre2_match()\fP, a number of additional
|
||||
options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP.
|
||||
.P
|
||||
As mentioned above, PCRE2_SUBSTITUTE_LITERAL causes the replacement string to
|
||||
be treated as a literal, with no interpretation. If this option is set,
|
||||
PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
|
||||
replacing every matching substring. If this option is not set, only the first
|
||||
matching substring is replaced. The search for matches takes place in the
|
||||
|
@ -3500,13 +3513,17 @@ The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
|||
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
|
||||
groups in the extended syntax forms to be treated as unset.
|
||||
.P
|
||||
If successful, \fBpcre2_substitute()\fP returns the number of successful
|
||||
matches. This may be zero if no matches were found, and is never greater than 1
|
||||
unless PCRE2_SUBSTITUTE_GLOBAL is set.
|
||||
.P
|
||||
In the event of an error, a negative error code is returned. Except for
|
||||
PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
|
||||
are passed straight back.
|
||||
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and
|
||||
are ignored.
|
||||
.
|
||||
.
|
||||
.SS "Substitution errors"
|
||||
.rs
|
||||
.sp
|
||||
In the event of an error, \fBpcre2_substitute()\fP returns a negative error
|
||||
code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from
|
||||
\fBpcre2_match()\fP are passed straight back.
|
||||
.P
|
||||
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
|
||||
unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
|
||||
|
@ -3520,6 +3537,9 @@ PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size of buffer that is
|
|||
needed is returned via \fIoutlengthptr\fP. Note that this does not happen by
|
||||
default.
|
||||
.P
|
||||
PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
|
||||
\fImatch_data\fP argument is NULL.
|
||||
.P
|
||||
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
|
||||
replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
|
||||
(invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket
|
||||
|
@ -3884,6 +3904,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 26 December 2019
|
||||
Last updated: 27 December 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -182,6 +182,7 @@ pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */
|
|||
#define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */
|
||||
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
|
||||
#define PCRE2_SUBSTITUTE_LITERAL 0x00008000u /* pcre2_substitute() only */
|
||||
#define PCRE2_SUBSTITUTE_MATCHED 0x00010000u /* pcre2_substitute() only */
|
||||
|
||||
/* Options for pcre2_pattern_convert(). */
|
||||
|
||||
|
|
|
@ -49,8 +49,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
|||
|
||||
#define SUBSTITUTE_OPTIONS \
|
||||
(PCRE2_SUBSTITUTE_EXTENDED|PCRE2_SUBSTITUTE_GLOBAL| \
|
||||
PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_OVERFLOW_LENGTH| \
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET|PCRE2_SUBSTITUTE_UNSET_EMPTY)
|
||||
PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_MATCHED| \
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_UNKNOWN_UNSET| \
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY)
|
||||
|
||||
|
||||
|
||||
|
@ -229,6 +230,7 @@ uint32_t suboptions;
|
|||
BOOL match_data_created = FALSE;
|
||||
BOOL escaped_literal = FALSE;
|
||||
BOOL overflowed = FALSE;
|
||||
BOOL use_existing_match;
|
||||
#ifdef SUPPORT_UNICODE
|
||||
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
|
||||
#endif
|
||||
|
@ -248,15 +250,25 @@ lengthleft = buff_length = *blength;
|
|||
*blength = PCRE2_UNSET;
|
||||
ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET;
|
||||
|
||||
/* Partial matching is not valid. This must come after setting *blength to
|
||||
/* Partial matching is not valid. This must come after setting *blength to
|
||||
PCRE2_UNSET, so as not to imply an offset in the replacement. */
|
||||
|
||||
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
|
||||
return PCRE2_ERROR_BADOPTION;
|
||||
|
||||
/* If no match data block is provided, create one. */
|
||||
/* Check for using a match that has already happened. Note that the subject
|
||||
pointer in the match data may be NULL after a no-match. */
|
||||
|
||||
if (match_data == NULL)
|
||||
use_existing_match = ((options & PCRE2_SUBSTITUTE_MATCHED) != 0);
|
||||
|
||||
if (use_existing_match)
|
||||
{
|
||||
if (match_data == NULL) return PCRE2_ERROR_NULL;
|
||||
}
|
||||
|
||||
/* Otherwise, if no match data block is provided, create one. */
|
||||
|
||||
else if (match_data == NULL)
|
||||
{
|
||||
pcre2_general_context *gcontext = (mcontext == NULL)?
|
||||
(pcre2_general_context *)code :
|
||||
|
@ -310,7 +322,8 @@ if (start_offset > length)
|
|||
}
|
||||
CHECKMEMCPY(subject, start_offset);
|
||||
|
||||
/* Loop for global substituting. */
|
||||
/* Loop for global substituting. If PCRE2_SUBSTITUTE_MATCHED is set, the first
|
||||
match is taken from the match_data that was passed in. */
|
||||
|
||||
subs = 0;
|
||||
do
|
||||
|
@ -318,8 +331,13 @@ do
|
|||
PCRE2_SPTR ptrstack[PTR_STACK_SIZE];
|
||||
uint32_t ptrstackptr = 0;
|
||||
|
||||
rc = pcre2_match(code, subject, length, start_offset, options|goptions,
|
||||
match_data, mcontext);
|
||||
if (use_existing_match)
|
||||
{
|
||||
rc = match_data->rc;
|
||||
use_existing_match = FALSE;
|
||||
}
|
||||
else rc = pcre2_match(code, subject, length, start_offset, options|goptions,
|
||||
match_data, mcontext);
|
||||
|
||||
#ifdef SUPPORT_UNICODE
|
||||
if (utf) options |= PCRE2_NO_UTF_CHECK; /* Only need to check once */
|
||||
|
@ -375,33 +393,33 @@ do
|
|||
|
||||
/* Handle a successful match. Matches that use \K to end before they start
|
||||
or start before the current point in the subject are not supported. */
|
||||
|
||||
|
||||
if (ovector[1] < ovector[0] || ovector[0] < start_offset)
|
||||
{
|
||||
rc = PCRE2_ERROR_BADSUBSPATTERN;
|
||||
goto EXIT;
|
||||
}
|
||||
|
||||
/* Check for the same match as previous. This is legitimate after matching an
|
||||
|
||||
/* Check for the same match as previous. This is legitimate after matching an
|
||||
empty string that starts after the initial match offset. We have tried again
|
||||
at the match point in case the pattern is one like /(?<=\G.)/ which can never
|
||||
match at its starting point, so running the match achieves the bumpalong. If
|
||||
we do get the same (null) match at the original match point, it isn't such a
|
||||
pattern, so we now do the empty string magic. In all other cases, a repeat
|
||||
match should never occur. */
|
||||
|
||||
|
||||
if (ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1])
|
||||
{
|
||||
if (ovector[0] == ovector[1] && ovecsave[2] != start_offset)
|
||||
{
|
||||
goptions = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
|
||||
ovecsave[2] = start_offset;
|
||||
continue; /* Back to the top of the loop */
|
||||
{
|
||||
if (ovector[0] == ovector[1] && ovecsave[2] != start_offset)
|
||||
{
|
||||
goptions = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
|
||||
ovecsave[2] = start_offset;
|
||||
continue; /* Back to the top of the loop */
|
||||
}
|
||||
rc = PCRE2_ERROR_INTERNAL_DUPMATCH;
|
||||
goto EXIT;
|
||||
}
|
||||
|
||||
goto EXIT;
|
||||
}
|
||||
|
||||
/* Count substitutions with a paranoid check for integer overflow; surely no
|
||||
real call to this function would ever hit this! */
|
||||
|
||||
|
@ -421,20 +439,20 @@ do
|
|||
scb.output_offsets[0] = buff_offset;
|
||||
scb.oveccount = rc;
|
||||
|
||||
/* Process the replacement string. If the entire replacement is literal, just
|
||||
/* Process the replacement string. If the entire replacement is literal, just
|
||||
copy it with length check. */
|
||||
|
||||
|
||||
ptr = replacement;
|
||||
if ((suboptions & PCRE2_SUBSTITUTE_LITERAL) != 0)
|
||||
{
|
||||
CHECKMEMCPY(ptr, rlength);
|
||||
CHECKMEMCPY(ptr, rlength);
|
||||
}
|
||||
|
||||
/* Within a non-literal replacement, which must be scanned character by
|
||||
/* Within a non-literal replacement, which must be scanned character by
|
||||
character, local literal mode can be set by \Q, but only in extended mode
|
||||
when backslashes are being interpreted. In extended mode we must handle
|
||||
nested substrings that are to be reprocessed. */
|
||||
|
||||
|
||||
else for (;;)
|
||||
{
|
||||
uint32_t ch;
|
||||
|
@ -844,42 +862,42 @@ do
|
|||
} /* End handling a literal code unit */
|
||||
} /* End of loop for scanning the replacement. */
|
||||
|
||||
/* The replacement has been copied to the output, or its size has been
|
||||
remembered. Do the callout if there is one and we have done an actual
|
||||
/* The replacement has been copied to the output, or its size has been
|
||||
remembered. Do the callout if there is one and we have done an actual
|
||||
replacement. */
|
||||
|
||||
|
||||
if (!overflowed && mcontext != NULL && mcontext->substitute_callout != NULL)
|
||||
{
|
||||
scb.subscount = subs;
|
||||
scb.subscount = subs;
|
||||
scb.output_offsets[1] = buff_offset;
|
||||
rc = mcontext->substitute_callout(&scb, mcontext->substitute_callout_data);
|
||||
rc = mcontext->substitute_callout(&scb, mcontext->substitute_callout_data);
|
||||
|
||||
/* A non-zero return means cancel this substitution. Instead, copy the
|
||||
/* A non-zero return means cancel this substitution. Instead, copy the
|
||||
matched string fragment. */
|
||||
|
||||
if (rc != 0)
|
||||
{
|
||||
PCRE2_SIZE newlength = scb.output_offsets[1] - scb.output_offsets[0];
|
||||
PCRE2_SIZE oldlength = ovector[1] - ovector[0];
|
||||
|
||||
|
||||
buff_offset -= newlength;
|
||||
lengthleft += newlength;
|
||||
CHECKMEMCPY(subject + ovector[0], oldlength);
|
||||
|
||||
CHECKMEMCPY(subject + ovector[0], oldlength);
|
||||
|
||||
/* A negative return means do not do any more. */
|
||||
|
||||
|
||||
if (rc < 0) suboptions &= (~PCRE2_SUBSTITUTE_GLOBAL);
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
/* Save the details of this match. See above for how this data is used. If we
|
||||
matched an empty string, do the magic for global matches. Finally, update the
|
||||
start offset to point to the rest of the subject string. */
|
||||
|
||||
ovecsave[0] = ovector[0];
|
||||
ovecsave[1] = ovector[1];
|
||||
|
||||
ovecsave[0] = ovector[0];
|
||||
ovecsave[1] = ovector[1];
|
||||
ovecsave[2] = start_offset;
|
||||
|
||||
|
||||
goptions = (ovector[0] != ovector[1] || ovector[0] > start_offset)? 0 :
|
||||
PCRE2_ANCHORED|PCRE2_NOTEMPTY_ATSTART;
|
||||
start_offset = ovector[1];
|
||||
|
|
|
@ -503,13 +503,14 @@ so many of them that they are split into two fields. */
|
|||
#define CTL2_SUBSTITUTE_CALLOUT 0x00000001u
|
||||
#define CTL2_SUBSTITUTE_EXTENDED 0x00000002u
|
||||
#define CTL2_SUBSTITUTE_LITERAL 0x00000004u
|
||||
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000008u
|
||||
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000010u
|
||||
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000020u
|
||||
#define CTL2_SUBJECT_LITERAL 0x00000040u
|
||||
#define CTL2_CALLOUT_NO_WHERE 0x00000080u
|
||||
#define CTL2_CALLOUT_EXTRA 0x00000100u
|
||||
#define CTL2_ALLVECTOR 0x00000200u
|
||||
#define CTL2_SUBSTITUTE_MATCHED 0x00000008u
|
||||
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000010u
|
||||
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000020u
|
||||
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000040u
|
||||
#define CTL2_SUBJECT_LITERAL 0x00000080u
|
||||
#define CTL2_CALLOUT_NO_WHERE 0x00000100u
|
||||
#define CTL2_CALLOUT_EXTRA 0x00000200u
|
||||
#define CTL2_ALLVECTOR 0x00000400u
|
||||
|
||||
#define CTL2_NL_SET 0x40000000u /* Informational */
|
||||
#define CTL2_BSR_SET 0x80000000u /* Informational */
|
||||
|
@ -532,6 +533,7 @@ different things in the two cases. */
|
|||
#define CTL2_ALLPD (CTL2_SUBSTITUTE_CALLOUT|\
|
||||
CTL2_SUBSTITUTE_EXTENDED|\
|
||||
CTL2_SUBSTITUTE_LITERAL|\
|
||||
CTL2_SUBSTITUTE_MATCHED|\
|
||||
CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\
|
||||
CTL2_SUBSTITUTE_UNKNOWN_UNSET|\
|
||||
CTL2_SUBSTITUTE_UNSET_EMPTY|\
|
||||
|
@ -721,6 +723,7 @@ static modstruct modlist[] = {
|
|||
{ "substitute_callout", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_CALLOUT, PO(control2) },
|
||||
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
|
||||
{ "substitute_literal", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_LITERAL, PO(control2) },
|
||||
{ "substitute_matched", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_MATCHED, PO(control2) },
|
||||
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
|
||||
{ "substitute_skip", MOD_PND, MOD_INT, 0, PO(substitute_skip) },
|
||||
{ "substitute_stop", MOD_PND, MOD_INT, 0, PO(substitute_stop) },
|
||||
|
@ -4088,7 +4091,7 @@ Returns: nothing
|
|||
static void
|
||||
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
||||
{
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
before,
|
||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||
|
@ -4127,6 +4130,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
|
|||
((controls2 & CTL2_SUBSTITUTE_CALLOUT) != 0)? " substitute_callout" : "",
|
||||
((controls2 & CTL2_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
|
||||
((controls2 & CTL2_SUBSTITUTE_LITERAL) != 0)? " substitute_literal" : "",
|
||||
((controls2 & CTL2_SUBSTITUTE_MATCHED) != 0)? " substitute_matched" : "",
|
||||
((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
|
||||
((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "",
|
||||
((controls2 & CTL2_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
|
||||
|
@ -7232,6 +7236,7 @@ if (dat_datctl.replacement[0] != 0)
|
|||
uint8_t rbuffer[REPLACE_BUFFSIZE];
|
||||
uint8_t nbuffer[REPLACE_BUFFSIZE];
|
||||
uint32_t xoptions;
|
||||
uint32_t emoption; /* External match option */
|
||||
PCRE2_SIZE j, rlen, nsize, erroroffset;
|
||||
BOOL badutf = FALSE;
|
||||
|
||||
|
@ -7252,11 +7257,25 @@ if (dat_datctl.replacement[0] != 0)
|
|||
|
||||
if (timeitm)
|
||||
fprintf(outfile, "** Timing is not supported with replace: ignored\n");
|
||||
|
||||
|
||||
if ((dat_datctl.control & CTL_ALTGLOBAL) != 0)
|
||||
fprintf(outfile, "** Altglobal is not supported with replace: ignored\n");
|
||||
|
||||
xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
|
||||
/* Check for a test that does substitution after an initial external match.
|
||||
If this is set, we run the external match, but leave the interpretation of
|
||||
its output to pcre2_substitute(). */
|
||||
|
||||
emoption = ((dat_datctl.control2 & CTL2_SUBSTITUTE_MATCHED) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_MATCHED;
|
||||
|
||||
if (emoption != 0)
|
||||
{
|
||||
PCRE2_MATCH(rc, compiled_code, pp, arg_ulen, dat_datctl.offset,
|
||||
dat_datctl.options, match_data, use_dat_context);
|
||||
}
|
||||
|
||||
xoptions = emoption |
|
||||
(((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_GLOBAL) |
|
||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_EXTENDED) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_EXTENDED) |
|
||||
|
@ -7268,7 +7287,7 @@ if (dat_datctl.replacement[0] != 0)
|
|||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET) |
|
||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY);
|
||||
|
||||
|
||||
SETCASTPTR(r, rbuffer); /* Sets r8, r16, or r32, as appropriate. */
|
||||
pr = dat_datctl.replacement;
|
||||
|
||||
|
|
|
@ -4640,6 +4640,13 @@ B)x/alt_verbnames,mark
|
|||
|
||||
/(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1
|
||||
aaBB
|
||||
|
||||
/abcd/replace=wxyz,substitute_matched
|
||||
abcd
|
||||
pqrs
|
||||
|
||||
/abcd/g
|
||||
>abcd1234abcd5678<\=replace=wxyz,substitute_matched
|
||||
|
||||
/^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I
|
||||
|
||||
|
|
|
@ -14859,6 +14859,16 @@ Failed: error -55 at offset 3 in replacement: requested value is not set
|
|||
/(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1
|
||||
aaBB
|
||||
1: AAbbaa..AAbBaa
|
||||
|
||||
/abcd/replace=wxyz,substitute_matched
|
||||
abcd
|
||||
1: wxyz
|
||||
pqrs
|
||||
0: pqrs
|
||||
|
||||
/abcd/g
|
||||
>abcd1234abcd5678<\=replace=wxyz,substitute_matched
|
||||
2: >wxyz1234wxyz5678<
|
||||
|
||||
/^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I
|
||||
Capture group count = 2
|
||||
|
|
Loading…
Reference in New Issue