Implement PCRE2_SUBSTITUTE_MATCHED.
This commit is contained in:
parent
777582d4de
commit
d170829b26
|
@ -26,6 +26,8 @@ now correctly backtracked, so this unnecessary restriction has been removed.
|
||||||
|
|
||||||
6. Avoid some VS compiler warnings.
|
6. Avoid some VS compiler warnings.
|
||||||
|
|
||||||
|
7. Added PCRE2_SUBSTITUTE_MATCHED.
|
||||||
|
|
||||||
|
|
||||||
Version 10.34 21-November-2019
|
Version 10.34 21-November-2019
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
|
@ -48,8 +48,8 @@ Its arguments are:
|
||||||
<i>outlengthptr</i> Points to the length of the output buffer
|
<i>outlengthptr</i> Points to the length of the output buffer
|
||||||
</pre>
|
</pre>
|
||||||
A match data block is needed only if you want to inspect the data from the
|
A match data block is needed only if you want to inspect the data from the
|
||||||
match that is returned in that block. A match context is needed only if you
|
match that is returned in that block or if PCRE2_SUBSTITUTE_MATCHED is set. A
|
||||||
want to:
|
match context is needed only if you want to:
|
||||||
<pre>
|
<pre>
|
||||||
Set up a callout function
|
Set up a callout function
|
||||||
Set a matching offset limit
|
Set a matching offset limit
|
||||||
|
@ -75,16 +75,17 @@ zero-terminated strings. The options are:
|
||||||
PCRE2_SUBSTITUTE_EXTENDED Do extended replacement processing
|
PCRE2_SUBSTITUTE_EXTENDED Do extended replacement processing
|
||||||
PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
|
PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
|
||||||
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
|
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
|
||||||
|
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
|
||||||
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
|
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
|
||||||
</pre>
|
</pre>
|
||||||
PCRE2_SUBSTITUTE_LITERAL overrides PCRE2_SUBSTITUTE_EXTENDED,
|
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED,
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY.
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The function returns the number of substitutions, which may be zero if there
|
The function returns the number of substitutions, which may be zero if there
|
||||||
were no matches. The result can be greater than one only when
|
are no matches. The result may be greater than one only when
|
||||||
PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
|
PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
|
||||||
is returned.
|
is returned.
|
||||||
</P>
|
</P>
|
||||||
|
|
|
@ -3302,12 +3302,19 @@ same number causes an error at compile time.
|
||||||
<b> PCRE2_SIZE *<i>outlengthptr</i>);</b>
|
<b> PCRE2_SIZE *<i>outlengthptr</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
This function calls <b>pcre2_match()</b> and then makes a copy of the subject
|
This function optionally calls <b>pcre2_match()</b> and then makes a copy of the
|
||||||
string in <i>outputbuffer</i>, replacing one or more parts that were matched
|
subject string in <i>outputbuffer</i>, replacing parts that were matched with
|
||||||
with the <i>replacement</i> string, whose length is supplied in <b>rlength</b>.
|
the <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This
|
||||||
This can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
|
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
|
||||||
The default is to perform just one replacement, but there is an option that
|
is to perform just one replacement if the pattern matches, but there is an
|
||||||
requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below for details).
|
option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
|
||||||
|
for details).
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If successful, <b>pcre2_substitute()</b> returns the number of substitutions
|
||||||
|
that were carried out. This may be zero if no match was found, and is never
|
||||||
|
greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
|
||||||
|
returned if an error is detected (see below for details).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Matches in which a \K item in a lookahead in the pattern causes the match to
|
Matches in which a \K item in a lookahead in the pattern causes the match to
|
||||||
|
@ -3327,14 +3334,31 @@ allocate memory for the compiled code.
|
||||||
<P>
|
<P>
|
||||||
If an external <i>match_data</i> block is provided, its contents afterwards
|
If an external <i>match_data</i> block is provided, its contents afterwards
|
||||||
are those set by the final call to <b>pcre2_match()</b>. For global changes,
|
are those set by the final call to <b>pcre2_match()</b>. For global changes,
|
||||||
this will have ended in a matching error. The contents of the ovector within
|
this will have ended in a no-match error. The contents of the ovector within
|
||||||
the match data block may or may not have been changed.
|
the match data block may or may not have been changed.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <i>outlengthptr</i> argument must point to a variable that contains the
|
As well as the usual options for <b>pcre2_match()</b>, a number of additional
|
||||||
length, in code units, of the output buffer. If the function is successful, the
|
options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
|
||||||
value is updated to contain the length of the new string, excluding the
|
One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
|
||||||
trailing zero that is automatically added.
|
<i>match_data</i> block must be provided, and it must have been used for an
|
||||||
|
external call to <b>pcre2_match()</b>. The data in the <i>match_data</i> block
|
||||||
|
(return code, offset vector) is used for the first substitution instead of
|
||||||
|
calling <b>pcre2_match()</b> from within <b>pcre2_substitute()</b>. This allows
|
||||||
|
an application to check for a match before choosing to substitute, without
|
||||||
|
having to repeat the match.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The <i>code</i> argument is not used for the first substitution, but if
|
||||||
|
PCRE2_SUBSTITUTE_GLOBAL is set, <b>pcre2_match()</b> will be called after the
|
||||||
|
first substitution to check for further matches, and the contents of the
|
||||||
|
<i>match_data</i> block will be changed.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
|
||||||
|
variable that contains the length, in code units, of the output buffer. If the
|
||||||
|
function is successful, the value is updated to contain the length of the new
|
||||||
|
string, excluding the trailing zero that is automatically added.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If the function is not successful, the value set via <i>outlengthptr</i> depends
|
If the function is not successful, the value set via <i>outlengthptr</i> depends
|
||||||
|
@ -3353,7 +3377,7 @@ The replacement string, which is interpreted as a UTF string in UTF mode,
|
||||||
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
|
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
|
||||||
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
|
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
|
||||||
default, however, a dollar character is an escape character that can specify
|
default, however, a dollar character is an escape character that can specify
|
||||||
the insertion of characters from capture groups or names from (*MARK) or other
|
the insertion of characters from capture groups and names from (*MARK) or other
|
||||||
control verbs in the pattern. The following forms are always recognized:
|
control verbs in the pattern. The following forms are always recognized:
|
||||||
<pre>
|
<pre>
|
||||||
$$ insert a dollar character
|
$$ insert a dollar character
|
||||||
|
@ -3378,16 +3402,6 @@ facility can be used to perform simple simultaneous substitutions, as this
|
||||||
apple lemon
|
apple lemon
|
||||||
2: pear orange
|
2: pear orange
|
||||||
</pre>
|
</pre>
|
||||||
As well as the usual options for <b>pcre2_match()</b>, a number of additional
|
|
||||||
options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
As mentioned above, PCRE2_SUBSTITUTE_LITERAL causes the replacement string to
|
|
||||||
be treated as a literal, with no interpretation. If this option is set,
|
|
||||||
PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and
|
|
||||||
PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
|
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
|
||||||
replacing every matching substring. If this option is not set, only the first
|
replacing every matching substring. If this option is not set, only the first
|
||||||
matching substring is replaced. The search for matches takes place in the
|
matching substring is replaced. The search for matches takes place in the
|
||||||
|
@ -3501,14 +3515,17 @@ substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
|
||||||
groups in the extended syntax forms to be treated as unset.
|
groups in the extended syntax forms to be treated as unset.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If successful, <b>pcre2_substitute()</b> returns the number of successful
|
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
|
||||||
matches. This may be zero if no matches were found, and is never greater than 1
|
PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and
|
||||||
unless PCRE2_SUBSTITUTE_GLOBAL is set.
|
are ignored.
|
||||||
</P>
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Substitution errors
|
||||||
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
In the event of an error, a negative error code is returned. Except for
|
In the event of an error, <b>pcre2_substitute()</b> returns a negative error
|
||||||
PCRE2_ERROR_NOMATCH (which is never returned), errors from <b>pcre2_match()</b>
|
code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from
|
||||||
are passed straight back.
|
<b>pcre2_match()</b> are passed straight back.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
|
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
|
||||||
|
@ -3526,6 +3543,10 @@ needed is returned via <i>outlengthptr</i>. Note that this does not happen by
|
||||||
default.
|
default.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
|
||||||
|
<i>match_data</i> argument is NULL.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
|
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
|
||||||
replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
|
replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
|
||||||
(invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket
|
(invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket
|
||||||
|
@ -3876,7 +3897,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 26 December 2019
|
Last updated: 27 December 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
224
doc/pcre2.txt
224
doc/pcre2.txt
|
@ -3193,97 +3193,110 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer,
|
PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer,
|
||||||
PCRE2_SIZE *outlengthptr);
|
PCRE2_SIZE *outlengthptr);
|
||||||
|
|
||||||
This function calls pcre2_match() and then makes a copy of the subject
|
This function optionally calls pcre2_match() and then makes a copy of
|
||||||
string in outputbuffer, replacing one or more parts that were matched
|
the subject string in outputbuffer, replacing parts that were matched
|
||||||
with the replacement string, whose length is supplied in rlength. This
|
with the replacement string, whose length is supplied in rlength. This
|
||||||
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
|
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The
|
||||||
The default is to perform just one replacement, but there is an option
|
default is to perform just one replacement if the pattern matches, but
|
||||||
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
|
there is an option that requests multiple replacements (see PCRE2_SUB-
|
||||||
for details).
|
STITUTE_GLOBAL below for details).
|
||||||
|
|
||||||
Matches in which a \K item in a lookahead in the pattern causes the
|
If successful, pcre2_substitute() returns the number of substitutions
|
||||||
match to end before it starts are not supported, and give rise to an
|
that were carried out. This may be zero if no match was found, and is
|
||||||
|
never greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A nega-
|
||||||
|
tive value is returned if an error is detected (see below for details).
|
||||||
|
|
||||||
|
Matches in which a \K item in a lookahead in the pattern causes the
|
||||||
|
match to end before it starts are not supported, and give rise to an
|
||||||
error return. For global replacements, matches in which \K in a lookbe-
|
error return. For global replacements, matches in which \K in a lookbe-
|
||||||
hind causes the match to start earlier than the point that was reached
|
hind causes the match to start earlier than the point that was reached
|
||||||
in the previous iteration are also not supported.
|
in the previous iteration are also not supported.
|
||||||
|
|
||||||
The first seven arguments of pcre2_substitute() are the same as for
|
The first seven arguments of pcre2_substitute() are the same as for
|
||||||
pcre2_match(), except that the partial matching options are not permit-
|
pcre2_match(), except that the partial matching options are not permit-
|
||||||
ted, and match_data may be passed as NULL, in which case a match data
|
ted, and match_data may be passed as NULL, in which case a match data
|
||||||
block is obtained and freed within this function, using memory manage-
|
block is obtained and freed within this function, using memory manage-
|
||||||
ment functions from the match context, if provided, or else those that
|
ment functions from the match context, if provided, or else those that
|
||||||
were used to allocate memory for the compiled code.
|
were used to allocate memory for the compiled code.
|
||||||
|
|
||||||
If an external match_data block is provided, its contents afterwards
|
If an external match_data block is provided, its contents afterwards
|
||||||
are those set by the final call to pcre2_match(). For global changes,
|
are those set by the final call to pcre2_match(). For global changes,
|
||||||
this will have ended in a matching error. The contents of the ovector
|
this will have ended in a no-match error. The contents of the ovector
|
||||||
within the match data block may or may not have been changed.
|
within the match data block may or may not have been changed.
|
||||||
|
|
||||||
The outlengthptr argument must point to a variable that contains the
|
As well as the usual options for pcre2_match(), a number of additional
|
||||||
length, in code units, of the output buffer. If the function is suc-
|
options can be set in the options argument of pcre2_substitute(). One
|
||||||
cessful, the value is updated to contain the length of the new string,
|
such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
|
||||||
excluding the trailing zero that is automatically added.
|
match_data block must be provided, and it must have been used for an
|
||||||
|
external call to pcre2_match(). The data in the match_data block (re-
|
||||||
|
turn code, offset vector) is used for the first substitution instead of
|
||||||
|
calling pcre2_match() from within pcre2_substitute(). This allows an
|
||||||
|
application to check for a match before choosing to substitute, without
|
||||||
|
having to repeat the match.
|
||||||
|
|
||||||
If the function is not successful, the value set via outlengthptr de-
|
The code argument is not used for the first substitution, but if
|
||||||
pends on the type of error. For syntax errors in the replacement
|
PCRE2_SUBSTITUTE_GLOBAL is set, pcre2_match() will be called after the
|
||||||
|
first substitution to check for further matches, and the contents of
|
||||||
|
the match_data block will be changed.
|
||||||
|
|
||||||
|
The outlengthptr argument of pcre2_substitute() must point to a vari-
|
||||||
|
able that contains the length, in code units, of the output buffer. If
|
||||||
|
the function is successful, the value is updated to contain the length
|
||||||
|
of the new string, excluding the trailing zero that is automatically
|
||||||
|
added.
|
||||||
|
|
||||||
|
If the function is not successful, the value set via outlengthptr de-
|
||||||
|
pends on the type of error. For syntax errors in the replacement
|
||||||
string, the value is the offset in the replacement string where the er-
|
string, the value is the offset in the replacement string where the er-
|
||||||
ror was detected. For other errors, the value is PCRE2_UNSET by de-
|
ror was detected. For other errors, the value is PCRE2_UNSET by de-
|
||||||
fault. This includes the case of the output buffer being too small, un-
|
fault. This includes the case of the output buffer being too small, un-
|
||||||
less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which case
|
less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which case
|
||||||
the value is the minimum length needed, including space for the trail-
|
the value is the minimum length needed, including space for the trail-
|
||||||
ing zero. Note that in order to compute the required length, pcre2_sub-
|
ing zero. Note that in order to compute the required length, pcre2_sub-
|
||||||
stitute() has to simulate all the matching and copying, instead of giv-
|
stitute() has to simulate all the matching and copying, instead of giv-
|
||||||
ing an error return as soon as the buffer overflows. Note also that the
|
ing an error return as soon as the buffer overflows. Note also that the
|
||||||
length is in code units, not bytes.
|
length is in code units, not bytes.
|
||||||
|
|
||||||
The replacement string, which is interpreted as a UTF string in UTF
|
The replacement string, which is interpreted as a UTF string in UTF
|
||||||
mode, is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option
|
mode, is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option
|
||||||
is set. If the PCRE2_SUBSTITUTE_LITERAL option is set, it is not inter-
|
is set. If the PCRE2_SUBSTITUTE_LITERAL option is set, it is not inter-
|
||||||
preted in any way. By default, however, a dollar character is an escape
|
preted in any way. By default, however, a dollar character is an escape
|
||||||
character that can specify the insertion of characters from capture
|
character that can specify the insertion of characters from capture
|
||||||
groups or names from (*MARK) or other control verbs in the pattern. The
|
groups and names from (*MARK) or other control verbs in the pattern.
|
||||||
following forms are always recognized:
|
The following forms are always recognized:
|
||||||
|
|
||||||
$$ insert a dollar character
|
$$ insert a dollar character
|
||||||
$<n> or ${<n>} insert the contents of group <n>
|
$<n> or ${<n>} insert the contents of group <n>
|
||||||
$*MARK or ${*MARK} insert a control verb name
|
$*MARK or ${*MARK} insert a control verb name
|
||||||
|
|
||||||
Either a group number or a group name can be given for <n>. Curly
|
Either a group number or a group name can be given for <n>. Curly
|
||||||
brackets are required only if the following character would be inter-
|
brackets are required only if the following character would be inter-
|
||||||
preted as part of the number or name. The number may be zero to include
|
preted as part of the number or name. The number may be zero to include
|
||||||
the entire matched string. For example, if the pattern a(b)c is
|
the entire matched string. For example, if the pattern a(b)c is
|
||||||
matched with "=abc=" and the replacement string "+$1$0$1+", the result
|
matched with "=abc=" and the replacement string "+$1$0$1+", the result
|
||||||
is "=+babcb+=".
|
is "=+babcb+=".
|
||||||
|
|
||||||
$*MARK inserts the name from the last encountered backtracking control
|
$*MARK inserts the name from the last encountered backtracking control
|
||||||
verb on the matching path that has a name. (*MARK) must always include
|
verb on the matching path that has a name. (*MARK) must always include
|
||||||
a name, but the other verbs need not. For example, in the case of
|
a name, but the other verbs need not. For example, in the case of
|
||||||
(*MARK:A)(*PRUNE) the name inserted is "A", but for (*MARK:A)(*PRUNE:B)
|
(*MARK:A)(*PRUNE) the name inserted is "A", but for (*MARK:A)(*PRUNE:B)
|
||||||
the relevant name is "B". This facility can be used to perform simple
|
the relevant name is "B". This facility can be used to perform simple
|
||||||
simultaneous substitutions, as this pcre2test example shows:
|
simultaneous substitutions, as this pcre2test example shows:
|
||||||
|
|
||||||
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
||||||
apple lemon
|
apple lemon
|
||||||
2: pear orange
|
2: pear orange
|
||||||
|
|
||||||
As well as the usual options for pcre2_match(), a number of additional
|
|
||||||
options can be set in the options argument of pcre2_substitute().
|
|
||||||
|
|
||||||
As mentioned above, PCRE2_SUBSTITUTE_LITERAL causes the replacement
|
|
||||||
string to be treated as a literal, with no interpretation. If this op-
|
|
||||||
tion is set, PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
|
|
||||||
and PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
|
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
|
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
|
||||||
string, replacing every matching substring. If this option is not set,
|
string, replacing every matching substring. If this option is not set,
|
||||||
only the first matching substring is replaced. The search for matches
|
only the first matching substring is replaced. The search for matches
|
||||||
takes place in the original subject string (that is, previous replace-
|
takes place in the original subject string (that is, previous replace-
|
||||||
ments do not affect it). Iteration is implemented by advancing the
|
ments do not affect it). Iteration is implemented by advancing the
|
||||||
startoffset value for each search, which is always passed the entire
|
startoffset value for each search, which is always passed the entire
|
||||||
subject string. If an offset limit is set in the match context, search-
|
subject string. If an offset limit is set in the match context, search-
|
||||||
ing stops when that limit is reached.
|
ing stops when that limit is reached.
|
||||||
|
|
||||||
You can restrict the effect of a global substitution to a portion of
|
You can restrict the effect of a global substitution to a portion of
|
||||||
the subject string by setting either or both of startoffset and an off-
|
the subject string by setting either or both of startoffset and an off-
|
||||||
set limit. Here is a pcre2test example:
|
set limit. Here is a pcre2test example:
|
||||||
|
|
||||||
|
@ -3291,87 +3304,87 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
ABC ABC ABC ABC\=offset=3,offset_limit=12
|
ABC ABC ABC ABC\=offset=3,offset_limit=12
|
||||||
2: ABC A!C A!C ABC
|
2: ABC A!C A!C ABC
|
||||||
|
|
||||||
When continuing with global substitutions after matching a substring
|
When continuing with global substitutions after matching a substring
|
||||||
with zero length, an attempt to find a non-empty match at the same off-
|
with zero length, an attempt to find a non-empty match at the same off-
|
||||||
set is performed. If this is not successful, the offset is advanced by
|
set is performed. If this is not successful, the offset is advanced by
|
||||||
one character except when CRLF is a valid newline sequence and the next
|
one character except when CRLF is a valid newline sequence and the next
|
||||||
two characters are CR, LF. In this case, the offset is advanced by two
|
two characters are CR, LF. In this case, the offset is advanced by two
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
||||||
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
|
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
|
||||||
ORY immediately. If this option is set, however, pcre2_substitute()
|
ORY immediately. If this option is set, however, pcre2_substitute()
|
||||||
continues to go through the motions of matching and substituting (with-
|
continues to go through the motions of matching and substituting (with-
|
||||||
out, of course, writing anything) in order to compute the size of buf-
|
out, of course, writing anything) in order to compute the size of buf-
|
||||||
fer that is needed. This value is passed back via the outlengthptr
|
fer that is needed. This value is passed back via the outlengthptr
|
||||||
variable, with the result of the function still being PCRE2_ER-
|
variable, with the result of the function still being PCRE2_ER-
|
||||||
ROR_NOMEMORY.
|
ROR_NOMEMORY.
|
||||||
|
|
||||||
Passing a buffer size of zero is a permitted way of finding out how
|
Passing a buffer size of zero is a permitted way of finding out how
|
||||||
much memory is needed for given substitution. However, this does mean
|
much memory is needed for given substitution. However, this does mean
|
||||||
that the entire operation is carried out twice. Depending on the appli-
|
that the entire operation is carried out twice. Depending on the appli-
|
||||||
cation, it may be more efficient to allocate a large buffer and free
|
cation, it may be more efficient to allocate a large buffer and free
|
||||||
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
||||||
FLOW_LENGTH.
|
FLOW_LENGTH.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
|
||||||
do not appear in the pattern to be treated as unset groups. This option
|
do not appear in the pattern to be treated as unset groups. This option
|
||||||
should be used with care, because it means that a typo in a group name
|
should be used with care, because it means that a typo in a group name
|
||||||
or number no longer causes the PCRE2_ERROR_NOSUBSTRING error.
|
or number no longer causes the PCRE2_ERROR_NOSUBSTRING error.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including un-
|
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including un-
|
||||||
known groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated
|
known groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated
|
||||||
as empty strings when inserted as described above. If this option is
|
as empty strings when inserted as described above. If this option is
|
||||||
not set, an attempt to insert an unset group causes the PCRE2_ERROR_UN-
|
not set, an attempt to insert an unset group causes the PCRE2_ERROR_UN-
|
||||||
SET error. This option does not influence the extended substitution
|
SET error. This option does not influence the extended substitution
|
||||||
syntax described below.
|
syntax described below.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
|
PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
|
||||||
replacement string. Without this option, only the dollar character is
|
replacement string. Without this option, only the dollar character is
|
||||||
special, and only the group insertion forms listed above are valid.
|
special, and only the group insertion forms listed above are valid.
|
||||||
When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
|
When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
|
||||||
|
|
||||||
Firstly, backslash in a replacement string is interpreted as an escape
|
Firstly, backslash in a replacement string is interpreted as an escape
|
||||||
character. The usual forms such as \n or \x{ddd} can be used to specify
|
character. The usual forms such as \n or \x{ddd} can be used to specify
|
||||||
particular character codes, and backslash followed by any non-alphanu-
|
particular character codes, and backslash followed by any non-alphanu-
|
||||||
meric character quotes that character. Extended quoting can be coded
|
meric character quotes that character. Extended quoting can be coded
|
||||||
using \Q...\E, exactly as in pattern strings.
|
using \Q...\E, exactly as in pattern strings.
|
||||||
|
|
||||||
There are also four escape sequences for forcing the case of inserted
|
There are also four escape sequences for forcing the case of inserted
|
||||||
letters. The insertion mechanism has three states: no case forcing,
|
letters. The insertion mechanism has three states: no case forcing,
|
||||||
force upper case, and force lower case. The escape sequences change the
|
force upper case, and force lower case. The escape sequences change the
|
||||||
current state: \U and \L change to upper or lower case forcing, respec-
|
current state: \U and \L change to upper or lower case forcing, respec-
|
||||||
tively, and \E (when not terminating a \Q quoted sequence) reverts to
|
tively, and \E (when not terminating a \Q quoted sequence) reverts to
|
||||||
no case forcing. The sequences \u and \l force the next character (if
|
no case forcing. The sequences \u and \l force the next character (if
|
||||||
it is a letter) to upper or lower case, respectively, and then the
|
it is a letter) to upper or lower case, respectively, and then the
|
||||||
state automatically reverts to no case forcing. Case forcing applies to
|
state automatically reverts to no case forcing. Case forcing applies to
|
||||||
all inserted characters, including those from capture groups and let-
|
all inserted characters, including those from capture groups and let-
|
||||||
ters within \Q...\E quoted sequences.
|
ters within \Q...\E quoted sequences.
|
||||||
|
|
||||||
Note that case forcing sequences such as \U...\E do not nest. For exam-
|
Note that case forcing sequences such as \U...\E do not nest. For exam-
|
||||||
ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
|
ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
|
||||||
\E has no effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EX-
|
\E has no effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EX-
|
||||||
TRA_ALT_BSUX options do not apply to replacement strings.
|
TRA_ALT_BSUX options do not apply to replacement strings.
|
||||||
|
|
||||||
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
||||||
flexibility to capture group substitution. The syntax is similar to
|
flexibility to capture group substitution. The syntax is similar to
|
||||||
that used by Bash:
|
that used by Bash:
|
||||||
|
|
||||||
${<n>:-<string>}
|
${<n>:-<string>}
|
||||||
${<n>:+<string1>:<string2>}
|
${<n>:+<string1>:<string2>}
|
||||||
|
|
||||||
As before, <n> may be a group number or a name. The first form speci-
|
As before, <n> may be a group number or a name. The first form speci-
|
||||||
fies a default value. If group <n> is set, its value is inserted; if
|
fies a default value. If group <n> is set, its value is inserted; if
|
||||||
not, <string> is expanded and the result inserted. The second form
|
not, <string> is expanded and the result inserted. The second form
|
||||||
specifies strings that are expanded and inserted when group <n> is set
|
specifies strings that are expanded and inserted when group <n> is set
|
||||||
or unset, respectively. The first form is just a convenient shorthand
|
or unset, respectively. The first form is just a convenient shorthand
|
||||||
for
|
for
|
||||||
|
|
||||||
${<n>:+${<n>}:<string>}
|
${<n>:+${<n>}:<string>}
|
||||||
|
|
||||||
Backslash can be used to escape colons and closing curly brackets in
|
Backslash can be used to escape colons and closing curly brackets in
|
||||||
the replacement strings. A change of the case forcing state within a
|
the replacement strings. A change of the case forcing state within a
|
||||||
replacement string remains in force afterwards, as shown in this
|
replacement string remains in force afterwards, as shown in this
|
||||||
pcre2test example:
|
pcre2test example:
|
||||||
|
|
||||||
/(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
|
/(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
|
||||||
|
@ -3380,31 +3393,36 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
somebody
|
somebody
|
||||||
1: HELLO
|
1: HELLO
|
||||||
|
|
||||||
The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
||||||
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause un-
|
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause un-
|
||||||
known groups in the extended syntax forms to be treated as unset.
|
known groups in the extended syntax forms to be treated as unset.
|
||||||
|
|
||||||
If successful, pcre2_substitute() returns the number of successful
|
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
|
||||||
matches. This may be zero if no matches were found, and is never
|
PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrele-
|
||||||
greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
|
vant and are ignored.
|
||||||
|
|
||||||
In the event of an error, a negative error code is returned. Except for
|
Substitution errors
|
||||||
PCRE2_ERROR_NOMATCH (which is never returned), errors from
|
|
||||||
pcre2_match() are passed straight back.
|
In the event of an error, pcre2_substitute() returns a negative error
|
||||||
|
code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors
|
||||||
|
from pcre2_match() are passed straight back.
|
||||||
|
|
||||||
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
|
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
|
||||||
tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
|
tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
|
||||||
|
|
||||||
PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
|
PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
|
||||||
ing an unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
|
ing an unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
|
||||||
when the simple (non-extended) syntax is used and PCRE2_SUBSTITUTE_UN-
|
when the simple (non-extended) syntax is used and PCRE2_SUBSTITUTE_UN-
|
||||||
SET_EMPTY is not set.
|
SET_EMPTY is not set.
|
||||||
|
|
||||||
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big
|
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big
|
||||||
enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
|
enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
|
||||||
of buffer that is needed is returned via outlengthptr. Note that this
|
of buffer that is needed is returned via outlengthptr. Note that this
|
||||||
does not happen by default.
|
does not happen by default.
|
||||||
|
|
||||||
|
PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
|
||||||
|
match_data argument is NULL.
|
||||||
|
|
||||||
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in
|
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in
|
||||||
the replacement string, with more particular errors being PCRE2_ER-
|
the replacement string, with more particular errors being PCRE2_ER-
|
||||||
ROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE
|
ROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE
|
||||||
|
@ -3727,7 +3745,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 26 December 2019
|
Last updated: 27 December 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_SUBSTITUTE 3 "26 December 2019" "PCRE2 10.35"
|
.TH PCRE2_SUBSTITUTE 3 "27 December 2019" "PCRE2 10.35"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -36,8 +36,8 @@ Its arguments are:
|
||||||
\fIoutlengthptr\fP Points to the length of the output buffer
|
\fIoutlengthptr\fP Points to the length of the output buffer
|
||||||
.sp
|
.sp
|
||||||
A match data block is needed only if you want to inspect the data from the
|
A match data block is needed only if you want to inspect the data from the
|
||||||
match that is returned in that block. A match context is needed only if you
|
match that is returned in that block or if PCRE2_SUBSTITUTE_MATCHED is set. A
|
||||||
want to:
|
match context is needed only if you want to:
|
||||||
.sp
|
.sp
|
||||||
Set up a callout function
|
Set up a callout function
|
||||||
Set a matching offset limit
|
Set a matching offset limit
|
||||||
|
@ -67,15 +67,16 @@ zero-terminated strings. The options are:
|
||||||
PCRE2_SUBSTITUTE_EXTENDED Do extended replacement processing
|
PCRE2_SUBSTITUTE_EXTENDED Do extended replacement processing
|
||||||
PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
|
PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
|
||||||
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
|
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
|
||||||
|
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
|
||||||
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
|
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
|
||||||
.sp
|
.sp
|
||||||
PCRE2_SUBSTITUTE_LITERAL overrides PCRE2_SUBSTITUTE_EXTENDED,
|
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED,
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY.
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored.
|
||||||
.P
|
.P
|
||||||
The function returns the number of substitutions, which may be zero if there
|
The function returns the number of substitutions, which may be zero if there
|
||||||
were no matches. The result can be greater than one only when
|
are no matches. The result may be greater than one only when
|
||||||
PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
|
PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
|
||||||
is returned.
|
is returned.
|
||||||
.P
|
.P
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "26 December 2019" "PCRE2 10.35"
|
.TH PCRE2API 3 "27 December 2019" "PCRE2 10.35"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -3321,12 +3321,18 @@ same number causes an error at compile time.
|
||||||
.B " PCRE2_SIZE *\fIoutlengthptr\fP);"
|
.B " PCRE2_SIZE *\fIoutlengthptr\fP);"
|
||||||
.fi
|
.fi
|
||||||
.P
|
.P
|
||||||
This function calls \fBpcre2_match()\fP and then makes a copy of the subject
|
This function optionally calls \fBpcre2_match()\fP and then makes a copy of the
|
||||||
string in \fIoutputbuffer\fP, replacing one or more parts that were matched
|
subject string in \fIoutputbuffer\fP, replacing parts that were matched with
|
||||||
with the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP.
|
the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This
|
||||||
This can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
|
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
|
||||||
The default is to perform just one replacement, but there is an option that
|
is to perform just one replacement if the pattern matches, but there is an
|
||||||
requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below for details).
|
option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
|
||||||
|
for details).
|
||||||
|
.P
|
||||||
|
If successful, \fBpcre2_substitute()\fP returns the number of substitutions
|
||||||
|
that were carried out. This may be zero if no match was found, and is never
|
||||||
|
greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
|
||||||
|
returned if an error is detected (see below for details).
|
||||||
.P
|
.P
|
||||||
Matches in which a \eK item in a lookahead in the pattern causes the match to
|
Matches in which a \eK item in a lookahead in the pattern causes the match to
|
||||||
end before it starts are not supported, and give rise to an error return. For
|
end before it starts are not supported, and give rise to an error return. For
|
||||||
|
@ -3343,13 +3349,28 @@ allocate memory for the compiled code.
|
||||||
.P
|
.P
|
||||||
If an external \fImatch_data\fP block is provided, its contents afterwards
|
If an external \fImatch_data\fP block is provided, its contents afterwards
|
||||||
are those set by the final call to \fBpcre2_match()\fP. For global changes,
|
are those set by the final call to \fBpcre2_match()\fP. For global changes,
|
||||||
this will have ended in a matching error. The contents of the ovector within
|
this will have ended in a no-match error. The contents of the ovector within
|
||||||
the match data block may or may not have been changed.
|
the match data block may or may not have been changed.
|
||||||
.P
|
.P
|
||||||
The \fIoutlengthptr\fP argument must point to a variable that contains the
|
As well as the usual options for \fBpcre2_match()\fP, a number of additional
|
||||||
length, in code units, of the output buffer. If the function is successful, the
|
options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP.
|
||||||
value is updated to contain the length of the new string, excluding the
|
One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
|
||||||
trailing zero that is automatically added.
|
\fImatch_data\fP block must be provided, and it must have been used for an
|
||||||
|
external call to \fBpcre2_match()\fP. The data in the \fImatch_data\fP block
|
||||||
|
(return code, offset vector) is used for the first substitution instead of
|
||||||
|
calling \fBpcre2_match()\fP from within \fBpcre2_substitute()\fP. This allows
|
||||||
|
an application to check for a match before choosing to substitute, without
|
||||||
|
having to repeat the match.
|
||||||
|
.P
|
||||||
|
The \fIcode\fP argument is not used for the first substitution, but if
|
||||||
|
PCRE2_SUBSTITUTE_GLOBAL is set, \fBpcre2_match()\fP will be called after the
|
||||||
|
first substitution to check for further matches, and the contents of the
|
||||||
|
\fImatch_data\fP block will be changed.
|
||||||
|
.P
|
||||||
|
The \fIoutlengthptr\fP argument of \fBpcre2_substitute()\fP must point to a
|
||||||
|
variable that contains the length, in code units, of the output buffer. If the
|
||||||
|
function is successful, the value is updated to contain the length of the new
|
||||||
|
string, excluding the trailing zero that is automatically added.
|
||||||
.P
|
.P
|
||||||
If the function is not successful, the value set via \fIoutlengthptr\fP depends
|
If the function is not successful, the value set via \fIoutlengthptr\fP depends
|
||||||
on the type of error. For syntax errors in the replacement string, the value is
|
on the type of error. For syntax errors in the replacement string, the value is
|
||||||
|
@ -3366,7 +3387,7 @@ The replacement string, which is interpreted as a UTF string in UTF mode,
|
||||||
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
|
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
|
||||||
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
|
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
|
||||||
default, however, a dollar character is an escape character that can specify
|
default, however, a dollar character is an escape character that can specify
|
||||||
the insertion of characters from capture groups or names from (*MARK) or other
|
the insertion of characters from capture groups and names from (*MARK) or other
|
||||||
control verbs in the pattern. The following forms are always recognized:
|
control verbs in the pattern. The following forms are always recognized:
|
||||||
.sp
|
.sp
|
||||||
$$ insert a dollar character
|
$$ insert a dollar character
|
||||||
|
@ -3390,14 +3411,6 @@ facility can be used to perform simple simultaneous substitutions, as this
|
||||||
apple lemon
|
apple lemon
|
||||||
2: pear orange
|
2: pear orange
|
||||||
.sp
|
.sp
|
||||||
As well as the usual options for \fBpcre2_match()\fP, a number of additional
|
|
||||||
options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP.
|
|
||||||
.P
|
|
||||||
As mentioned above, PCRE2_SUBSTITUTE_LITERAL causes the replacement string to
|
|
||||||
be treated as a literal, with no interpretation. If this option is set,
|
|
||||||
PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and
|
|
||||||
PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
|
|
||||||
.P
|
|
||||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
|
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
|
||||||
replacing every matching substring. If this option is not set, only the first
|
replacing every matching substring. If this option is not set, only the first
|
||||||
matching substring is replaced. The search for matches takes place in the
|
matching substring is replaced. The search for matches takes place in the
|
||||||
|
@ -3500,13 +3513,17 @@ The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
||||||
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
|
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
|
||||||
groups in the extended syntax forms to be treated as unset.
|
groups in the extended syntax forms to be treated as unset.
|
||||||
.P
|
.P
|
||||||
If successful, \fBpcre2_substitute()\fP returns the number of successful
|
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
|
||||||
matches. This may be zero if no matches were found, and is never greater than 1
|
PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and
|
||||||
unless PCRE2_SUBSTITUTE_GLOBAL is set.
|
are ignored.
|
||||||
.P
|
.
|
||||||
In the event of an error, a negative error code is returned. Except for
|
.
|
||||||
PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
|
.SS "Substitution errors"
|
||||||
are passed straight back.
|
.rs
|
||||||
|
.sp
|
||||||
|
In the event of an error, \fBpcre2_substitute()\fP returns a negative error
|
||||||
|
code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from
|
||||||
|
\fBpcre2_match()\fP are passed straight back.
|
||||||
.P
|
.P
|
||||||
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
|
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
|
||||||
unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
|
unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
|
||||||
|
@ -3520,6 +3537,9 @@ PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size of buffer that is
|
||||||
needed is returned via \fIoutlengthptr\fP. Note that this does not happen by
|
needed is returned via \fIoutlengthptr\fP. Note that this does not happen by
|
||||||
default.
|
default.
|
||||||
.P
|
.P
|
||||||
|
PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
|
||||||
|
\fImatch_data\fP argument is NULL.
|
||||||
|
.P
|
||||||
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
|
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
|
||||||
replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
|
replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
|
||||||
(invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket
|
(invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket
|
||||||
|
@ -3884,6 +3904,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 26 December 2019
|
Last updated: 27 December 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -182,6 +182,7 @@ pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */
|
||||||
#define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */
|
#define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */
|
||||||
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
|
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
|
||||||
#define PCRE2_SUBSTITUTE_LITERAL 0x00008000u /* pcre2_substitute() only */
|
#define PCRE2_SUBSTITUTE_LITERAL 0x00008000u /* pcre2_substitute() only */
|
||||||
|
#define PCRE2_SUBSTITUTE_MATCHED 0x00010000u /* pcre2_substitute() only */
|
||||||
|
|
||||||
/* Options for pcre2_pattern_convert(). */
|
/* Options for pcre2_pattern_convert(). */
|
||||||
|
|
||||||
|
|
|
@ -49,8 +49,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
|
||||||
#define SUBSTITUTE_OPTIONS \
|
#define SUBSTITUTE_OPTIONS \
|
||||||
(PCRE2_SUBSTITUTE_EXTENDED|PCRE2_SUBSTITUTE_GLOBAL| \
|
(PCRE2_SUBSTITUTE_EXTENDED|PCRE2_SUBSTITUTE_GLOBAL| \
|
||||||
PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_OVERFLOW_LENGTH| \
|
PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_MATCHED| \
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET|PCRE2_SUBSTITUTE_UNSET_EMPTY)
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_UNKNOWN_UNSET| \
|
||||||
|
PCRE2_SUBSTITUTE_UNSET_EMPTY)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -229,6 +230,7 @@ uint32_t suboptions;
|
||||||
BOOL match_data_created = FALSE;
|
BOOL match_data_created = FALSE;
|
||||||
BOOL escaped_literal = FALSE;
|
BOOL escaped_literal = FALSE;
|
||||||
BOOL overflowed = FALSE;
|
BOOL overflowed = FALSE;
|
||||||
|
BOOL use_existing_match;
|
||||||
#ifdef SUPPORT_UNICODE
|
#ifdef SUPPORT_UNICODE
|
||||||
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
|
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
|
||||||
#endif
|
#endif
|
||||||
|
@ -248,15 +250,25 @@ lengthleft = buff_length = *blength;
|
||||||
*blength = PCRE2_UNSET;
|
*blength = PCRE2_UNSET;
|
||||||
ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET;
|
ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET;
|
||||||
|
|
||||||
/* Partial matching is not valid. This must come after setting *blength to
|
/* Partial matching is not valid. This must come after setting *blength to
|
||||||
PCRE2_UNSET, so as not to imply an offset in the replacement. */
|
PCRE2_UNSET, so as not to imply an offset in the replacement. */
|
||||||
|
|
||||||
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
|
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
|
||||||
return PCRE2_ERROR_BADOPTION;
|
return PCRE2_ERROR_BADOPTION;
|
||||||
|
|
||||||
/* If no match data block is provided, create one. */
|
/* Check for using a match that has already happened. Note that the subject
|
||||||
|
pointer in the match data may be NULL after a no-match. */
|
||||||
|
|
||||||
if (match_data == NULL)
|
use_existing_match = ((options & PCRE2_SUBSTITUTE_MATCHED) != 0);
|
||||||
|
|
||||||
|
if (use_existing_match)
|
||||||
|
{
|
||||||
|
if (match_data == NULL) return PCRE2_ERROR_NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Otherwise, if no match data block is provided, create one. */
|
||||||
|
|
||||||
|
else if (match_data == NULL)
|
||||||
{
|
{
|
||||||
pcre2_general_context *gcontext = (mcontext == NULL)?
|
pcre2_general_context *gcontext = (mcontext == NULL)?
|
||||||
(pcre2_general_context *)code :
|
(pcre2_general_context *)code :
|
||||||
|
@ -310,7 +322,8 @@ if (start_offset > length)
|
||||||
}
|
}
|
||||||
CHECKMEMCPY(subject, start_offset);
|
CHECKMEMCPY(subject, start_offset);
|
||||||
|
|
||||||
/* Loop for global substituting. */
|
/* Loop for global substituting. If PCRE2_SUBSTITUTE_MATCHED is set, the first
|
||||||
|
match is taken from the match_data that was passed in. */
|
||||||
|
|
||||||
subs = 0;
|
subs = 0;
|
||||||
do
|
do
|
||||||
|
@ -318,8 +331,13 @@ do
|
||||||
PCRE2_SPTR ptrstack[PTR_STACK_SIZE];
|
PCRE2_SPTR ptrstack[PTR_STACK_SIZE];
|
||||||
uint32_t ptrstackptr = 0;
|
uint32_t ptrstackptr = 0;
|
||||||
|
|
||||||
rc = pcre2_match(code, subject, length, start_offset, options|goptions,
|
if (use_existing_match)
|
||||||
match_data, mcontext);
|
{
|
||||||
|
rc = match_data->rc;
|
||||||
|
use_existing_match = FALSE;
|
||||||
|
}
|
||||||
|
else rc = pcre2_match(code, subject, length, start_offset, options|goptions,
|
||||||
|
match_data, mcontext);
|
||||||
|
|
||||||
#ifdef SUPPORT_UNICODE
|
#ifdef SUPPORT_UNICODE
|
||||||
if (utf) options |= PCRE2_NO_UTF_CHECK; /* Only need to check once */
|
if (utf) options |= PCRE2_NO_UTF_CHECK; /* Only need to check once */
|
||||||
|
@ -375,33 +393,33 @@ do
|
||||||
|
|
||||||
/* Handle a successful match. Matches that use \K to end before they start
|
/* Handle a successful match. Matches that use \K to end before they start
|
||||||
or start before the current point in the subject are not supported. */
|
or start before the current point in the subject are not supported. */
|
||||||
|
|
||||||
if (ovector[1] < ovector[0] || ovector[0] < start_offset)
|
if (ovector[1] < ovector[0] || ovector[0] < start_offset)
|
||||||
{
|
{
|
||||||
rc = PCRE2_ERROR_BADSUBSPATTERN;
|
rc = PCRE2_ERROR_BADSUBSPATTERN;
|
||||||
goto EXIT;
|
goto EXIT;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Check for the same match as previous. This is legitimate after matching an
|
/* Check for the same match as previous. This is legitimate after matching an
|
||||||
empty string that starts after the initial match offset. We have tried again
|
empty string that starts after the initial match offset. We have tried again
|
||||||
at the match point in case the pattern is one like /(?<=\G.)/ which can never
|
at the match point in case the pattern is one like /(?<=\G.)/ which can never
|
||||||
match at its starting point, so running the match achieves the bumpalong. If
|
match at its starting point, so running the match achieves the bumpalong. If
|
||||||
we do get the same (null) match at the original match point, it isn't such a
|
we do get the same (null) match at the original match point, it isn't such a
|
||||||
pattern, so we now do the empty string magic. In all other cases, a repeat
|
pattern, so we now do the empty string magic. In all other cases, a repeat
|
||||||
match should never occur. */
|
match should never occur. */
|
||||||
|
|
||||||
if (ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1])
|
if (ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1])
|
||||||
{
|
{
|
||||||
if (ovector[0] == ovector[1] && ovecsave[2] != start_offset)
|
if (ovector[0] == ovector[1] && ovecsave[2] != start_offset)
|
||||||
{
|
{
|
||||||
goptions = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
|
goptions = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
|
||||||
ovecsave[2] = start_offset;
|
ovecsave[2] = start_offset;
|
||||||
continue; /* Back to the top of the loop */
|
continue; /* Back to the top of the loop */
|
||||||
}
|
}
|
||||||
rc = PCRE2_ERROR_INTERNAL_DUPMATCH;
|
rc = PCRE2_ERROR_INTERNAL_DUPMATCH;
|
||||||
goto EXIT;
|
goto EXIT;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Count substitutions with a paranoid check for integer overflow; surely no
|
/* Count substitutions with a paranoid check for integer overflow; surely no
|
||||||
real call to this function would ever hit this! */
|
real call to this function would ever hit this! */
|
||||||
|
|
||||||
|
@ -421,20 +439,20 @@ do
|
||||||
scb.output_offsets[0] = buff_offset;
|
scb.output_offsets[0] = buff_offset;
|
||||||
scb.oveccount = rc;
|
scb.oveccount = rc;
|
||||||
|
|
||||||
/* Process the replacement string. If the entire replacement is literal, just
|
/* Process the replacement string. If the entire replacement is literal, just
|
||||||
copy it with length check. */
|
copy it with length check. */
|
||||||
|
|
||||||
ptr = replacement;
|
ptr = replacement;
|
||||||
if ((suboptions & PCRE2_SUBSTITUTE_LITERAL) != 0)
|
if ((suboptions & PCRE2_SUBSTITUTE_LITERAL) != 0)
|
||||||
{
|
{
|
||||||
CHECKMEMCPY(ptr, rlength);
|
CHECKMEMCPY(ptr, rlength);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Within a non-literal replacement, which must be scanned character by
|
/* Within a non-literal replacement, which must be scanned character by
|
||||||
character, local literal mode can be set by \Q, but only in extended mode
|
character, local literal mode can be set by \Q, but only in extended mode
|
||||||
when backslashes are being interpreted. In extended mode we must handle
|
when backslashes are being interpreted. In extended mode we must handle
|
||||||
nested substrings that are to be reprocessed. */
|
nested substrings that are to be reprocessed. */
|
||||||
|
|
||||||
else for (;;)
|
else for (;;)
|
||||||
{
|
{
|
||||||
uint32_t ch;
|
uint32_t ch;
|
||||||
|
@ -844,42 +862,42 @@ do
|
||||||
} /* End handling a literal code unit */
|
} /* End handling a literal code unit */
|
||||||
} /* End of loop for scanning the replacement. */
|
} /* End of loop for scanning the replacement. */
|
||||||
|
|
||||||
/* The replacement has been copied to the output, or its size has been
|
/* The replacement has been copied to the output, or its size has been
|
||||||
remembered. Do the callout if there is one and we have done an actual
|
remembered. Do the callout if there is one and we have done an actual
|
||||||
replacement. */
|
replacement. */
|
||||||
|
|
||||||
if (!overflowed && mcontext != NULL && mcontext->substitute_callout != NULL)
|
if (!overflowed && mcontext != NULL && mcontext->substitute_callout != NULL)
|
||||||
{
|
{
|
||||||
scb.subscount = subs;
|
scb.subscount = subs;
|
||||||
scb.output_offsets[1] = buff_offset;
|
scb.output_offsets[1] = buff_offset;
|
||||||
rc = mcontext->substitute_callout(&scb, mcontext->substitute_callout_data);
|
rc = mcontext->substitute_callout(&scb, mcontext->substitute_callout_data);
|
||||||
|
|
||||||
/* A non-zero return means cancel this substitution. Instead, copy the
|
/* A non-zero return means cancel this substitution. Instead, copy the
|
||||||
matched string fragment. */
|
matched string fragment. */
|
||||||
|
|
||||||
if (rc != 0)
|
if (rc != 0)
|
||||||
{
|
{
|
||||||
PCRE2_SIZE newlength = scb.output_offsets[1] - scb.output_offsets[0];
|
PCRE2_SIZE newlength = scb.output_offsets[1] - scb.output_offsets[0];
|
||||||
PCRE2_SIZE oldlength = ovector[1] - ovector[0];
|
PCRE2_SIZE oldlength = ovector[1] - ovector[0];
|
||||||
|
|
||||||
buff_offset -= newlength;
|
buff_offset -= newlength;
|
||||||
lengthleft += newlength;
|
lengthleft += newlength;
|
||||||
CHECKMEMCPY(subject + ovector[0], oldlength);
|
CHECKMEMCPY(subject + ovector[0], oldlength);
|
||||||
|
|
||||||
/* A negative return means do not do any more. */
|
/* A negative return means do not do any more. */
|
||||||
|
|
||||||
if (rc < 0) suboptions &= (~PCRE2_SUBSTITUTE_GLOBAL);
|
if (rc < 0) suboptions &= (~PCRE2_SUBSTITUTE_GLOBAL);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Save the details of this match. See above for how this data is used. If we
|
/* Save the details of this match. See above for how this data is used. If we
|
||||||
matched an empty string, do the magic for global matches. Finally, update the
|
matched an empty string, do the magic for global matches. Finally, update the
|
||||||
start offset to point to the rest of the subject string. */
|
start offset to point to the rest of the subject string. */
|
||||||
|
|
||||||
ovecsave[0] = ovector[0];
|
ovecsave[0] = ovector[0];
|
||||||
ovecsave[1] = ovector[1];
|
ovecsave[1] = ovector[1];
|
||||||
ovecsave[2] = start_offset;
|
ovecsave[2] = start_offset;
|
||||||
|
|
||||||
goptions = (ovector[0] != ovector[1] || ovector[0] > start_offset)? 0 :
|
goptions = (ovector[0] != ovector[1] || ovector[0] > start_offset)? 0 :
|
||||||
PCRE2_ANCHORED|PCRE2_NOTEMPTY_ATSTART;
|
PCRE2_ANCHORED|PCRE2_NOTEMPTY_ATSTART;
|
||||||
start_offset = ovector[1];
|
start_offset = ovector[1];
|
||||||
|
|
|
@ -503,13 +503,14 @@ so many of them that they are split into two fields. */
|
||||||
#define CTL2_SUBSTITUTE_CALLOUT 0x00000001u
|
#define CTL2_SUBSTITUTE_CALLOUT 0x00000001u
|
||||||
#define CTL2_SUBSTITUTE_EXTENDED 0x00000002u
|
#define CTL2_SUBSTITUTE_EXTENDED 0x00000002u
|
||||||
#define CTL2_SUBSTITUTE_LITERAL 0x00000004u
|
#define CTL2_SUBSTITUTE_LITERAL 0x00000004u
|
||||||
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000008u
|
#define CTL2_SUBSTITUTE_MATCHED 0x00000008u
|
||||||
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000010u
|
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000010u
|
||||||
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000020u
|
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000020u
|
||||||
#define CTL2_SUBJECT_LITERAL 0x00000040u
|
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000040u
|
||||||
#define CTL2_CALLOUT_NO_WHERE 0x00000080u
|
#define CTL2_SUBJECT_LITERAL 0x00000080u
|
||||||
#define CTL2_CALLOUT_EXTRA 0x00000100u
|
#define CTL2_CALLOUT_NO_WHERE 0x00000100u
|
||||||
#define CTL2_ALLVECTOR 0x00000200u
|
#define CTL2_CALLOUT_EXTRA 0x00000200u
|
||||||
|
#define CTL2_ALLVECTOR 0x00000400u
|
||||||
|
|
||||||
#define CTL2_NL_SET 0x40000000u /* Informational */
|
#define CTL2_NL_SET 0x40000000u /* Informational */
|
||||||
#define CTL2_BSR_SET 0x80000000u /* Informational */
|
#define CTL2_BSR_SET 0x80000000u /* Informational */
|
||||||
|
@ -532,6 +533,7 @@ different things in the two cases. */
|
||||||
#define CTL2_ALLPD (CTL2_SUBSTITUTE_CALLOUT|\
|
#define CTL2_ALLPD (CTL2_SUBSTITUTE_CALLOUT|\
|
||||||
CTL2_SUBSTITUTE_EXTENDED|\
|
CTL2_SUBSTITUTE_EXTENDED|\
|
||||||
CTL2_SUBSTITUTE_LITERAL|\
|
CTL2_SUBSTITUTE_LITERAL|\
|
||||||
|
CTL2_SUBSTITUTE_MATCHED|\
|
||||||
CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\
|
CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\
|
||||||
CTL2_SUBSTITUTE_UNKNOWN_UNSET|\
|
CTL2_SUBSTITUTE_UNKNOWN_UNSET|\
|
||||||
CTL2_SUBSTITUTE_UNSET_EMPTY|\
|
CTL2_SUBSTITUTE_UNSET_EMPTY|\
|
||||||
|
@ -721,6 +723,7 @@ static modstruct modlist[] = {
|
||||||
{ "substitute_callout", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_CALLOUT, PO(control2) },
|
{ "substitute_callout", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_CALLOUT, PO(control2) },
|
||||||
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
|
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
|
||||||
{ "substitute_literal", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_LITERAL, PO(control2) },
|
{ "substitute_literal", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_LITERAL, PO(control2) },
|
||||||
|
{ "substitute_matched", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_MATCHED, PO(control2) },
|
||||||
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
|
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
|
||||||
{ "substitute_skip", MOD_PND, MOD_INT, 0, PO(substitute_skip) },
|
{ "substitute_skip", MOD_PND, MOD_INT, 0, PO(substitute_skip) },
|
||||||
{ "substitute_stop", MOD_PND, MOD_INT, 0, PO(substitute_stop) },
|
{ "substitute_stop", MOD_PND, MOD_INT, 0, PO(substitute_stop) },
|
||||||
|
@ -4088,7 +4091,7 @@ Returns: nothing
|
||||||
static void
|
static void
|
||||||
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||||
before,
|
before,
|
||||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||||
|
@ -4127,6 +4130,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
|
||||||
((controls2 & CTL2_SUBSTITUTE_CALLOUT) != 0)? " substitute_callout" : "",
|
((controls2 & CTL2_SUBSTITUTE_CALLOUT) != 0)? " substitute_callout" : "",
|
||||||
((controls2 & CTL2_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
|
((controls2 & CTL2_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
|
||||||
((controls2 & CTL2_SUBSTITUTE_LITERAL) != 0)? " substitute_literal" : "",
|
((controls2 & CTL2_SUBSTITUTE_LITERAL) != 0)? " substitute_literal" : "",
|
||||||
|
((controls2 & CTL2_SUBSTITUTE_MATCHED) != 0)? " substitute_matched" : "",
|
||||||
((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
|
((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
|
||||||
((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "",
|
((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "",
|
||||||
((controls2 & CTL2_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
|
((controls2 & CTL2_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
|
||||||
|
@ -7232,6 +7236,7 @@ if (dat_datctl.replacement[0] != 0)
|
||||||
uint8_t rbuffer[REPLACE_BUFFSIZE];
|
uint8_t rbuffer[REPLACE_BUFFSIZE];
|
||||||
uint8_t nbuffer[REPLACE_BUFFSIZE];
|
uint8_t nbuffer[REPLACE_BUFFSIZE];
|
||||||
uint32_t xoptions;
|
uint32_t xoptions;
|
||||||
|
uint32_t emoption; /* External match option */
|
||||||
PCRE2_SIZE j, rlen, nsize, erroroffset;
|
PCRE2_SIZE j, rlen, nsize, erroroffset;
|
||||||
BOOL badutf = FALSE;
|
BOOL badutf = FALSE;
|
||||||
|
|
||||||
|
@ -7252,11 +7257,25 @@ if (dat_datctl.replacement[0] != 0)
|
||||||
|
|
||||||
if (timeitm)
|
if (timeitm)
|
||||||
fprintf(outfile, "** Timing is not supported with replace: ignored\n");
|
fprintf(outfile, "** Timing is not supported with replace: ignored\n");
|
||||||
|
|
||||||
if ((dat_datctl.control & CTL_ALTGLOBAL) != 0)
|
if ((dat_datctl.control & CTL_ALTGLOBAL) != 0)
|
||||||
fprintf(outfile, "** Altglobal is not supported with replace: ignored\n");
|
fprintf(outfile, "** Altglobal is not supported with replace: ignored\n");
|
||||||
|
|
||||||
xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
|
/* Check for a test that does substitution after an initial external match.
|
||||||
|
If this is set, we run the external match, but leave the interpretation of
|
||||||
|
its output to pcre2_substitute(). */
|
||||||
|
|
||||||
|
emoption = ((dat_datctl.control2 & CTL2_SUBSTITUTE_MATCHED) == 0)? 0 :
|
||||||
|
PCRE2_SUBSTITUTE_MATCHED;
|
||||||
|
|
||||||
|
if (emoption != 0)
|
||||||
|
{
|
||||||
|
PCRE2_MATCH(rc, compiled_code, pp, arg_ulen, dat_datctl.offset,
|
||||||
|
dat_datctl.options, match_data, use_dat_context);
|
||||||
|
}
|
||||||
|
|
||||||
|
xoptions = emoption |
|
||||||
|
(((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
|
||||||
PCRE2_SUBSTITUTE_GLOBAL) |
|
PCRE2_SUBSTITUTE_GLOBAL) |
|
||||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_EXTENDED) == 0)? 0 :
|
(((dat_datctl.control2 & CTL2_SUBSTITUTE_EXTENDED) == 0)? 0 :
|
||||||
PCRE2_SUBSTITUTE_EXTENDED) |
|
PCRE2_SUBSTITUTE_EXTENDED) |
|
||||||
|
@ -7268,7 +7287,7 @@ if (dat_datctl.replacement[0] != 0)
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET) |
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET) |
|
||||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :
|
(((dat_datctl.control2 & CTL2_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :
|
||||||
PCRE2_SUBSTITUTE_UNSET_EMPTY);
|
PCRE2_SUBSTITUTE_UNSET_EMPTY);
|
||||||
|
|
||||||
SETCASTPTR(r, rbuffer); /* Sets r8, r16, or r32, as appropriate. */
|
SETCASTPTR(r, rbuffer); /* Sets r8, r16, or r32, as appropriate. */
|
||||||
pr = dat_datctl.replacement;
|
pr = dat_datctl.replacement;
|
||||||
|
|
||||||
|
|
|
@ -4640,6 +4640,13 @@ B)x/alt_verbnames,mark
|
||||||
|
|
||||||
/(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1
|
/(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1
|
||||||
aaBB
|
aaBB
|
||||||
|
|
||||||
|
/abcd/replace=wxyz,substitute_matched
|
||||||
|
abcd
|
||||||
|
pqrs
|
||||||
|
|
||||||
|
/abcd/g
|
||||||
|
>abcd1234abcd5678<\=replace=wxyz,substitute_matched
|
||||||
|
|
||||||
/^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I
|
/^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I
|
||||||
|
|
||||||
|
|
|
@ -14859,6 +14859,16 @@ Failed: error -55 at offset 3 in replacement: requested value is not set
|
||||||
/(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1
|
/(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1
|
||||||
aaBB
|
aaBB
|
||||||
1: AAbbaa..AAbBaa
|
1: AAbbaa..AAbBaa
|
||||||
|
|
||||||
|
/abcd/replace=wxyz,substitute_matched
|
||||||
|
abcd
|
||||||
|
1: wxyz
|
||||||
|
pqrs
|
||||||
|
0: pqrs
|
||||||
|
|
||||||
|
/abcd/g
|
||||||
|
>abcd1234abcd5678<\=replace=wxyz,substitute_matched
|
||||||
|
2: >wxyz1234wxyz5678<
|
||||||
|
|
||||||
/^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I
|
/^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I
|
||||||
Capture group count = 2
|
Capture group count = 2
|
||||||
|
|
Loading…
Reference in New Issue