Implement PCRE2_SUBSTITUTE_REPLACEMENT_ONLY.
This commit is contained in:
parent
7171d86587
commit
e8d70e2459
|
@ -41,6 +41,8 @@ the minimum.
|
||||||
|
|
||||||
10. Fix *THEN verbs in lookahead assertions in JIT.
|
10. Fix *THEN verbs in lookahead assertions in JIT.
|
||||||
|
|
||||||
|
11. Added PCRE2_SUBSTITUTE_REPLACEMENT_ONLY.
|
||||||
|
|
||||||
|
|
||||||
Version 10.34 21-November-2019
|
Version 10.34 21-November-2019
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
|
@ -82,6 +82,7 @@ zero-terminated strings. The options are:
|
||||||
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
|
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
|
||||||
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
|
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
|
||||||
|
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY Return only replacement string(s)
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
|
||||||
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
|
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
|
||||||
</pre>
|
</pre>
|
||||||
|
|
|
@ -3305,10 +3305,11 @@ same number causes an error at compile time.
|
||||||
This function optionally calls <b>pcre2_match()</b> and then makes a copy of the
|
This function optionally calls <b>pcre2_match()</b> and then makes a copy of the
|
||||||
subject string in <i>outputbuffer</i>, replacing parts that were matched with
|
subject string in <i>outputbuffer</i>, replacing parts that were matched with
|
||||||
the <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This
|
the <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This
|
||||||
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
|
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
|
||||||
is to perform just one replacement if the pattern matches, but there is an
|
option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
|
||||||
option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
|
replacement string(s). The default action is to perform just one replacement if
|
||||||
for details).
|
the pattern matches, but there is an option that requests multiple replacements
|
||||||
|
(see PCRE2_SUBSTITUTE_GLOBAL below for details).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If successful, <b>pcre2_substitute()</b> returns the number of substitutions
|
If successful, <b>pcre2_substitute()</b> returns the number of substitutions
|
||||||
|
@ -3349,10 +3350,19 @@ an application to check for a match before choosing to substitute, without
|
||||||
having to repeat the match.
|
having to repeat the match.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <i>code</i> argument is not used for the first substitution, but if
|
The <i>code</i> argument is not used for the first substitution when
|
||||||
PCRE2_SUBSTITUTE_GLOBAL is set, <b>pcre2_match()</b> will be called after the
|
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also set,
|
||||||
first substitution to check for further matches, and the contents of the
|
<b>pcre2_match()</b> will be called after the first substitution to check for
|
||||||
<i>match_data</i> block will be changed.
|
further matches, and the contents of the <i>match_data</i> block will be
|
||||||
|
changed.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The default is to return a copy of the subject string with matched substrings
|
||||||
|
replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the
|
||||||
|
replacement substrings are returned. In the global case, multiple replacements
|
||||||
|
are concatenated in the output buffer. Substitution callouts (see
|
||||||
|
<a href="#subcallouts">below)</a>
|
||||||
|
can be used to separate them if necessary.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
|
The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
|
||||||
|
@ -3560,7 +3570,7 @@ As for all PCRE2 errors, a text message that describes the error can be
|
||||||
obtained by calling the <b>pcre2_get_error_message()</b> function (see
|
obtained by calling the <b>pcre2_get_error_message()</b> function (see
|
||||||
"Obtaining a textual error message"
|
"Obtaining a textual error message"
|
||||||
<a href="#geterrormessage">above).</a>
|
<a href="#geterrormessage">above).</a>
|
||||||
</P>
|
<a name="subcallouts"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Substitution callouts
|
Substitution callouts
|
||||||
</b><br>
|
</b><br>
|
||||||
|
@ -3897,9 +3907,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 27 December 2019
|
Last updated: 22 January 2020
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2020 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -1050,25 +1050,27 @@ modifier list, in which case they are applied to every subject line that is
|
||||||
processed with that pattern. These modifiers do not affect the compilation
|
processed with that pattern. These modifiers do not affect the compilation
|
||||||
process.
|
process.
|
||||||
<pre>
|
<pre>
|
||||||
aftertext show text after match
|
aftertext show text after match
|
||||||
allaftertext show text after captures
|
allaftertext show text after captures
|
||||||
allcaptures show all captures
|
allcaptures show all captures
|
||||||
allvector show the entire ovector
|
allvector show the entire ovector
|
||||||
allusedtext show all consulted text
|
allusedtext show all consulted text
|
||||||
altglobal alternative global matching
|
altglobal alternative global matching
|
||||||
/g global global matching
|
/g global global matching
|
||||||
jitstack=<n> set size of JIT stack
|
jitstack=<n> set size of JIT stack
|
||||||
mark show mark values
|
mark show mark values
|
||||||
replace=<string> specify a replacement string
|
replace=<string> specify a replacement string
|
||||||
startchar show starting character when relevant
|
startchar show starting character when relevant
|
||||||
substitute_callout use substitution callouts
|
substitute_callout use substitution callouts
|
||||||
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
|
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
|
||||||
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
||||||
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
|
||||||
substitute_skip=<n> skip substitution number n
|
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||||
substitute_stop=<n> skip substitution number n and greater
|
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
|
||||||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
substitute_skip=<n> skip substitution <n>
|
||||||
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
|
substitute_stop=<n> skip substitution <n> and following
|
||||||
|
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||||
|
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||||
</pre>
|
</pre>
|
||||||
These modifiers may not appear in a <b>#pattern</b> command. If you want them as
|
These modifiers may not appear in a <b>#pattern</b> command. If you want them as
|
||||||
defaults, set them in a <b>#subject</b> command.
|
defaults, set them in a <b>#subject</b> command.
|
||||||
|
@ -1235,7 +1237,9 @@ pattern.
|
||||||
substitute_callout use substitution callouts
|
substitute_callout use substitution callouts
|
||||||
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
|
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
|
||||||
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
||||||
|
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
|
||||||
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||||
|
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
|
||||||
substitute_skip=<n> skip substitution number n
|
substitute_skip=<n> skip substitution number n
|
||||||
substitute_stop=<n> skip substitution number n and greater
|
substitute_stop=<n> skip substitution number n and greater
|
||||||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||||
|
@ -1397,9 +1401,10 @@ Testing the substitution function
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
|
If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
|
||||||
called instead of one of the matching functions. Note that replacement strings
|
called instead of one of the matching functions (or after one call of
|
||||||
cannot contain commas, because a comma signifies the end of a modifier. This is
|
<b>pcre2_match()</b> in the case of PCRE2_SUBSTITUTE_MATCHED). Note that
|
||||||
not thought to be an issue in a test program.
|
replacement strings cannot contain commas, because a comma signifies the end of
|
||||||
|
a modifier. This is not thought to be an issue in a test program.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Unlike subject strings, <b>pcre2test</b> does not process replacement strings
|
Unlike subject strings, <b>pcre2test</b> does not process replacement strings
|
||||||
|
@ -1416,11 +1421,15 @@ for <b>pcre2_substitute()</b>:
|
||||||
global PCRE2_SUBSTITUTE_GLOBAL
|
global PCRE2_SUBSTITUTE_GLOBAL
|
||||||
substitute_extended PCRE2_SUBSTITUTE_EXTENDED
|
substitute_extended PCRE2_SUBSTITUTE_EXTENDED
|
||||||
substitute_literal PCRE2_SUBSTITUTE_LITERAL
|
substitute_literal PCRE2_SUBSTITUTE_LITERAL
|
||||||
|
substitute_matched PCRE2_SUBSTITUTE_MATCHED
|
||||||
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||||
|
substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
|
||||||
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||||
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
|
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||||
|
</pre>
|
||||||
</PRE>
|
See the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
documentation for details of these options.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
After a successful substitution, the modified string is output, preceded by the
|
After a successful substitution, the modified string is output, preceded by the
|
||||||
|
@ -2096,9 +2105,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 26 December 2019
|
Last updated: 22 January 2020
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2020 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
397
doc/pcre2.txt
397
doc/pcre2.txt
|
@ -3196,10 +3196,12 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
This function optionally calls pcre2_match() and then makes a copy of
|
This function optionally calls pcre2_match() and then makes a copy of
|
||||||
the subject string in outputbuffer, replacing parts that were matched
|
the subject string in outputbuffer, replacing parts that were matched
|
||||||
with the replacement string, whose length is supplied in rlength. This
|
with the replacement string, whose length is supplied in rlength. This
|
||||||
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The
|
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
|
||||||
default is to perform just one replacement if the pattern matches, but
|
There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to re-
|
||||||
there is an option that requests multiple replacements (see PCRE2_SUB-
|
turn just the replacement string(s). The default action is to perform
|
||||||
STITUTE_GLOBAL below for details).
|
just one replacement if the pattern matches, but there is an option
|
||||||
|
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
|
||||||
|
for details).
|
||||||
|
|
||||||
If successful, pcre2_substitute() returns the number of substitutions
|
If successful, pcre2_substitute() returns the number of substitutions
|
||||||
that were carried out. This may be zero if no match was found, and is
|
that were carried out. This may be zero if no match was found, and is
|
||||||
|
@ -3234,53 +3236,60 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
application to check for a match before choosing to substitute, without
|
application to check for a match before choosing to substitute, without
|
||||||
having to repeat the match.
|
having to repeat the match.
|
||||||
|
|
||||||
The code argument is not used for the first substitution, but if
|
The code argument is not used for the first substitution when
|
||||||
PCRE2_SUBSTITUTE_GLOBAL is set, pcre2_match() will be called after the
|
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also
|
||||||
first substitution to check for further matches, and the contents of
|
set, pcre2_match() will be called after the first substitution to check
|
||||||
the match_data block will be changed.
|
for further matches, and the contents of the match_data block will be
|
||||||
|
changed.
|
||||||
|
|
||||||
The outlengthptr argument of pcre2_substitute() must point to a vari-
|
The default is to return a copy of the subject string with matched sub-
|
||||||
able that contains the length, in code units, of the output buffer. If
|
strings replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set,
|
||||||
the function is successful, the value is updated to contain the length
|
only the replacement substrings are returned. In the global case, mul-
|
||||||
of the new string, excluding the trailing zero that is automatically
|
tiple replacements are concatenated in the output buffer. Substitution
|
||||||
|
callouts (see below) can be used to separate them if necessary.
|
||||||
|
|
||||||
|
The outlengthptr argument of pcre2_substitute() must point to a vari-
|
||||||
|
able that contains the length, in code units, of the output buffer. If
|
||||||
|
the function is successful, the value is updated to contain the length
|
||||||
|
of the new string, excluding the trailing zero that is automatically
|
||||||
added.
|
added.
|
||||||
|
|
||||||
If the function is not successful, the value set via outlengthptr de-
|
If the function is not successful, the value set via outlengthptr de-
|
||||||
pends on the type of error. For syntax errors in the replacement
|
pends on the type of error. For syntax errors in the replacement
|
||||||
string, the value is the offset in the replacement string where the er-
|
string, the value is the offset in the replacement string where the er-
|
||||||
ror was detected. For other errors, the value is PCRE2_UNSET by de-
|
ror was detected. For other errors, the value is PCRE2_UNSET by de-
|
||||||
fault. This includes the case of the output buffer being too small, un-
|
fault. This includes the case of the output buffer being too small, un-
|
||||||
less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which case
|
less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which case
|
||||||
the value is the minimum length needed, including space for the trail-
|
the value is the minimum length needed, including space for the trail-
|
||||||
ing zero. Note that in order to compute the required length, pcre2_sub-
|
ing zero. Note that in order to compute the required length, pcre2_sub-
|
||||||
stitute() has to simulate all the matching and copying, instead of giv-
|
stitute() has to simulate all the matching and copying, instead of giv-
|
||||||
ing an error return as soon as the buffer overflows. Note also that the
|
ing an error return as soon as the buffer overflows. Note also that the
|
||||||
length is in code units, not bytes.
|
length is in code units, not bytes.
|
||||||
|
|
||||||
The replacement string, which is interpreted as a UTF string in UTF
|
The replacement string, which is interpreted as a UTF string in UTF
|
||||||
mode, is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option
|
mode, is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option
|
||||||
is set. If the PCRE2_SUBSTITUTE_LITERAL option is set, it is not inter-
|
is set. If the PCRE2_SUBSTITUTE_LITERAL option is set, it is not inter-
|
||||||
preted in any way. By default, however, a dollar character is an escape
|
preted in any way. By default, however, a dollar character is an escape
|
||||||
character that can specify the insertion of characters from capture
|
character that can specify the insertion of characters from capture
|
||||||
groups and names from (*MARK) or other control verbs in the pattern.
|
groups and names from (*MARK) or other control verbs in the pattern.
|
||||||
The following forms are always recognized:
|
The following forms are always recognized:
|
||||||
|
|
||||||
$$ insert a dollar character
|
$$ insert a dollar character
|
||||||
$<n> or ${<n>} insert the contents of group <n>
|
$<n> or ${<n>} insert the contents of group <n>
|
||||||
$*MARK or ${*MARK} insert a control verb name
|
$*MARK or ${*MARK} insert a control verb name
|
||||||
|
|
||||||
Either a group number or a group name can be given for <n>. Curly
|
Either a group number or a group name can be given for <n>. Curly
|
||||||
brackets are required only if the following character would be inter-
|
brackets are required only if the following character would be inter-
|
||||||
preted as part of the number or name. The number may be zero to include
|
preted as part of the number or name. The number may be zero to include
|
||||||
the entire matched string. For example, if the pattern a(b)c is
|
the entire matched string. For example, if the pattern a(b)c is
|
||||||
matched with "=abc=" and the replacement string "+$1$0$1+", the result
|
matched with "=abc=" and the replacement string "+$1$0$1+", the result
|
||||||
is "=+babcb+=".
|
is "=+babcb+=".
|
||||||
|
|
||||||
$*MARK inserts the name from the last encountered backtracking control
|
$*MARK inserts the name from the last encountered backtracking control
|
||||||
verb on the matching path that has a name. (*MARK) must always include
|
verb on the matching path that has a name. (*MARK) must always include
|
||||||
a name, but the other verbs need not. For example, in the case of
|
a name, but the other verbs need not. For example, in the case of
|
||||||
(*MARK:A)(*PRUNE) the name inserted is "A", but for (*MARK:A)(*PRUNE:B)
|
(*MARK:A)(*PRUNE) the name inserted is "A", but for (*MARK:A)(*PRUNE:B)
|
||||||
the relevant name is "B". This facility can be used to perform simple
|
the relevant name is "B". This facility can be used to perform simple
|
||||||
simultaneous substitutions, as this pcre2test example shows:
|
simultaneous substitutions, as this pcre2test example shows:
|
||||||
|
|
||||||
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
||||||
|
@ -3288,15 +3297,15 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
2: pear orange
|
2: pear orange
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
|
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
|
||||||
string, replacing every matching substring. If this option is not set,
|
string, replacing every matching substring. If this option is not set,
|
||||||
only the first matching substring is replaced. The search for matches
|
only the first matching substring is replaced. The search for matches
|
||||||
takes place in the original subject string (that is, previous replace-
|
takes place in the original subject string (that is, previous replace-
|
||||||
ments do not affect it). Iteration is implemented by advancing the
|
ments do not affect it). Iteration is implemented by advancing the
|
||||||
startoffset value for each search, which is always passed the entire
|
startoffset value for each search, which is always passed the entire
|
||||||
subject string. If an offset limit is set in the match context, search-
|
subject string. If an offset limit is set in the match context, search-
|
||||||
ing stops when that limit is reached.
|
ing stops when that limit is reached.
|
||||||
|
|
||||||
You can restrict the effect of a global substitution to a portion of
|
You can restrict the effect of a global substitution to a portion of
|
||||||
the subject string by setting either or both of startoffset and an off-
|
the subject string by setting either or both of startoffset and an off-
|
||||||
set limit. Here is a pcre2test example:
|
set limit. Here is a pcre2test example:
|
||||||
|
|
||||||
|
@ -3304,87 +3313,87 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
ABC ABC ABC ABC\=offset=3,offset_limit=12
|
ABC ABC ABC ABC\=offset=3,offset_limit=12
|
||||||
2: ABC A!C A!C ABC
|
2: ABC A!C A!C ABC
|
||||||
|
|
||||||
When continuing with global substitutions after matching a substring
|
When continuing with global substitutions after matching a substring
|
||||||
with zero length, an attempt to find a non-empty match at the same off-
|
with zero length, an attempt to find a non-empty match at the same off-
|
||||||
set is performed. If this is not successful, the offset is advanced by
|
set is performed. If this is not successful, the offset is advanced by
|
||||||
one character except when CRLF is a valid newline sequence and the next
|
one character except when CRLF is a valid newline sequence and the next
|
||||||
two characters are CR, LF. In this case, the offset is advanced by two
|
two characters are CR, LF. In this case, the offset is advanced by two
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
||||||
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
|
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
|
||||||
ORY immediately. If this option is set, however, pcre2_substitute()
|
ORY immediately. If this option is set, however, pcre2_substitute()
|
||||||
continues to go through the motions of matching and substituting (with-
|
continues to go through the motions of matching and substituting (with-
|
||||||
out, of course, writing anything) in order to compute the size of buf-
|
out, of course, writing anything) in order to compute the size of buf-
|
||||||
fer that is needed. This value is passed back via the outlengthptr
|
fer that is needed. This value is passed back via the outlengthptr
|
||||||
variable, with the result of the function still being PCRE2_ER-
|
variable, with the result of the function still being PCRE2_ER-
|
||||||
ROR_NOMEMORY.
|
ROR_NOMEMORY.
|
||||||
|
|
||||||
Passing a buffer size of zero is a permitted way of finding out how
|
Passing a buffer size of zero is a permitted way of finding out how
|
||||||
much memory is needed for given substitution. However, this does mean
|
much memory is needed for given substitution. However, this does mean
|
||||||
that the entire operation is carried out twice. Depending on the appli-
|
that the entire operation is carried out twice. Depending on the appli-
|
||||||
cation, it may be more efficient to allocate a large buffer and free
|
cation, it may be more efficient to allocate a large buffer and free
|
||||||
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
||||||
FLOW_LENGTH.
|
FLOW_LENGTH.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
|
||||||
do not appear in the pattern to be treated as unset groups. This option
|
do not appear in the pattern to be treated as unset groups. This option
|
||||||
should be used with care, because it means that a typo in a group name
|
should be used with care, because it means that a typo in a group name
|
||||||
or number no longer causes the PCRE2_ERROR_NOSUBSTRING error.
|
or number no longer causes the PCRE2_ERROR_NOSUBSTRING error.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including un-
|
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including un-
|
||||||
known groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated
|
known groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated
|
||||||
as empty strings when inserted as described above. If this option is
|
as empty strings when inserted as described above. If this option is
|
||||||
not set, an attempt to insert an unset group causes the PCRE2_ERROR_UN-
|
not set, an attempt to insert an unset group causes the PCRE2_ERROR_UN-
|
||||||
SET error. This option does not influence the extended substitution
|
SET error. This option does not influence the extended substitution
|
||||||
syntax described below.
|
syntax described below.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
|
PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
|
||||||
replacement string. Without this option, only the dollar character is
|
replacement string. Without this option, only the dollar character is
|
||||||
special, and only the group insertion forms listed above are valid.
|
special, and only the group insertion forms listed above are valid.
|
||||||
When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
|
When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
|
||||||
|
|
||||||
Firstly, backslash in a replacement string is interpreted as an escape
|
Firstly, backslash in a replacement string is interpreted as an escape
|
||||||
character. The usual forms such as \n or \x{ddd} can be used to specify
|
character. The usual forms such as \n or \x{ddd} can be used to specify
|
||||||
particular character codes, and backslash followed by any non-alphanu-
|
particular character codes, and backslash followed by any non-alphanu-
|
||||||
meric character quotes that character. Extended quoting can be coded
|
meric character quotes that character. Extended quoting can be coded
|
||||||
using \Q...\E, exactly as in pattern strings.
|
using \Q...\E, exactly as in pattern strings.
|
||||||
|
|
||||||
There are also four escape sequences for forcing the case of inserted
|
There are also four escape sequences for forcing the case of inserted
|
||||||
letters. The insertion mechanism has three states: no case forcing,
|
letters. The insertion mechanism has three states: no case forcing,
|
||||||
force upper case, and force lower case. The escape sequences change the
|
force upper case, and force lower case. The escape sequences change the
|
||||||
current state: \U and \L change to upper or lower case forcing, respec-
|
current state: \U and \L change to upper or lower case forcing, respec-
|
||||||
tively, and \E (when not terminating a \Q quoted sequence) reverts to
|
tively, and \E (when not terminating a \Q quoted sequence) reverts to
|
||||||
no case forcing. The sequences \u and \l force the next character (if
|
no case forcing. The sequences \u and \l force the next character (if
|
||||||
it is a letter) to upper or lower case, respectively, and then the
|
it is a letter) to upper or lower case, respectively, and then the
|
||||||
state automatically reverts to no case forcing. Case forcing applies to
|
state automatically reverts to no case forcing. Case forcing applies to
|
||||||
all inserted characters, including those from capture groups and let-
|
all inserted characters, including those from capture groups and let-
|
||||||
ters within \Q...\E quoted sequences.
|
ters within \Q...\E quoted sequences.
|
||||||
|
|
||||||
Note that case forcing sequences such as \U...\E do not nest. For exam-
|
Note that case forcing sequences such as \U...\E do not nest. For exam-
|
||||||
ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
|
ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
|
||||||
\E has no effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EX-
|
\E has no effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EX-
|
||||||
TRA_ALT_BSUX options do not apply to replacement strings.
|
TRA_ALT_BSUX options do not apply to replacement strings.
|
||||||
|
|
||||||
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
||||||
flexibility to capture group substitution. The syntax is similar to
|
flexibility to capture group substitution. The syntax is similar to
|
||||||
that used by Bash:
|
that used by Bash:
|
||||||
|
|
||||||
${<n>:-<string>}
|
${<n>:-<string>}
|
||||||
${<n>:+<string1>:<string2>}
|
${<n>:+<string1>:<string2>}
|
||||||
|
|
||||||
As before, <n> may be a group number or a name. The first form speci-
|
As before, <n> may be a group number or a name. The first form speci-
|
||||||
fies a default value. If group <n> is set, its value is inserted; if
|
fies a default value. If group <n> is set, its value is inserted; if
|
||||||
not, <string> is expanded and the result inserted. The second form
|
not, <string> is expanded and the result inserted. The second form
|
||||||
specifies strings that are expanded and inserted when group <n> is set
|
specifies strings that are expanded and inserted when group <n> is set
|
||||||
or unset, respectively. The first form is just a convenient shorthand
|
or unset, respectively. The first form is just a convenient shorthand
|
||||||
for
|
for
|
||||||
|
|
||||||
${<n>:+${<n>}:<string>}
|
${<n>:+${<n>}:<string>}
|
||||||
|
|
||||||
Backslash can be used to escape colons and closing curly brackets in
|
Backslash can be used to escape colons and closing curly brackets in
|
||||||
the replacement strings. A change of the case forcing state within a
|
the replacement strings. A change of the case forcing state within a
|
||||||
replacement string remains in force afterwards, as shown in this
|
replacement string remains in force afterwards, as shown in this
|
||||||
pcre2test example:
|
pcre2test example:
|
||||||
|
|
||||||
/(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
|
/(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
|
||||||
|
@ -3393,8 +3402,8 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
somebody
|
somebody
|
||||||
1: HELLO
|
1: HELLO
|
||||||
|
|
||||||
The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
||||||
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause un-
|
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause un-
|
||||||
known groups in the extended syntax forms to be treated as unset.
|
known groups in the extended syntax forms to be treated as unset.
|
||||||
|
|
||||||
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
|
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
|
||||||
|
@ -3403,37 +3412,37 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
|
|
||||||
Substitution errors
|
Substitution errors
|
||||||
|
|
||||||
In the event of an error, pcre2_substitute() returns a negative error
|
In the event of an error, pcre2_substitute() returns a negative error
|
||||||
code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors
|
code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors
|
||||||
from pcre2_match() are passed straight back.
|
from pcre2_match() are passed straight back.
|
||||||
|
|
||||||
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
|
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
|
||||||
tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
|
tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
|
||||||
|
|
||||||
PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
|
PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
|
||||||
ing an unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
|
ing an unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
|
||||||
when the simple (non-extended) syntax is used and PCRE2_SUBSTITUTE_UN-
|
when the simple (non-extended) syntax is used and PCRE2_SUBSTITUTE_UN-
|
||||||
SET_EMPTY is not set.
|
SET_EMPTY is not set.
|
||||||
|
|
||||||
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big
|
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big
|
||||||
enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
|
enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
|
||||||
of buffer that is needed is returned via outlengthptr. Note that this
|
of buffer that is needed is returned via outlengthptr. Note that this
|
||||||
does not happen by default.
|
does not happen by default.
|
||||||
|
|
||||||
PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
|
PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
|
||||||
match_data argument is NULL.
|
match_data argument is NULL.
|
||||||
|
|
||||||
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in
|
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in
|
||||||
the replacement string, with more particular errors being PCRE2_ER-
|
the replacement string, with more particular errors being PCRE2_ER-
|
||||||
ROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE
|
ROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE
|
||||||
(closing curly bracket not found), PCRE2_ERROR_BADSUBSTITUTION (syntax
|
(closing curly bracket not found), PCRE2_ERROR_BADSUBSTITUTION (syntax
|
||||||
error in extended group substitution), and PCRE2_ERROR_BADSUBSPATTERN
|
error in extended group substitution), and PCRE2_ERROR_BADSUBSPATTERN
|
||||||
(the pattern match ended before it started or the match started earlier
|
(the pattern match ended before it started or the match started earlier
|
||||||
than the current position in the subject, which can happen if \K is
|
than the current position in the subject, which can happen if \K is
|
||||||
used in an assertion).
|
used in an assertion).
|
||||||
|
|
||||||
As for all PCRE2 errors, a text message that describes the error can be
|
As for all PCRE2 errors, a text message that describes the error can be
|
||||||
obtained by calling the pcre2_get_error_message() function (see "Ob-
|
obtained by calling the pcre2_get_error_message() function (see "Ob-
|
||||||
taining a textual error message" above).
|
taining a textual error message" above).
|
||||||
|
|
||||||
Substitution callouts
|
Substitution callouts
|
||||||
|
@ -3442,15 +3451,15 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
int (*callout_function)(pcre2_substitute_callout_block *, void *),
|
int (*callout_function)(pcre2_substitute_callout_block *, void *),
|
||||||
void *callout_data);
|
void *callout_data);
|
||||||
|
|
||||||
The pcre2_set_substitution_callout() function can be used to specify a
|
The pcre2_set_substitution_callout() function can be used to specify a
|
||||||
callout function for pcre2_substitute(). This information is passed in
|
callout function for pcre2_substitute(). This information is passed in
|
||||||
a match context. The callout function is called after each substitution
|
a match context. The callout function is called after each substitution
|
||||||
has been processed, but it can cause the replacement not to happen. The
|
has been processed, but it can cause the replacement not to happen. The
|
||||||
callout function is not called for simulated substitutions that happen
|
callout function is not called for simulated substitutions that happen
|
||||||
as a result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.
|
as a result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.
|
||||||
|
|
||||||
The first argument of the callout function is a pointer to a substitute
|
The first argument of the callout function is a pointer to a substitute
|
||||||
callout block structure, which contains the following fields, not nec-
|
callout block structure, which contains the following fields, not nec-
|
||||||
essarily in this order:
|
essarily in this order:
|
||||||
|
|
||||||
uint32_t version;
|
uint32_t version;
|
||||||
|
@ -3461,34 +3470,34 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
uint32_t oveccount;
|
uint32_t oveccount;
|
||||||
PCRE2_SIZE output_offsets[2];
|
PCRE2_SIZE output_offsets[2];
|
||||||
|
|
||||||
The version field contains the version number of the block format. The
|
The version field contains the version number of the block format. The
|
||||||
current version is 0. The version number will increase in future if
|
current version is 0. The version number will increase in future if
|
||||||
more fields are added, but the intention is never to remove any of the
|
more fields are added, but the intention is never to remove any of the
|
||||||
existing fields.
|
existing fields.
|
||||||
|
|
||||||
The subscount field is the number of the current match. It is 1 for the
|
The subscount field is the number of the current match. It is 1 for the
|
||||||
first callout, 2 for the second, and so on. The input and output point-
|
first callout, 2 for the second, and so on. The input and output point-
|
||||||
ers are copies of the values passed to pcre2_substitute().
|
ers are copies of the values passed to pcre2_substitute().
|
||||||
|
|
||||||
The ovector field points to the ovector, which contains the result of
|
The ovector field points to the ovector, which contains the result of
|
||||||
the most recent match. The oveccount field contains the number of pairs
|
the most recent match. The oveccount field contains the number of pairs
|
||||||
that are set in the ovector, and is always greater than zero.
|
that are set in the ovector, and is always greater than zero.
|
||||||
|
|
||||||
The output_offsets vector contains the offsets of the replacement in
|
The output_offsets vector contains the offsets of the replacement in
|
||||||
the output string. This has already been processed for dollar and (if
|
the output string. This has already been processed for dollar and (if
|
||||||
requested) backslash substitutions as described above.
|
requested) backslash substitutions as described above.
|
||||||
|
|
||||||
The second argument of the callout function is the value passed as
|
The second argument of the callout function is the value passed as
|
||||||
callout_data when the function was registered. The value returned by
|
callout_data when the function was registered. The value returned by
|
||||||
the callout function is interpreted as follows:
|
the callout function is interpreted as follows:
|
||||||
|
|
||||||
If the value is zero, the replacement is accepted, and, if PCRE2_SUB-
|
If the value is zero, the replacement is accepted, and, if PCRE2_SUB-
|
||||||
STITUTE_GLOBAL is set, processing continues with a search for the next
|
STITUTE_GLOBAL is set, processing continues with a search for the next
|
||||||
match. If the value is not zero, the current replacement is not ac-
|
match. If the value is not zero, the current replacement is not ac-
|
||||||
cepted. If the value is greater than zero, processing continues when
|
cepted. If the value is greater than zero, processing continues when
|
||||||
PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero
|
PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero
|
||||||
or PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is
|
or PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is
|
||||||
copied to the output and the call to pcre2_substitute() exits, return-
|
copied to the output and the call to pcre2_substitute() exits, return-
|
||||||
ing the number of matches so far.
|
ing the number of matches so far.
|
||||||
|
|
||||||
|
|
||||||
|
@ -3497,56 +3506,56 @@ DUPLICATE CAPTURE GROUP NAMES
|
||||||
int pcre2_substring_nametable_scan(const pcre2_code *code,
|
int pcre2_substring_nametable_scan(const pcre2_code *code,
|
||||||
PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);
|
PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);
|
||||||
|
|
||||||
When a pattern is compiled with the PCRE2_DUPNAMES option, names for
|
When a pattern is compiled with the PCRE2_DUPNAMES option, names for
|
||||||
capture groups are not required to be unique. Duplicate names are al-
|
capture groups are not required to be unique. Duplicate names are al-
|
||||||
ways allowed for groups with the same number, created by using the (?|
|
ways allowed for groups with the same number, created by using the (?|
|
||||||
feature. Indeed, if such groups are named, they are required to use the
|
feature. Indeed, if such groups are named, they are required to use the
|
||||||
same names.
|
same names.
|
||||||
|
|
||||||
Normally, patterns that use duplicate names are such that in any one
|
Normally, patterns that use duplicate names are such that in any one
|
||||||
match, only one of each set of identically-named groups participates.
|
match, only one of each set of identically-named groups participates.
|
||||||
An example is shown in the pcre2pattern documentation.
|
An example is shown in the pcre2pattern documentation.
|
||||||
|
|
||||||
When duplicates are present, pcre2_substring_copy_byname() and
|
When duplicates are present, pcre2_substring_copy_byname() and
|
||||||
pcre2_substring_get_byname() return the first substring corresponding
|
pcre2_substring_get_byname() return the first substring corresponding
|
||||||
to the given name that is set. Only if none are set is PCRE2_ERROR_UN-
|
to the given name that is set. Only if none are set is PCRE2_ERROR_UN-
|
||||||
SET is returned. The pcre2_substring_number_from_name() function re-
|
SET is returned. The pcre2_substring_number_from_name() function re-
|
||||||
turns the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are duplicate
|
turns the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are duplicate
|
||||||
names.
|
names.
|
||||||
|
|
||||||
If you want to get full details of all captured substrings for a given
|
If you want to get full details of all captured substrings for a given
|
||||||
name, you must use the pcre2_substring_nametable_scan() function. The
|
name, you must use the pcre2_substring_nametable_scan() function. The
|
||||||
first argument is the compiled pattern, and the second is the name. If
|
first argument is the compiled pattern, and the second is the name. If
|
||||||
the third and fourth arguments are NULL, the function returns a group
|
the third and fourth arguments are NULL, the function returns a group
|
||||||
number for a unique name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise.
|
number for a unique name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise.
|
||||||
|
|
||||||
When the third and fourth arguments are not NULL, they must be pointers
|
When the third and fourth arguments are not NULL, they must be pointers
|
||||||
to variables that are updated by the function. After it has run, they
|
to variables that are updated by the function. After it has run, they
|
||||||
point to the first and last entries in the name-to-number table for the
|
point to the first and last entries in the name-to-number table for the
|
||||||
given name, and the function returns the length of each entry in code
|
given name, and the function returns the length of each entry in code
|
||||||
units. In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there are
|
units. In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there are
|
||||||
no entries for the given name.
|
no entries for the given name.
|
||||||
|
|
||||||
The format of the name table is described above in the section entitled
|
The format of the name table is described above in the section entitled
|
||||||
Information about a pattern. Given all the relevant entries for the
|
Information about a pattern. Given all the relevant entries for the
|
||||||
name, you can extract each of their numbers, and hence the captured
|
name, you can extract each of their numbers, and hence the captured
|
||||||
data.
|
data.
|
||||||
|
|
||||||
|
|
||||||
FINDING ALL POSSIBLE MATCHES AT ONE POSITION
|
FINDING ALL POSSIBLE MATCHES AT ONE POSITION
|
||||||
|
|
||||||
The traditional matching function uses a similar algorithm to Perl,
|
The traditional matching function uses a similar algorithm to Perl,
|
||||||
which stops when it finds the first match at a given point in the sub-
|
which stops when it finds the first match at a given point in the sub-
|
||||||
ject. If you want to find all possible matches, or the longest possible
|
ject. If you want to find all possible matches, or the longest possible
|
||||||
match at a given position, consider using the alternative matching
|
match at a given position, consider using the alternative matching
|
||||||
function (see below) instead. If you cannot use the alternative func-
|
function (see below) instead. If you cannot use the alternative func-
|
||||||
tion, you can kludge it up by making use of the callout facility, which
|
tion, you can kludge it up by making use of the callout facility, which
|
||||||
is described in the pcre2callout documentation.
|
is described in the pcre2callout documentation.
|
||||||
|
|
||||||
What you have to do is to insert a callout right at the end of the pat-
|
What you have to do is to insert a callout right at the end of the pat-
|
||||||
tern. When your callout function is called, extract and save the cur-
|
tern. When your callout function is called, extract and save the cur-
|
||||||
rent matched substring. Then return 1, which forces pcre2_match() to
|
rent matched substring. Then return 1, which forces pcre2_match() to
|
||||||
backtrack and try other alternatives. Ultimately, when it runs out of
|
backtrack and try other alternatives. Ultimately, when it runs out of
|
||||||
matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.
|
matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.
|
||||||
|
|
||||||
|
|
||||||
|
@ -3558,26 +3567,26 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
pcre2_match_context *mcontext,
|
pcre2_match_context *mcontext,
|
||||||
int *workspace, PCRE2_SIZE wscount);
|
int *workspace, PCRE2_SIZE wscount);
|
||||||
|
|
||||||
The function pcre2_dfa_match() is called to match a subject string
|
The function pcre2_dfa_match() is called to match a subject string
|
||||||
against a compiled pattern, using a matching algorithm that scans the
|
against a compiled pattern, using a matching algorithm that scans the
|
||||||
subject string just once (not counting lookaround assertions), and does
|
subject string just once (not counting lookaround assertions), and does
|
||||||
not backtrack. This has different characteristics to the normal algo-
|
not backtrack. This has different characteristics to the normal algo-
|
||||||
rithm, and is not compatible with Perl. Some of the features of PCRE2
|
rithm, and is not compatible with Perl. Some of the features of PCRE2
|
||||||
patterns are not supported. Nevertheless, there are times when this
|
patterns are not supported. Nevertheless, there are times when this
|
||||||
kind of matching can be useful. For a discussion of the two matching
|
kind of matching can be useful. For a discussion of the two matching
|
||||||
algorithms, and a list of features that pcre2_dfa_match() does not sup-
|
algorithms, and a list of features that pcre2_dfa_match() does not sup-
|
||||||
port, see the pcre2matching documentation.
|
port, see the pcre2matching documentation.
|
||||||
|
|
||||||
The arguments for the pcre2_dfa_match() function are the same as for
|
The arguments for the pcre2_dfa_match() function are the same as for
|
||||||
pcre2_match(), plus two extras. The ovector within the match data block
|
pcre2_match(), plus two extras. The ovector within the match data block
|
||||||
is used in a different way, and this is described below. The other com-
|
is used in a different way, and this is described below. The other com-
|
||||||
mon arguments are used in the same way as for pcre2_match(), so their
|
mon arguments are used in the same way as for pcre2_match(), so their
|
||||||
description is not repeated here.
|
description is not repeated here.
|
||||||
|
|
||||||
The two additional arguments provide workspace for the function. The
|
The two additional arguments provide workspace for the function. The
|
||||||
workspace vector should contain at least 20 elements. It is used for
|
workspace vector should contain at least 20 elements. It is used for
|
||||||
keeping track of multiple paths through the pattern tree. More
|
keeping track of multiple paths through the pattern tree. More
|
||||||
workspace is needed for patterns and subjects where there are a lot of
|
workspace is needed for patterns and subjects where there are a lot of
|
||||||
potential matches.
|
potential matches.
|
||||||
|
|
||||||
Here is an example of a simple call to pcre2_dfa_match():
|
Here is an example of a simple call to pcre2_dfa_match():
|
||||||
|
@ -3597,45 +3606,45 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
|
|
||||||
Option bits for pcre_dfa_match()
|
Option bits for pcre_dfa_match()
|
||||||
|
|
||||||
The unused bits of the options argument for pcre2_dfa_match() must be
|
The unused bits of the options argument for pcre2_dfa_match() must be
|
||||||
zero. The only bits that may be set are PCRE2_ANCHORED,
|
zero. The only bits that may be set are PCRE2_ANCHORED,
|
||||||
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NO-
|
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NO-
|
||||||
TEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK,
|
TEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK,
|
||||||
PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and
|
PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and
|
||||||
PCRE2_DFA_RESTART. All but the last four of these are exactly the same
|
PCRE2_DFA_RESTART. All but the last four of these are exactly the same
|
||||||
as for pcre2_match(), so their description is not repeated here.
|
as for pcre2_match(), so their description is not repeated here.
|
||||||
|
|
||||||
PCRE2_PARTIAL_HARD
|
PCRE2_PARTIAL_HARD
|
||||||
PCRE2_PARTIAL_SOFT
|
PCRE2_PARTIAL_SOFT
|
||||||
|
|
||||||
These have the same general effect as they do for pcre2_match(), but
|
These have the same general effect as they do for pcre2_match(), but
|
||||||
the details are slightly different. When PCRE2_PARTIAL_HARD is set for
|
the details are slightly different. When PCRE2_PARTIAL_HARD is set for
|
||||||
pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if the end of the
|
pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if the end of the
|
||||||
subject is reached and there is still at least one matching possibility
|
subject is reached and there is still at least one matching possibility
|
||||||
that requires additional characters. This happens even if some complete
|
that requires additional characters. This happens even if some complete
|
||||||
matches have already been found. When PCRE2_PARTIAL_SOFT is set, the
|
matches have already been found. When PCRE2_PARTIAL_SOFT is set, the
|
||||||
return code PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL
|
return code PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL
|
||||||
if the end of the subject is reached, there have been no complete
|
if the end of the subject is reached, there have been no complete
|
||||||
matches, but there is still at least one matching possibility. The por-
|
matches, but there is still at least one matching possibility. The por-
|
||||||
tion of the string that was inspected when the longest partial match
|
tion of the string that was inspected when the longest partial match
|
||||||
was found is set as the first matching string in both cases. There is a
|
was found is set as the first matching string in both cases. There is a
|
||||||
more detailed discussion of partial and multi-segment matching, with
|
more detailed discussion of partial and multi-segment matching, with
|
||||||
examples, in the pcre2partial documentation.
|
examples, in the pcre2partial documentation.
|
||||||
|
|
||||||
PCRE2_DFA_SHORTEST
|
PCRE2_DFA_SHORTEST
|
||||||
|
|
||||||
Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to
|
Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to
|
||||||
stop as soon as it has found one match. Because of the way the alterna-
|
stop as soon as it has found one match. Because of the way the alterna-
|
||||||
tive algorithm works, this is necessarily the shortest possible match
|
tive algorithm works, this is necessarily the shortest possible match
|
||||||
at the first possible matching point in the subject string.
|
at the first possible matching point in the subject string.
|
||||||
|
|
||||||
PCRE2_DFA_RESTART
|
PCRE2_DFA_RESTART
|
||||||
|
|
||||||
When pcre2_dfa_match() returns a partial match, it is possible to call
|
When pcre2_dfa_match() returns a partial match, it is possible to call
|
||||||
it again, with additional subject characters, and have it continue with
|
it again, with additional subject characters, and have it continue with
|
||||||
the same match. The PCRE2_DFA_RESTART option requests this action; when
|
the same match. The PCRE2_DFA_RESTART option requests this action; when
|
||||||
it is set, the workspace and wscount options must reference the same
|
it is set, the workspace and wscount options must reference the same
|
||||||
vector as before because data about the match so far is left in them
|
vector as before because data about the match so far is left in them
|
||||||
after a partial match. There is more discussion of this facility in the
|
after a partial match. There is more discussion of this facility in the
|
||||||
pcre2partial documentation.
|
pcre2partial documentation.
|
||||||
|
|
||||||
|
@ -3643,8 +3652,8 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
|
|
||||||
When pcre2_dfa_match() succeeds, it may have matched more than one sub-
|
When pcre2_dfa_match() succeeds, it may have matched more than one sub-
|
||||||
string in the subject. Note, however, that all the matches from one run
|
string in the subject. Note, however, that all the matches from one run
|
||||||
of the function start at the same point in the subject. The shorter
|
of the function start at the same point in the subject. The shorter
|
||||||
matches are all initial substrings of the longer matches. For example,
|
matches are all initial substrings of the longer matches. For example,
|
||||||
if the pattern
|
if the pattern
|
||||||
|
|
||||||
<.*>
|
<.*>
|
||||||
|
@ -3659,80 +3668,80 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
<something> <something else>
|
<something> <something else>
|
||||||
<something>
|
<something>
|
||||||
|
|
||||||
On success, the yield of the function is a number greater than zero,
|
On success, the yield of the function is a number greater than zero,
|
||||||
which is the number of matched substrings. The offsets of the sub-
|
which is the number of matched substrings. The offsets of the sub-
|
||||||
strings are returned in the ovector, and can be extracted by number in
|
strings are returned in the ovector, and can be extracted by number in
|
||||||
the same way as for pcre2_match(), but the numbers bear no relation to
|
the same way as for pcre2_match(), but the numbers bear no relation to
|
||||||
any capture groups that may exist in the pattern, because DFA matching
|
any capture groups that may exist in the pattern, because DFA matching
|
||||||
does not support capturing.
|
does not support capturing.
|
||||||
|
|
||||||
Calls to the convenience functions that extract substrings by name re-
|
Calls to the convenience functions that extract substrings by name re-
|
||||||
turn the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used af-
|
turn the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used af-
|
||||||
ter a DFA match. The convenience functions that extract substrings by
|
ter a DFA match. The convenience functions that extract substrings by
|
||||||
number never return PCRE2_ERROR_NOSUBSTRING.
|
number never return PCRE2_ERROR_NOSUBSTRING.
|
||||||
|
|
||||||
The matched strings are stored in the ovector in reverse order of
|
The matched strings are stored in the ovector in reverse order of
|
||||||
length; that is, the longest matching string is first. If there were
|
length; that is, the longest matching string is first. If there were
|
||||||
too many matches to fit into the ovector, the yield of the function is
|
too many matches to fit into the ovector, the yield of the function is
|
||||||
zero, and the vector is filled with the longest matches.
|
zero, and the vector is filled with the longest matches.
|
||||||
|
|
||||||
NOTE: PCRE2's "auto-possessification" optimization usually applies to
|
NOTE: PCRE2's "auto-possessification" optimization usually applies to
|
||||||
character repeats at the end of a pattern (as well as internally). For
|
character repeats at the end of a pattern (as well as internally). For
|
||||||
example, the pattern "a\d+" is compiled as if it were "a\d++". For DFA
|
example, the pattern "a\d+" is compiled as if it were "a\d++". For DFA
|
||||||
matching, this means that only one possible match is found. If you re-
|
matching, this means that only one possible match is found. If you re-
|
||||||
ally do want multiple matches in such cases, either use an ungreedy re-
|
ally do want multiple matches in such cases, either use an ungreedy re-
|
||||||
peat such as "a\d+?" or set the PCRE2_NO_AUTO_POSSESS option when com-
|
peat such as "a\d+?" or set the PCRE2_NO_AUTO_POSSESS option when com-
|
||||||
piling.
|
piling.
|
||||||
|
|
||||||
Error returns from pcre2_dfa_match()
|
Error returns from pcre2_dfa_match()
|
||||||
|
|
||||||
The pcre2_dfa_match() function returns a negative number when it fails.
|
The pcre2_dfa_match() function returns a negative number when it fails.
|
||||||
Many of the errors are the same as for pcre2_match(), as described
|
Many of the errors are the same as for pcre2_match(), as described
|
||||||
above. There are in addition the following errors that are specific to
|
above. There are in addition the following errors that are specific to
|
||||||
pcre2_dfa_match():
|
pcre2_dfa_match():
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_UITEM
|
PCRE2_ERROR_DFA_UITEM
|
||||||
|
|
||||||
This return is given if pcre2_dfa_match() encounters an item in the
|
This return is given if pcre2_dfa_match() encounters an item in the
|
||||||
pattern that it does not support, for instance, the use of \C in a UTF
|
pattern that it does not support, for instance, the use of \C in a UTF
|
||||||
mode or a backreference.
|
mode or a backreference.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_UCOND
|
PCRE2_ERROR_DFA_UCOND
|
||||||
|
|
||||||
This return is given if pcre2_dfa_match() encounters a condition item
|
This return is given if pcre2_dfa_match() encounters a condition item
|
||||||
that uses a backreference for the condition, or a test for recursion in
|
that uses a backreference for the condition, or a test for recursion in
|
||||||
a specific capture group. These are not supported.
|
a specific capture group. These are not supported.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_UINVALID_UTF
|
PCRE2_ERROR_DFA_UINVALID_UTF
|
||||||
|
|
||||||
This return is given if pcre2_dfa_match() is called for a pattern that
|
This return is given if pcre2_dfa_match() is called for a pattern that
|
||||||
was compiled with PCRE2_MATCH_INVALID_UTF. This is not supported for
|
was compiled with PCRE2_MATCH_INVALID_UTF. This is not supported for
|
||||||
DFA matching.
|
DFA matching.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_WSSIZE
|
PCRE2_ERROR_DFA_WSSIZE
|
||||||
|
|
||||||
This return is given if pcre2_dfa_match() runs out of space in the
|
This return is given if pcre2_dfa_match() runs out of space in the
|
||||||
workspace vector.
|
workspace vector.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_RECURSE
|
PCRE2_ERROR_DFA_RECURSE
|
||||||
|
|
||||||
When a recursion or subroutine call is processed, the matching function
|
When a recursion or subroutine call is processed, the matching function
|
||||||
calls itself recursively, using private memory for the ovector and
|
calls itself recursively, using private memory for the ovector and
|
||||||
workspace. This error is given if the internal ovector is not large
|
workspace. This error is given if the internal ovector is not large
|
||||||
enough. This should be extremely rare, as a vector of size 1000 is
|
enough. This should be extremely rare, as a vector of size 1000 is
|
||||||
used.
|
used.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_BADRESTART
|
PCRE2_ERROR_DFA_BADRESTART
|
||||||
|
|
||||||
When pcre2_dfa_match() is called with the PCRE2_DFA_RESTART option,
|
When pcre2_dfa_match() is called with the PCRE2_DFA_RESTART option,
|
||||||
some plausibility checks are made on the contents of the workspace,
|
some plausibility checks are made on the contents of the workspace,
|
||||||
which should contain data about the previous partial match. If any of
|
which should contain data about the previous partial match. If any of
|
||||||
these checks fail, this error is given.
|
these checks fail, this error is given.
|
||||||
|
|
||||||
|
|
||||||
SEE ALSO
|
SEE ALSO
|
||||||
|
|
||||||
pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3),
|
pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3),
|
||||||
pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).
|
pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).
|
||||||
|
|
||||||
|
|
||||||
|
@ -3745,8 +3754,8 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 27 December 2019
|
Last updated: 22 January 2020
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2020 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_SUBSTITUTE 3 "05 January 2020" "PCRE2 10.35"
|
.TH PCRE2_SUBSTITUTE 3 "22 January 2020" "PCRE2 10.35"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -73,6 +73,7 @@ zero-terminated strings. The options are:
|
||||||
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
|
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
|
||||||
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
|
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
|
||||||
|
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY Return only replacement string(s)
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
|
||||||
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
|
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
|
||||||
.sp
|
.sp
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "27 December 2019" "PCRE2 10.35"
|
.TH PCRE2API 3 "22 January 2020" "PCRE2 10.35"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -3324,10 +3324,11 @@ same number causes an error at compile time.
|
||||||
This function optionally calls \fBpcre2_match()\fP and then makes a copy of the
|
This function optionally calls \fBpcre2_match()\fP and then makes a copy of the
|
||||||
subject string in \fIoutputbuffer\fP, replacing parts that were matched with
|
subject string in \fIoutputbuffer\fP, replacing parts that were matched with
|
||||||
the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This
|
the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This
|
||||||
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
|
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
|
||||||
is to perform just one replacement if the pattern matches, but there is an
|
option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
|
||||||
option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
|
replacement string(s). The default action is to perform just one replacement if
|
||||||
for details).
|
the pattern matches, but there is an option that requests multiple replacements
|
||||||
|
(see PCRE2_SUBSTITUTE_GLOBAL below for details).
|
||||||
.P
|
.P
|
||||||
If successful, \fBpcre2_substitute()\fP returns the number of substitutions
|
If successful, \fBpcre2_substitute()\fP returns the number of substitutions
|
||||||
that were carried out. This may be zero if no match was found, and is never
|
that were carried out. This may be zero if no match was found, and is never
|
||||||
|
@ -3362,10 +3363,21 @@ calling \fBpcre2_match()\fP from within \fBpcre2_substitute()\fP. This allows
|
||||||
an application to check for a match before choosing to substitute, without
|
an application to check for a match before choosing to substitute, without
|
||||||
having to repeat the match.
|
having to repeat the match.
|
||||||
.P
|
.P
|
||||||
The \fIcode\fP argument is not used for the first substitution, but if
|
The \fIcode\fP argument is not used for the first substitution when
|
||||||
PCRE2_SUBSTITUTE_GLOBAL is set, \fBpcre2_match()\fP will be called after the
|
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also set,
|
||||||
first substitution to check for further matches, and the contents of the
|
\fBpcre2_match()\fP will be called after the first substitution to check for
|
||||||
\fImatch_data\fP block will be changed.
|
further matches, and the contents of the \fImatch_data\fP block will be
|
||||||
|
changed.
|
||||||
|
.P
|
||||||
|
The default is to return a copy of the subject string with matched substrings
|
||||||
|
replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the
|
||||||
|
replacement substrings are returned. In the global case, multiple replacements
|
||||||
|
are concatenated in the output buffer. Substitution callouts (see
|
||||||
|
.\" HTML <a href="#subcallouts">
|
||||||
|
.\" </a>
|
||||||
|
below)
|
||||||
|
.\"
|
||||||
|
can be used to separate them if necessary.
|
||||||
.P
|
.P
|
||||||
The \fIoutlengthptr\fP argument of \fBpcre2_substitute()\fP must point to a
|
The \fIoutlengthptr\fP argument of \fBpcre2_substitute()\fP must point to a
|
||||||
variable that contains the length, in code units, of the output buffer. If the
|
variable that contains the length, in code units, of the output buffer. If the
|
||||||
|
@ -3557,6 +3569,7 @@ above).
|
||||||
.\"
|
.\"
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.\" HTML <a name="subcallouts"></a>
|
||||||
.SS "Substitution callouts"
|
.SS "Substitution callouts"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -3904,6 +3917,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 27 December 2019
|
Last updated: 22 January 2020
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2020 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "26 December 2019" "PCRE 10.35"
|
.TH PCRE2TEST 1 "22 January 2020" "PCRE 10.35"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -1011,25 +1011,27 @@ modifier list, in which case they are applied to every subject line that is
|
||||||
processed with that pattern. These modifiers do not affect the compilation
|
processed with that pattern. These modifiers do not affect the compilation
|
||||||
process.
|
process.
|
||||||
.sp
|
.sp
|
||||||
aftertext show text after match
|
aftertext show text after match
|
||||||
allaftertext show text after captures
|
allaftertext show text after captures
|
||||||
allcaptures show all captures
|
allcaptures show all captures
|
||||||
allvector show the entire ovector
|
allvector show the entire ovector
|
||||||
allusedtext show all consulted text
|
allusedtext show all consulted text
|
||||||
altglobal alternative global matching
|
altglobal alternative global matching
|
||||||
/g global global matching
|
/g global global matching
|
||||||
jitstack=<n> set size of JIT stack
|
jitstack=<n> set size of JIT stack
|
||||||
mark show mark values
|
mark show mark values
|
||||||
replace=<string> specify a replacement string
|
replace=<string> specify a replacement string
|
||||||
startchar show starting character when relevant
|
startchar show starting character when relevant
|
||||||
substitute_callout use substitution callouts
|
substitute_callout use substitution callouts
|
||||||
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
|
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
|
||||||
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
||||||
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
|
||||||
substitute_skip=<n> skip substitution number n
|
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||||
substitute_stop=<n> skip substitution number n and greater
|
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
|
||||||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
substitute_skip=<n> skip substitution <n>
|
||||||
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
|
substitute_stop=<n> skip substitution <n> and following
|
||||||
|
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||||
|
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||||
.sp
|
.sp
|
||||||
These modifiers may not appear in a \fB#pattern\fP command. If you want them as
|
These modifiers may not appear in a \fB#pattern\fP command. If you want them as
|
||||||
defaults, set them in a \fB#subject\fP command.
|
defaults, set them in a \fB#subject\fP command.
|
||||||
|
@ -1203,7 +1205,9 @@ pattern.
|
||||||
substitute_callout use substitution callouts
|
substitute_callout use substitution callouts
|
||||||
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
|
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
|
||||||
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
||||||
|
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
|
||||||
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||||
|
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
|
||||||
substitute_skip=<n> skip substitution number n
|
substitute_skip=<n> skip substitution number n
|
||||||
substitute_stop=<n> skip substitution number n and greater
|
substitute_stop=<n> skip substitution number n and greater
|
||||||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||||
|
@ -1367,9 +1371,10 @@ by name.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
|
If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
|
||||||
called instead of one of the matching functions. Note that replacement strings
|
called instead of one of the matching functions (or after one call of
|
||||||
cannot contain commas, because a comma signifies the end of a modifier. This is
|
\fBpcre2_match()\fP in the case of PCRE2_SUBSTITUTE_MATCHED). Note that
|
||||||
not thought to be an issue in a test program.
|
replacement strings cannot contain commas, because a comma signifies the end of
|
||||||
|
a modifier. This is not thought to be an issue in a test program.
|
||||||
.P
|
.P
|
||||||
Unlike subject strings, \fBpcre2test\fP does not process replacement strings
|
Unlike subject strings, \fBpcre2test\fP does not process replacement strings
|
||||||
for escape sequences. In UTF mode, a replacement string is checked to see if it
|
for escape sequences. In UTF mode, a replacement string is checked to see if it
|
||||||
|
@ -1384,10 +1389,17 @@ for \fBpcre2_substitute()\fP:
|
||||||
global PCRE2_SUBSTITUTE_GLOBAL
|
global PCRE2_SUBSTITUTE_GLOBAL
|
||||||
substitute_extended PCRE2_SUBSTITUTE_EXTENDED
|
substitute_extended PCRE2_SUBSTITUTE_EXTENDED
|
||||||
substitute_literal PCRE2_SUBSTITUTE_LITERAL
|
substitute_literal PCRE2_SUBSTITUTE_LITERAL
|
||||||
|
substitute_matched PCRE2_SUBSTITUTE_MATCHED
|
||||||
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||||
|
substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
|
||||||
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||||
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
|
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||||
.sp
|
.sp
|
||||||
|
See the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2api\fP
|
||||||
|
.\"
|
||||||
|
documentation for details of these options.
|
||||||
.P
|
.P
|
||||||
After a successful substitution, the modified string is output, preceded by the
|
After a successful substitution, the modified string is output, preceded by the
|
||||||
number of replacements. This may be zero if there were no matches. Here is a
|
number of replacements. This may be zero if there were no matches. Here is a
|
||||||
|
@ -2076,6 +2088,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 26 December 2019
|
Last updated: 22 January 2020
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2020 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -936,25 +936,27 @@ PATTERN MODIFIERS
|
||||||
ject line that is processed with that pattern. These modifiers do not
|
ject line that is processed with that pattern. These modifiers do not
|
||||||
affect the compilation process.
|
affect the compilation process.
|
||||||
|
|
||||||
aftertext show text after match
|
aftertext show text after match
|
||||||
allaftertext show text after captures
|
allaftertext show text after captures
|
||||||
allcaptures show all captures
|
allcaptures show all captures
|
||||||
allvector show the entire ovector
|
allvector show the entire ovector
|
||||||
allusedtext show all consulted text
|
allusedtext show all consulted text
|
||||||
altglobal alternative global matching
|
altglobal alternative global matching
|
||||||
/g global global matching
|
/g global global matching
|
||||||
jitstack=<n> set size of JIT stack
|
jitstack=<n> set size of JIT stack
|
||||||
mark show mark values
|
mark show mark values
|
||||||
replace=<string> specify a replacement string
|
replace=<string> specify a replacement string
|
||||||
startchar show starting character when relevant
|
startchar show starting character when relevant
|
||||||
substitute_callout use substitution callouts
|
substitute_callout use substitution callouts
|
||||||
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
|
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
|
||||||
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
||||||
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
|
||||||
substitute_skip=<n> skip substitution number n
|
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||||
substitute_stop=<n> skip substitution number n and greater
|
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
|
||||||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
substitute_skip=<n> skip substitution <n>
|
||||||
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
|
substitute_stop=<n> skip substitution <n> and following
|
||||||
|
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||||
|
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||||
|
|
||||||
These modifiers may not appear in a #pattern command. If you want them
|
These modifiers may not appear in a #pattern command. If you want them
|
||||||
as defaults, set them in a #subject command.
|
as defaults, set them in a #subject command.
|
||||||
|
@ -1105,7 +1107,9 @@ SUBJECT MODIFIERS
|
||||||
substitute_callout use substitution callouts
|
substitute_callout use substitution callouts
|
||||||
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
|
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
|
||||||
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
|
||||||
|
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
|
||||||
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||||
|
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
|
||||||
substitute_skip=<n> skip substitution number n
|
substitute_skip=<n> skip substitution number n
|
||||||
substitute_stop=<n> skip substitution number n and greater
|
substitute_stop=<n> skip substitution number n and greater
|
||||||
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||||
|
@ -1251,9 +1255,11 @@ SUBJECT MODIFIERS
|
||||||
Testing the substitution function
|
Testing the substitution function
|
||||||
|
|
||||||
If the replace modifier is set, the pcre2_substitute() function is
|
If the replace modifier is set, the pcre2_substitute() function is
|
||||||
called instead of one of the matching functions. Note that replacement
|
called instead of one of the matching functions (or after one call of
|
||||||
strings cannot contain commas, because a comma signifies the end of a
|
pcre2_match() in the case of PCRE2_SUBSTITUTE_MATCHED). Note that re-
|
||||||
modifier. This is not thought to be an issue in a test program.
|
placement strings cannot contain commas, because a comma signifies the
|
||||||
|
end of a modifier. This is not thought to be an issue in a test pro-
|
||||||
|
gram.
|
||||||
|
|
||||||
Unlike subject strings, pcre2test does not process replacement strings
|
Unlike subject strings, pcre2test does not process replacement strings
|
||||||
for escape sequences. In UTF mode, a replacement string is checked to
|
for escape sequences. In UTF mode, a replacement string is checked to
|
||||||
|
@ -1268,10 +1274,13 @@ SUBJECT MODIFIERS
|
||||||
global PCRE2_SUBSTITUTE_GLOBAL
|
global PCRE2_SUBSTITUTE_GLOBAL
|
||||||
substitute_extended PCRE2_SUBSTITUTE_EXTENDED
|
substitute_extended PCRE2_SUBSTITUTE_EXTENDED
|
||||||
substitute_literal PCRE2_SUBSTITUTE_LITERAL
|
substitute_literal PCRE2_SUBSTITUTE_LITERAL
|
||||||
|
substitute_matched PCRE2_SUBSTITUTE_MATCHED
|
||||||
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
|
||||||
|
substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
|
||||||
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
|
||||||
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
|
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||||
|
|
||||||
|
See the pcre2api documentation for details of these options.
|
||||||
|
|
||||||
After a successful substitution, the modified string is output, pre-
|
After a successful substitution, the modified string is output, pre-
|
||||||
ceded by the number of replacements. This may be zero if there were no
|
ceded by the number of replacements. This may be zero if there were no
|
||||||
|
@ -1905,5 +1914,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 26 December 2019
|
Last updated: 22 January 2020
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2020 University of Cambridge.
|
||||||
|
|
|
@ -5,7 +5,7 @@
|
||||||
/* This is the public header file for the PCRE library, second API, to be
|
/* This is the public header file for the PCRE library, second API, to be
|
||||||
#included by applications that call PCRE2 functions.
|
#included by applications that call PCRE2 functions.
|
||||||
|
|
||||||
Copyright (c) 2016-2019 University of Cambridge
|
Copyright (c) 2016-2020 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -183,6 +183,7 @@ pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */
|
||||||
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
|
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
|
||||||
#define PCRE2_SUBSTITUTE_LITERAL 0x00008000u /* pcre2_substitute() only */
|
#define PCRE2_SUBSTITUTE_LITERAL 0x00008000u /* pcre2_substitute() only */
|
||||||
#define PCRE2_SUBSTITUTE_MATCHED 0x00010000u /* pcre2_substitute() only */
|
#define PCRE2_SUBSTITUTE_MATCHED 0x00010000u /* pcre2_substitute() only */
|
||||||
|
#define PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 0x00020000u /* pcre2_substitute() only */
|
||||||
|
|
||||||
/* Options for pcre2_pattern_convert(). */
|
/* Options for pcre2_pattern_convert(). */
|
||||||
|
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2019 University of Cambridge
|
New API code Copyright (c) 2016-2020 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -50,8 +50,8 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||||
#define SUBSTITUTE_OPTIONS \
|
#define SUBSTITUTE_OPTIONS \
|
||||||
(PCRE2_SUBSTITUTE_EXTENDED|PCRE2_SUBSTITUTE_GLOBAL| \
|
(PCRE2_SUBSTITUTE_EXTENDED|PCRE2_SUBSTITUTE_GLOBAL| \
|
||||||
PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_MATCHED| \
|
PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_MATCHED| \
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_UNKNOWN_UNSET| \
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_REPLACEMENT_ONLY| \
|
||||||
PCRE2_SUBSTITUTE_UNSET_EMPTY)
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET|PCRE2_SUBSTITUTE_UNSET_EMPTY)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -195,6 +195,7 @@ overflow, either give an error immediately, or keep on, accumulating the
|
||||||
length. */
|
length. */
|
||||||
|
|
||||||
#define CHECKMEMCPY(from,length) \
|
#define CHECKMEMCPY(from,length) \
|
||||||
|
{ \
|
||||||
if (!overflowed && lengthleft < length) \
|
if (!overflowed && lengthleft < length) \
|
||||||
{ \
|
{ \
|
||||||
if ((suboptions & PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) == 0) goto NOROOM; \
|
if ((suboptions & PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) == 0) goto NOROOM; \
|
||||||
|
@ -210,7 +211,8 @@ length. */
|
||||||
memcpy(buffer + buff_offset, from, CU2BYTES(length)); \
|
memcpy(buffer + buff_offset, from, CU2BYTES(length)); \
|
||||||
buff_offset += length; \
|
buff_offset += length; \
|
||||||
lengthleft -= length; \
|
lengthleft -= length; \
|
||||||
}
|
} \
|
||||||
|
}
|
||||||
|
|
||||||
/* Here's the function */
|
/* Here's the function */
|
||||||
|
|
||||||
|
@ -231,6 +233,7 @@ BOOL match_data_created = FALSE;
|
||||||
BOOL escaped_literal = FALSE;
|
BOOL escaped_literal = FALSE;
|
||||||
BOOL overflowed = FALSE;
|
BOOL overflowed = FALSE;
|
||||||
BOOL use_existing_match;
|
BOOL use_existing_match;
|
||||||
|
BOOL replacement_only;
|
||||||
#ifdef SUPPORT_UNICODE
|
#ifdef SUPPORT_UNICODE
|
||||||
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
|
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
|
||||||
#endif
|
#endif
|
||||||
|
@ -260,6 +263,7 @@ if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
|
||||||
pointer in the match data may be NULL after a no-match. */
|
pointer in the match data may be NULL after a no-match. */
|
||||||
|
|
||||||
use_existing_match = ((options & PCRE2_SUBSTITUTE_MATCHED) != 0);
|
use_existing_match = ((options & PCRE2_SUBSTITUTE_MATCHED) != 0);
|
||||||
|
replacement_only = ((options & PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) != 0);
|
||||||
|
|
||||||
if (use_existing_match)
|
if (use_existing_match)
|
||||||
{
|
{
|
||||||
|
@ -312,7 +316,7 @@ if (utf && (options & PCRE2_NO_UTF_CHECK) == 0)
|
||||||
suboptions = options & SUBSTITUTE_OPTIONS;
|
suboptions = options & SUBSTITUTE_OPTIONS;
|
||||||
options &= ~SUBSTITUTE_OPTIONS;
|
options &= ~SUBSTITUTE_OPTIONS;
|
||||||
|
|
||||||
/* Copy up to the start offset */
|
/* Error if the start match offset it greater than the length of the subject. */
|
||||||
|
|
||||||
if (start_offset > length)
|
if (start_offset > length)
|
||||||
{
|
{
|
||||||
|
@ -320,7 +324,10 @@ if (start_offset > length)
|
||||||
rc = PCRE2_ERROR_BADOFFSET;
|
rc = PCRE2_ERROR_BADOFFSET;
|
||||||
goto EXIT;
|
goto EXIT;
|
||||||
}
|
}
|
||||||
CHECKMEMCPY(subject, start_offset);
|
|
||||||
|
/* Copy up to the start offset, unless only the replacement is required. */
|
||||||
|
|
||||||
|
if (!replacement_only) CHECKMEMCPY(subject, start_offset);
|
||||||
|
|
||||||
/* Loop for global substituting. If PCRE2_SUBSTITUTE_MATCHED is set, the first
|
/* Loop for global substituting. If PCRE2_SUBSTITUTE_MATCHED is set, the first
|
||||||
match is taken from the match_data that was passed in. */
|
match is taken from the match_data that was passed in. */
|
||||||
|
@ -382,11 +389,11 @@ do
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Copy what we have advanced past, reset the special global options, and
|
/* Copy what we have advanced past (unless not required), reset the special
|
||||||
continue to the next match. */
|
global options, and continue to the next match. */
|
||||||
|
|
||||||
fraglength = start_offset - save_start;
|
fraglength = start_offset - save_start;
|
||||||
CHECKMEMCPY(subject + save_start, fraglength);
|
if (!replacement_only) CHECKMEMCPY(subject + save_start, fraglength);
|
||||||
goptions = 0;
|
goptions = 0;
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
@ -430,12 +437,12 @@ do
|
||||||
}
|
}
|
||||||
subs++;
|
subs++;
|
||||||
|
|
||||||
/* Copy the text leading up to the match, and remember where the insert
|
/* Copy the text leading up to the match (unless not required), and remember
|
||||||
begins and how many ovector pairs are set. */
|
where the insert begins and how many ovector pairs are set. */
|
||||||
|
|
||||||
if (rc == 0) rc = ovector_count;
|
if (rc == 0) rc = ovector_count;
|
||||||
fraglength = ovector[0] - start_offset;
|
fraglength = ovector[0] - start_offset;
|
||||||
CHECKMEMCPY(subject + start_offset, fraglength);
|
if (!replacement_only) CHECKMEMCPY(subject + start_offset, fraglength);
|
||||||
scb.output_offsets[0] = buff_offset;
|
scb.output_offsets[0] = buff_offset;
|
||||||
scb.oveccount = rc;
|
scb.oveccount = rc;
|
||||||
|
|
||||||
|
@ -882,7 +889,7 @@ do
|
||||||
|
|
||||||
buff_offset -= newlength;
|
buff_offset -= newlength;
|
||||||
lengthleft += newlength;
|
lengthleft += newlength;
|
||||||
CHECKMEMCPY(subject + ovector[0], oldlength);
|
if (!replacement_only) CHECKMEMCPY(subject + ovector[0], oldlength);
|
||||||
|
|
||||||
/* A negative return means do not do any more. */
|
/* A negative return means do not do any more. */
|
||||||
|
|
||||||
|
@ -903,12 +910,17 @@ do
|
||||||
start_offset = ovector[1];
|
start_offset = ovector[1];
|
||||||
} while ((suboptions & PCRE2_SUBSTITUTE_GLOBAL) != 0); /* Repeat "do" loop */
|
} while ((suboptions & PCRE2_SUBSTITUTE_GLOBAL) != 0); /* Repeat "do" loop */
|
||||||
|
|
||||||
/* Copy the rest of the subject. */
|
/* Copy the rest of the subject unless not required, and terminate the output
|
||||||
|
with a binary zero. */
|
||||||
|
|
||||||
|
if (!replacement_only)
|
||||||
|
{
|
||||||
|
fraglength = length - start_offset;
|
||||||
|
CHECKMEMCPY(subject + start_offset, fraglength);
|
||||||
|
}
|
||||||
|
|
||||||
fraglength = length - start_offset;
|
|
||||||
CHECKMEMCPY(subject + start_offset, fraglength);
|
|
||||||
temp[0] = 0;
|
temp[0] = 0;
|
||||||
CHECKMEMCPY(temp , 1);
|
CHECKMEMCPY(temp, 1);
|
||||||
|
|
||||||
/* If overflowed is set it means the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set,
|
/* If overflowed is set it means the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set,
|
||||||
and matching has carried on after a full buffer, in order to compute the length
|
and matching has carried on after a full buffer, in order to compute the length
|
||||||
|
|
268
src/pcre2test.c
268
src/pcre2test.c
|
@ -11,7 +11,7 @@ hacked-up (non-) design had also run out of steam.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original code Copyright (c) 1997-2012 University of Cambridge
|
Original code Copyright (c) 1997-2012 University of Cambridge
|
||||||
Rewritten code Copyright (c) 2016-2019 University of Cambridge
|
Rewritten code Copyright (c) 2016-2020 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -505,12 +505,13 @@ so many of them that they are split into two fields. */
|
||||||
#define CTL2_SUBSTITUTE_LITERAL 0x00000004u
|
#define CTL2_SUBSTITUTE_LITERAL 0x00000004u
|
||||||
#define CTL2_SUBSTITUTE_MATCHED 0x00000008u
|
#define CTL2_SUBSTITUTE_MATCHED 0x00000008u
|
||||||
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000010u
|
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000010u
|
||||||
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000020u
|
#define CTL2_SUBSTITUTE_REPLACEMENT_ONLY 0x00000020u
|
||||||
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000040u
|
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000040u
|
||||||
#define CTL2_SUBJECT_LITERAL 0x00000080u
|
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000080u
|
||||||
#define CTL2_CALLOUT_NO_WHERE 0x00000100u
|
#define CTL2_SUBJECT_LITERAL 0x00000100u
|
||||||
#define CTL2_CALLOUT_EXTRA 0x00000200u
|
#define CTL2_CALLOUT_NO_WHERE 0x00000200u
|
||||||
#define CTL2_ALLVECTOR 0x00000400u
|
#define CTL2_CALLOUT_EXTRA 0x00000400u
|
||||||
|
#define CTL2_ALLVECTOR 0x00000800u
|
||||||
|
|
||||||
#define CTL2_NL_SET 0x40000000u /* Informational */
|
#define CTL2_NL_SET 0x40000000u /* Informational */
|
||||||
#define CTL2_BSR_SET 0x80000000u /* Informational */
|
#define CTL2_BSR_SET 0x80000000u /* Informational */
|
||||||
|
@ -535,6 +536,7 @@ different things in the two cases. */
|
||||||
CTL2_SUBSTITUTE_LITERAL|\
|
CTL2_SUBSTITUTE_LITERAL|\
|
||||||
CTL2_SUBSTITUTE_MATCHED|\
|
CTL2_SUBSTITUTE_MATCHED|\
|
||||||
CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\
|
CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\
|
||||||
|
CTL2_SUBSTITUTE_REPLACEMENT_ONLY|\
|
||||||
CTL2_SUBSTITUTE_UNKNOWN_UNSET|\
|
CTL2_SUBSTITUTE_UNKNOWN_UNSET|\
|
||||||
CTL2_SUBSTITUTE_UNSET_EMPTY|\
|
CTL2_SUBSTITUTE_UNSET_EMPTY|\
|
||||||
CTL2_ALLVECTOR)
|
CTL2_ALLVECTOR)
|
||||||
|
@ -614,129 +616,130 @@ typedef struct modstruct {
|
||||||
} modstruct;
|
} modstruct;
|
||||||
|
|
||||||
static modstruct modlist[] = {
|
static modstruct modlist[] = {
|
||||||
{ "aftertext", MOD_PNDP, MOD_CTL, CTL_AFTERTEXT, PO(control) },
|
{ "aftertext", MOD_PNDP, MOD_CTL, CTL_AFTERTEXT, PO(control) },
|
||||||
{ "allaftertext", MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT, PO(control) },
|
{ "allaftertext", MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT, PO(control) },
|
||||||
{ "allcaptures", MOD_PND, MOD_CTL, CTL_ALLCAPTURES, PO(control) },
|
{ "allcaptures", MOD_PND, MOD_CTL, CTL_ALLCAPTURES, PO(control) },
|
||||||
{ "allow_empty_class", MOD_PAT, MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS, PO(options) },
|
{ "allow_empty_class", MOD_PAT, MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS, PO(options) },
|
||||||
{ "allow_surrogate_escapes", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES, CO(extra_options) },
|
{ "allow_surrogate_escapes", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES, CO(extra_options) },
|
||||||
{ "allusedtext", MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT, PO(control) },
|
{ "allusedtext", MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT, PO(control) },
|
||||||
{ "allvector", MOD_PND, MOD_CTL, CTL2_ALLVECTOR, PO(control2) },
|
{ "allvector", MOD_PND, MOD_CTL, CTL2_ALLVECTOR, PO(control2) },
|
||||||
{ "alt_bsux", MOD_PAT, MOD_OPT, PCRE2_ALT_BSUX, PO(options) },
|
{ "alt_bsux", MOD_PAT, MOD_OPT, PCRE2_ALT_BSUX, PO(options) },
|
||||||
{ "alt_circumflex", MOD_PAT, MOD_OPT, PCRE2_ALT_CIRCUMFLEX, PO(options) },
|
{ "alt_circumflex", MOD_PAT, MOD_OPT, PCRE2_ALT_CIRCUMFLEX, PO(options) },
|
||||||
{ "alt_verbnames", MOD_PAT, MOD_OPT, PCRE2_ALT_VERBNAMES, PO(options) },
|
{ "alt_verbnames", MOD_PAT, MOD_OPT, PCRE2_ALT_VERBNAMES, PO(options) },
|
||||||
{ "altglobal", MOD_PND, MOD_CTL, CTL_ALTGLOBAL, PO(control) },
|
{ "altglobal", MOD_PND, MOD_CTL, CTL_ALTGLOBAL, PO(control) },
|
||||||
{ "anchored", MOD_PD, MOD_OPT, PCRE2_ANCHORED, PD(options) },
|
{ "anchored", MOD_PD, MOD_OPT, PCRE2_ANCHORED, PD(options) },
|
||||||
{ "auto_callout", MOD_PAT, MOD_OPT, PCRE2_AUTO_CALLOUT, PO(options) },
|
{ "auto_callout", MOD_PAT, MOD_OPT, PCRE2_AUTO_CALLOUT, PO(options) },
|
||||||
{ "bad_escape_is_literal", MOD_CTC, MOD_OPT, PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL, CO(extra_options) },
|
{ "bad_escape_is_literal", MOD_CTC, MOD_OPT, PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL, CO(extra_options) },
|
||||||
{ "bincode", MOD_PAT, MOD_CTL, CTL_BINCODE, PO(control) },
|
{ "bincode", MOD_PAT, MOD_CTL, CTL_BINCODE, PO(control) },
|
||||||
{ "bsr", MOD_CTC, MOD_BSR, 0, CO(bsr_convention) },
|
{ "bsr", MOD_CTC, MOD_BSR, 0, CO(bsr_convention) },
|
||||||
{ "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
|
{ "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
|
||||||
{ "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) },
|
{ "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) },
|
||||||
{ "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) },
|
{ "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) },
|
||||||
{ "callout_extra", MOD_DAT, MOD_CTL, CTL2_CALLOUT_EXTRA, DO(control2) },
|
{ "callout_extra", MOD_DAT, MOD_CTL, CTL2_CALLOUT_EXTRA, DO(control2) },
|
||||||
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
|
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
|
||||||
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
|
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
|
||||||
{ "callout_no_where", MOD_DAT, MOD_CTL, CTL2_CALLOUT_NO_WHERE, DO(control2) },
|
{ "callout_no_where", MOD_DAT, MOD_CTL, CTL2_CALLOUT_NO_WHERE, DO(control2) },
|
||||||
{ "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) },
|
{ "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) },
|
||||||
{ "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) },
|
{ "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) },
|
||||||
{ "convert", MOD_PAT, MOD_CON, 0, PO(convert_type) },
|
{ "convert", MOD_PAT, MOD_CON, 0, PO(convert_type) },
|
||||||
{ "convert_glob_escape", MOD_PAT, MOD_CHR, 0, PO(convert_glob_escape) },
|
{ "convert_glob_escape", MOD_PAT, MOD_CHR, 0, PO(convert_glob_escape) },
|
||||||
{ "convert_glob_separator", MOD_PAT, MOD_CHR, 0, PO(convert_glob_separator) },
|
{ "convert_glob_separator", MOD_PAT, MOD_CHR, 0, PO(convert_glob_separator) },
|
||||||
{ "convert_length", MOD_PAT, MOD_INT, 0, PO(convert_length) },
|
{ "convert_length", MOD_PAT, MOD_INT, 0, PO(convert_length) },
|
||||||
{ "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) },
|
{ "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) },
|
||||||
{ "copy_matched_subject", MOD_DAT, MOD_OPT, PCRE2_COPY_MATCHED_SUBJECT, DO(options) },
|
{ "copy_matched_subject", MOD_DAT, MOD_OPT, PCRE2_COPY_MATCHED_SUBJECT, DO(options) },
|
||||||
{ "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) },
|
{ "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) },
|
||||||
{ "depth_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) },
|
{ "depth_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) },
|
||||||
{ "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) },
|
{ "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) },
|
||||||
{ "dfa_restart", MOD_DAT, MOD_OPT, PCRE2_DFA_RESTART, DO(options) },
|
{ "dfa_restart", MOD_DAT, MOD_OPT, PCRE2_DFA_RESTART, DO(options) },
|
||||||
{ "dfa_shortest", MOD_DAT, MOD_OPT, PCRE2_DFA_SHORTEST, DO(options) },
|
{ "dfa_shortest", MOD_DAT, MOD_OPT, PCRE2_DFA_SHORTEST, DO(options) },
|
||||||
{ "dollar_endonly", MOD_PAT, MOD_OPT, PCRE2_DOLLAR_ENDONLY, PO(options) },
|
{ "dollar_endonly", MOD_PAT, MOD_OPT, PCRE2_DOLLAR_ENDONLY, PO(options) },
|
||||||
{ "dotall", MOD_PATP, MOD_OPT, PCRE2_DOTALL, PO(options) },
|
{ "dotall", MOD_PATP, MOD_OPT, PCRE2_DOTALL, PO(options) },
|
||||||
{ "dupnames", MOD_PATP, MOD_OPT, PCRE2_DUPNAMES, PO(options) },
|
{ "dupnames", MOD_PATP, MOD_OPT, PCRE2_DUPNAMES, PO(options) },
|
||||||
{ "endanchored", MOD_PD, MOD_OPT, PCRE2_ENDANCHORED, PD(options) },
|
{ "endanchored", MOD_PD, MOD_OPT, PCRE2_ENDANCHORED, PD(options) },
|
||||||
{ "escaped_cr_is_lf", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ESCAPED_CR_IS_LF, CO(extra_options) },
|
{ "escaped_cr_is_lf", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ESCAPED_CR_IS_LF, CO(extra_options) },
|
||||||
{ "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) },
|
{ "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) },
|
||||||
{ "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) },
|
{ "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) },
|
||||||
{ "extended_more", MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE, PO(options) },
|
{ "extended_more", MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE, PO(options) },
|
||||||
{ "extra_alt_bsux", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALT_BSUX, CO(extra_options) },
|
{ "extra_alt_bsux", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALT_BSUX, CO(extra_options) },
|
||||||
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
|
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
|
||||||
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
|
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
|
||||||
{ "framesize", MOD_PAT, MOD_CTL, CTL_FRAMESIZE, PO(control) },
|
{ "framesize", MOD_PAT, MOD_CTL, CTL_FRAMESIZE, PO(control) },
|
||||||
{ "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) },
|
{ "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) },
|
||||||
{ "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) },
|
{ "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) },
|
||||||
{ "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) },
|
{ "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) },
|
||||||
{ "global", MOD_PNDP, MOD_CTL, CTL_GLOBAL, PO(control) },
|
{ "global", MOD_PNDP, MOD_CTL, CTL_GLOBAL, PO(control) },
|
||||||
{ "heap_limit", MOD_CTM, MOD_INT, 0, MO(heap_limit) },
|
{ "heap_limit", MOD_CTM, MOD_INT, 0, MO(heap_limit) },
|
||||||
{ "hex", MOD_PAT, MOD_CTL, CTL_HEXPAT, PO(control) },
|
{ "hex", MOD_PAT, MOD_CTL, CTL_HEXPAT, PO(control) },
|
||||||
{ "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) },
|
{ "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) },
|
||||||
{ "jit", MOD_PAT, MOD_IND, 7, PO(jit) },
|
{ "jit", MOD_PAT, MOD_IND, 7, PO(jit) },
|
||||||
{ "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) },
|
{ "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) },
|
||||||
{ "jitstack", MOD_PNDP, MOD_INT, 0, PO(jitstack) },
|
{ "jitstack", MOD_PNDP, MOD_INT, 0, PO(jitstack) },
|
||||||
{ "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) },
|
{ "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) },
|
||||||
{ "literal", MOD_PAT, MOD_OPT, PCRE2_LITERAL, PO(options) },
|
{ "literal", MOD_PAT, MOD_OPT, PCRE2_LITERAL, PO(options) },
|
||||||
{ "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) },
|
{ "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) },
|
||||||
{ "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) },
|
{ "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) },
|
||||||
{ "match_invalid_utf", MOD_PAT, MOD_OPT, PCRE2_MATCH_INVALID_UTF, PO(options) },
|
{ "match_invalid_utf", MOD_PAT, MOD_OPT, PCRE2_MATCH_INVALID_UTF, PO(options) },
|
||||||
{ "match_limit", MOD_CTM, MOD_INT, 0, MO(match_limit) },
|
{ "match_limit", MOD_CTM, MOD_INT, 0, MO(match_limit) },
|
||||||
{ "match_line", MOD_CTC, MOD_OPT, PCRE2_EXTRA_MATCH_LINE, CO(extra_options) },
|
{ "match_line", MOD_CTC, MOD_OPT, PCRE2_EXTRA_MATCH_LINE, CO(extra_options) },
|
||||||
{ "match_unset_backref", MOD_PAT, MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) },
|
{ "match_unset_backref", MOD_PAT, MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) },
|
||||||
{ "match_word", MOD_CTC, MOD_OPT, PCRE2_EXTRA_MATCH_WORD, CO(extra_options) },
|
{ "match_word", MOD_CTC, MOD_OPT, PCRE2_EXTRA_MATCH_WORD, CO(extra_options) },
|
||||||
{ "max_pattern_length", MOD_CTC, MOD_SIZ, 0, CO(max_pattern_length) },
|
{ "max_pattern_length", MOD_CTC, MOD_SIZ, 0, CO(max_pattern_length) },
|
||||||
{ "memory", MOD_PD, MOD_CTL, CTL_MEMORY, PD(control) },
|
{ "memory", MOD_PD, MOD_CTL, CTL_MEMORY, PD(control) },
|
||||||
{ "multiline", MOD_PATP, MOD_OPT, PCRE2_MULTILINE, PO(options) },
|
{ "multiline", MOD_PATP, MOD_OPT, PCRE2_MULTILINE, PO(options) },
|
||||||
{ "never_backslash_c", MOD_PAT, MOD_OPT, PCRE2_NEVER_BACKSLASH_C, PO(options) },
|
{ "never_backslash_c", MOD_PAT, MOD_OPT, PCRE2_NEVER_BACKSLASH_C, PO(options) },
|
||||||
{ "never_ucp", MOD_PAT, MOD_OPT, PCRE2_NEVER_UCP, PO(options) },
|
{ "never_ucp", MOD_PAT, MOD_OPT, PCRE2_NEVER_UCP, PO(options) },
|
||||||
{ "never_utf", MOD_PAT, MOD_OPT, PCRE2_NEVER_UTF, PO(options) },
|
{ "never_utf", MOD_PAT, MOD_OPT, PCRE2_NEVER_UTF, PO(options) },
|
||||||
{ "newline", MOD_CTC, MOD_NL, 0, CO(newline_convention) },
|
{ "newline", MOD_CTC, MOD_NL, 0, CO(newline_convention) },
|
||||||
{ "no_auto_capture", MOD_PAT, MOD_OPT, PCRE2_NO_AUTO_CAPTURE, PO(options) },
|
{ "no_auto_capture", MOD_PAT, MOD_OPT, PCRE2_NO_AUTO_CAPTURE, PO(options) },
|
||||||
{ "no_auto_possess", MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS, PO(options) },
|
{ "no_auto_possess", MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS, PO(options) },
|
||||||
{ "no_dotstar_anchor", MOD_PAT, MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR, PO(options) },
|
{ "no_dotstar_anchor", MOD_PAT, MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR, PO(options) },
|
||||||
{ "no_jit", MOD_DAT, MOD_OPT, PCRE2_NO_JIT, DO(options) },
|
{ "no_jit", MOD_DAT, MOD_OPT, PCRE2_NO_JIT, DO(options) },
|
||||||
{ "no_start_optimize", MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE, PO(options) },
|
{ "no_start_optimize", MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE, PO(options) },
|
||||||
{ "no_utf_check", MOD_PD, MOD_OPT, PCRE2_NO_UTF_CHECK, PD(options) },
|
{ "no_utf_check", MOD_PD, MOD_OPT, PCRE2_NO_UTF_CHECK, PD(options) },
|
||||||
{ "notbol", MOD_DAT, MOD_OPT, PCRE2_NOTBOL, DO(options) },
|
{ "notbol", MOD_DAT, MOD_OPT, PCRE2_NOTBOL, DO(options) },
|
||||||
{ "notempty", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY, DO(options) },
|
{ "notempty", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY, DO(options) },
|
||||||
{ "notempty_atstart", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY_ATSTART, DO(options) },
|
{ "notempty_atstart", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY_ATSTART, DO(options) },
|
||||||
{ "noteol", MOD_DAT, MOD_OPT, PCRE2_NOTEOL, DO(options) },
|
{ "noteol", MOD_DAT, MOD_OPT, PCRE2_NOTEOL, DO(options) },
|
||||||
{ "null_context", MOD_PD, MOD_CTL, CTL_NULLCONTEXT, PO(control) },
|
{ "null_context", MOD_PD, MOD_CTL, CTL_NULLCONTEXT, PO(control) },
|
||||||
{ "offset", MOD_DAT, MOD_INT, 0, DO(offset) },
|
{ "offset", MOD_DAT, MOD_INT, 0, DO(offset) },
|
||||||
{ "offset_limit", MOD_CTM, MOD_SIZ, 0, MO(offset_limit)},
|
{ "offset_limit", MOD_CTM, MOD_SIZ, 0, MO(offset_limit)},
|
||||||
{ "ovector", MOD_DAT, MOD_INT, 0, DO(oveccount) },
|
{ "ovector", MOD_DAT, MOD_INT, 0, DO(oveccount) },
|
||||||
{ "parens_nest_limit", MOD_CTC, MOD_INT, 0, CO(parens_nest_limit) },
|
{ "parens_nest_limit", MOD_CTC, MOD_INT, 0, CO(parens_nest_limit) },
|
||||||
{ "partial_hard", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
{ "partial_hard", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
||||||
{ "partial_soft", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
{ "partial_soft", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
||||||
{ "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
{ "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
||||||
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
||||||
{ "posix_nosub", MOD_PAT, MOD_CTL, CTL_POSIX|CTL_POSIX_NOSUB, PO(control) },
|
{ "posix_nosub", MOD_PAT, MOD_CTL, CTL_POSIX|CTL_POSIX_NOSUB, PO(control) },
|
||||||
{ "posix_startend", MOD_DAT, MOD_IN2, 0, DO(startend) },
|
{ "posix_startend", MOD_DAT, MOD_IN2, 0, DO(startend) },
|
||||||
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
||||||
{ "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) },
|
{ "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) },
|
||||||
{ "pushcopy", MOD_PAT, MOD_CTL, CTL_PUSHCOPY, PO(control) },
|
{ "pushcopy", MOD_PAT, MOD_CTL, CTL_PUSHCOPY, PO(control) },
|
||||||
{ "pushtablescopy", MOD_PAT, MOD_CTL, CTL_PUSHTABLESCOPY, PO(control) },
|
{ "pushtablescopy", MOD_PAT, MOD_CTL, CTL_PUSHTABLESCOPY, PO(control) },
|
||||||
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) }, /* Obsolete synonym */
|
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) }, /* Obsolete synonym */
|
||||||
{ "regerror_buffsize", MOD_PAT, MOD_INT, 0, PO(regerror_buffsize) },
|
{ "regerror_buffsize", MOD_PAT, MOD_INT, 0, PO(regerror_buffsize) },
|
||||||
{ "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
|
{ "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
|
||||||
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
||||||
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
|
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
|
||||||
{ "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) },
|
{ "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) },
|
||||||
{ "subject_literal", MOD_PATP, MOD_CTL, CTL2_SUBJECT_LITERAL, PO(control2) },
|
{ "subject_literal", MOD_PATP, MOD_CTL, CTL2_SUBJECT_LITERAL, PO(control2) },
|
||||||
{ "substitute_callout", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_CALLOUT, PO(control2) },
|
{ "substitute_callout", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_CALLOUT, PO(control2) },
|
||||||
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
|
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
|
||||||
{ "substitute_literal", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_LITERAL, PO(control2) },
|
{ "substitute_literal", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_LITERAL, PO(control2) },
|
||||||
{ "substitute_matched", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_MATCHED, PO(control2) },
|
{ "substitute_matched", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_MATCHED, PO(control2) },
|
||||||
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
|
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
|
||||||
{ "substitute_skip", MOD_PND, MOD_INT, 0, PO(substitute_skip) },
|
{ "substitute_replacement_only", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_REPLACEMENT_ONLY, PO(control2) },
|
||||||
{ "substitute_stop", MOD_PND, MOD_INT, 0, PO(substitute_stop) },
|
{ "substitute_skip", MOD_PND, MOD_INT, 0, PO(substitute_skip) },
|
||||||
{ "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
|
{ "substitute_stop", MOD_PND, MOD_INT, 0, PO(substitute_stop) },
|
||||||
{ "substitute_unset_empty", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNSET_EMPTY, PO(control2) },
|
{ "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
|
||||||
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
|
{ "substitute_unset_empty", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNSET_EMPTY, PO(control2) },
|
||||||
{ "ucp", MOD_PATP, MOD_OPT, PCRE2_UCP, PO(options) },
|
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
|
||||||
{ "ungreedy", MOD_PAT, MOD_OPT, PCRE2_UNGREEDY, PO(options) },
|
{ "ucp", MOD_PATP, MOD_OPT, PCRE2_UCP, PO(options) },
|
||||||
{ "use_length", MOD_PAT, MOD_CTL, CTL_USE_LENGTH, PO(control) },
|
{ "ungreedy", MOD_PAT, MOD_OPT, PCRE2_UNGREEDY, PO(options) },
|
||||||
{ "use_offset_limit", MOD_PAT, MOD_OPT, PCRE2_USE_OFFSET_LIMIT, PO(options) },
|
{ "use_length", MOD_PAT, MOD_CTL, CTL_USE_LENGTH, PO(control) },
|
||||||
{ "utf", MOD_PATP, MOD_OPT, PCRE2_UTF, PO(options) },
|
{ "use_offset_limit", MOD_PAT, MOD_OPT, PCRE2_USE_OFFSET_LIMIT, PO(options) },
|
||||||
{ "utf8_input", MOD_PAT, MOD_CTL, CTL_UTF8_INPUT, PO(control) },
|
{ "utf", MOD_PATP, MOD_OPT, PCRE2_UTF, PO(options) },
|
||||||
{ "zero_terminate", MOD_DAT, MOD_CTL, CTL_ZERO_TERMINATE, DO(control) }
|
{ "utf8_input", MOD_PAT, MOD_CTL, CTL_UTF8_INPUT, PO(control) },
|
||||||
|
{ "zero_terminate", MOD_DAT, MOD_CTL, CTL_ZERO_TERMINATE, DO(control) }
|
||||||
};
|
};
|
||||||
|
|
||||||
#define MODLISTCOUNT sizeof(modlist)/sizeof(modstruct)
|
#define MODLISTCOUNT sizeof(modlist)/sizeof(modstruct)
|
||||||
|
@ -4091,7 +4094,7 @@ Returns: nothing
|
||||||
static void
|
static void
|
||||||
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||||
before,
|
before,
|
||||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||||
|
@ -4132,6 +4135,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
|
||||||
((controls2 & CTL2_SUBSTITUTE_LITERAL) != 0)? " substitute_literal" : "",
|
((controls2 & CTL2_SUBSTITUTE_LITERAL) != 0)? " substitute_literal" : "",
|
||||||
((controls2 & CTL2_SUBSTITUTE_MATCHED) != 0)? " substitute_matched" : "",
|
((controls2 & CTL2_SUBSTITUTE_MATCHED) != 0)? " substitute_matched" : "",
|
||||||
((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
|
((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
|
||||||
|
((controls2 & CTL2_SUBSTITUTE_REPLACEMENT_ONLY) != 0)? " substitute_replacement_only" : "",
|
||||||
((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "",
|
((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "",
|
||||||
((controls2 & CTL2_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
|
((controls2 & CTL2_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
|
||||||
((controls & CTL_USE_LENGTH) != 0)? " use_length" : "",
|
((controls & CTL_USE_LENGTH) != 0)? " use_length" : "",
|
||||||
|
@ -7283,6 +7287,8 @@ if (dat_datctl.replacement[0] != 0)
|
||||||
PCRE2_SUBSTITUTE_LITERAL) |
|
PCRE2_SUBSTITUTE_LITERAL) |
|
||||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) == 0)? 0 :
|
(((dat_datctl.control2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) == 0)? 0 :
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) |
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) |
|
||||||
|
(((dat_datctl.control2 & CTL2_SUBSTITUTE_REPLACEMENT_ONLY) == 0)? 0 :
|
||||||
|
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) |
|
||||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) == 0)? 0 :
|
(((dat_datctl.control2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) == 0)? 0 :
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET) |
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET) |
|
||||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :
|
(((dat_datctl.control2 & CTL2_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :
|
||||||
|
|
|
@ -5793,4 +5793,17 @@ a)"xI
|
||||||
/^((\1+)(?C)|\d)+133X$/
|
/^((\1+)(?C)|\d)+133X$/
|
||||||
111133X\=callout_capture
|
111133X\=callout_capture
|
||||||
|
|
||||||
|
/abc/replace=xyz,substitute_replacement_only
|
||||||
|
123abc456
|
||||||
|
|
||||||
|
/a(?<ONE>b)c(?<TWO>d)e/g,replace=X$ONE+${TWO}Z,substitute_replacement_only
|
||||||
|
"abcde-abcde-"
|
||||||
|
|
||||||
|
/a(b)c|xyz/g,replace=<$0>,substitute_callout,substitute_replacement_only
|
||||||
|
abcdefabcpqr
|
||||||
|
abxyzpqrabcxyz
|
||||||
|
12abc34xyz99abc55\=substitute_stop=2
|
||||||
|
12abc34xyz99abc55\=substitute_skip=1
|
||||||
|
12abc34xyz99abc55\=substitute_skip=2
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -17503,6 +17503,39 @@ Callout 0: last capture = 2
|
||||||
1: 11
|
1: 11
|
||||||
2: 11
|
2: 11
|
||||||
|
|
||||||
|
/abc/replace=xyz,substitute_replacement_only
|
||||||
|
123abc456
|
||||||
|
1: xyz
|
||||||
|
|
||||||
|
/a(?<ONE>b)c(?<TWO>d)e/g,replace=X$ONE+${TWO}Z,substitute_replacement_only
|
||||||
|
"abcde-abcde-"
|
||||||
|
2: Xb+dZXb+dZ
|
||||||
|
|
||||||
|
/a(b)c|xyz/g,replace=<$0>,substitute_callout,substitute_replacement_only
|
||||||
|
abcdefabcpqr
|
||||||
|
1(2) Old 0 3 "abc" New 0 5 "<abc>"
|
||||||
|
2(2) Old 6 9 "abc" New 5 10 "<abc>"
|
||||||
|
2: <abc><abc>
|
||||||
|
abxyzpqrabcxyz
|
||||||
|
1(1) Old 2 5 "xyz" New 0 5 "<xyz>"
|
||||||
|
2(2) Old 8 11 "abc" New 5 10 "<abc>"
|
||||||
|
3(1) Old 11 14 "xyz" New 10 15 "<xyz>"
|
||||||
|
3: <xyz><abc><xyz>
|
||||||
|
12abc34xyz99abc55\=substitute_stop=2
|
||||||
|
1(2) Old 2 5 "abc" New 0 5 "<abc>"
|
||||||
|
2(1) Old 7 10 "xyz" New 5 10 "<xyz> STOPPED"
|
||||||
|
2: <abc>
|
||||||
|
12abc34xyz99abc55\=substitute_skip=1
|
||||||
|
1(2) Old 2 5 "abc" New 0 5 "<abc> SKIPPED"
|
||||||
|
2(1) Old 7 10 "xyz" New 0 5 "<xyz>"
|
||||||
|
3(2) Old 12 15 "abc" New 5 10 "<abc>"
|
||||||
|
3: <xyz><abc>
|
||||||
|
12abc34xyz99abc55\=substitute_skip=2
|
||||||
|
1(2) Old 2 5 "abc" New 0 5 "<abc>"
|
||||||
|
2(1) Old 7 10 "xyz" New 5 10 "<xyz> SKIPPED"
|
||||||
|
3(2) Old 12 15 "abc" New 5 10 "<abc>"
|
||||||
|
3: <abc><abc>
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||||
Error -62: bad serialized data
|
Error -62: bad serialized data
|
||||||
|
|
Loading…
Reference in New Issue