Fix global search/replace in pcre2test and pcre2_substitute() when the pattern
matches an empty string, but never at the starting offset.
This commit is contained in:
parent
462f25d7d3
commit
1c79bdf36f
11
ChangeLog
11
ChangeLog
|
@ -90,6 +90,17 @@ standard systems:
|
||||||
when linking pcre2test with MSVC. This gets rid of a stack overflow error in
|
when linking pcre2test with MSVC. This gets rid of a stack overflow error in
|
||||||
the standard set of tests.
|
the standard set of tests.
|
||||||
|
|
||||||
|
20. Output a warning in pcre2test when ignoring the "altglobal" modifier when
|
||||||
|
it is given with the "replace" modifier.
|
||||||
|
|
||||||
|
21. In both pcre2test and pcre2_substitute(), with global matching, a pattern
|
||||||
|
that matched an empty string, but never at the starting match offset, was not
|
||||||
|
handled in a Perl-compatible way. The pattern /(<?=\G.)/ is an example of such
|
||||||
|
a pattern. Because \G is in a lookbehind assertion, there has to be a
|
||||||
|
"bumpalong" before there can be a match. The automatic "advance by one
|
||||||
|
character after an empty string match" rule is therefore inappropriate. A more
|
||||||
|
complicated algorithm has now been implemented.
|
||||||
|
|
||||||
|
|
||||||
Version 10.31 12-February-2018
|
Version 10.31 12-February-2018
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
2
RunTest
2
RunTest
|
@ -500,7 +500,7 @@ for bmode in "$test8" "$test16" "$test32"; do
|
||||||
for opt in "" $jitopt; do
|
for opt in "" $jitopt; do
|
||||||
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput2 testtry
|
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput2 testtry
|
||||||
if [ $? = 0 ] ; then
|
if [ $? = 0 ] ; then
|
||||||
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -65,-62,-2,-1,0,100,101,191,200 >>testtry
|
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -70,-62,-2,-1,0,100,101,191,200 >>testtry
|
||||||
checkresult $? 2 "$opt"
|
checkresult $? 2 "$opt"
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
|
|
|
@ -3154,7 +3154,10 @@ string in <i>outputbuffer</i>, replacing the part that was matched with the
|
||||||
<i>replacement</i> string, whose length is supplied in <b>rlength</b>. This can
|
<i>replacement</i> string, whose length is supplied in <b>rlength</b>. This can
|
||||||
be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
|
be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
|
||||||
which a \K item in a lookahead in the pattern causes the match to end before
|
which a \K item in a lookahead in the pattern causes the match to end before
|
||||||
it starts are not supported, and give rise to an error return.
|
it starts are not supported, and give rise to an error return. For global
|
||||||
|
replacements, matches in which \K in a lookbehind causes the match to start
|
||||||
|
earlier than the point that was reached in the previous iteration are also not
|
||||||
|
supported.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The first seven arguments of <b>pcre2_substitute()</b> are the same as for
|
The first seven arguments of <b>pcre2_substitute()</b> are the same as for
|
||||||
|
@ -3631,7 +3634,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 30 June 2018
|
Last updated: 02 July 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -1084,9 +1084,9 @@ sequences but the characters that they represent.)
|
||||||
Resetting the match start
|
Resetting the match start
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The escape sequence \K causes any previously matched characters not to be
|
In normal use, the escape sequence \K causes any previously matched characters
|
||||||
included in the final matched sequence that is returned. For example, the
|
not to be included in the final matched sequence that is returned. For example,
|
||||||
pattern:
|
the pattern:
|
||||||
<pre>
|
<pre>
|
||||||
foo\Kbar
|
foo\Kbar
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1115,7 +1115,13 @@ PCRE2, \K is acted upon when it occurs inside positive assertions, but is
|
||||||
ignored in negative assertions. Note that when a pattern such as (?=ab\K)
|
ignored in negative assertions. Note that when a pattern such as (?=ab\K)
|
||||||
matches, the reported start of the match can be greater than the end of the
|
matches, the reported start of the match can be greater than the end of the
|
||||||
match. Using \K in a lookbehind assertion at the start of a pattern can also
|
match. Using \K in a lookbehind assertion at the start of a pattern can also
|
||||||
lead to odd effects.
|
lead to odd effects. For example, consider this pattern:
|
||||||
|
<pre>
|
||||||
|
(?<=\Kfoo)bar
|
||||||
|
</pre>
|
||||||
|
If the subject is "foobar", a call to <b>pcre2_match()</b> with a starting
|
||||||
|
offset of 3 succeeds and reports the matching string as "foobar", that is, the
|
||||||
|
start of the reported match is earlier than where the match started.
|
||||||
<a name="smallassertions"></a></P>
|
<a name="smallassertions"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Simple assertions
|
Simple assertions
|
||||||
|
@ -3484,7 +3490,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 28 June 2018
|
Last updated: 30 June 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
371
doc/pcre2.txt
371
doc/pcre2.txt
|
@ -3059,75 +3059,78 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
replacement string, whose length is supplied in rlength. This can be
|
replacement string, whose length is supplied in rlength. This can be
|
||||||
given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
|
given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
|
||||||
which a \K item in a lookahead in the pattern causes the match to end
|
which a \K item in a lookahead in the pattern causes the match to end
|
||||||
before it starts are not supported, and give rise to an error return.
|
before it starts are not supported, and give rise to an error return.
|
||||||
|
For global replacements, matches in which \K in a lookbehind causes the
|
||||||
|
match to start earlier than the point that was reached in the previous
|
||||||
|
iteration are also not supported.
|
||||||
|
|
||||||
The first seven arguments of pcre2_substitute() are the same as for
|
The first seven arguments of pcre2_substitute() are the same as for
|
||||||
pcre2_match(), except that the partial matching options are not permit-
|
pcre2_match(), except that the partial matching options are not permit-
|
||||||
ted, and match_data may be passed as NULL, in which case a match data
|
ted, and match_data may be passed as NULL, in which case a match data
|
||||||
block is obtained and freed within this function, using memory manage-
|
block is obtained and freed within this function, using memory manage-
|
||||||
ment functions from the match context, if provided, or else those that
|
ment functions from the match context, if provided, or else those that
|
||||||
were used to allocate memory for the compiled code.
|
were used to allocate memory for the compiled code.
|
||||||
|
|
||||||
The outlengthptr argument must point to a variable that contains the
|
The outlengthptr argument must point to a variable that contains the
|
||||||
length, in code units, of the output buffer. If the function is suc-
|
length, in code units, of the output buffer. If the function is suc-
|
||||||
cessful, the value is updated to contain the length of the new string,
|
cessful, the value is updated to contain the length of the new string,
|
||||||
excluding the trailing zero that is automatically added.
|
excluding the trailing zero that is automatically added.
|
||||||
|
|
||||||
If the function is not successful, the value set via outlengthptr
|
If the function is not successful, the value set via outlengthptr
|
||||||
depends on the type of error. For syntax errors in the replacement
|
depends on the type of error. For syntax errors in the replacement
|
||||||
string, the value is the offset in the replacement string where the
|
string, the value is the offset in the replacement string where the
|
||||||
error was detected. For other errors, the value is PCRE2_UNSET by
|
error was detected. For other errors, the value is PCRE2_UNSET by
|
||||||
default. This includes the case of the output buffer being too small,
|
default. This includes the case of the output buffer being too small,
|
||||||
unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which
|
unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which
|
||||||
case the value is the minimum length needed, including space for the
|
case the value is the minimum length needed, including space for the
|
||||||
trailing zero. Note that in order to compute the required length,
|
trailing zero. Note that in order to compute the required length,
|
||||||
pcre2_substitute() has to simulate all the matching and copying,
|
pcre2_substitute() has to simulate all the matching and copying,
|
||||||
instead of giving an error return as soon as the buffer overflows. Note
|
instead of giving an error return as soon as the buffer overflows. Note
|
||||||
also that the length is in code units, not bytes.
|
also that the length is in code units, not bytes.
|
||||||
|
|
||||||
In the replacement string, which is interpreted as a UTF string in UTF
|
In the replacement string, which is interpreted as a UTF string in UTF
|
||||||
mode, and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK
|
mode, and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK
|
||||||
option is set, a dollar character is an escape character that can spec-
|
option is set, a dollar character is an escape character that can spec-
|
||||||
ify the insertion of characters from capturing groups or (*MARK),
|
ify the insertion of characters from capturing groups or (*MARK),
|
||||||
(*PRUNE), or (*THEN) items in the pattern. The following forms are
|
(*PRUNE), or (*THEN) items in the pattern. The following forms are
|
||||||
always recognized:
|
always recognized:
|
||||||
|
|
||||||
$$ insert a dollar character
|
$$ insert a dollar character
|
||||||
$<n> or ${<n>} insert the contents of group <n>
|
$<n> or ${<n>} insert the contents of group <n>
|
||||||
$*MARK or ${*MARK} insert a (*MARK), (*PRUNE), or (*THEN) name
|
$*MARK or ${*MARK} insert a (*MARK), (*PRUNE), or (*THEN) name
|
||||||
|
|
||||||
Either a group number or a group name can be given for <n>. Curly
|
Either a group number or a group name can be given for <n>. Curly
|
||||||
brackets are required only if the following character would be inter-
|
brackets are required only if the following character would be inter-
|
||||||
preted as part of the number or name. The number may be zero to include
|
preted as part of the number or name. The number may be zero to include
|
||||||
the entire matched string. For example, if the pattern a(b)c is
|
the entire matched string. For example, if the pattern a(b)c is
|
||||||
matched with "=abc=" and the replacement string "+$1$0$1+", the result
|
matched with "=abc=" and the replacement string "+$1$0$1+", the result
|
||||||
is "=+babcb+=".
|
is "=+babcb+=".
|
||||||
|
|
||||||
$*MARK inserts the name from the last encountered (*MARK), (*PRUNE), or
|
$*MARK inserts the name from the last encountered (*MARK), (*PRUNE), or
|
||||||
(*THEN) on the matching path that has a name. (*MARK) must always
|
(*THEN) on the matching path that has a name. (*MARK) must always
|
||||||
include a name, but (*PRUNE) and (*THEN) need not. For example, in the
|
include a name, but (*PRUNE) and (*THEN) need not. For example, in the
|
||||||
case of (*MARK:A)(*PRUNE) the name inserted is "A", but for
|
case of (*MARK:A)(*PRUNE) the name inserted is "A", but for
|
||||||
(*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be
|
(*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be
|
||||||
used to perform simple simultaneous substitutions, as this pcre2test
|
used to perform simple simultaneous substitutions, as this pcre2test
|
||||||
example shows:
|
example shows:
|
||||||
|
|
||||||
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
||||||
apple lemon
|
apple lemon
|
||||||
2: pear orange
|
2: pear orange
|
||||||
|
|
||||||
As well as the usual options for pcre2_match(), a number of additional
|
As well as the usual options for pcre2_match(), a number of additional
|
||||||
options can be set in the options argument of pcre2_substitute().
|
options can be set in the options argument of pcre2_substitute().
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
|
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
|
||||||
string, replacing every matching substring. If this option is not set,
|
string, replacing every matching substring. If this option is not set,
|
||||||
only the first matching substring is replaced. The search for matches
|
only the first matching substring is replaced. The search for matches
|
||||||
takes place in the original subject string (that is, previous replace-
|
takes place in the original subject string (that is, previous replace-
|
||||||
ments do not affect it). Iteration is implemented by advancing the
|
ments do not affect it). Iteration is implemented by advancing the
|
||||||
startoffset value for each search, which is always passed the entire
|
startoffset value for each search, which is always passed the entire
|
||||||
subject string. If an offset limit is set in the match context, search-
|
subject string. If an offset limit is set in the match context, search-
|
||||||
ing stops when that limit is reached.
|
ing stops when that limit is reached.
|
||||||
|
|
||||||
You can restrict the effect of a global substitution to a portion of
|
You can restrict the effect of a global substitution to a portion of
|
||||||
the subject string by setting either or both of startoffset and an off-
|
the subject string by setting either or both of startoffset and an off-
|
||||||
set limit. Here is a pcre2test example:
|
set limit. Here is a pcre2test example:
|
||||||
|
|
||||||
|
@ -3135,87 +3138,87 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
ABC ABC ABC ABC\=offset=3,offset_limit=12
|
ABC ABC ABC ABC\=offset=3,offset_limit=12
|
||||||
2: ABC A!C A!C ABC
|
2: ABC A!C A!C ABC
|
||||||
|
|
||||||
When continuing with global substitutions after matching a substring
|
When continuing with global substitutions after matching a substring
|
||||||
with zero length, an attempt to find a non-empty match at the same off-
|
with zero length, an attempt to find a non-empty match at the same off-
|
||||||
set is performed. If this is not successful, the offset is advanced by
|
set is performed. If this is not successful, the offset is advanced by
|
||||||
one character except when CRLF is a valid newline sequence and the next
|
one character except when CRLF is a valid newline sequence and the next
|
||||||
two characters are CR, LF. In this case, the offset is advanced by two
|
two characters are CR, LF. In this case, the offset is advanced by two
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
||||||
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
|
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
|
||||||
ORY immediately. If this option is set, however, pcre2_substitute()
|
ORY immediately. If this option is set, however, pcre2_substitute()
|
||||||
continues to go through the motions of matching and substituting (with-
|
continues to go through the motions of matching and substituting (with-
|
||||||
out, of course, writing anything) in order to compute the size of buf-
|
out, of course, writing anything) in order to compute the size of buf-
|
||||||
fer that is needed. This value is passed back via the outlengthptr
|
fer that is needed. This value is passed back via the outlengthptr
|
||||||
variable, with the result of the function still being
|
variable, with the result of the function still being
|
||||||
PCRE2_ERROR_NOMEMORY.
|
PCRE2_ERROR_NOMEMORY.
|
||||||
|
|
||||||
Passing a buffer size of zero is a permitted way of finding out how
|
Passing a buffer size of zero is a permitted way of finding out how
|
||||||
much memory is needed for given substitution. However, this does mean
|
much memory is needed for given substitution. However, this does mean
|
||||||
that the entire operation is carried out twice. Depending on the appli-
|
that the entire operation is carried out twice. Depending on the appli-
|
||||||
cation, it may be more efficient to allocate a large buffer and free
|
cation, it may be more efficient to allocate a large buffer and free
|
||||||
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
||||||
FLOW_LENGTH.
|
FLOW_LENGTH.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capturing groups
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capturing groups
|
||||||
that do not appear in the pattern to be treated as unset groups. This
|
that do not appear in the pattern to be treated as unset groups. This
|
||||||
option should be used with care, because it means that a typo in a
|
option should be used with care, because it means that a typo in a
|
||||||
group name or number no longer causes the PCRE2_ERROR_NOSUBSTRING
|
group name or number no longer causes the PCRE2_ERROR_NOSUBSTRING
|
||||||
error.
|
error.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups (including
|
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups (including
|
||||||
unknown groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be
|
unknown groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be
|
||||||
treated as empty strings when inserted as described above. If this
|
treated as empty strings when inserted as described above. If this
|
||||||
option is not set, an attempt to insert an unset group causes the
|
option is not set, an attempt to insert an unset group causes the
|
||||||
PCRE2_ERROR_UNSET error. This option does not influence the extended
|
PCRE2_ERROR_UNSET error. This option does not influence the extended
|
||||||
substitution syntax described below.
|
substitution syntax described below.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
|
PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
|
||||||
replacement string. Without this option, only the dollar character is
|
replacement string. Without this option, only the dollar character is
|
||||||
special, and only the group insertion forms listed above are valid.
|
special, and only the group insertion forms listed above are valid.
|
||||||
When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
|
When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
|
||||||
|
|
||||||
Firstly, backslash in a replacement string is interpreted as an escape
|
Firstly, backslash in a replacement string is interpreted as an escape
|
||||||
character. The usual forms such as \n or \x{ddd} can be used to specify
|
character. The usual forms such as \n or \x{ddd} can be used to specify
|
||||||
particular character codes, and backslash followed by any non-alphanu-
|
particular character codes, and backslash followed by any non-alphanu-
|
||||||
meric character quotes that character. Extended quoting can be coded
|
meric character quotes that character. Extended quoting can be coded
|
||||||
using \Q...\E, exactly as in pattern strings.
|
using \Q...\E, exactly as in pattern strings.
|
||||||
|
|
||||||
There are also four escape sequences for forcing the case of inserted
|
There are also four escape sequences for forcing the case of inserted
|
||||||
letters. The insertion mechanism has three states: no case forcing,
|
letters. The insertion mechanism has three states: no case forcing,
|
||||||
force upper case, and force lower case. The escape sequences change the
|
force upper case, and force lower case. The escape sequences change the
|
||||||
current state: \U and \L change to upper or lower case forcing, respec-
|
current state: \U and \L change to upper or lower case forcing, respec-
|
||||||
tively, and \E (when not terminating a \Q quoted sequence) reverts to
|
tively, and \E (when not terminating a \Q quoted sequence) reverts to
|
||||||
no case forcing. The sequences \u and \l force the next character (if
|
no case forcing. The sequences \u and \l force the next character (if
|
||||||
it is a letter) to upper or lower case, respectively, and then the
|
it is a letter) to upper or lower case, respectively, and then the
|
||||||
state automatically reverts to no case forcing. Case forcing applies to
|
state automatically reverts to no case forcing. Case forcing applies to
|
||||||
all inserted characters, including those from captured groups and let-
|
all inserted characters, including those from captured groups and let-
|
||||||
ters within \Q...\E quoted sequences.
|
ters within \Q...\E quoted sequences.
|
||||||
|
|
||||||
Note that case forcing sequences such as \U...\E do not nest. For exam-
|
Note that case forcing sequences such as \U...\E do not nest. For exam-
|
||||||
ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
|
ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
|
||||||
\E has no effect.
|
\E has no effect.
|
||||||
|
|
||||||
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
||||||
flexibility to group substitution. The syntax is similar to that used
|
flexibility to group substitution. The syntax is similar to that used
|
||||||
by Bash:
|
by Bash:
|
||||||
|
|
||||||
${<n>:-<string>}
|
${<n>:-<string>}
|
||||||
${<n>:+<string1>:<string2>}
|
${<n>:+<string1>:<string2>}
|
||||||
|
|
||||||
As before, <n> may be a group number or a name. The first form speci-
|
As before, <n> may be a group number or a name. The first form speci-
|
||||||
fies a default value. If group <n> is set, its value is inserted; if
|
fies a default value. If group <n> is set, its value is inserted; if
|
||||||
not, <string> is expanded and the result inserted. The second form
|
not, <string> is expanded and the result inserted. The second form
|
||||||
specifies strings that are expanded and inserted when group <n> is set
|
specifies strings that are expanded and inserted when group <n> is set
|
||||||
or unset, respectively. The first form is just a convenient shorthand
|
or unset, respectively. The first form is just a convenient shorthand
|
||||||
for
|
for
|
||||||
|
|
||||||
${<n>:+${<n>}:<string>}
|
${<n>:+${<n>}:<string>}
|
||||||
|
|
||||||
Backslash can be used to escape colons and closing curly brackets in
|
Backslash can be used to escape colons and closing curly brackets in
|
||||||
the replacement strings. A change of the case forcing state within a
|
the replacement strings. A change of the case forcing state within a
|
||||||
replacement string remains in force afterwards, as shown in this
|
replacement string remains in force afterwards, as shown in this
|
||||||
pcre2test example:
|
pcre2test example:
|
||||||
|
|
||||||
/(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
|
/(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
|
||||||
|
@ -3224,42 +3227,42 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
somebody
|
somebody
|
||||||
1: HELLO
|
1: HELLO
|
||||||
|
|
||||||
The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
||||||
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause
|
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause
|
||||||
unknown groups in the extended syntax forms to be treated as unset.
|
unknown groups in the extended syntax forms to be treated as unset.
|
||||||
|
|
||||||
If successful, pcre2_substitute() returns the number of replacements
|
If successful, pcre2_substitute() returns the number of replacements
|
||||||
that were made. This may be zero if no matches were found, and is never
|
that were made. This may be zero if no matches were found, and is never
|
||||||
greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
|
greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
|
||||||
|
|
||||||
In the event of an error, a negative error code is returned. Except for
|
In the event of an error, a negative error code is returned. Except for
|
||||||
PCRE2_ERROR_NOMATCH (which is never returned), errors from
|
PCRE2_ERROR_NOMATCH (which is never returned), errors from
|
||||||
pcre2_match() are passed straight back.
|
pcre2_match() are passed straight back.
|
||||||
|
|
||||||
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
|
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
|
||||||
tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
|
tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
|
||||||
|
|
||||||
PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
|
PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
|
||||||
ing an unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
|
ing an unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
|
||||||
when the simple (non-extended) syntax is used and PCRE2_SUBSTI-
|
when the simple (non-extended) syntax is used and PCRE2_SUBSTI-
|
||||||
TUTE_UNSET_EMPTY is not set.
|
TUTE_UNSET_EMPTY is not set.
|
||||||
|
|
||||||
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big
|
PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big
|
||||||
enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
|
enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
|
||||||
of buffer that is needed is returned via outlengthptr. Note that this
|
of buffer that is needed is returned via outlengthptr. Note that this
|
||||||
does not happen by default.
|
does not happen by default.
|
||||||
|
|
||||||
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in
|
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in
|
||||||
the replacement string, with more particular errors being
|
the replacement string, with more particular errors being
|
||||||
PCRE2_ERROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REP-
|
PCRE2_ERROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REP-
|
||||||
MISSINGBRACE (closing curly bracket not found), PCRE2_ERROR_BADSUBSTI-
|
MISSINGBRACE (closing curly bracket not found), PCRE2_ERROR_BADSUBSTI-
|
||||||
TUTION (syntax error in extended group substitution), and
|
TUTION (syntax error in extended group substitution), and
|
||||||
PCRE2_ERROR_BADSUBSPATTERN (the pattern match ended before it started
|
PCRE2_ERROR_BADSUBSPATTERN (the pattern match ended before it started
|
||||||
or the match started earlier than the current position in the subject,
|
or the match started earlier than the current position in the subject,
|
||||||
which can happen if \K is used in an assertion).
|
which can happen if \K is used in an assertion).
|
||||||
|
|
||||||
As for all PCRE2 errors, a text message that describes the error can be
|
As for all PCRE2 errors, a text message that describes the error can be
|
||||||
obtained by calling the pcre2_get_error_message() function (see
|
obtained by calling the pcre2_get_error_message() function (see
|
||||||
"Obtaining a textual error message" above).
|
"Obtaining a textual error message" above).
|
||||||
|
|
||||||
|
|
||||||
|
@ -3268,56 +3271,56 @@ DUPLICATE SUBPATTERN NAMES
|
||||||
int pcre2_substring_nametable_scan(const pcre2_code *code,
|
int pcre2_substring_nametable_scan(const pcre2_code *code,
|
||||||
PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);
|
PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);
|
||||||
|
|
||||||
When a pattern is compiled with the PCRE2_DUPNAMES option, names for
|
When a pattern is compiled with the PCRE2_DUPNAMES option, names for
|
||||||
subpatterns are not required to be unique. Duplicate names are always
|
subpatterns are not required to be unique. Duplicate names are always
|
||||||
allowed for subpatterns with the same number, created by using the (?|
|
allowed for subpatterns with the same number, created by using the (?|
|
||||||
feature. Indeed, if such subpatterns are named, they are required to
|
feature. Indeed, if such subpatterns are named, they are required to
|
||||||
use the same names.
|
use the same names.
|
||||||
|
|
||||||
Normally, patterns with duplicate names are such that in any one match,
|
Normally, patterns with duplicate names are such that in any one match,
|
||||||
only one of the named subpatterns participates. An example is shown in
|
only one of the named subpatterns participates. An example is shown in
|
||||||
the pcre2pattern documentation.
|
the pcre2pattern documentation.
|
||||||
|
|
||||||
When duplicates are present, pcre2_substring_copy_byname() and
|
When duplicates are present, pcre2_substring_copy_byname() and
|
||||||
pcre2_substring_get_byname() return the first substring corresponding
|
pcre2_substring_get_byname() return the first substring corresponding
|
||||||
to the given name that is set. Only if none are set is
|
to the given name that is set. Only if none are set is
|
||||||
PCRE2_ERROR_UNSET is returned. The pcre2_substring_number_from_name()
|
PCRE2_ERROR_UNSET is returned. The pcre2_substring_number_from_name()
|
||||||
function returns the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are
|
function returns the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are
|
||||||
duplicate names.
|
duplicate names.
|
||||||
|
|
||||||
If you want to get full details of all captured substrings for a given
|
If you want to get full details of all captured substrings for a given
|
||||||
name, you must use the pcre2_substring_nametable_scan() function. The
|
name, you must use the pcre2_substring_nametable_scan() function. The
|
||||||
first argument is the compiled pattern, and the second is the name. If
|
first argument is the compiled pattern, and the second is the name. If
|
||||||
the third and fourth arguments are NULL, the function returns a group
|
the third and fourth arguments are NULL, the function returns a group
|
||||||
number for a unique name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise.
|
number for a unique name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise.
|
||||||
|
|
||||||
When the third and fourth arguments are not NULL, they must be pointers
|
When the third and fourth arguments are not NULL, they must be pointers
|
||||||
to variables that are updated by the function. After it has run, they
|
to variables that are updated by the function. After it has run, they
|
||||||
point to the first and last entries in the name-to-number table for the
|
point to the first and last entries in the name-to-number table for the
|
||||||
given name, and the function returns the length of each entry in code
|
given name, and the function returns the length of each entry in code
|
||||||
units. In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there are
|
units. In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there are
|
||||||
no entries for the given name.
|
no entries for the given name.
|
||||||
|
|
||||||
The format of the name table is described above in the section entitled
|
The format of the name table is described above in the section entitled
|
||||||
Information about a pattern. Given all the relevant entries for the
|
Information about a pattern. Given all the relevant entries for the
|
||||||
name, you can extract each of their numbers, and hence the captured
|
name, you can extract each of their numbers, and hence the captured
|
||||||
data.
|
data.
|
||||||
|
|
||||||
|
|
||||||
FINDING ALL POSSIBLE MATCHES AT ONE POSITION
|
FINDING ALL POSSIBLE MATCHES AT ONE POSITION
|
||||||
|
|
||||||
The traditional matching function uses a similar algorithm to Perl,
|
The traditional matching function uses a similar algorithm to Perl,
|
||||||
which stops when it finds the first match at a given point in the sub-
|
which stops when it finds the first match at a given point in the sub-
|
||||||
ject. If you want to find all possible matches, or the longest possible
|
ject. If you want to find all possible matches, or the longest possible
|
||||||
match at a given position, consider using the alternative matching
|
match at a given position, consider using the alternative matching
|
||||||
function (see below) instead. If you cannot use the alternative func-
|
function (see below) instead. If you cannot use the alternative func-
|
||||||
tion, you can kludge it up by making use of the callout facility, which
|
tion, you can kludge it up by making use of the callout facility, which
|
||||||
is described in the pcre2callout documentation.
|
is described in the pcre2callout documentation.
|
||||||
|
|
||||||
What you have to do is to insert a callout right at the end of the pat-
|
What you have to do is to insert a callout right at the end of the pat-
|
||||||
tern. When your callout function is called, extract and save the cur-
|
tern. When your callout function is called, extract and save the cur-
|
||||||
rent matched substring. Then return 1, which forces pcre2_match() to
|
rent matched substring. Then return 1, which forces pcre2_match() to
|
||||||
backtrack and try other alternatives. Ultimately, when it runs out of
|
backtrack and try other alternatives. Ultimately, when it runs out of
|
||||||
matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.
|
matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.
|
||||||
|
|
||||||
|
|
||||||
|
@ -3329,26 +3332,26 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
pcre2_match_context *mcontext,
|
pcre2_match_context *mcontext,
|
||||||
int *workspace, PCRE2_SIZE wscount);
|
int *workspace, PCRE2_SIZE wscount);
|
||||||
|
|
||||||
The function pcre2_dfa_match() is called to match a subject string
|
The function pcre2_dfa_match() is called to match a subject string
|
||||||
against a compiled pattern, using a matching algorithm that scans the
|
against a compiled pattern, using a matching algorithm that scans the
|
||||||
subject string just once (not counting lookaround assertions), and does
|
subject string just once (not counting lookaround assertions), and does
|
||||||
not backtrack. This has different characteristics to the normal algo-
|
not backtrack. This has different characteristics to the normal algo-
|
||||||
rithm, and is not compatible with Perl. Some of the features of PCRE2
|
rithm, and is not compatible with Perl. Some of the features of PCRE2
|
||||||
patterns are not supported. Nevertheless, there are times when this
|
patterns are not supported. Nevertheless, there are times when this
|
||||||
kind of matching can be useful. For a discussion of the two matching
|
kind of matching can be useful. For a discussion of the two matching
|
||||||
algorithms, and a list of features that pcre2_dfa_match() does not sup-
|
algorithms, and a list of features that pcre2_dfa_match() does not sup-
|
||||||
port, see the pcre2matching documentation.
|
port, see the pcre2matching documentation.
|
||||||
|
|
||||||
The arguments for the pcre2_dfa_match() function are the same as for
|
The arguments for the pcre2_dfa_match() function are the same as for
|
||||||
pcre2_match(), plus two extras. The ovector within the match data block
|
pcre2_match(), plus two extras. The ovector within the match data block
|
||||||
is used in a different way, and this is described below. The other com-
|
is used in a different way, and this is described below. The other com-
|
||||||
mon arguments are used in the same way as for pcre2_match(), so their
|
mon arguments are used in the same way as for pcre2_match(), so their
|
||||||
description is not repeated here.
|
description is not repeated here.
|
||||||
|
|
||||||
The two additional arguments provide workspace for the function. The
|
The two additional arguments provide workspace for the function. The
|
||||||
workspace vector should contain at least 20 elements. It is used for
|
workspace vector should contain at least 20 elements. It is used for
|
||||||
keeping track of multiple paths through the pattern tree. More
|
keeping track of multiple paths through the pattern tree. More
|
||||||
workspace is needed for patterns and subjects where there are a lot of
|
workspace is needed for patterns and subjects where there are a lot of
|
||||||
potential matches.
|
potential matches.
|
||||||
|
|
||||||
Here is an example of a simple call to pcre2_dfa_match():
|
Here is an example of a simple call to pcre2_dfa_match():
|
||||||
|
@ -3368,45 +3371,45 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
|
|
||||||
Option bits for pcre_dfa_match()
|
Option bits for pcre_dfa_match()
|
||||||
|
|
||||||
The unused bits of the options argument for pcre2_dfa_match() must be
|
The unused bits of the options argument for pcre2_dfa_match() must be
|
||||||
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDAN-
|
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDAN-
|
||||||
CHORED, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
|
CHORED, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
|
||||||
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
|
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
|
||||||
PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but
|
PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but
|
||||||
the last four of these are exactly the same as for pcre2_match(), so
|
the last four of these are exactly the same as for pcre2_match(), so
|
||||||
their description is not repeated here.
|
their description is not repeated here.
|
||||||
|
|
||||||
PCRE2_PARTIAL_HARD
|
PCRE2_PARTIAL_HARD
|
||||||
PCRE2_PARTIAL_SOFT
|
PCRE2_PARTIAL_SOFT
|
||||||
|
|
||||||
These have the same general effect as they do for pcre2_match(), but
|
These have the same general effect as they do for pcre2_match(), but
|
||||||
the details are slightly different. When PCRE2_PARTIAL_HARD is set for
|
the details are slightly different. When PCRE2_PARTIAL_HARD is set for
|
||||||
pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if the end of the
|
pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if the end of the
|
||||||
subject is reached and there is still at least one matching possibility
|
subject is reached and there is still at least one matching possibility
|
||||||
that requires additional characters. This happens even if some complete
|
that requires additional characters. This happens even if some complete
|
||||||
matches have already been found. When PCRE2_PARTIAL_SOFT is set, the
|
matches have already been found. When PCRE2_PARTIAL_SOFT is set, the
|
||||||
return code PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL
|
return code PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL
|
||||||
if the end of the subject is reached, there have been no complete
|
if the end of the subject is reached, there have been no complete
|
||||||
matches, but there is still at least one matching possibility. The por-
|
matches, but there is still at least one matching possibility. The por-
|
||||||
tion of the string that was inspected when the longest partial match
|
tion of the string that was inspected when the longest partial match
|
||||||
was found is set as the first matching string in both cases. There is a
|
was found is set as the first matching string in both cases. There is a
|
||||||
more detailed discussion of partial and multi-segment matching, with
|
more detailed discussion of partial and multi-segment matching, with
|
||||||
examples, in the pcre2partial documentation.
|
examples, in the pcre2partial documentation.
|
||||||
|
|
||||||
PCRE2_DFA_SHORTEST
|
PCRE2_DFA_SHORTEST
|
||||||
|
|
||||||
Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to
|
Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to
|
||||||
stop as soon as it has found one match. Because of the way the alterna-
|
stop as soon as it has found one match. Because of the way the alterna-
|
||||||
tive algorithm works, this is necessarily the shortest possible match
|
tive algorithm works, this is necessarily the shortest possible match
|
||||||
at the first possible matching point in the subject string.
|
at the first possible matching point in the subject string.
|
||||||
|
|
||||||
PCRE2_DFA_RESTART
|
PCRE2_DFA_RESTART
|
||||||
|
|
||||||
When pcre2_dfa_match() returns a partial match, it is possible to call
|
When pcre2_dfa_match() returns a partial match, it is possible to call
|
||||||
it again, with additional subject characters, and have it continue with
|
it again, with additional subject characters, and have it continue with
|
||||||
the same match. The PCRE2_DFA_RESTART option requests this action; when
|
the same match. The PCRE2_DFA_RESTART option requests this action; when
|
||||||
it is set, the workspace and wscount options must reference the same
|
it is set, the workspace and wscount options must reference the same
|
||||||
vector as before because data about the match so far is left in them
|
vector as before because data about the match so far is left in them
|
||||||
after a partial match. There is more discussion of this facility in the
|
after a partial match. There is more discussion of this facility in the
|
||||||
pcre2partial documentation.
|
pcre2partial documentation.
|
||||||
|
|
||||||
|
@ -3414,8 +3417,8 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
|
|
||||||
When pcre2_dfa_match() succeeds, it may have matched more than one sub-
|
When pcre2_dfa_match() succeeds, it may have matched more than one sub-
|
||||||
string in the subject. Note, however, that all the matches from one run
|
string in the subject. Note, however, that all the matches from one run
|
||||||
of the function start at the same point in the subject. The shorter
|
of the function start at the same point in the subject. The shorter
|
||||||
matches are all initial substrings of the longer matches. For example,
|
matches are all initial substrings of the longer matches. For example,
|
||||||
if the pattern
|
if the pattern
|
||||||
|
|
||||||
<.*>
|
<.*>
|
||||||
|
@ -3430,73 +3433,73 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
<something> <something else>
|
<something> <something else>
|
||||||
<something>
|
<something>
|
||||||
|
|
||||||
On success, the yield of the function is a number greater than zero,
|
On success, the yield of the function is a number greater than zero,
|
||||||
which is the number of matched substrings. The offsets of the sub-
|
which is the number of matched substrings. The offsets of the sub-
|
||||||
strings are returned in the ovector, and can be extracted by number in
|
strings are returned in the ovector, and can be extracted by number in
|
||||||
the same way as for pcre2_match(), but the numbers bear no relation to
|
the same way as for pcre2_match(), but the numbers bear no relation to
|
||||||
any capturing groups that may exist in the pattern, because DFA match-
|
any capturing groups that may exist in the pattern, because DFA match-
|
||||||
ing does not support group capture.
|
ing does not support group capture.
|
||||||
|
|
||||||
Calls to the convenience functions that extract substrings by name
|
Calls to the convenience functions that extract substrings by name
|
||||||
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used
|
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used
|
||||||
after a DFA match. The convenience functions that extract substrings by
|
after a DFA match. The convenience functions that extract substrings by
|
||||||
number never return PCRE2_ERROR_NOSUBSTRING.
|
number never return PCRE2_ERROR_NOSUBSTRING.
|
||||||
|
|
||||||
The matched strings are stored in the ovector in reverse order of
|
The matched strings are stored in the ovector in reverse order of
|
||||||
length; that is, the longest matching string is first. If there were
|
length; that is, the longest matching string is first. If there were
|
||||||
too many matches to fit into the ovector, the yield of the function is
|
too many matches to fit into the ovector, the yield of the function is
|
||||||
zero, and the vector is filled with the longest matches.
|
zero, and the vector is filled with the longest matches.
|
||||||
|
|
||||||
NOTE: PCRE2's "auto-possessification" optimization usually applies to
|
NOTE: PCRE2's "auto-possessification" optimization usually applies to
|
||||||
character repeats at the end of a pattern (as well as internally). For
|
character repeats at the end of a pattern (as well as internally). For
|
||||||
example, the pattern "a\d+" is compiled as if it were "a\d++". For DFA
|
example, the pattern "a\d+" is compiled as if it were "a\d++". For DFA
|
||||||
matching, this means that only one possible match is found. If you
|
matching, this means that only one possible match is found. If you
|
||||||
really do want multiple matches in such cases, either use an ungreedy
|
really do want multiple matches in such cases, either use an ungreedy
|
||||||
repeat such as "a\d+?" or set the PCRE2_NO_AUTO_POSSESS option when
|
repeat such as "a\d+?" or set the PCRE2_NO_AUTO_POSSESS option when
|
||||||
compiling.
|
compiling.
|
||||||
|
|
||||||
Error returns from pcre2_dfa_match()
|
Error returns from pcre2_dfa_match()
|
||||||
|
|
||||||
The pcre2_dfa_match() function returns a negative number when it fails.
|
The pcre2_dfa_match() function returns a negative number when it fails.
|
||||||
Many of the errors are the same as for pcre2_match(), as described
|
Many of the errors are the same as for pcre2_match(), as described
|
||||||
above. There are in addition the following errors that are specific to
|
above. There are in addition the following errors that are specific to
|
||||||
pcre2_dfa_match():
|
pcre2_dfa_match():
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_UITEM
|
PCRE2_ERROR_DFA_UITEM
|
||||||
|
|
||||||
This return is given if pcre2_dfa_match() encounters an item in the
|
This return is given if pcre2_dfa_match() encounters an item in the
|
||||||
pattern that it does not support, for instance, the use of \C in a UTF
|
pattern that it does not support, for instance, the use of \C in a UTF
|
||||||
mode or a backreference.
|
mode or a backreference.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_UCOND
|
PCRE2_ERROR_DFA_UCOND
|
||||||
|
|
||||||
This return is given if pcre2_dfa_match() encounters a condition item
|
This return is given if pcre2_dfa_match() encounters a condition item
|
||||||
that uses a backreference for the condition, or a test for recursion in
|
that uses a backreference for the condition, or a test for recursion in
|
||||||
a specific group. These are not supported.
|
a specific group. These are not supported.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_WSSIZE
|
PCRE2_ERROR_DFA_WSSIZE
|
||||||
|
|
||||||
This return is given if pcre2_dfa_match() runs out of space in the
|
This return is given if pcre2_dfa_match() runs out of space in the
|
||||||
workspace vector.
|
workspace vector.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_RECURSE
|
PCRE2_ERROR_DFA_RECURSE
|
||||||
|
|
||||||
When a recursive subpattern is processed, the matching function calls
|
When a recursive subpattern is processed, the matching function calls
|
||||||
itself recursively, using private memory for the ovector and workspace.
|
itself recursively, using private memory for the ovector and workspace.
|
||||||
This error is given if the internal ovector is not large enough. This
|
This error is given if the internal ovector is not large enough. This
|
||||||
should be extremely rare, as a vector of size 1000 is used.
|
should be extremely rare, as a vector of size 1000 is used.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_BADRESTART
|
PCRE2_ERROR_DFA_BADRESTART
|
||||||
|
|
||||||
When pcre2_dfa_match() is called with the PCRE2_DFA_RESTART option,
|
When pcre2_dfa_match() is called with the PCRE2_DFA_RESTART option,
|
||||||
some plausibility checks are made on the contents of the workspace,
|
some plausibility checks are made on the contents of the workspace,
|
||||||
which should contain data about the previous partial match. If any of
|
which should contain data about the previous partial match. If any of
|
||||||
these checks fail, this error is given.
|
these checks fail, this error is given.
|
||||||
|
|
||||||
|
|
||||||
SEE ALSO
|
SEE ALSO
|
||||||
|
|
||||||
pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3),
|
pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3),
|
||||||
pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).
|
pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).
|
||||||
|
|
||||||
|
|
||||||
|
@ -3509,7 +3512,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 30 June 2018
|
Last updated: 02 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -6664,9 +6667,9 @@ BACKSLASH
|
||||||
|
|
||||||
Resetting the match start
|
Resetting the match start
|
||||||
|
|
||||||
The escape sequence \K causes any previously matched characters not to
|
In normal use, the escape sequence \K causes any previously matched
|
||||||
be included in the final matched sequence that is returned. For exam-
|
characters not to be included in the final matched sequence that is
|
||||||
ple, the pattern:
|
returned. For example, the pattern:
|
||||||
|
|
||||||
foo\Kbar
|
foo\Kbar
|
||||||
|
|
||||||
|
@ -6692,7 +6695,15 @@ BACKSLASH
|
||||||
assertions, but is ignored in negative assertions. Note that when a
|
assertions, but is ignored in negative assertions. Note that when a
|
||||||
pattern such as (?=ab\K) matches, the reported start of the match can
|
pattern such as (?=ab\K) matches, the reported start of the match can
|
||||||
be greater than the end of the match. Using \K in a lookbehind asser-
|
be greater than the end of the match. Using \K in a lookbehind asser-
|
||||||
tion at the start of a pattern can also lead to odd effects.
|
tion at the start of a pattern can also lead to odd effects. For exam-
|
||||||
|
ple, consider this pattern:
|
||||||
|
|
||||||
|
(?<=\Kfoo)bar
|
||||||
|
|
||||||
|
If the subject is "foobar", a call to pcre2_match() with a starting
|
||||||
|
offset of 3 succeeds and reports the matching string as "foobar", that
|
||||||
|
is, the start of the reported match is earlier than where the match
|
||||||
|
started.
|
||||||
|
|
||||||
Simple assertions
|
Simple assertions
|
||||||
|
|
||||||
|
@ -8930,7 +8941,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 28 June 2018
|
Last updated: 30 June 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "30 June 2018" "PCRE2 10.32"
|
.TH PCRE2API 3 "02 July 2018" "PCRE2 10.32"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -3163,7 +3163,10 @@ string in \fIoutputbuffer\fP, replacing the part that was matched with the
|
||||||
\fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This can
|
\fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This can
|
||||||
be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
|
be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
|
||||||
which a \eK item in a lookahead in the pattern causes the match to end before
|
which a \eK item in a lookahead in the pattern causes the match to end before
|
||||||
it starts are not supported, and give rise to an error return.
|
it starts are not supported, and give rise to an error return. For global
|
||||||
|
replacements, matches in which \eK in a lookbehind causes the match to start
|
||||||
|
earlier than the point that was reached in the previous iteration are also not
|
||||||
|
supported.
|
||||||
.P
|
.P
|
||||||
The first seven arguments of \fBpcre2_substitute()\fP are the same as for
|
The first seven arguments of \fBpcre2_substitute()\fP are the same as for
|
||||||
\fBpcre2_match()\fP, except that the partial matching options are not
|
\fBpcre2_match()\fP, except that the partial matching options are not
|
||||||
|
@ -3637,6 +3640,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 30 June 2018
|
Last updated: 02 July 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1110,7 +1110,7 @@ matches, the reported start of the match can be greater than the end of the
|
||||||
match. Using \eK in a lookbehind assertion at the start of a pattern can also
|
match. Using \eK in a lookbehind assertion at the start of a pattern can also
|
||||||
lead to odd effects. For example, consider this pattern:
|
lead to odd effects. For example, consider this pattern:
|
||||||
.sp
|
.sp
|
||||||
(?<=\Kfoo)bar
|
(?<=\eKfoo)bar
|
||||||
.sp
|
.sp
|
||||||
If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting
|
If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting
|
||||||
offset of 3 succeeds and reports the matching string as "foobar", that is, the
|
offset of 3 succeeds and reports the matching string as "foobar", that is, the
|
||||||
|
|
|
@ -5,7 +5,7 @@
|
||||||
/* This is the public header file for the PCRE library, second API, to be
|
/* This is the public header file for the PCRE library, second API, to be
|
||||||
#included by applications that call PCRE2 functions.
|
#included by applications that call PCRE2 functions.
|
||||||
|
|
||||||
Copyright (c) 2016-2017 University of Cambridge
|
Copyright (c) 2016-2018 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -399,6 +399,7 @@ released, the numbers must not be changed. */
|
||||||
#define PCRE2_ERROR_BADSERIALIZEDDATA (-62)
|
#define PCRE2_ERROR_BADSERIALIZEDDATA (-62)
|
||||||
#define PCRE2_ERROR_HEAPLIMIT (-63)
|
#define PCRE2_ERROR_HEAPLIMIT (-63)
|
||||||
#define PCRE2_ERROR_CONVERT_SYNTAX (-64)
|
#define PCRE2_ERROR_CONVERT_SYNTAX (-64)
|
||||||
|
#define PCRE2_ERROR_INTERNAL_DUPMATCH (-65)
|
||||||
|
|
||||||
|
|
||||||
/* Request types for pcre2_pattern_info() */
|
/* Request types for pcre2_pattern_info() */
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2017 University of Cambridge
|
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -260,6 +260,8 @@ static const unsigned char match_error_texts[] =
|
||||||
"bad serialized data\0"
|
"bad serialized data\0"
|
||||||
"heap limit exceeded\0"
|
"heap limit exceeded\0"
|
||||||
"invalid syntax\0"
|
"invalid syntax\0"
|
||||||
|
/* 65 */
|
||||||
|
"internal error - duplicate substitution match\0"
|
||||||
;
|
;
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016 University of Cambridge
|
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -238,10 +238,12 @@ PCRE2_SPTR repend;
|
||||||
PCRE2_SIZE extra_needed = 0;
|
PCRE2_SIZE extra_needed = 0;
|
||||||
PCRE2_SIZE buff_offset, buff_length, lengthleft, fraglength;
|
PCRE2_SIZE buff_offset, buff_length, lengthleft, fraglength;
|
||||||
PCRE2_SIZE *ovector;
|
PCRE2_SIZE *ovector;
|
||||||
|
PCRE2_SIZE ovecsave[3];
|
||||||
|
|
||||||
buff_offset = 0;
|
buff_offset = 0;
|
||||||
lengthleft = buff_length = *blength;
|
lengthleft = buff_length = *blength;
|
||||||
*blength = PCRE2_UNSET;
|
*blength = PCRE2_UNSET;
|
||||||
|
ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET;
|
||||||
|
|
||||||
/* Partial matching is not valid. */
|
/* Partial matching is not valid. */
|
||||||
|
|
||||||
|
@ -369,6 +371,26 @@ do
|
||||||
goto EXIT;
|
goto EXIT;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Check for the same match as previous. This is legitimate after matching an
|
||||||
|
empty string that starts after the initial match offset. We have tried again
|
||||||
|
at the match point in case the pattern is one like /(?<=\G.)/ which can never
|
||||||
|
match at its starting point, so running the match achieves the bumpalong. If
|
||||||
|
we do get the same (null) match at the original match point, it isn't such a
|
||||||
|
pattern, so we now do the empty string magic. In all other cases, a repeat
|
||||||
|
match should never occur. */
|
||||||
|
|
||||||
|
if (ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1])
|
||||||
|
{
|
||||||
|
if (ovector[0] == ovector[1] && ovecsave[2] != start_offset)
|
||||||
|
{
|
||||||
|
goptions = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
|
||||||
|
ovecsave[2] = start_offset;
|
||||||
|
continue; /* Back to the top of the loop */
|
||||||
|
}
|
||||||
|
rc = PCRE2_ERROR_INTERNAL_DUPMATCH;
|
||||||
|
goto EXIT;
|
||||||
|
}
|
||||||
|
|
||||||
/* Count substitutions with a paranoid check for integer overflow; surely no
|
/* Count substitutions with a paranoid check for integer overflow; surely no
|
||||||
real call to this function would ever hit this! */
|
real call to this function would ever hit this! */
|
||||||
|
|
||||||
|
@ -799,13 +821,18 @@ do
|
||||||
} /* End handling a literal code unit */
|
} /* End handling a literal code unit */
|
||||||
} /* End of loop for scanning the replacement. */
|
} /* End of loop for scanning the replacement. */
|
||||||
|
|
||||||
/* The replacement has been copied to the output. Update the start offset to
|
/* The replacement has been copied to the output. Save the details of this
|
||||||
point to the rest of the subject string. If we matched an empty string,
|
match. See above for how this data is used. If we matched an empty string, do
|
||||||
do the magic for global matches. */
|
the magic for global matches. Finally, update the start offset to point to
|
||||||
|
the rest of the subject string. */
|
||||||
|
|
||||||
start_offset = ovector[1];
|
ovecsave[0] = ovector[0];
|
||||||
goptions = (ovector[0] != ovector[1])? 0 :
|
ovecsave[1] = ovector[1];
|
||||||
|
ovecsave[2] = start_offset;
|
||||||
|
|
||||||
|
goptions = (ovector[0] != ovector[1] || ovector[0] > start_offset)? 0 :
|
||||||
PCRE2_ANCHORED|PCRE2_NOTEMPTY_ATSTART;
|
PCRE2_ANCHORED|PCRE2_NOTEMPTY_ATSTART;
|
||||||
|
start_offset = ovector[1];
|
||||||
} while ((suboptions & PCRE2_SUBSTITUTE_GLOBAL) != 0); /* Repeat "do" loop */
|
} while ((suboptions & PCRE2_SUBSTITUTE_GLOBAL) != 0); /* Repeat "do" loop */
|
||||||
|
|
||||||
/* Copy the rest of the subject. */
|
/* Copy the rest of the subject. */
|
||||||
|
|
|
@ -6302,6 +6302,7 @@ size_t needlen;
|
||||||
void *use_dat_context;
|
void *use_dat_context;
|
||||||
BOOL utf;
|
BOOL utf;
|
||||||
BOOL subject_literal;
|
BOOL subject_literal;
|
||||||
|
PCRE2_SIZE ovecsave[3];
|
||||||
|
|
||||||
#ifdef SUPPORT_PCRE2_8
|
#ifdef SUPPORT_PCRE2_8
|
||||||
uint8_t *q8 = NULL;
|
uint8_t *q8 = NULL;
|
||||||
|
@ -6949,6 +6950,9 @@ if (dat_datctl.replacement[0] != 0)
|
||||||
if (timeitm)
|
if (timeitm)
|
||||||
fprintf(outfile, "** Timing is not supported with replace: ignored\n");
|
fprintf(outfile, "** Timing is not supported with replace: ignored\n");
|
||||||
|
|
||||||
|
if ((dat_datctl.control & CTL_ALTGLOBAL) != 0)
|
||||||
|
fprintf(outfile, "** Altglobal is not supported with replace: ignored\n");
|
||||||
|
|
||||||
xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
|
xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
|
||||||
PCRE2_SUBSTITUTE_GLOBAL) |
|
PCRE2_SUBSTITUTE_GLOBAL) |
|
||||||
(((dat_datctl.control2 & CTL2_SUBSTITUTE_EXTENDED) == 0)? 0 :
|
(((dat_datctl.control2 & CTL2_SUBSTITUTE_EXTENDED) == 0)? 0 :
|
||||||
|
@ -7067,35 +7071,24 @@ if (dat_datctl.replacement[0] != 0)
|
||||||
}
|
}
|
||||||
|
|
||||||
fprintf(outfile, "\n");
|
fprintf(outfile, "\n");
|
||||||
|
show_memory = FALSE;
|
||||||
|
return PR_OK;
|
||||||
} /* End of substitution handling */
|
} /* End of substitution handling */
|
||||||
|
|
||||||
/* When a replacement string is not provided, run a loop for global matching
|
/* When a replacement string is not provided, run a loop for global matching
|
||||||
with one of the basic matching functions. */
|
with one of the basic matching functions. For altglobal (or first time round
|
||||||
|
the loop), set an "unset" value for the previous match info. */
|
||||||
|
|
||||||
else for (gmatched = 0;; gmatched++)
|
ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET;
|
||||||
|
|
||||||
|
for (gmatched = 0;; gmatched++)
|
||||||
{
|
{
|
||||||
PCRE2_SIZE j;
|
PCRE2_SIZE j;
|
||||||
int capcount;
|
int capcount;
|
||||||
PCRE2_SIZE *ovector;
|
PCRE2_SIZE *ovector;
|
||||||
PCRE2_SIZE ovecsave[2];
|
|
||||||
|
|
||||||
ovector = FLD(match_data, ovector);
|
ovector = FLD(match_data, ovector);
|
||||||
|
|
||||||
/* After the first time round a global loop, for a normal global (/g)
|
|
||||||
iteration, save the current ovector[0,1] so that we can check that they do
|
|
||||||
change each time. Otherwise a matching bug that returns the same string
|
|
||||||
causes an infinite loop. It has happened! */
|
|
||||||
|
|
||||||
if (gmatched > 0 && (dat_datctl.control & CTL_GLOBAL) != 0)
|
|
||||||
{
|
|
||||||
ovecsave[0] = ovector[0];
|
|
||||||
ovecsave[1] = ovector[1];
|
|
||||||
}
|
|
||||||
|
|
||||||
/* For altglobal (or first time round the loop), set an "unset" value. */
|
|
||||||
|
|
||||||
else ovecsave[0] = ovecsave[1] = PCRE2_UNSET;
|
|
||||||
|
|
||||||
/* Fill the ovector with junk to detect elements that do not get set
|
/* Fill the ovector with junk to detect elements that do not get set
|
||||||
when they should be. */
|
when they should be. */
|
||||||
|
|
||||||
|
@ -7266,12 +7259,23 @@ else for (gmatched = 0;; gmatched++)
|
||||||
}
|
}
|
||||||
|
|
||||||
/* If this is not the first time round a global loop, check that the
|
/* If this is not the first time round a global loop, check that the
|
||||||
returned string has changed. If not, there is a bug somewhere and we must
|
returned string has changed. If it has not, check for an empty string match
|
||||||
break the loop because it will go on for ever. We know that there are
|
at different starting offset from the previous match. This is a failed test
|
||||||
always at least two elements in the ovector. */
|
retry for null-matching patterns that don't match at their starting offset,
|
||||||
|
for example /(?<=\G.)/. A repeated match at the same point is not such a
|
||||||
|
pattern, and must be discarded, and we then proceed to seek a non-null
|
||||||
|
match at the current point. For any other repeated match, there is a bug
|
||||||
|
somewhere and we must break the loop because it will go on for ever. We
|
||||||
|
know that there are always at least two elements in the ovector. */
|
||||||
|
|
||||||
if (gmatched > 0 && ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1])
|
if (gmatched > 0 && ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1])
|
||||||
{
|
{
|
||||||
|
if (ovector[0] == ovector[1] && ovecsave[2] != dat_datctl.offset)
|
||||||
|
{
|
||||||
|
g_notempty = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
|
||||||
|
ovecsave[2] = dat_datctl.offset;
|
||||||
|
continue; /* Back to the top of the loop */
|
||||||
|
}
|
||||||
fprintf(outfile,
|
fprintf(outfile,
|
||||||
"** PCRE2 error: global repeat returned the same string as previous\n");
|
"** PCRE2 error: global repeat returned the same string as previous\n");
|
||||||
fprintf(outfile, "** Global loop abandoned\n");
|
fprintf(outfile, "** Global loop abandoned\n");
|
||||||
|
@ -7579,6 +7583,7 @@ else for (gmatched = 0;; gmatched++)
|
||||||
|
|
||||||
if ((dat_datctl.control & CTL_ANYGLOB) == 0) break; else
|
if ((dat_datctl.control & CTL_ANYGLOB) == 0) break; else
|
||||||
{
|
{
|
||||||
|
PCRE2_SIZE match_offset = FLD(match_data, ovector)[0];
|
||||||
PCRE2_SIZE end_offset = FLD(match_data, ovector)[1];
|
PCRE2_SIZE end_offset = FLD(match_data, ovector)[1];
|
||||||
|
|
||||||
/* We must now set up for the next iteration of a global search. If we have
|
/* We must now set up for the next iteration of a global search. If we have
|
||||||
|
@ -7586,12 +7591,19 @@ else for (gmatched = 0;; gmatched++)
|
||||||
subject. If so, the loop is over. Otherwise, mimic what Perl's /g option
|
subject. If so, the loop is over. Otherwise, mimic what Perl's /g option
|
||||||
does. Set PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED and try the match again
|
does. Set PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED and try the match again
|
||||||
at the same point. If this fails it will be picked up above, where a fake
|
at the same point. If this fails it will be picked up above, where a fake
|
||||||
match is set up so that at this point we advance to the next character. */
|
match is set up so that at this point we advance to the next character.
|
||||||
|
|
||||||
if (FLD(match_data, ovector)[0] == end_offset)
|
However, in order to cope with patterns that never match at their starting
|
||||||
|
offset (e.g. /(?<=\G.)/) we don't do this when the match offset is greater
|
||||||
|
than the starting offset. This means there will be a retry with the
|
||||||
|
starting offset at the match offset. If this returns the same match again,
|
||||||
|
it is picked up above and ignored, and the special action is then taken. */
|
||||||
|
|
||||||
|
if (match_offset == end_offset)
|
||||||
{
|
{
|
||||||
if (end_offset == ulen) break; /* End of subject */
|
if (end_offset == ulen) break; /* End of subject */
|
||||||
g_notempty = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
|
if (match_offset <= dat_datctl.offset)
|
||||||
|
g_notempty = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* However, even after matching a non-empty string, there is still one
|
/* However, even after matching a non-empty string, there is still one
|
||||||
|
@ -7629,10 +7641,19 @@ else for (gmatched = 0;; gmatched++)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* For /g (global), update the start offset, leaving the rest alone. */
|
/* For a normal global (/g) iteration, save the current ovector[0,1] and
|
||||||
|
the starting offset so that we can check that they do change each time.
|
||||||
|
Otherwise a matching bug that returns the same string causes an infinite
|
||||||
|
loop. It has happened! Then update the start offset, leaving other
|
||||||
|
parameters alone. */
|
||||||
|
|
||||||
if ((dat_datctl.control & CTL_GLOBAL) != 0)
|
if ((dat_datctl.control & CTL_GLOBAL) != 0)
|
||||||
|
{
|
||||||
|
ovecsave[0] = ovector[0];
|
||||||
|
ovecsave[1] = ovector[1];
|
||||||
|
ovecsave[2] = dat_datctl.offset;
|
||||||
dat_datctl.offset = end_offset;
|
dat_datctl.offset = end_offset;
|
||||||
|
}
|
||||||
|
|
||||||
/* For altglobal, just update the pointer and length. */
|
/* For altglobal, just update the pointer and length. */
|
||||||
|
|
||||||
|
|
|
@ -6189,4 +6189,7 @@ ef) x/x,mark
|
||||||
/(?=a+)a(a+)++b/
|
/(?=a+)a(a+)++b/
|
||||||
aab
|
aab
|
||||||
|
|
||||||
|
/(?<=\G.)/g,aftertext
|
||||||
|
abc
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -4938,6 +4938,9 @@ a)"xI
|
||||||
//replace=0
|
//replace=0
|
||||||
\=offset=7
|
\=offset=7
|
||||||
|
|
||||||
|
/(?<=\G.)/g,replace=+
|
||||||
|
abc
|
||||||
|
|
||||||
".+\QX\E+"B,no_auto_possess
|
".+\QX\E+"B,no_auto_possess
|
||||||
|
|
||||||
".+\QX\E+"B,auto_callout,no_auto_possess
|
".+\QX\E+"B,auto_callout,no_auto_possess
|
||||||
|
|
|
@ -9822,4 +9822,13 @@ No match
|
||||||
0: aab
|
0: aab
|
||||||
1: a
|
1: a
|
||||||
|
|
||||||
|
/(?<=\G.)/g,aftertext
|
||||||
|
abc
|
||||||
|
0:
|
||||||
|
0+ bc
|
||||||
|
0:
|
||||||
|
0+ c
|
||||||
|
0:
|
||||||
|
0+
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -15549,6 +15549,10 @@ Failed: error -57 at offset 2 in replacement: bad escape sequence in replacement
|
||||||
\=offset=7
|
\=offset=7
|
||||||
Failed: error -33: bad offset value
|
Failed: error -33: bad offset value
|
||||||
|
|
||||||
|
/(?<=\G.)/g,replace=+
|
||||||
|
abc
|
||||||
|
3: a+b+c+
|
||||||
|
|
||||||
".+\QX\E+"B,no_auto_possess
|
".+\QX\E+"B,no_auto_possess
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
Bra
|
Bra
|
||||||
|
@ -16580,7 +16584,7 @@ No match
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
Error -65: PCRE2_ERROR_BADDATA (unknown error number)
|
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||||
Error -62: bad serialized data
|
Error -62: bad serialized data
|
||||||
Error -2: partial match
|
Error -2: partial match
|
||||||
Error -1: no match
|
Error -1: no match
|
||||||
|
|
Loading…
Reference in New Issue