Make pcre2test show actual pre-match consulted characters for a partial match,
not the length of the longest lookbehind. Control this by "allusedtext".
This commit is contained in:
parent
d21f7daf9b
commit
434e3f7468
|
@ -71,6 +71,14 @@ lookbehind value. For example /(?<=a(?<=ba)c)/ previously set a maximum
|
||||||
lookbehind of 2, because that is the largest individual lookbehind. Now it sets
|
lookbehind of 2, because that is the largest individual lookbehind. Now it sets
|
||||||
it to 3, because matching looks back 3 characters.
|
it to 3, because matching looks back 3 characters.
|
||||||
|
|
||||||
|
14. For partial matches, pcre2test was always showing the maximum lookbehind
|
||||||
|
characters, flagged with "<", which is misleading when the lookbehind didn't
|
||||||
|
actually look behind the start (because it was later in the pattern). Showing
|
||||||
|
all consulted preceding characters for partial matches is now controlled by the
|
||||||
|
existing "allusedtext" modifier and, as for complete matches, this facility is
|
||||||
|
available only for non-JIT matching, because JIT does not maintain the first
|
||||||
|
and last consulted characters.
|
||||||
|
|
||||||
|
|
||||||
Version 10.33 16-April-2019
|
Version 10.33 16-April-2019
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
|
@ -1252,22 +1252,27 @@ following line with a plus character following the capture number.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>allusedtext</b> modifier requests that all the text that was consulted
|
The <b>allusedtext</b> modifier requests that all the text that was consulted
|
||||||
during a successful pattern match by the interpreter should be shown. This
|
during a successful pattern match by the interpreter should be shown, for both
|
||||||
feature is not supported for JIT matching, and if requested with JIT it is
|
full and partial matches. This feature is not supported for JIT matching, and
|
||||||
ignored (with a warning message). Setting this modifier affects the output if
|
if requested with JIT it is ignored (with a warning message). Setting this
|
||||||
there is a lookbehind at the start of a match, or a lookahead at the end, or if
|
modifier affects the output if there is a lookbehind at the start of a match,
|
||||||
\K is used in the pattern. Characters that precede or follow the start and end
|
or, for a complete match, a lookahead at the end, or if \K is used in the
|
||||||
of the actual match are indicated in the output by '<' or '>' characters
|
pattern. Characters that precede or follow the start and end of the actual
|
||||||
underneath them. Here is an example:
|
match are indicated in the output by '<' or '>' characters underneath them.
|
||||||
|
Here is an example:
|
||||||
<pre>
|
<pre>
|
||||||
re> /(?<=pqr)abc(?=xyz)/
|
re> /(?<=pqr)abc(?=xyz)/
|
||||||
data> 123pqrabcxyz456\=allusedtext
|
data> 123pqrabcxyz456\=allusedtext
|
||||||
0: pqrabcxyz
|
0: pqrabcxyz
|
||||||
<<< >>>
|
<<< >>>
|
||||||
|
data> 123pqrabcxy\=ph,allusedtext
|
||||||
|
Partial match: pqrabcxy
|
||||||
|
<<<
|
||||||
</pre>
|
</pre>
|
||||||
This shows that the matched string is "abc", with the preceding and following
|
The first, complete match shows that the matched string is "abc", with the
|
||||||
strings "pqr" and "xyz" having been consulted during the match (when processing
|
preceding and following strings "pqr" and "xyz" having been consulted during
|
||||||
the assertions).
|
the match (when processing the assertions). The partial match can indicate only
|
||||||
|
the preceding string.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>startchar</b> modifier requests that the starting character for the match
|
The <b>startchar</b> modifier requests that the starting character for the match
|
||||||
|
@ -2081,7 +2086,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 20 June 2019
|
Last updated: 26 June 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "20 June 2019" "PCRE 10.34"
|
.TH PCRE2TEST 1 "26 June 2019" "PCRE 10.34"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -1220,22 +1220,27 @@ well as the main matched substring. In each case the remainder is output on the
|
||||||
following line with a plus character following the capture number.
|
following line with a plus character following the capture number.
|
||||||
.P
|
.P
|
||||||
The \fBallusedtext\fP modifier requests that all the text that was consulted
|
The \fBallusedtext\fP modifier requests that all the text that was consulted
|
||||||
during a successful pattern match by the interpreter should be shown. This
|
during a successful pattern match by the interpreter should be shown, for both
|
||||||
feature is not supported for JIT matching, and if requested with JIT it is
|
full and partial matches. This feature is not supported for JIT matching, and
|
||||||
ignored (with a warning message). Setting this modifier affects the output if
|
if requested with JIT it is ignored (with a warning message). Setting this
|
||||||
there is a lookbehind at the start of a match, or a lookahead at the end, or if
|
modifier affects the output if there is a lookbehind at the start of a match,
|
||||||
\eK is used in the pattern. Characters that precede or follow the start and end
|
or, for a complete match, a lookahead at the end, or if \eK is used in the
|
||||||
of the actual match are indicated in the output by '<' or '>' characters
|
pattern. Characters that precede or follow the start and end of the actual
|
||||||
underneath them. Here is an example:
|
match are indicated in the output by '<' or '>' characters underneath them.
|
||||||
|
Here is an example:
|
||||||
.sp
|
.sp
|
||||||
re> /(?<=pqr)abc(?=xyz)/
|
re> /(?<=pqr)abc(?=xyz)/
|
||||||
data> 123pqrabcxyz456\e=allusedtext
|
data> 123pqrabcxyz456\e=allusedtext
|
||||||
0: pqrabcxyz
|
0: pqrabcxyz
|
||||||
<<< >>>
|
<<< >>>
|
||||||
|
data> 123pqrabcxy\e=ph,allusedtext
|
||||||
|
Partial match: pqrabcxy
|
||||||
|
<<<
|
||||||
.sp
|
.sp
|
||||||
This shows that the matched string is "abc", with the preceding and following
|
The first, complete match shows that the matched string is "abc", with the
|
||||||
strings "pqr" and "xyz" having been consulted during the match (when processing
|
preceding and following strings "pqr" and "xyz" having been consulted during
|
||||||
the assertions).
|
the match (when processing the assertions). The partial match can indicate only
|
||||||
|
the preceding string.
|
||||||
.P
|
.P
|
||||||
The \fBstartchar\fP modifier requests that the starting character for the match
|
The \fBstartchar\fP modifier requests that the starting character for the match
|
||||||
be indicated, if it is different to the start of the matched string. The only
|
be indicated, if it is different to the start of the matched string. The only
|
||||||
|
@ -2062,6 +2067,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 20 June 2019
|
Last updated: 26 June 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1122,29 +1122,33 @@ SUBJECT MODIFIERS
|
||||||
capture number.
|
capture number.
|
||||||
|
|
||||||
The allusedtext modifier requests that all the text that was consulted
|
The allusedtext modifier requests that all the text that was consulted
|
||||||
during a successful pattern match by the interpreter should be shown.
|
during a successful pattern match by the interpreter should be shown,
|
||||||
This feature is not supported for JIT matching, and if requested with
|
for both full and partial matches. This feature is not supported for
|
||||||
JIT it is ignored (with a warning message). Setting this modifier af-
|
JIT matching, and if requested with JIT it is ignored (with a warning
|
||||||
fects the output if there is a lookbehind at the start of a match, or a
|
message). Setting this modifier affects the output if there is a look-
|
||||||
lookahead at the end, or if \K is used in the pattern. Characters that
|
behind at the start of a match, or, for a complete match, a lookahead
|
||||||
precede or follow the start and end of the actual match are indicated
|
at the end, or if \K is used in the pattern. Characters that precede or
|
||||||
in the output by '<' or '>' characters underneath them. Here is an ex-
|
follow the start and end of the actual match are indicated in the out-
|
||||||
ample:
|
put by '<' or '>' characters underneath them. Here is an example:
|
||||||
|
|
||||||
re> /(?<=pqr)abc(?=xyz)/
|
re> /(?<=pqr)abc(?=xyz)/
|
||||||
data> 123pqrabcxyz456\=allusedtext
|
data> 123pqrabcxyz456\=allusedtext
|
||||||
0: pqrabcxyz
|
0: pqrabcxyz
|
||||||
<<< >>>
|
<<< >>>
|
||||||
|
data> 123pqrabcxy\=ph,allusedtext
|
||||||
|
Partial match: pqrabcxy
|
||||||
|
<<<
|
||||||
|
|
||||||
This shows that the matched string is "abc", with the preceding and
|
The first, complete match shows that the matched string is "abc", with
|
||||||
following strings "pqr" and "xyz" having been consulted during the
|
the preceding and following strings "pqr" and "xyz" having been con-
|
||||||
match (when processing the assertions).
|
sulted during the match (when processing the assertions). The partial
|
||||||
|
match can indicate only the preceding string.
|
||||||
|
|
||||||
The startchar modifier requests that the starting character for the
|
The startchar modifier requests that the starting character for the
|
||||||
match be indicated, if it is different to the start of the matched
|
match be indicated, if it is different to the start of the matched
|
||||||
string. The only time when this occurs is when \K has been processed as
|
string. The only time when this occurs is when \K has been processed as
|
||||||
part of the match. In this situation, the output for the matched string
|
part of the match. In this situation, the output for the matched string
|
||||||
is displayed from the starting character instead of from the match
|
is displayed from the starting character instead of from the match
|
||||||
point, with circumflex characters under the earlier characters. For ex-
|
point, with circumflex characters under the earlier characters. For ex-
|
||||||
ample:
|
ample:
|
||||||
|
|
||||||
|
@ -1153,7 +1157,7 @@ SUBJECT MODIFIERS
|
||||||
0: abcxyz
|
0: abcxyz
|
||||||
^^^
|
^^^
|
||||||
|
|
||||||
Unlike allusedtext, the startchar modifier can be used with JIT. How-
|
Unlike allusedtext, the startchar modifier can be used with JIT. How-
|
||||||
ever, these two modifiers are mutually exclusive.
|
ever, these two modifiers are mutually exclusive.
|
||||||
|
|
||||||
Showing the value of all capture groups
|
Showing the value of all capture groups
|
||||||
|
@ -1161,97 +1165,97 @@ SUBJECT MODIFIERS
|
||||||
The allcaptures modifier requests that the values of all potential cap-
|
The allcaptures modifier requests that the values of all potential cap-
|
||||||
tured parentheses be output after a match. By default, only those up to
|
tured parentheses be output after a match. By default, only those up to
|
||||||
the highest one actually used in the match are output (corresponding to
|
the highest one actually used in the match are output (corresponding to
|
||||||
the return code from pcre2_match()). Groups that did not take part in
|
the return code from pcre2_match()). Groups that did not take part in
|
||||||
the match are output as "<unset>". This modifier is not relevant for
|
the match are output as "<unset>". This modifier is not relevant for
|
||||||
DFA matching (which does no capturing) and does not apply when replace
|
DFA matching (which does no capturing) and does not apply when replace
|
||||||
is specified; it is ignored, with a warning message, if present.
|
is specified; it is ignored, with a warning message, if present.
|
||||||
|
|
||||||
Showing the entire ovector, for all outcomes
|
Showing the entire ovector, for all outcomes
|
||||||
|
|
||||||
The allvector modifier requests that the entire ovector be shown, what-
|
The allvector modifier requests that the entire ovector be shown, what-
|
||||||
ever the outcome of the match. Compare allcaptures, which shows only up
|
ever the outcome of the match. Compare allcaptures, which shows only up
|
||||||
to the maximum number of capture groups for the pattern, and then only
|
to the maximum number of capture groups for the pattern, and then only
|
||||||
for a successful complete non-DFA match. This modifier, which acts af-
|
for a successful complete non-DFA match. This modifier, which acts af-
|
||||||
ter any match result, and also for DFA matching, provides a means of
|
ter any match result, and also for DFA matching, provides a means of
|
||||||
checking that there are no unexpected modifications to ovector fields.
|
checking that there are no unexpected modifications to ovector fields.
|
||||||
Before each match attempt, the ovector is filled with a special value,
|
Before each match attempt, the ovector is filled with a special value,
|
||||||
and if this is found in both elements of a capturing pair, "<un-
|
and if this is found in both elements of a capturing pair, "<un-
|
||||||
changed>" is output. After a successful match, this applies to all
|
changed>" is output. After a successful match, this applies to all
|
||||||
groups after the maximum capture group for the pattern. In other cases
|
groups after the maximum capture group for the pattern. In other cases
|
||||||
it applies to the entire ovector. After a partial match, the first two
|
it applies to the entire ovector. After a partial match, the first two
|
||||||
elements are the only ones that should be set. After a DFA match, the
|
elements are the only ones that should be set. After a DFA match, the
|
||||||
amount of ovector that is used depends on the number of matches that
|
amount of ovector that is used depends on the number of matches that
|
||||||
were found.
|
were found.
|
||||||
|
|
||||||
Testing pattern callouts
|
Testing pattern callouts
|
||||||
|
|
||||||
A callout function is supplied when pcre2test calls the library match-
|
A callout function is supplied when pcre2test calls the library match-
|
||||||
ing functions, unless callout_none is specified. Its behaviour can be
|
ing functions, unless callout_none is specified. Its behaviour can be
|
||||||
controlled by various modifiers listed above whose names begin with
|
controlled by various modifiers listed above whose names begin with
|
||||||
callout_. Details are given in the section entitled "Callouts" below.
|
callout_. Details are given in the section entitled "Callouts" below.
|
||||||
Testing callouts from pcre2_substitute() is decribed separately in
|
Testing callouts from pcre2_substitute() is decribed separately in
|
||||||
"Testing the substitution function" below.
|
"Testing the substitution function" below.
|
||||||
|
|
||||||
Finding all matches in a string
|
Finding all matches in a string
|
||||||
|
|
||||||
Searching for all possible matches within a subject can be requested by
|
Searching for all possible matches within a subject can be requested by
|
||||||
the global or altglobal modifier. After finding a match, the matching
|
the global or altglobal modifier. After finding a match, the matching
|
||||||
function is called again to search the remainder of the subject. The
|
function is called again to search the remainder of the subject. The
|
||||||
difference between global and altglobal is that the former uses the
|
difference between global and altglobal is that the former uses the
|
||||||
start_offset argument to pcre2_match() or pcre2_dfa_match() to start
|
start_offset argument to pcre2_match() or pcre2_dfa_match() to start
|
||||||
searching at a new point within the entire string (which is what Perl
|
searching at a new point within the entire string (which is what Perl
|
||||||
does), whereas the latter passes over a shortened subject. This makes a
|
does), whereas the latter passes over a shortened subject. This makes a
|
||||||
difference to the matching process if the pattern begins with a lookbe-
|
difference to the matching process if the pattern begins with a lookbe-
|
||||||
hind assertion (including \b or \B).
|
hind assertion (including \b or \B).
|
||||||
|
|
||||||
If an empty string is matched, the next match is done with the
|
If an empty string is matched, the next match is done with the
|
||||||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
||||||
for another, non-empty, match at the same point in the subject. If this
|
for another, non-empty, match at the same point in the subject. If this
|
||||||
match fails, the start offset is advanced, and the normal match is re-
|
match fails, the start offset is advanced, and the normal match is re-
|
||||||
tried. This imitates the way Perl handles such cases when using the /g
|
tried. This imitates the way Perl handles such cases when using the /g
|
||||||
modifier or the split() function. Normally, the start offset is ad-
|
modifier or the split() function. Normally, the start offset is ad-
|
||||||
vanced by one character, but if the newline convention recognizes CRLF
|
vanced by one character, but if the newline convention recognizes CRLF
|
||||||
as a newline, and the current character is CR followed by LF, an ad-
|
as a newline, and the current character is CR followed by LF, an ad-
|
||||||
vance of two characters occurs.
|
vance of two characters occurs.
|
||||||
|
|
||||||
Testing substring extraction functions
|
Testing substring extraction functions
|
||||||
|
|
||||||
The copy and get modifiers can be used to test the pcre2_sub-
|
The copy and get modifiers can be used to test the pcre2_sub-
|
||||||
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
|
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
|
||||||
given more than once, and each can specify a capture group name or num-
|
given more than once, and each can specify a capture group name or num-
|
||||||
ber, for example:
|
ber, for example:
|
||||||
|
|
||||||
abcd\=copy=1,copy=3,get=G1
|
abcd\=copy=1,copy=3,get=G1
|
||||||
|
|
||||||
If the #subject command is used to set default copy and/or get lists,
|
If the #subject command is used to set default copy and/or get lists,
|
||||||
these can be unset by specifying a negative number to cancel all num-
|
these can be unset by specifying a negative number to cancel all num-
|
||||||
bered groups and an empty name to cancel all named groups.
|
bered groups and an empty name to cancel all named groups.
|
||||||
|
|
||||||
The getall modifier tests pcre2_substring_list_get(), which extracts
|
The getall modifier tests pcre2_substring_list_get(), which extracts
|
||||||
all captured substrings.
|
all captured substrings.
|
||||||
|
|
||||||
If the subject line is successfully matched, the substrings extracted
|
If the subject line is successfully matched, the substrings extracted
|
||||||
by the convenience functions are output with C, G, or L after the
|
by the convenience functions are output with C, G, or L after the
|
||||||
string number instead of a colon. This is in addition to the normal
|
string number instead of a colon. This is in addition to the normal
|
||||||
full list. The string length (that is, the return from the extraction
|
full list. The string length (that is, the return from the extraction
|
||||||
function) is given in parentheses after each substring, followed by the
|
function) is given in parentheses after each substring, followed by the
|
||||||
name when the extraction was by name.
|
name when the extraction was by name.
|
||||||
|
|
||||||
Testing the substitution function
|
Testing the substitution function
|
||||||
|
|
||||||
If the replace modifier is set, the pcre2_substitute() function is
|
If the replace modifier is set, the pcre2_substitute() function is
|
||||||
called instead of one of the matching functions. Note that replacement
|
called instead of one of the matching functions. Note that replacement
|
||||||
strings cannot contain commas, because a comma signifies the end of a
|
strings cannot contain commas, because a comma signifies the end of a
|
||||||
modifier. This is not thought to be an issue in a test program.
|
modifier. This is not thought to be an issue in a test program.
|
||||||
|
|
||||||
Unlike subject strings, pcre2test does not process replacement strings
|
Unlike subject strings, pcre2test does not process replacement strings
|
||||||
for escape sequences. In UTF mode, a replacement string is checked to
|
for escape sequences. In UTF mode, a replacement string is checked to
|
||||||
see if it is a valid UTF-8 string. If so, it is correctly converted to
|
see if it is a valid UTF-8 string. If so, it is correctly converted to
|
||||||
a UTF string of the appropriate code unit width. If it is not a valid
|
a UTF string of the appropriate code unit width. If it is not a valid
|
||||||
UTF-8 string, the individual code units are copied directly. This pro-
|
UTF-8 string, the individual code units are copied directly. This pro-
|
||||||
vides a means of passing an invalid UTF-8 string for testing purposes.
|
vides a means of passing an invalid UTF-8 string for testing purposes.
|
||||||
|
|
||||||
The following modifiers set options (in additional to the normal match
|
The following modifiers set options (in additional to the normal match
|
||||||
options) for pcre2_substitute():
|
options) for pcre2_substitute():
|
||||||
|
|
||||||
global PCRE2_SUBSTITUTE_GLOBAL
|
global PCRE2_SUBSTITUTE_GLOBAL
|
||||||
|
@ -1261,8 +1265,8 @@ SUBJECT MODIFIERS
|
||||||
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
|
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||||
|
|
||||||
|
|
||||||
After a successful substitution, the modified string is output, pre-
|
After a successful substitution, the modified string is output, pre-
|
||||||
ceded by the number of replacements. This may be zero if there were no
|
ceded by the number of replacements. This may be zero if there were no
|
||||||
matches. Here is a simple example of a substitution test:
|
matches. Here is a simple example of a substitution test:
|
||||||
|
|
||||||
/abc/replace=xxx
|
/abc/replace=xxx
|
||||||
|
@ -1271,12 +1275,12 @@ SUBJECT MODIFIERS
|
||||||
=abc=abc=\=global
|
=abc=abc=\=global
|
||||||
2: =xxx=xxx=
|
2: =xxx=xxx=
|
||||||
|
|
||||||
Subject and replacement strings should be kept relatively short (fewer
|
Subject and replacement strings should be kept relatively short (fewer
|
||||||
than 256 characters) for substitution tests, as fixed-size buffers are
|
than 256 characters) for substitution tests, as fixed-size buffers are
|
||||||
used. To make it easy to test for buffer overflow, if the replacement
|
used. To make it easy to test for buffer overflow, if the replacement
|
||||||
string starts with a number in square brackets, that number is passed
|
string starts with a number in square brackets, that number is passed
|
||||||
to pcre2_substitute() as the size of the output buffer, with the re-
|
to pcre2_substitute() as the size of the output buffer, with the re-
|
||||||
placement string starting at the next character. Here is an example
|
placement string starting at the next character. Here is an example
|
||||||
that tests the edge case:
|
that tests the edge case:
|
||||||
|
|
||||||
/abc/
|
/abc/
|
||||||
|
@ -1286,12 +1290,12 @@ SUBJECT MODIFIERS
|
||||||
Failed: error -47: no more memory
|
Failed: error -47: no more memory
|
||||||
|
|
||||||
The default action of pcre2_substitute() is to return PCRE2_ER-
|
The default action of pcre2_substitute() is to return PCRE2_ER-
|
||||||
ROR_NOMEMORY when the output buffer is too small. However, if the
|
ROR_NOMEMORY when the output buffer is too small. However, if the
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the substi-
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the substi-
|
||||||
tute_overflow_length modifier), pcre2_substitute() continues to go
|
tute_overflow_length modifier), pcre2_substitute() continues to go
|
||||||
through the motions of matching and substituting (but not doing any
|
through the motions of matching and substituting (but not doing any
|
||||||
callouts), in order to compute the size of buffer that is required.
|
callouts), in order to compute the size of buffer that is required.
|
||||||
When this happens, pcre2test shows the required buffer length (which
|
When this happens, pcre2test shows the required buffer length (which
|
||||||
includes space for the trailing zero) as part of the error message. For
|
includes space for the trailing zero) as part of the error message. For
|
||||||
example:
|
example:
|
||||||
|
|
||||||
|
@ -1300,15 +1304,15 @@ SUBJECT MODIFIERS
|
||||||
Failed: error -47: no more memory: 10 code units are needed
|
Failed: error -47: no more memory: 10 code units are needed
|
||||||
|
|
||||||
A replacement string is ignored with POSIX and DFA matching. Specifying
|
A replacement string is ignored with POSIX and DFA matching. Specifying
|
||||||
partial matching provokes an error return ("bad option value") from
|
partial matching provokes an error return ("bad option value") from
|
||||||
pcre2_substitute().
|
pcre2_substitute().
|
||||||
|
|
||||||
Testing substitute callouts
|
Testing substitute callouts
|
||||||
|
|
||||||
If the substitute_callout modifier is set, a substitution callout func-
|
If the substitute_callout modifier is set, a substitution callout func-
|
||||||
tion is set up. The null_context modifier must not be set, because the
|
tion is set up. The null_context modifier must not be set, because the
|
||||||
address of the callout function is passed in a match context. When the
|
address of the callout function is passed in a match context. When the
|
||||||
callout function is called (after each substitution), details of the
|
callout function is called (after each substitution), details of the
|
||||||
the input and output strings are output. For example:
|
the input and output strings are output. For example:
|
||||||
|
|
||||||
/abc/g,replace=<$0>,substitute_callout
|
/abc/g,replace=<$0>,substitute_callout
|
||||||
|
@ -1317,19 +1321,19 @@ SUBJECT MODIFIERS
|
||||||
2(1) Old 6 9 "abc" New 8 13 "<abc>"
|
2(1) Old 6 9 "abc" New 8 13 "<abc>"
|
||||||
2: <abc>def<abc>pqr
|
2: <abc>def<abc>pqr
|
||||||
|
|
||||||
The first number on each callout line is the count of matches. The
|
The first number on each callout line is the count of matches. The
|
||||||
parenthesized number is the number of pairs that are set in the ovector
|
parenthesized number is the number of pairs that are set in the ovector
|
||||||
(that is, one more than the number of capturing groups that were set).
|
(that is, one more than the number of capturing groups that were set).
|
||||||
Then are listed the offsets of the old substring, its contents, and the
|
Then are listed the offsets of the old substring, its contents, and the
|
||||||
same for the replacement.
|
same for the replacement.
|
||||||
|
|
||||||
By default, the substitution callout function returns zero, which ac-
|
By default, the substitution callout function returns zero, which ac-
|
||||||
cepts the replacement and causes matching to continue if /g was used.
|
cepts the replacement and causes matching to continue if /g was used.
|
||||||
Two further modifiers can be used to test other return values. If sub-
|
Two further modifiers can be used to test other return values. If sub-
|
||||||
stitute_skip is set to a value greater than zero the callout function
|
stitute_skip is set to a value greater than zero the callout function
|
||||||
returns +1 for the match of that number, and similarly substitute_stop
|
returns +1 for the match of that number, and similarly substitute_stop
|
||||||
returns -1. These cause the replacement to be rejected, and -1 causes
|
returns -1. These cause the replacement to be rejected, and -1 causes
|
||||||
no further matching to take place. If either of them are set, substi-
|
no further matching to take place. If either of them are set, substi-
|
||||||
tute_callout is assumed. For example:
|
tute_callout is assumed. For example:
|
||||||
|
|
||||||
/abc/g,replace=<$0>,substitute_skip=1
|
/abc/g,replace=<$0>,substitute_skip=1
|
||||||
|
@ -1347,160 +1351,160 @@ SUBJECT MODIFIERS
|
||||||
|
|
||||||
Setting the JIT stack size
|
Setting the JIT stack size
|
||||||
|
|
||||||
The jitstack modifier provides a way of setting the maximum stack size
|
The jitstack modifier provides a way of setting the maximum stack size
|
||||||
that is used by the just-in-time optimization code. It is ignored if
|
that is used by the just-in-time optimization code. It is ignored if
|
||||||
JIT optimization is not being used. The value is a number of kibibytes
|
JIT optimization is not being used. The value is a number of kibibytes
|
||||||
(units of 1024 bytes). Setting zero reverts to the default of 32KiB.
|
(units of 1024 bytes). Setting zero reverts to the default of 32KiB.
|
||||||
Providing a stack that is larger than the default is necessary only for
|
Providing a stack that is larger than the default is necessary only for
|
||||||
very complicated patterns. If jitstack is set non-zero on a subject
|
very complicated patterns. If jitstack is set non-zero on a subject
|
||||||
line it overrides any value that was set on the pattern.
|
line it overrides any value that was set on the pattern.
|
||||||
|
|
||||||
Setting heap, match, and depth limits
|
Setting heap, match, and depth limits
|
||||||
|
|
||||||
The heap_limit, match_limit, and depth_limit modifiers set the appro-
|
The heap_limit, match_limit, and depth_limit modifiers set the appro-
|
||||||
priate limits in the match context. These values are ignored when the
|
priate limits in the match context. These values are ignored when the
|
||||||
find_limits modifier is specified.
|
find_limits modifier is specified.
|
||||||
|
|
||||||
Finding minimum limits
|
Finding minimum limits
|
||||||
|
|
||||||
If the find_limits modifier is present on a subject line, pcre2test
|
If the find_limits modifier is present on a subject line, pcre2test
|
||||||
calls the relevant matching function several times, setting different
|
calls the relevant matching function several times, setting different
|
||||||
values in the match context via pcre2_set_heap_limit(),
|
values in the match context via pcre2_set_heap_limit(),
|
||||||
pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the
|
pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the
|
||||||
minimum values for each parameter that allows the match to complete
|
minimum values for each parameter that allows the match to complete
|
||||||
without error. If JIT is being used, only the match limit is relevant.
|
without error. If JIT is being used, only the match limit is relevant.
|
||||||
|
|
||||||
When using this modifier, the pattern should not contain any limit set-
|
When using this modifier, the pattern should not contain any limit set-
|
||||||
tings such as (*LIMIT_MATCH=...) within it. If such a setting is
|
tings such as (*LIMIT_MATCH=...) within it. If such a setting is
|
||||||
present and is lower than the minimum matching value, the minimum value
|
present and is lower than the minimum matching value, the minimum value
|
||||||
cannot be found because pcre2_set_match_limit() etc. are only able to
|
cannot be found because pcre2_set_match_limit() etc. are only able to
|
||||||
reduce the value of an in-pattern limit; they cannot increase it.
|
reduce the value of an in-pattern limit; they cannot increase it.
|
||||||
|
|
||||||
For non-DFA matching, the minimum depth_limit number is a measure of
|
For non-DFA matching, the minimum depth_limit number is a measure of
|
||||||
how much nested backtracking happens (that is, how deeply the pattern's
|
how much nested backtracking happens (that is, how deeply the pattern's
|
||||||
tree is searched). In the case of DFA matching, depth_limit controls
|
tree is searched). In the case of DFA matching, depth_limit controls
|
||||||
the depth of recursive calls of the internal function that is used for
|
the depth of recursive calls of the internal function that is used for
|
||||||
handling pattern recursion, lookaround assertions, and atomic groups.
|
handling pattern recursion, lookaround assertions, and atomic groups.
|
||||||
|
|
||||||
For non-DFA matching, the match_limit number is a measure of the amount
|
For non-DFA matching, the match_limit number is a measure of the amount
|
||||||
of backtracking that takes place, and learning the minimum value can be
|
of backtracking that takes place, and learning the minimum value can be
|
||||||
instructive. For most simple matches, the number is quite small, but
|
instructive. For most simple matches, the number is quite small, but
|
||||||
for patterns with very large numbers of matching possibilities, it can
|
for patterns with very large numbers of matching possibilities, it can
|
||||||
become large very quickly with increasing length of subject string. In
|
become large very quickly with increasing length of subject string. In
|
||||||
the case of DFA matching, match_limit controls the total number of
|
the case of DFA matching, match_limit controls the total number of
|
||||||
calls, both recursive and non-recursive, to the internal matching func-
|
calls, both recursive and non-recursive, to the internal matching func-
|
||||||
tion, thus controlling the overall amount of computing resource that is
|
tion, thus controlling the overall amount of computing resource that is
|
||||||
used.
|
used.
|
||||||
|
|
||||||
For both kinds of matching, the heap_limit number, which is in
|
For both kinds of matching, the heap_limit number, which is in
|
||||||
kibibytes (units of 1024 bytes), limits the amount of heap memory used
|
kibibytes (units of 1024 bytes), limits the amount of heap memory used
|
||||||
for matching. A value of zero disables the use of any heap memory; many
|
for matching. A value of zero disables the use of any heap memory; many
|
||||||
simple pattern matches can be done without using the heap, so zero is
|
simple pattern matches can be done without using the heap, so zero is
|
||||||
not an unreasonable setting.
|
not an unreasonable setting.
|
||||||
|
|
||||||
Showing MARK names
|
Showing MARK names
|
||||||
|
|
||||||
|
|
||||||
The mark modifier causes the names from backtracking control verbs that
|
The mark modifier causes the names from backtracking control verbs that
|
||||||
are returned from calls to pcre2_match() to be displayed. If a mark is
|
are returned from calls to pcre2_match() to be displayed. If a mark is
|
||||||
returned for a match, non-match, or partial match, pcre2test shows it.
|
returned for a match, non-match, or partial match, pcre2test shows it.
|
||||||
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
||||||
it is added to the non-match message.
|
it is added to the non-match message.
|
||||||
|
|
||||||
Showing memory usage
|
Showing memory usage
|
||||||
|
|
||||||
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
The memory modifier causes pcre2test to log the sizes of all heap mem-
|
||||||
ory allocation and freeing calls that occur during a call to
|
ory allocation and freeing calls that occur during a call to
|
||||||
pcre2_match() or pcre2_dfa_match(). These occur only when a match re-
|
pcre2_match() or pcre2_dfa_match(). These occur only when a match re-
|
||||||
quires a bigger vector than the default for remembering backtracking
|
quires a bigger vector than the default for remembering backtracking
|
||||||
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
|
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
|
||||||
In many cases there will be no heap memory used and therefore no addi-
|
In many cases there will be no heap memory used and therefore no addi-
|
||||||
tional output. No heap memory is allocated during matching with JIT, so
|
tional output. No heap memory is allocated during matching with JIT, so
|
||||||
in that case the memory modifier never has any effect. For this modi-
|
in that case the memory modifier never has any effect. For this modi-
|
||||||
fier to work, the null_context modifier must not be set on both the
|
fier to work, the null_context modifier must not be set on both the
|
||||||
pattern and the subject, though it can be set on one or the other.
|
pattern and the subject, though it can be set on one or the other.
|
||||||
|
|
||||||
Setting a starting offset
|
Setting a starting offset
|
||||||
|
|
||||||
The offset modifier sets an offset in the subject string at which
|
The offset modifier sets an offset in the subject string at which
|
||||||
matching starts. Its value is a number of code units, not characters.
|
matching starts. Its value is a number of code units, not characters.
|
||||||
|
|
||||||
Setting an offset limit
|
Setting an offset limit
|
||||||
|
|
||||||
The offset_limit modifier sets a limit for unanchored matches. If a
|
The offset_limit modifier sets a limit for unanchored matches. If a
|
||||||
match cannot be found starting at or before this offset in the subject,
|
match cannot be found starting at or before this offset in the subject,
|
||||||
a "no match" return is given. The data value is a number of code units,
|
a "no match" return is given. The data value is a number of code units,
|
||||||
not characters. When this modifier is used, the use_offset_limit modi-
|
not characters. When this modifier is used, the use_offset_limit modi-
|
||||||
fier must have been set for the pattern; if not, an error is generated.
|
fier must have been set for the pattern; if not, an error is generated.
|
||||||
|
|
||||||
Setting the size of the output vector
|
Setting the size of the output vector
|
||||||
|
|
||||||
The ovector modifier applies only to the subject line in which it ap-
|
The ovector modifier applies only to the subject line in which it ap-
|
||||||
pears, though of course it can also be used to set a default in a #sub-
|
pears, though of course it can also be used to set a default in a #sub-
|
||||||
ject command. It specifies the number of pairs of offsets that are
|
ject command. It specifies the number of pairs of offsets that are
|
||||||
available for storing matching information. The default is 15.
|
available for storing matching information. The default is 15.
|
||||||
|
|
||||||
A value of zero is useful when testing the POSIX API because it causes
|
A value of zero is useful when testing the POSIX API because it causes
|
||||||
regexec() to be called with a NULL capture vector. When not testing the
|
regexec() to be called with a NULL capture vector. When not testing the
|
||||||
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
||||||
ate_from_pattern() to be called, in order to create a match block of
|
ate_from_pattern() to be called, in order to create a match block of
|
||||||
exactly the right size for the pattern. (It is not possible to create a
|
exactly the right size for the pattern. (It is not possible to create a
|
||||||
match block with a zero-length ovector; there is always at least one
|
match block with a zero-length ovector; there is always at least one
|
||||||
pair of offsets.)
|
pair of offsets.)
|
||||||
|
|
||||||
Passing the subject as zero-terminated
|
Passing the subject as zero-terminated
|
||||||
|
|
||||||
By default, the subject string is passed to a native API matching func-
|
By default, the subject string is passed to a native API matching func-
|
||||||
tion with its correct length. In order to test the facility for passing
|
tion with its correct length. In order to test the facility for passing
|
||||||
a zero-terminated string, the zero_terminate modifier is provided. It
|
a zero-terminated string, the zero_terminate modifier is provided. It
|
||||||
causes the length to be passed as PCRE2_ZERO_TERMINATED. When matching
|
causes the length to be passed as PCRE2_ZERO_TERMINATED. When matching
|
||||||
via the POSIX interface, this modifier is ignored, with a warning.
|
via the POSIX interface, this modifier is ignored, with a warning.
|
||||||
|
|
||||||
When testing pcre2_substitute(), this modifier also has the effect of
|
When testing pcre2_substitute(), this modifier also has the effect of
|
||||||
passing the replacement string as zero-terminated.
|
passing the replacement string as zero-terminated.
|
||||||
|
|
||||||
Passing a NULL context
|
Passing a NULL context
|
||||||
|
|
||||||
Normally, pcre2test passes a context block to pcre2_match(),
|
Normally, pcre2test passes a context block to pcre2_match(),
|
||||||
pcre2_dfa_match(), pcre2_jit_match() or pcre2_substitute(). If the
|
pcre2_dfa_match(), pcre2_jit_match() or pcre2_substitute(). If the
|
||||||
null_context modifier is set, however, NULL is passed. This is for
|
null_context modifier is set, however, NULL is passed. This is for
|
||||||
testing that the matching and substitution functions behave correctly
|
testing that the matching and substitution functions behave correctly
|
||||||
in this case (they use default values). This modifier cannot be used
|
in this case (they use default values). This modifier cannot be used
|
||||||
with the find_limits or substitute_callout modifiers.
|
with the find_limits or substitute_callout modifiers.
|
||||||
|
|
||||||
|
|
||||||
THE ALTERNATIVE MATCHING FUNCTION
|
THE ALTERNATIVE MATCHING FUNCTION
|
||||||
|
|
||||||
By default, pcre2test uses the standard PCRE2 matching function,
|
By default, pcre2test uses the standard PCRE2 matching function,
|
||||||
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
||||||
native matching function, pcre2_dfa_match(), which operates in a dif-
|
native matching function, pcre2_dfa_match(), which operates in a dif-
|
||||||
ferent way, and has some restrictions. The differences between the two
|
ferent way, and has some restrictions. The differences between the two
|
||||||
functions are described in the pcre2matching documentation.
|
functions are described in the pcre2matching documentation.
|
||||||
|
|
||||||
If the dfa modifier is set, the alternative matching function is used.
|
If the dfa modifier is set, the alternative matching function is used.
|
||||||
This function finds all possible matches at a given point in the sub-
|
This function finds all possible matches at a given point in the sub-
|
||||||
ject. If, however, the dfa_shortest modifier is set, processing stops
|
ject. If, however, the dfa_shortest modifier is set, processing stops
|
||||||
after the first match is found. This is always the shortest possible
|
after the first match is found. This is always the shortest possible
|
||||||
match.
|
match.
|
||||||
|
|
||||||
|
|
||||||
DEFAULT OUTPUT FROM pcre2test
|
DEFAULT OUTPUT FROM pcre2test
|
||||||
|
|
||||||
This section describes the output when the normal matching function,
|
This section describes the output when the normal matching function,
|
||||||
pcre2_match(), is being used.
|
pcre2_match(), is being used.
|
||||||
|
|
||||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||||
strings, starting with number 0 for the string that matched the whole
|
strings, starting with number 0 for the string that matched the whole
|
||||||
pattern. Otherwise, it outputs "No match" when the return is PCRE2_ER-
|
pattern. Otherwise, it outputs "No match" when the return is PCRE2_ER-
|
||||||
ROR_NOMATCH, or "Partial match:" followed by the partially matching
|
ROR_NOMATCH, or "Partial match:" followed by the partially matching
|
||||||
substring when the return is PCRE2_ERROR_PARTIAL. (Note that this is
|
substring when the return is PCRE2_ERROR_PARTIAL. (Note that this is
|
||||||
the entire substring that was inspected during the partial match; it
|
the entire substring that was inspected during the partial match; it
|
||||||
may include characters before the actual match start if a lookbehind
|
may include characters before the actual match start if a lookbehind
|
||||||
assertion, \K, \b, or \B was involved.)
|
assertion, \K, \b, or \B was involved.)
|
||||||
|
|
||||||
For any other return, pcre2test outputs the PCRE2 negative error number
|
For any other return, pcre2test outputs the PCRE2 negative error number
|
||||||
and a short descriptive phrase. If the error is a failed UTF string
|
and a short descriptive phrase. If the error is a failed UTF string
|
||||||
check, the code unit offset of the start of the failing character is
|
check, the code unit offset of the start of the failing character is
|
||||||
also output. Here is an example of an interactive pcre2test run.
|
also output. Here is an example of an interactive pcre2test run.
|
||||||
|
|
||||||
$ pcre2test
|
$ pcre2test
|
||||||
|
@ -1516,8 +1520,8 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
Unset capturing substrings that are not followed by one that is set are
|
Unset capturing substrings that are not followed by one that is set are
|
||||||
not shown by pcre2test unless the allcaptures modifier is specified. In
|
not shown by pcre2test unless the allcaptures modifier is specified. In
|
||||||
the following example, there are two capturing substrings, but when the
|
the following example, there are two capturing substrings, but when the
|
||||||
first data line is matched, the second, unset substring is not shown.
|
first data line is matched, the second, unset substring is not shown.
|
||||||
An "internal" unset substring is shown as "<unset>", as for the second
|
An "internal" unset substring is shown as "<unset>", as for the second
|
||||||
data line.
|
data line.
|
||||||
|
|
||||||
re> /(a)|(b)/
|
re> /(a)|(b)/
|
||||||
|
@ -1529,11 +1533,11 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
1: <unset>
|
1: <unset>
|
||||||
2: b
|
2: b
|
||||||
|
|
||||||
If the strings contain any non-printing characters, they are output as
|
If the strings contain any non-printing characters, they are output as
|
||||||
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
||||||
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
||||||
nition of non-printing characters. If the aftertext modifier is set,
|
nition of non-printing characters. If the aftertext modifier is set,
|
||||||
the output for substring 0 is followed by the the rest of the subject
|
the output for substring 0 is followed by the the rest of the subject
|
||||||
string, identified by "0+" like this:
|
string, identified by "0+" like this:
|
||||||
|
|
||||||
re> /cat/aftertext
|
re> /cat/aftertext
|
||||||
|
@ -1553,8 +1557,8 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
0: ipp
|
0: ipp
|
||||||
1: pp
|
1: pp
|
||||||
|
|
||||||
"No match" is output only if the first match attempt fails. Here is an
|
"No match" is output only if the first match attempt fails. Here is an
|
||||||
example of a failure message (the offset 4 that is specified by the
|
example of a failure message (the offset 4 that is specified by the
|
||||||
offset modifier is past the end of the subject string):
|
offset modifier is past the end of the subject string):
|
||||||
|
|
||||||
re> /xyz/
|
re> /xyz/
|
||||||
|
@ -1562,7 +1566,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
Error -24 (bad offset value)
|
Error -24 (bad offset value)
|
||||||
|
|
||||||
Note that whereas patterns can be continued over several lines (a plain
|
Note that whereas patterns can be continued over several lines (a plain
|
||||||
">" prompt is used for continuations), subject lines may not. However
|
">" prompt is used for continuations), subject lines may not. However
|
||||||
newlines can be included in a subject by means of the \n escape (or \r,
|
newlines can be included in a subject by means of the \n escape (or \r,
|
||||||
\r\n, etc., depending on the newline sequence setting).
|
\r\n, etc., depending on the newline sequence setting).
|
||||||
|
|
||||||
|
@ -1570,7 +1574,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
|
|
||||||
When the alternative matching function, pcre2_dfa_match(), is used, the
|
When the alternative matching function, pcre2_dfa_match(), is used, the
|
||||||
output consists of a list of all the matches that start at the first
|
output consists of a list of all the matches that start at the first
|
||||||
point in the subject where there is at least one match. For example:
|
point in the subject where there is at least one match. For example:
|
||||||
|
|
||||||
re> /(tang|tangerine|tan)/
|
re> /(tang|tangerine|tan)/
|
||||||
|
@ -1579,11 +1583,11 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
1: tang
|
1: tang
|
||||||
2: tan
|
2: tan
|
||||||
|
|
||||||
Using the normal matching function on this data finds only "tang". The
|
Using the normal matching function on this data finds only "tang". The
|
||||||
longest matching string is always given first (and numbered zero). Af-
|
longest matching string is always given first (and numbered zero). Af-
|
||||||
ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol-
|
ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol-
|
||||||
lowed by the partially matching substring. Note that this is the entire
|
lowed by the partially matching substring. Note that this is the entire
|
||||||
substring that was inspected during the partial match; it may include
|
substring that was inspected during the partial match; it may include
|
||||||
characters before the actual match start if a lookbehind assertion, \b,
|
characters before the actual match start if a lookbehind assertion, \b,
|
||||||
or \B was involved. (\K is not supported for DFA matching.)
|
or \B was involved. (\K is not supported for DFA matching.)
|
||||||
|
|
||||||
|
@ -1599,16 +1603,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
1: tan
|
1: tan
|
||||||
0: tan
|
0: tan
|
||||||
|
|
||||||
The alternative matching function does not support substring capture,
|
The alternative matching function does not support substring capture,
|
||||||
so the modifiers that are concerned with captured substrings are not
|
so the modifiers that are concerned with captured substrings are not
|
||||||
relevant.
|
relevant.
|
||||||
|
|
||||||
|
|
||||||
RESTARTING AFTER A PARTIAL MATCH
|
RESTARTING AFTER A PARTIAL MATCH
|
||||||
|
|
||||||
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
||||||
TIAL return, indicating that the subject partially matched the pattern,
|
TIAL return, indicating that the subject partially matched the pattern,
|
||||||
you can restart the match with additional subject data by means of the
|
you can restart the match with additional subject data by means of the
|
||||||
dfa_restart modifier. For example:
|
dfa_restart modifier. For example:
|
||||||
|
|
||||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||||
|
@ -1617,37 +1621,37 @@ RESTARTING AFTER A PARTIAL MATCH
|
||||||
data> n05\=dfa,dfa_restart
|
data> n05\=dfa,dfa_restart
|
||||||
0: n05
|
0: n05
|
||||||
|
|
||||||
For further information about partial matching, see the pcre2partial
|
For further information about partial matching, see the pcre2partial
|
||||||
documentation.
|
documentation.
|
||||||
|
|
||||||
|
|
||||||
CALLOUTS
|
CALLOUTS
|
||||||
|
|
||||||
If the pattern contains any callout requests, pcre2test's callout func-
|
If the pattern contains any callout requests, pcre2test's callout func-
|
||||||
tion is called during matching unless callout_none is specified. This
|
tion is called during matching unless callout_none is specified. This
|
||||||
works with both matching functions, and with JIT, though there are some
|
works with both matching functions, and with JIT, though there are some
|
||||||
differences in behaviour. The output for callouts with numerical argu-
|
differences in behaviour. The output for callouts with numerical argu-
|
||||||
ments and those with string arguments is slightly different.
|
ments and those with string arguments is slightly different.
|
||||||
|
|
||||||
Callouts with numerical arguments
|
Callouts with numerical arguments
|
||||||
|
|
||||||
By default, the callout function displays the callout number, the start
|
By default, the callout function displays the callout number, the start
|
||||||
and current positions in the subject text at the callout time, and the
|
and current positions in the subject text at the callout time, and the
|
||||||
next pattern item to be tested. For example:
|
next pattern item to be tested. For example:
|
||||||
|
|
||||||
--->pqrabcdef
|
--->pqrabcdef
|
||||||
0 ^ ^ \d
|
0 ^ ^ \d
|
||||||
|
|
||||||
This output indicates that callout number 0 occurred for a match at-
|
This output indicates that callout number 0 occurred for a match at-
|
||||||
tempt starting at the fourth character of the subject string, when the
|
tempt starting at the fourth character of the subject string, when the
|
||||||
pointer was at the seventh character, and when the next pattern item
|
pointer was at the seventh character, and when the next pattern item
|
||||||
was \d. Just one circumflex is output if the start and current posi-
|
was \d. Just one circumflex is output if the start and current posi-
|
||||||
tions are the same, or if the current position precedes the start posi-
|
tions are the same, or if the current position precedes the start posi-
|
||||||
tion, which can happen if the callout is in a lookbehind assertion.
|
tion, which can happen if the callout is in a lookbehind assertion.
|
||||||
|
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||||
a result of the auto_callout pattern modifier. In this case, instead of
|
a result of the auto_callout pattern modifier. In this case, instead of
|
||||||
showing the callout number, the offset in the pattern, preceded by a
|
showing the callout number, the offset in the pattern, preceded by a
|
||||||
plus, is output. For example:
|
plus, is output. For example:
|
||||||
|
|
||||||
re> /\d?[A-E]\*/auto_callout
|
re> /\d?[A-E]\*/auto_callout
|
||||||
|
@ -1674,17 +1678,17 @@ CALLOUTS
|
||||||
+12 ^ ^
|
+12 ^ ^
|
||||||
0: abc
|
0: abc
|
||||||
|
|
||||||
The mark changes between matching "a" and "b", but stays the same for
|
The mark changes between matching "a" and "b", but stays the same for
|
||||||
the rest of the match, so nothing more is output. If, as a result of
|
the rest of the match, so nothing more is output. If, as a result of
|
||||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||||
output.
|
output.
|
||||||
|
|
||||||
Callouts with string arguments
|
Callouts with string arguments
|
||||||
|
|
||||||
The output for a callout with a string argument is similar, except that
|
The output for a callout with a string argument is similar, except that
|
||||||
instead of outputting a callout number before the position indicators,
|
instead of outputting a callout number before the position indicators,
|
||||||
the callout string and its offset in the pattern string are output be-
|
the callout string and its offset in the pattern string are output be-
|
||||||
fore the reflection of the subject string, and the subject string is
|
fore the reflection of the subject string, and the subject string is
|
||||||
reflected for each callout. For example:
|
reflected for each callout. For example:
|
||||||
|
|
||||||
re> /^ab(?C'first')cd(?C"second")ef/
|
re> /^ab(?C'first')cd(?C"second")ef/
|
||||||
|
@ -1700,26 +1704,26 @@ CALLOUTS
|
||||||
|
|
||||||
Callout modifiers
|
Callout modifiers
|
||||||
|
|
||||||
The callout function in pcre2test returns zero (carry on matching) by
|
The callout function in pcre2test returns zero (carry on matching) by
|
||||||
default, but you can use a callout_fail modifier in a subject line to
|
default, but you can use a callout_fail modifier in a subject line to
|
||||||
change this and other parameters of the callout (see below).
|
change this and other parameters of the callout (see below).
|
||||||
|
|
||||||
If the callout_capture modifier is set, the current captured groups are
|
If the callout_capture modifier is set, the current captured groups are
|
||||||
output when a callout occurs. This is useful only for non-DFA matching,
|
output when a callout occurs. This is useful only for non-DFA matching,
|
||||||
as pcre2_dfa_match() does not support capturing, so no captures are
|
as pcre2_dfa_match() does not support capturing, so no captures are
|
||||||
ever shown.
|
ever shown.
|
||||||
|
|
||||||
The normal callout output, showing the callout number or pattern offset
|
The normal callout output, showing the callout number or pattern offset
|
||||||
(as described above) is suppressed if the callout_no_where modifier is
|
(as described above) is suppressed if the callout_no_where modifier is
|
||||||
set.
|
set.
|
||||||
|
|
||||||
When using the interpretive matching function pcre2_match() without
|
When using the interpretive matching function pcre2_match() without
|
||||||
JIT, setting the callout_extra modifier causes additional output from
|
JIT, setting the callout_extra modifier causes additional output from
|
||||||
pcre2test's callout function to be generated. For the first callout in
|
pcre2test's callout function to be generated. For the first callout in
|
||||||
a match attempt at a new starting position in the subject, "New match
|
a match attempt at a new starting position in the subject, "New match
|
||||||
attempt" is output. If there has been a backtrack since the last call-
|
attempt" is output. If there has been a backtrack since the last call-
|
||||||
out (or start of matching if this is the first callout), "Backtrack" is
|
out (or start of matching if this is the first callout), "Backtrack" is
|
||||||
output, followed by "No other matching paths" if the backtrack ended
|
output, followed by "No other matching paths" if the backtrack ended
|
||||||
the previous match attempt. For example:
|
the previous match attempt. For example:
|
||||||
|
|
||||||
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||||
|
@ -1756,86 +1760,86 @@ CALLOUTS
|
||||||
+1 ^ a+
|
+1 ^ a+
|
||||||
No match
|
No match
|
||||||
|
|
||||||
Notice that various optimizations must be turned off if you want all
|
Notice that various optimizations must be turned off if you want all
|
||||||
possible matching paths to be scanned. If no_start_optimize is not
|
possible matching paths to be scanned. If no_start_optimize is not
|
||||||
used, there is an immediate "no match", without any callouts, because
|
used, there is an immediate "no match", without any callouts, because
|
||||||
the starting optimization fails to find "b" in the subject, which it
|
the starting optimization fails to find "b" in the subject, which it
|
||||||
knows must be present for any match. If no_auto_possess is not used,
|
knows must be present for any match. If no_auto_possess is not used,
|
||||||
the "a+" item is turned into "a++", which reduces the number of back-
|
the "a+" item is turned into "a++", which reduces the number of back-
|
||||||
tracks.
|
tracks.
|
||||||
|
|
||||||
The callout_extra modifier has no effect if used with the DFA matching
|
The callout_extra modifier has no effect if used with the DFA matching
|
||||||
function, or with JIT.
|
function, or with JIT.
|
||||||
|
|
||||||
Return values from callouts
|
Return values from callouts
|
||||||
|
|
||||||
The default return from the callout function is zero, which allows
|
The default return from the callout function is zero, which allows
|
||||||
matching to continue. The callout_fail modifier can be given one or two
|
matching to continue. The callout_fail modifier can be given one or two
|
||||||
numbers. If there is only one number, 1 is returned instead of 0 (caus-
|
numbers. If there is only one number, 1 is returned instead of 0 (caus-
|
||||||
ing matching to backtrack) when a callout of that number is reached. If
|
ing matching to backtrack) when a callout of that number is reached. If
|
||||||
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
|
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
|
||||||
reached and there have been at least <m> callouts. The callout_error
|
reached and there have been at least <m> callouts. The callout_error
|
||||||
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
|
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
|
||||||
ing the entire matching process to be aborted. If both these modifiers
|
ing the entire matching process to be aborted. If both these modifiers
|
||||||
are set for the same callout number, callout_error takes precedence.
|
are set for the same callout number, callout_error takes precedence.
|
||||||
Note that callouts with string arguments are always given the number
|
Note that callouts with string arguments are always given the number
|
||||||
zero.
|
zero.
|
||||||
|
|
||||||
The callout_data modifier can be given an unsigned or a negative num-
|
The callout_data modifier can be given an unsigned or a negative num-
|
||||||
ber. This is set as the "user data" that is passed to the matching
|
ber. This is set as the "user data" that is passed to the matching
|
||||||
function, and passed back when the callout function is invoked. Any
|
function, and passed back when the callout function is invoked. Any
|
||||||
value other than zero is used as a return from pcre2test's callout
|
value other than zero is used as a return from pcre2test's callout
|
||||||
function.
|
function.
|
||||||
|
|
||||||
Inserting callouts can be helpful when using pcre2test to check compli-
|
Inserting callouts can be helpful when using pcre2test to check compli-
|
||||||
cated regular expressions. For further information about callouts, see
|
cated regular expressions. For further information about callouts, see
|
||||||
the pcre2callout documentation.
|
the pcre2callout documentation.
|
||||||
|
|
||||||
|
|
||||||
NON-PRINTING CHARACTERS
|
NON-PRINTING CHARACTERS
|
||||||
|
|
||||||
When pcre2test is outputting text in the compiled version of a pattern,
|
When pcre2test is outputting text in the compiled version of a pattern,
|
||||||
bytes other than 32-126 are always treated as non-printing characters
|
bytes other than 32-126 are always treated as non-printing characters
|
||||||
and are therefore shown as hex escapes.
|
and are therefore shown as hex escapes.
|
||||||
|
|
||||||
When pcre2test is outputting text that is a matched part of a subject
|
When pcre2test is outputting text that is a matched part of a subject
|
||||||
string, it behaves in the same way, unless a different locale has been
|
string, it behaves in the same way, unless a different locale has been
|
||||||
set for the pattern (using the locale modifier). In this case, the is-
|
set for the pattern (using the locale modifier). In this case, the is-
|
||||||
print() function is used to distinguish printing and non-printing char-
|
print() function is used to distinguish printing and non-printing char-
|
||||||
acters.
|
acters.
|
||||||
|
|
||||||
|
|
||||||
SAVING AND RESTORING COMPILED PATTERNS
|
SAVING AND RESTORING COMPILED PATTERNS
|
||||||
|
|
||||||
It is possible to save compiled patterns on disc or elsewhere, and
|
It is possible to save compiled patterns on disc or elsewhere, and
|
||||||
reload them later, subject to a number of restrictions. JIT data cannot
|
reload them later, subject to a number of restrictions. JIT data cannot
|
||||||
be saved. The host on which the patterns are reloaded must be running
|
be saved. The host on which the patterns are reloaded must be running
|
||||||
the same version of PCRE2, with the same code unit width, and must also
|
the same version of PCRE2, with the same code unit width, and must also
|
||||||
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||||
compiled patterns can be saved they must be serialized, that is, con-
|
compiled patterns can be saved they must be serialized, that is, con-
|
||||||
verted to a stream of bytes. A single byte stream may contain any num-
|
verted to a stream of bytes. A single byte stream may contain any num-
|
||||||
ber of compiled patterns, but they must all use the same character ta-
|
ber of compiled patterns, but they must all use the same character ta-
|
||||||
bles. A single copy of the tables is included in the byte stream (its
|
bles. A single copy of the tables is included in the byte stream (its
|
||||||
size is 1088 bytes).
|
size is 1088 bytes).
|
||||||
|
|
||||||
The functions whose names begin with pcre2_serialize_ are used for se-
|
The functions whose names begin with pcre2_serialize_ are used for se-
|
||||||
rializing and de-serializing. They are described in the pcre2serialize
|
rializing and de-serializing. They are described in the pcre2serialize
|
||||||
documentation. In this section we describe the features of pcre2test
|
documentation. In this section we describe the features of pcre2test
|
||||||
that can be used to test these functions.
|
that can be used to test these functions.
|
||||||
|
|
||||||
Note that "serialization" in PCRE2 does not convert compiled patterns
|
Note that "serialization" in PCRE2 does not convert compiled patterns
|
||||||
to an abstract format like Java or .NET. It just makes a reloadable
|
to an abstract format like Java or .NET. It just makes a reloadable
|
||||||
byte code stream. Hence the restrictions on reloading mentioned above.
|
byte code stream. Hence the restrictions on reloading mentioned above.
|
||||||
|
|
||||||
In pcre2test, when a pattern with push modifier is successfully com-
|
In pcre2test, when a pattern with push modifier is successfully com-
|
||||||
piled, it is pushed onto a stack of compiled patterns, and pcre2test
|
piled, it is pushed onto a stack of compiled patterns, and pcre2test
|
||||||
expects the next line to contain a new pattern (or command) instead of
|
expects the next line to contain a new pattern (or command) instead of
|
||||||
a subject line. By contrast, the pushcopy modifier causes a copy of the
|
a subject line. By contrast, the pushcopy modifier causes a copy of the
|
||||||
compiled pattern to be stacked, leaving the original available for im-
|
compiled pattern to be stacked, leaving the original available for im-
|
||||||
mediate matching. By using push and/or pushcopy, a number of patterns
|
mediate matching. By using push and/or pushcopy, a number of patterns
|
||||||
can be compiled and retained. These modifiers are incompatible with
|
can be compiled and retained. These modifiers are incompatible with
|
||||||
posix, and control modifiers that act at match time are ignored (with a
|
posix, and control modifiers that act at match time are ignored (with a
|
||||||
message) for the stacked patterns. The jitverify modifier applies only
|
message) for the stacked patterns. The jitverify modifier applies only
|
||||||
at compile time.
|
at compile time.
|
||||||
|
|
||||||
The command
|
The command
|
||||||
|
@ -1843,21 +1847,21 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
#save <filename>
|
#save <filename>
|
||||||
|
|
||||||
causes all the stacked patterns to be serialized and the result written
|
causes all the stacked patterns to be serialized and the result written
|
||||||
to the named file. Afterwards, all the stacked patterns are freed. The
|
to the named file. Afterwards, all the stacked patterns are freed. The
|
||||||
command
|
command
|
||||||
|
|
||||||
#load <filename>
|
#load <filename>
|
||||||
|
|
||||||
reads the data in the file, and then arranges for it to be de-serial-
|
reads the data in the file, and then arranges for it to be de-serial-
|
||||||
ized, with the resulting compiled patterns added to the pattern stack.
|
ized, with the resulting compiled patterns added to the pattern stack.
|
||||||
The pattern on the top of the stack can be retrieved by the #pop com-
|
The pattern on the top of the stack can be retrieved by the #pop com-
|
||||||
mand, which must be followed by lines of subjects that are to be
|
mand, which must be followed by lines of subjects that are to be
|
||||||
matched with the pattern, terminated as usual by an empty line or end
|
matched with the pattern, terminated as usual by an empty line or end
|
||||||
of file. This command may be followed by a modifier list containing
|
of file. This command may be followed by a modifier list containing
|
||||||
only control modifiers that act after a pattern has been compiled. In
|
only control modifiers that act after a pattern has been compiled. In
|
||||||
particular, hex, posix, posix_nosub, push, and pushcopy are not al-
|
particular, hex, posix, posix_nosub, push, and pushcopy are not al-
|
||||||
lowed, nor are any option-setting modifiers. The JIT modifiers are,
|
lowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||||
however permitted. Here is an example that saves and reloads two pat-
|
however permitted. Here is an example that saves and reloads two pat-
|
||||||
terns.
|
terns.
|
||||||
|
|
||||||
/abc/push
|
/abc/push
|
||||||
|
@ -1870,10 +1874,10 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
#pop jit,bincode
|
#pop jit,bincode
|
||||||
abc
|
abc
|
||||||
|
|
||||||
If jitverify is used with #pop, it does not automatically imply jit,
|
If jitverify is used with #pop, it does not automatically imply jit,
|
||||||
which is different behaviour from when it is used on a pattern.
|
which is different behaviour from when it is used on a pattern.
|
||||||
|
|
||||||
The #popcopy command is analagous to the pushcopy modifier in that it
|
The #popcopy command is analagous to the pushcopy modifier in that it
|
||||||
makes current a copy of the topmost stack pattern, leaving the original
|
makes current a copy of the topmost stack pattern, leaving the original
|
||||||
still on the stack.
|
still on the stack.
|
||||||
|
|
||||||
|
@ -1893,5 +1897,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 20 June 2019
|
Last updated: 26 June 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
|
|
|
@ -5124,7 +5124,7 @@ patlen = p - buffer - 2;
|
||||||
|
|
||||||
if (!decode_modifiers(p, CTX_PAT, &pat_patctl, NULL)) return PR_SKIP;
|
if (!decode_modifiers(p, CTX_PAT, &pat_patctl, NULL)) return PR_SKIP;
|
||||||
|
|
||||||
/* Note that the match_invalid_utf option also sets utf when passed to
|
/* Note that the match_invalid_utf option also sets utf when passed to
|
||||||
pcre2_compile(). */
|
pcre2_compile(). */
|
||||||
|
|
||||||
utf = (pat_patctl.options & (PCRE2_UTF|PCRE2_MATCH_INVALID_UTF)) != 0;
|
utf = (pat_patctl.options & (PCRE2_UTF|PCRE2_MATCH_INVALID_UTF)) != 0;
|
||||||
|
@ -7761,14 +7761,22 @@ for (gmatched = 0;; gmatched++)
|
||||||
} /* End of handling a successful match */
|
} /* End of handling a successful match */
|
||||||
|
|
||||||
/* There was a partial match. The value of ovector[0] is the bumpalong point,
|
/* There was a partial match. The value of ovector[0] is the bumpalong point,
|
||||||
that is, startchar, not any \K point that might have been passed. */
|
that is, startchar, not any \K point that might have been passed. When JIT is
|
||||||
|
not in use, "allusedtext" may be set, in which case we indicate the leftmost
|
||||||
|
consulted character. */
|
||||||
|
|
||||||
else if (capcount == PCRE2_ERROR_PARTIAL)
|
else if (capcount == PCRE2_ERROR_PARTIAL)
|
||||||
{
|
{
|
||||||
PCRE2_SIZE poffset;
|
PCRE2_SIZE leftchar;
|
||||||
int backlength;
|
int backlength;
|
||||||
int rubriclength = 0;
|
int rubriclength = 0;
|
||||||
|
|
||||||
|
if ((dat_datctl.control & CTL_ALLUSEDTEXT) != 0)
|
||||||
|
{
|
||||||
|
leftchar = FLD(match_data, leftchar);
|
||||||
|
}
|
||||||
|
else leftchar = ovector[0];
|
||||||
|
|
||||||
fprintf(outfile, "Partial match");
|
fprintf(outfile, "Partial match");
|
||||||
if ((dat_datctl.control & CTL_MARK) != 0 &&
|
if ((dat_datctl.control & CTL_MARK) != 0 &&
|
||||||
TESTFLD(match_data, mark, !=, NULL))
|
TESTFLD(match_data, mark, !=, NULL))
|
||||||
|
@ -7781,8 +7789,7 @@ for (gmatched = 0;; gmatched++)
|
||||||
fprintf(outfile, ": ");
|
fprintf(outfile, ": ");
|
||||||
rubriclength += 15;
|
rubriclength += 15;
|
||||||
|
|
||||||
poffset = backchars(pp, ovector[0], maxlookbehind, utf);
|
PCHARS(backlength, pp, leftchar, ovector[0] - leftchar, utf, outfile);
|
||||||
PCHARS(backlength, pp, poffset, ovector[0] - poffset, utf, outfile);
|
|
||||||
PCHARSV(pp, ovector[0], ulen - ovector[0], utf, outfile);
|
PCHARSV(pp, ovector[0], ulen - ovector[0], utf, outfile);
|
||||||
|
|
||||||
if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used)
|
if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used)
|
||||||
|
|
|
@ -16,14 +16,14 @@
|
||||||
/^(?>a)++/
|
/^(?>a)++/
|
||||||
aa\=find_limits
|
aa\=find_limits
|
||||||
aaaaaaaaa\=find_limits
|
aaaaaaaaa\=find_limits
|
||||||
|
|
||||||
/(a)(?1)++/
|
/(a)(?1)++/
|
||||||
aa\=find_limits
|
aa\=find_limits
|
||||||
aaaaaaaaa\=find_limits
|
aaaaaaaaa\=find_limits
|
||||||
|
|
||||||
/a(?:.)*?a/ims
|
/a(?:.)*?a/ims
|
||||||
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
||||||
|
|
||||||
/a(?:.(*THEN))*?a/ims
|
/a(?:.(*THEN))*?a/ims
|
||||||
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
||||||
|
|
||||||
|
@ -86,9 +86,9 @@
|
||||||
aaaaaaaaaaaaaz
|
aaaaaaaaaaaaaz
|
||||||
\= Expect limit exceeded
|
\= Expect limit exceeded
|
||||||
aaaaaaaaaaaaaz\=depth_limit=10
|
aaaaaaaaaaaaaz\=depth_limit=10
|
||||||
|
|
||||||
# These three have infinitely nested recursions.
|
# These three have infinitely nested recursions.
|
||||||
|
|
||||||
/((?2))((?1))/
|
/((?2))((?1))/
|
||||||
abc
|
abc
|
||||||
|
|
||||||
|
@ -97,21 +97,21 @@
|
||||||
|
|
||||||
/(?(R)a*(?1)|((?R))b)/
|
/(?(R)a*(?1)|((?R))b)/
|
||||||
aaaabcde
|
aaaabcde
|
||||||
|
|
||||||
# The allusedtext modifier does not work with JIT, which does not maintain
|
# The allusedtext modifier does not work with JIT, which does not maintain
|
||||||
# the leftchar/rightchar data.
|
# the leftchar/rightchar data.
|
||||||
|
|
||||||
/abc(?=xyz)/allusedtext
|
/abc(?=xyz)/allusedtext
|
||||||
abcxyzpqr
|
abcxyzpqr
|
||||||
abcxyzpqr\=aftertext
|
abcxyzpqr\=aftertext
|
||||||
|
|
||||||
/(?<=pqr)abc(?=xyz)/allusedtext
|
/(?<=pqr)abc(?=xyz)/allusedtext
|
||||||
xyzpqrabcxyzpqr
|
xyzpqrabcxyzpqr
|
||||||
xyzpqrabcxyzpqr\=aftertext
|
xyzpqrabcxyzpqr\=aftertext
|
||||||
|
|
||||||
/a\b/
|
/a\b/
|
||||||
a.\=allusedtext
|
a.\=allusedtext
|
||||||
a\=allusedtext
|
a\=allusedtext
|
||||||
|
|
||||||
/abc\Kxyz/
|
/abc\Kxyz/
|
||||||
abcxyz\=allusedtext
|
abcxyz\=allusedtext
|
||||||
|
@ -121,7 +121,45 @@
|
||||||
|
|
||||||
/abc(?=abcde)(?=ab)/allusedtext
|
/abc(?=abcde)(?=ab)/allusedtext
|
||||||
abcabcdefg
|
abcabcdefg
|
||||||
|
|
||||||
|
#subject allusedtext
|
||||||
|
|
||||||
|
/(?<=abc)123/
|
||||||
|
xyzabc123pqr
|
||||||
|
xyzabc12\=ps
|
||||||
|
xyzabc12\=ph
|
||||||
|
|
||||||
|
/\babc\b/
|
||||||
|
+++abc+++
|
||||||
|
+++ab\=ps
|
||||||
|
+++ab\=ph
|
||||||
|
|
||||||
|
/(?<=abc)def/
|
||||||
|
abc\=ph
|
||||||
|
|
||||||
|
/(?<=123)(*MARK:xx)abc/mark
|
||||||
|
xxxx123a\=ph
|
||||||
|
xxxx123a\=ps
|
||||||
|
|
||||||
|
/(?<=(?<=a)b)c.*/I
|
||||||
|
abc\=ph
|
||||||
|
\= Expect no match
|
||||||
|
xbc\=ph
|
||||||
|
|
||||||
|
/(?<=ab)c.*/I
|
||||||
|
abc\=ph
|
||||||
|
\= Expect no match
|
||||||
|
xbc\=ph
|
||||||
|
|
||||||
|
/abc(?<=bc)def/
|
||||||
|
xxxabcd\=ph
|
||||||
|
|
||||||
|
/(?<=ab)cdef/
|
||||||
|
xxabcd\=ph
|
||||||
|
|
||||||
|
#subject
|
||||||
|
# -------------------------------------------------------------------
|
||||||
|
|
||||||
# These tests provoke recursion loops, which give a different error message
|
# These tests provoke recursion loops, which give a different error message
|
||||||
# when JIT is used.
|
# when JIT is used.
|
||||||
|
|
||||||
|
@ -130,26 +168,26 @@
|
||||||
|
|
||||||
/(a|(?R))/I
|
/(a|(?R))/I
|
||||||
abcd
|
abcd
|
||||||
defg
|
defg
|
||||||
|
|
||||||
/(ab|(bc|(de|(?R))))/I
|
/(ab|(bc|(de|(?R))))/I
|
||||||
abcd
|
abcd
|
||||||
fghi
|
fghi
|
||||||
|
|
||||||
/(ab|(bc|(de|(?1))))/I
|
/(ab|(bc|(de|(?1))))/I
|
||||||
abcd
|
abcd
|
||||||
fghi
|
fghi
|
||||||
|
|
||||||
/x(ab|(bc|(de|(?1)x)x)x)/I
|
/x(ab|(bc|(de|(?1)x)x)x)/I
|
||||||
xab123
|
xab123
|
||||||
xfghi
|
xfghi
|
||||||
|
|
||||||
/(?!\w)(?R)/
|
/(?!\w)(?R)/
|
||||||
abcd
|
abcd
|
||||||
=abc
|
=abc
|
||||||
|
|
||||||
/(?=\w)(?R)/
|
/(?=\w)(?R)/
|
||||||
=abc
|
=abc
|
||||||
abcd
|
abcd
|
||||||
|
|
||||||
/(?<!\w)(?R)/
|
/(?<!\w)(?R)/
|
||||||
|
@ -160,12 +198,12 @@
|
||||||
|
|
||||||
/(a+|(?R)b)/
|
/(a+|(?R)b)/
|
||||||
aaa
|
aaa
|
||||||
bbb
|
bbb
|
||||||
|
|
||||||
/[^\xff]((?1))/BI
|
/[^\xff]((?1))/BI
|
||||||
abcd
|
abcd
|
||||||
|
|
||||||
# These tests don't behave the same with JIT
|
# These tests don't behave the same with JIT
|
||||||
|
|
||||||
/\w+(?C1)/BI,no_auto_possess
|
/\w+(?C1)/BI,no_auto_possess
|
||||||
abc\=callout_fail=1
|
abc\=callout_fail=1
|
||||||
|
@ -173,7 +211,7 @@
|
||||||
/(*NO_AUTO_POSSESS)\w+(?C1)/BI
|
/(*NO_AUTO_POSSESS)\w+(?C1)/BI
|
||||||
abc\=callout_fail=1
|
abc\=callout_fail=1
|
||||||
|
|
||||||
# This test breaks the JIT stack limit
|
# This test breaks the JIT stack limit
|
||||||
|
|
||||||
/(|]+){2,2452}/
|
/(|]+){2,2452}/
|
||||||
(|]+){2,2452}
|
(|]+){2,2452}
|
||||||
|
|
|
@ -486,7 +486,7 @@
|
||||||
def\=dfa_restart
|
def\=dfa_restart
|
||||||
|
|
||||||
/(?<=foo)bar/
|
/(?<=foo)bar/
|
||||||
foob\=ps,offset=2
|
foob\=ps,offset=2,allusedtext
|
||||||
foobar...\=ps,dfa_restart,offset=4
|
foobar...\=ps,dfa_restart,offset=4
|
||||||
foobar\=offset=2
|
foobar\=offset=2
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
|
@ -4415,12 +4415,12 @@
|
||||||
/abc\K123/
|
/abc\K123/
|
||||||
xyzabc123pqr
|
xyzabc123pqr
|
||||||
|
|
||||||
/(?<=abc)123/
|
/(?<=abc)123/allusedtext
|
||||||
xyzabc123pqr
|
xyzabc123pqr
|
||||||
xyzabc12\=ps
|
xyzabc12\=ps
|
||||||
xyzabc12\=ph
|
xyzabc12\=ph
|
||||||
|
|
||||||
/\babc\b/
|
/\babc\b/allusedtext
|
||||||
+++abc+++
|
+++abc+++
|
||||||
+++ab\=ps
|
+++ab\=ps
|
||||||
+++ab\=ph
|
+++ab\=ph
|
||||||
|
@ -4490,7 +4490,7 @@
|
||||||
/^(?(?!a(*SKIP)b))/
|
/^(?(?!a(*SKIP)b))/
|
||||||
ac
|
ac
|
||||||
|
|
||||||
/(?<=abc)def/
|
/(?<=abc)def/allusedtext
|
||||||
abc\=ph
|
abc\=ph
|
||||||
|
|
||||||
/abc$/
|
/abc$/
|
||||||
|
|
|
@ -45,7 +45,7 @@ Minimum heap limit = 0
|
||||||
Minimum match limit = 12
|
Minimum match limit = 12
|
||||||
Minimum depth limit = 3
|
Minimum depth limit = 3
|
||||||
0: aaaaaaaaa
|
0: aaaaaaaaa
|
||||||
|
|
||||||
/(a)(?1)++/
|
/(a)(?1)++/
|
||||||
aa\=find_limits
|
aa\=find_limits
|
||||||
Minimum heap limit = 0
|
Minimum heap limit = 0
|
||||||
|
@ -66,7 +66,7 @@ Minimum heap limit = 0
|
||||||
Minimum match limit = 24
|
Minimum match limit = 24
|
||||||
Minimum depth limit = 3
|
Minimum depth limit = 3
|
||||||
0: abbbbbbbbbbbbbbbbbbbbba
|
0: abbbbbbbbbbbbbbbbbbbbba
|
||||||
|
|
||||||
/a(?:.(*THEN))*?a/ims
|
/a(?:.(*THEN))*?a/ims
|
||||||
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
||||||
Minimum heap limit = 0
|
Minimum heap limit = 0
|
||||||
|
@ -207,9 +207,9 @@ No match
|
||||||
\= Expect limit exceeded
|
\= Expect limit exceeded
|
||||||
aaaaaaaaaaaaaz\=depth_limit=10
|
aaaaaaaaaaaaaz\=depth_limit=10
|
||||||
Failed: error -53: matching depth limit exceeded
|
Failed: error -53: matching depth limit exceeded
|
||||||
|
|
||||||
# These three have infinitely nested recursions.
|
# These three have infinitely nested recursions.
|
||||||
|
|
||||||
/((?2))((?1))/
|
/((?2))((?1))/
|
||||||
abc
|
abc
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
@ -221,7 +221,7 @@ Failed: error -52: nested recursion at the same subject position
|
||||||
/(?(R)a*(?1)|((?R))b)/
|
/(?(R)a*(?1)|((?R))b)/
|
||||||
aaaabcde
|
aaaabcde
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
|
||||||
# The allusedtext modifier does not work with JIT, which does not maintain
|
# The allusedtext modifier does not work with JIT, which does not maintain
|
||||||
# the leftchar/rightchar data.
|
# the leftchar/rightchar data.
|
||||||
|
|
||||||
|
@ -233,7 +233,7 @@ Failed: error -52: nested recursion at the same subject position
|
||||||
0: abcxyz
|
0: abcxyz
|
||||||
>>>
|
>>>
|
||||||
0+ xyzpqr
|
0+ xyzpqr
|
||||||
|
|
||||||
/(?<=pqr)abc(?=xyz)/allusedtext
|
/(?<=pqr)abc(?=xyz)/allusedtext
|
||||||
xyzpqrabcxyzpqr
|
xyzpqrabcxyzpqr
|
||||||
0: pqrabcxyz
|
0: pqrabcxyz
|
||||||
|
@ -242,12 +242,12 @@ Failed: error -52: nested recursion at the same subject position
|
||||||
0: pqrabcxyz
|
0: pqrabcxyz
|
||||||
<<< >>>
|
<<< >>>
|
||||||
0+ xyzpqr
|
0+ xyzpqr
|
||||||
|
|
||||||
/a\b/
|
/a\b/
|
||||||
a.\=allusedtext
|
a.\=allusedtext
|
||||||
0: a.
|
0: a.
|
||||||
>
|
>
|
||||||
a\=allusedtext
|
a\=allusedtext
|
||||||
0: a
|
0: a
|
||||||
|
|
||||||
/abc\Kxyz/
|
/abc\Kxyz/
|
||||||
|
@ -264,7 +264,80 @@ Failed: error -52: nested recursion at the same subject position
|
||||||
abcabcdefg
|
abcabcdefg
|
||||||
0: abcabcde
|
0: abcabcde
|
||||||
>>>>>
|
>>>>>
|
||||||
|
|
||||||
|
#subject allusedtext
|
||||||
|
|
||||||
|
/(?<=abc)123/
|
||||||
|
xyzabc123pqr
|
||||||
|
0: abc123
|
||||||
|
<<<
|
||||||
|
xyzabc12\=ps
|
||||||
|
Partial match: abc12
|
||||||
|
<<<
|
||||||
|
xyzabc12\=ph
|
||||||
|
Partial match: abc12
|
||||||
|
<<<
|
||||||
|
|
||||||
|
/\babc\b/
|
||||||
|
+++abc+++
|
||||||
|
0: +abc+
|
||||||
|
< >
|
||||||
|
+++ab\=ps
|
||||||
|
Partial match: +ab
|
||||||
|
<
|
||||||
|
+++ab\=ph
|
||||||
|
Partial match: +ab
|
||||||
|
<
|
||||||
|
|
||||||
|
/(?<=abc)def/
|
||||||
|
abc\=ph
|
||||||
|
Partial match: abc
|
||||||
|
<<<
|
||||||
|
|
||||||
|
/(?<=123)(*MARK:xx)abc/mark
|
||||||
|
xxxx123a\=ph
|
||||||
|
Partial match, mark=xx: 123a
|
||||||
|
<<<
|
||||||
|
xxxx123a\=ps
|
||||||
|
Partial match, mark=xx: 123a
|
||||||
|
<<<
|
||||||
|
|
||||||
|
/(?<=(?<=a)b)c.*/I
|
||||||
|
Capture group count = 0
|
||||||
|
Max lookbehind = 2
|
||||||
|
First code unit = 'c'
|
||||||
|
Subject length lower bound = 1
|
||||||
|
abc\=ph
|
||||||
|
Partial match: abc
|
||||||
|
<<
|
||||||
|
\= Expect no match
|
||||||
|
xbc\=ph
|
||||||
|
No match
|
||||||
|
|
||||||
|
/(?<=ab)c.*/I
|
||||||
|
Capture group count = 0
|
||||||
|
Max lookbehind = 2
|
||||||
|
First code unit = 'c'
|
||||||
|
Subject length lower bound = 1
|
||||||
|
abc\=ph
|
||||||
|
Partial match: abc
|
||||||
|
<<
|
||||||
|
\= Expect no match
|
||||||
|
xbc\=ph
|
||||||
|
No match
|
||||||
|
|
||||||
|
/abc(?<=bc)def/
|
||||||
|
xxxabcd\=ph
|
||||||
|
Partial match: abcd
|
||||||
|
|
||||||
|
/(?<=ab)cdef/
|
||||||
|
xxabcd\=ph
|
||||||
|
Partial match: abcd
|
||||||
|
<<
|
||||||
|
|
||||||
|
#subject
|
||||||
|
# -------------------------------------------------------------------
|
||||||
|
|
||||||
# These tests provoke recursion loops, which give a different error message
|
# These tests provoke recursion loops, which give a different error message
|
||||||
# when JIT is used.
|
# when JIT is used.
|
||||||
|
|
||||||
|
@ -282,7 +355,7 @@ Subject length lower bound = 0
|
||||||
abcd
|
abcd
|
||||||
0: a
|
0: a
|
||||||
1: a
|
1: a
|
||||||
defg
|
defg
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
|
||||||
/(ab|(bc|(de|(?R))))/I
|
/(ab|(bc|(de|(?R))))/I
|
||||||
|
@ -292,7 +365,7 @@ Subject length lower bound = 0
|
||||||
abcd
|
abcd
|
||||||
0: ab
|
0: ab
|
||||||
1: ab
|
1: ab
|
||||||
fghi
|
fghi
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
|
||||||
/(ab|(bc|(de|(?1))))/I
|
/(ab|(bc|(de|(?1))))/I
|
||||||
|
@ -302,7 +375,7 @@ Subject length lower bound = 0
|
||||||
abcd
|
abcd
|
||||||
0: ab
|
0: ab
|
||||||
1: ab
|
1: ab
|
||||||
fghi
|
fghi
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
|
||||||
/x(ab|(bc|(de|(?1)x)x)x)/I
|
/x(ab|(bc|(de|(?1)x)x)x)/I
|
||||||
|
@ -312,17 +385,17 @@ Subject length lower bound = 3
|
||||||
xab123
|
xab123
|
||||||
0: xab
|
0: xab
|
||||||
1: ab
|
1: ab
|
||||||
xfghi
|
xfghi
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
|
||||||
/(?!\w)(?R)/
|
/(?!\w)(?R)/
|
||||||
abcd
|
abcd
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
=abc
|
=abc
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
|
||||||
/(?=\w)(?R)/
|
/(?=\w)(?R)/
|
||||||
=abc
|
=abc
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
abcd
|
abcd
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
@ -339,7 +412,7 @@ Failed: error -52: nested recursion at the same subject position
|
||||||
aaa
|
aaa
|
||||||
0: aaa
|
0: aaa
|
||||||
1: aaa
|
1: aaa
|
||||||
bbb
|
bbb
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
|
||||||
/[^\xff]((?1))/BI
|
/[^\xff]((?1))/BI
|
||||||
|
@ -356,8 +429,8 @@ Capture group count = 1
|
||||||
Subject length lower bound = 1
|
Subject length lower bound = 1
|
||||||
abcd
|
abcd
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
|
||||||
# These tests don't behave the same with JIT
|
# These tests don't behave the same with JIT
|
||||||
|
|
||||||
/\w+(?C1)/BI,no_auto_possess
|
/\w+(?C1)/BI,no_auto_possess
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
@ -406,7 +479,7 @@ Subject length lower bound = 1
|
||||||
1 ^^ End of pattern
|
1 ^^ End of pattern
|
||||||
No match
|
No match
|
||||||
|
|
||||||
# This test breaks the JIT stack limit
|
# This test breaks the JIT stack limit
|
||||||
|
|
||||||
/(|]+){2,2452}/
|
/(|]+){2,2452}/
|
||||||
(|]+){2,2452}
|
(|]+){2,2452}
|
||||||
|
|
|
@ -9369,21 +9369,17 @@ Partial match: abc12
|
||||||
xyzabc123pqr
|
xyzabc123pqr
|
||||||
0: 123
|
0: 123
|
||||||
xyzabc12\=ps
|
xyzabc12\=ps
|
||||||
Partial match: abc12
|
Partial match: 12
|
||||||
<<<
|
|
||||||
xyzabc12\=ph
|
xyzabc12\=ph
|
||||||
Partial match: abc12
|
Partial match: 12
|
||||||
<<<
|
|
||||||
|
|
||||||
/\babc\b/
|
/\babc\b/
|
||||||
+++abc+++
|
+++abc+++
|
||||||
0: abc
|
0: abc
|
||||||
+++ab\=ps
|
+++ab\=ps
|
||||||
Partial match: +ab
|
Partial match: ab
|
||||||
<
|
|
||||||
+++ab\=ph
|
+++ab\=ph
|
||||||
Partial match: +ab
|
Partial match: ab
|
||||||
<
|
|
||||||
|
|
||||||
/(?&word)(?&element)(?(DEFINE)(?<element><[^m][^>]>[^<])(?<word>\w*+))/B
|
/(?&word)(?&element)(?(DEFINE)(?<element><[^m][^>]>[^<])(?<word>\w*+))/B
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
@ -10401,8 +10397,7 @@ No match
|
||||||
|
|
||||||
/(?<=abc)def/
|
/(?<=abc)def/
|
||||||
abc\=ph
|
abc\=ph
|
||||||
Partial match: abc
|
Partial match:
|
||||||
<<<
|
|
||||||
|
|
||||||
/abc$/
|
/abc$/
|
||||||
abc
|
abc
|
||||||
|
@ -11959,11 +11954,9 @@ Callout 2: last capture = 0
|
||||||
|
|
||||||
/(?<=123)(*MARK:xx)abc/mark
|
/(?<=123)(*MARK:xx)abc/mark
|
||||||
xxxx123a\=ph
|
xxxx123a\=ph
|
||||||
Partial match, mark=xx: 123a
|
Partial match, mark=xx: a
|
||||||
<<<
|
|
||||||
xxxx123a\=ps
|
xxxx123a\=ps
|
||||||
Partial match, mark=xx: 123a
|
Partial match, mark=xx: a
|
||||||
<<<
|
|
||||||
|
|
||||||
/123\Kabc/startchar
|
/123\Kabc/startchar
|
||||||
xxxx123a\=ph
|
xxxx123a\=ph
|
||||||
|
@ -17045,8 +17038,7 @@ Max lookbehind = 2
|
||||||
First code unit = 'c'
|
First code unit = 'c'
|
||||||
Subject length lower bound = 1
|
Subject length lower bound = 1
|
||||||
abc\=ph
|
abc\=ph
|
||||||
Partial match: abc
|
Partial match: c
|
||||||
<<
|
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
xbc\=ph
|
xbc\=ph
|
||||||
No match
|
No match
|
||||||
|
@ -17057,8 +17049,7 @@ Max lookbehind = 2
|
||||||
First code unit = 'c'
|
First code unit = 'c'
|
||||||
Subject length lower bound = 1
|
Subject length lower bound = 1
|
||||||
abc\=ph
|
abc\=ph
|
||||||
Partial match: abc
|
Partial match: c
|
||||||
<<
|
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
xbc\=ph
|
xbc\=ph
|
||||||
No match
|
No match
|
||||||
|
|
|
@ -876,7 +876,7 @@ Partial match: abc
|
||||||
0: def
|
0: def
|
||||||
|
|
||||||
/(?<=foo)bar/
|
/(?<=foo)bar/
|
||||||
foob\=ps,offset=2
|
foob\=ps,offset=2,allusedtext
|
||||||
Partial match: foob
|
Partial match: foob
|
||||||
<<<
|
<<<
|
||||||
foobar...\=ps,dfa_restart,offset=4
|
foobar...\=ps,dfa_restart,offset=4
|
||||||
|
@ -6803,9 +6803,10 @@ Partial match: dogs
|
||||||
xyzabc123pqr
|
xyzabc123pqr
|
||||||
Failed: error -42: pattern contains an item that is not supported for DFA matching
|
Failed: error -42: pattern contains an item that is not supported for DFA matching
|
||||||
|
|
||||||
/(?<=abc)123/
|
/(?<=abc)123/allusedtext
|
||||||
xyzabc123pqr
|
xyzabc123pqr
|
||||||
0: 123
|
0: abc123
|
||||||
|
<<<
|
||||||
xyzabc12\=ps
|
xyzabc12\=ps
|
||||||
Partial match: abc12
|
Partial match: abc12
|
||||||
<<<
|
<<<
|
||||||
|
@ -6813,9 +6814,10 @@ Partial match: abc12
|
||||||
Partial match: abc12
|
Partial match: abc12
|
||||||
<<<
|
<<<
|
||||||
|
|
||||||
/\babc\b/
|
/\babc\b/allusedtext
|
||||||
+++abc+++
|
+++abc+++
|
||||||
0: abc
|
0: +abc+
|
||||||
|
< >
|
||||||
+++ab\=ps
|
+++ab\=ps
|
||||||
Partial match: +ab
|
Partial match: +ab
|
||||||
<
|
<
|
||||||
|
@ -6932,7 +6934,7 @@ Failed: error -42: pattern contains an item that is not supported for DFA matchi
|
||||||
ac
|
ac
|
||||||
Failed: error -42: pattern contains an item that is not supported for DFA matching
|
Failed: error -42: pattern contains an item that is not supported for DFA matching
|
||||||
|
|
||||||
/(?<=abc)def/
|
/(?<=abc)def/allusedtext
|
||||||
abc\=ph
|
abc\=ph
|
||||||
Partial match: abc
|
Partial match: abc
|
||||||
<<<
|
<<<
|
||||||
|
|
Loading…
Reference in New Issue