Add callout_flags to callout blocks, and set bits within it from pcre2_match()
interpretation.
This commit is contained in:
parent
814cc96bc5
commit
94d5f4a050
54
ChangeLog
54
ChangeLog
|
@ -16,10 +16,10 @@ that is called by both pcre2_match() and pcre2_dfa_match().
|
||||||
4. Add new pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and
|
4. Add new pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and
|
||||||
PCRE2_CONFIG_COMPILED_WIDTHS.
|
PCRE2_CONFIG_COMPILED_WIDTHS.
|
||||||
|
|
||||||
5. Cut out \C tests in the JIT regression tests when NEVER_BACKSLASH_C is
|
5. Cut out \C tests in the JIT regression tests when NEVER_BACKSLASH_C is
|
||||||
defined (e.g. by --enable-never-backslash-C).
|
defined (e.g. by --enable-never-backslash-C).
|
||||||
|
|
||||||
6. Defined public names for all the pcre2_compile() error numbers, and used
|
6. Defined public names for all the pcre2_compile() error numbers, and used
|
||||||
the public names in pcre2_convert.c.
|
the public names in pcre2_convert.c.
|
||||||
|
|
||||||
7. Fixed a small memory leak in pcre2test (convert contexts).
|
7. Fixed a small memory leak in pcre2test (convert contexts).
|
||||||
|
@ -30,8 +30,8 @@ the public names in pcre2_convert.c.
|
||||||
PCRE2GREP_RC to the exit status, because VMS does not distinguish between
|
PCRE2GREP_RC to the exit status, because VMS does not distinguish between
|
||||||
exit(0) and exit(1).
|
exit(0) and exit(1).
|
||||||
|
|
||||||
10. Added the -LM (list modifiers) option to pcre2test. Also made -C complain
|
10. Added the -LM (list modifiers) option to pcre2test. Also made -C complain
|
||||||
about a bad option only if the following argument item does not start with a
|
about a bad option only if the following argument item does not start with a
|
||||||
hyphen.
|
hyphen.
|
||||||
|
|
||||||
11. pcre2grep was truncating components of file names to 128 characters when
|
11. pcre2grep was truncating components of file names to 128 characters when
|
||||||
|
@ -39,20 +39,20 @@ processing files with the -r option, and also (some very odd code) truncating
|
||||||
path names to 512 characters. There is now a check on the absolute length of
|
path names to 512 characters. There is now a check on the absolute length of
|
||||||
full path file names, which may be up to 2047 characters long.
|
full path file names, which may be up to 2047 characters long.
|
||||||
|
|
||||||
12. When an assertion contained (*ACCEPT) it caused all open capturing groups
|
12. When an assertion contained (*ACCEPT) it caused all open capturing groups
|
||||||
to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to
|
to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to
|
||||||
misbehaviour for subsequent references to groups that started outside the
|
misbehaviour for subsequent references to groups that started outside the
|
||||||
recursion. ACCEPT in an assertion now closes only those groups that were
|
recursion. ACCEPT in an assertion now closes only those groups that were
|
||||||
started within that assertion. Fixes oss-fuzz issues 3852 and 3891.
|
started within that assertion. Fixes oss-fuzz issues 3852 and 3891.
|
||||||
|
|
||||||
13. Multiline matching in pcre2grep was misbehaving if the pattern matched
|
13. Multiline matching in pcre2grep was misbehaving if the pattern matched
|
||||||
within a line, and then matched again at the end of the line and over into
|
within a line, and then matched again at the end of the line and over into
|
||||||
subsequent lines. Behaviour was different with and without colouring, and
|
subsequent lines. Behaviour was different with and without colouring, and
|
||||||
sometimes context lines were incorrectly printed and/or line endings were lost.
|
sometimes context lines were incorrectly printed and/or line endings were lost.
|
||||||
All these issues should now be fixed.
|
All these issues should now be fixed.
|
||||||
|
|
||||||
14. If --line-buffered was specified for pcre2grep when input was from a
|
14. If --line-buffered was specified for pcre2grep when input was from a
|
||||||
compressed file (.gz or .bz2) a segfault occurred. (Line buffering should be
|
compressed file (.gz or .bz2) a segfault occurred. (Line buffering should be
|
||||||
ignored for compressed files.)
|
ignored for compressed files.)
|
||||||
|
|
||||||
15. Although pcre2_jit_match checks whether the pattern is compiled
|
15. Although pcre2_jit_match checks whether the pattern is compiled
|
||||||
|
@ -60,26 +60,26 @@ in a given mode, it was also expected that at least one mode is available.
|
||||||
This is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION
|
This is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION
|
||||||
when the pattern is not optimized by JIT at all.
|
when the pattern is not optimized by JIT at all.
|
||||||
|
|
||||||
16. The line number and related variables such as match counts in pcre2grep
|
16. The line number and related variables such as match counts in pcre2grep
|
||||||
were all int variables, causing overflow when files with more than 2147483647
|
were all int variables, causing overflow when files with more than 2147483647
|
||||||
lines were processed (assuming 32-bit ints). They have all been changed to
|
lines were processed (assuming 32-bit ints). They have all been changed to
|
||||||
unsigned long ints.
|
unsigned long ints.
|
||||||
|
|
||||||
17. If a backreference with a minimum repeat count of zero was first in a
|
17. If a backreference with a minimum repeat count of zero was first in a
|
||||||
pattern, apart from assertions, an incorrect first matching character could be
|
pattern, apart from assertions, an incorrect first matching character could be
|
||||||
recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set
|
recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set
|
||||||
as the first character of a match.
|
as the first character of a match.
|
||||||
|
|
||||||
18. Characters in a leading positive assertion are considered for recording a
|
18. Characters in a leading positive assertion are considered for recording a
|
||||||
first character of a match when the rest of the pattern does not provide one.
|
first character of a match when the rest of the pattern does not provide one.
|
||||||
However, a character in a non-assertive group within a leading assertion such
|
However, a character in a non-assertive group within a leading assertion such
|
||||||
as in the pattern /(?=(a))\1?b/ caused this process to fail. This was an
|
as in the pattern /(?=(a))\1?b/ caused this process to fail. This was an
|
||||||
infelicity rather than an outright bug, because it did not affect the result of
|
infelicity rather than an outright bug, because it did not affect the result of
|
||||||
a match, just its speed. (In fact, in this case, the starting 'a' was
|
a match, just its speed. (In fact, in this case, the starting 'a' was
|
||||||
subsequently picked up in the study.)
|
subsequently picked up in the study.)
|
||||||
|
|
||||||
19. A minor tidy in pcre2_match(): making all PCRE2_ERROR_ returns use "return"
|
19. A minor tidy in pcre2_match(): making all PCRE2_ERROR_ returns use "return"
|
||||||
instead of "RRETURN" saves unwinding the backtracks in these cases (only one
|
instead of "RRETURN" saves unwinding the backtracks in these cases (only one
|
||||||
didn't).
|
didn't).
|
||||||
|
|
||||||
20. Allocate a single callout block on the stack at the start of pcre2_match()
|
20. Allocate a single callout block on the stack at the start of pcre2_match()
|
||||||
|
@ -89,6 +89,12 @@ and set its never-changing fields once only.
|
||||||
compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS
|
compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS
|
||||||
to retrieve them, and update pcre2test to show them.
|
to retrieve them, and update pcre2test to show them.
|
||||||
|
|
||||||
|
22. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new
|
||||||
|
field callout_flags in callout blocks. The bits are set by pcre2_match(), but
|
||||||
|
not by JIT or pcre2_dfa_match(). Their settings are shown in pcre2test callouts
|
||||||
|
if the callout_extra subject modifier is set. These bits are provided to help
|
||||||
|
with tracking how a backtracking match is proceeding.
|
||||||
|
|
||||||
|
|
||||||
Version 10.30 14-August-2017
|
Version 10.30 14-August-2017
|
||||||
----------------------------
|
----------------------------
|
||||||
|
|
|
@ -30,7 +30,13 @@ DESCRIPTION
|
||||||
<P>
|
<P>
|
||||||
This function matches a compiled regular expression against a given subject
|
This function matches a compiled regular expression against a given subject
|
||||||
string, using a matching algorithm that is similar to Perl's. It returns
|
string, using a matching algorithm that is similar to Perl's. It returns
|
||||||
offsets to captured substrings. Its arguments are:
|
offsets to what it has matched and to captured substrings via the
|
||||||
|
<b>match_data</b> block, which can be processed by functions with names that
|
||||||
|
start with <b>pcre2_get_ovector_...()</b> or <b>pcre2_substring_...()</b>. The
|
||||||
|
return from <b>pcre2_match()</b> is one more than the highest numbered capturing
|
||||||
|
pair that has been set (for example, 1 if there are no captures), zero if the
|
||||||
|
vector of offsets is too small, or a negative error code for no match and other
|
||||||
|
errors. The function arguments are:
|
||||||
<pre>
|
<pre>
|
||||||
<i>code</i> Points to the compiled pattern
|
<i>code</i> Points to the compiled pattern
|
||||||
<i>subject</i> Points to the subject string
|
<i>subject</i> Points to the subject string
|
||||||
|
|
|
@ -27,7 +27,7 @@ DESCRIPTION
|
||||||
<P>
|
<P>
|
||||||
This function returns information about a compiled pattern. Its arguments are:
|
This function returns information about a compiled pattern. Its arguments are:
|
||||||
<pre>
|
<pre>
|
||||||
<i>code</i> Pointer to a compiled regular expression
|
<i>code</i> Pointer to a compiled regular expression pattern
|
||||||
<i>what</i> What information is required
|
<i>what</i> What information is required
|
||||||
<i>where</i> Where to put the information
|
<i>where</i> Where to put the information
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -42,6 +42,8 @@ request are as follows:
|
||||||
PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
|
PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
|
||||||
PCRE2_INFO_CAPTURECOUNT Number of capturing subpatterns
|
PCRE2_INFO_CAPTURECOUNT Number of capturing subpatterns
|
||||||
PCRE2_INFO_DEPTHLIMIT Backtracking depth limit if set, otherwise PCRE2_ERROR_UNSET
|
PCRE2_INFO_DEPTHLIMIT Backtracking depth limit if set, otherwise PCRE2_ERROR_UNSET
|
||||||
|
PCRE2_INFO_EXTRAOPTIONS Extra options that were passed in the
|
||||||
|
compile context
|
||||||
PCRE2_INFO_FIRSTBITMAP Bitmap of first code units, or NULL
|
PCRE2_INFO_FIRSTBITMAP Bitmap of first code units, or NULL
|
||||||
PCRE2_INFO_FIRSTCODETYPE Type of start-of-match information
|
PCRE2_INFO_FIRSTCODETYPE Type of start-of-match information
|
||||||
0 nothing set
|
0 nothing set
|
||||||
|
|
|
@ -920,11 +920,15 @@ The <i>offset_limit</i> parameter limits how far an unanchored search can
|
||||||
advance in the subject string. The default value is PCRE2_UNSET. The
|
advance in the subject string. The default value is PCRE2_UNSET. The
|
||||||
<b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> functions return
|
<b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> functions return
|
||||||
PCRE2_ERROR_NOMATCH if a match with a starting point before or at the given
|
PCRE2_ERROR_NOMATCH if a match with a starting point before or at the given
|
||||||
offset is not found. For example, if the pattern /abc/ is matched against
|
offset is not found. The <b>pcre2_substitute()</b> function makes no more
|
||||||
"123abc" with an offset limit less than 3, the result is PCRE2_ERROR_NO_MATCH.
|
substitutions.
|
||||||
A match can never be found if the <i>startoffset</i> argument of
|
</P>
|
||||||
<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> is greater than the offset
|
<P>
|
||||||
limit.
|
For example, if the pattern /abc/ is matched against "123abc" with an offset
|
||||||
|
limit less than 3, the result is PCRE2_ERROR_NO_MATCH. A match can never be
|
||||||
|
found if the <i>startoffset</i> argument of <b>pcre2_match()</b>,
|
||||||
|
<b>pcre2_dfa_match()</b>, or <b>pcre2_substitute()</b> is greater than the offset
|
||||||
|
limit set in the match context.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
|
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
|
||||||
|
@ -934,10 +938,11 @@ PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The offset limit facility can be used to track progress when searching large
|
The offset limit facility can be used to track progress when searching large
|
||||||
subject strings. See also the PCRE2_FIRSTLINE option, which requires a match to
|
subject strings or to limit the extent of global substitutions. See also the
|
||||||
start within the first line of the subject. If this is set with an offset
|
PCRE2_FIRSTLINE option, which requires a match to start within the first line
|
||||||
limit, a match must occur in the first line and also within the offset limit.
|
of the subject. If this is set with an offset limit, a match must occur in the
|
||||||
In other words, whichever limit comes first is used.
|
first line and also within the offset limit. In other words, whichever limit
|
||||||
|
comes first is used.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
<b>int pcre2_set_heap_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
<b>int pcre2_set_heap_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||||
|
@ -1940,12 +1945,15 @@ are as follows:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_ALLOPTIONS
|
PCRE2_INFO_ALLOPTIONS
|
||||||
PCRE2_INFO_ARGOPTIONS
|
PCRE2_INFO_ARGOPTIONS
|
||||||
|
PCRE2_INFO_EXTRAOPTIONS
|
||||||
</pre>
|
</pre>
|
||||||
Return a copy of the pattern's options. The third argument should point to a
|
Return copies of the pattern's options. The third argument should point to a
|
||||||
<b>uint32_t</b> variable. PCRE2_INFO_ARGOPTIONS returns exactly the options that
|
<b>uint32_t</b> variable. PCRE2_INFO_ARGOPTIONS returns exactly the options that
|
||||||
were passed to <b>pcre2_compile()</b>, whereas PCRE2_INFO_ALLOPTIONS returns
|
were passed to <b>pcre2_compile()</b>, whereas PCRE2_INFO_ALLOPTIONS returns
|
||||||
the compile options as modified by any top-level (*XXX) option settings such as
|
the compile options as modified by any top-level (*XXX) option settings such as
|
||||||
(*UTF) at the start of the pattern itself.
|
(*UTF) at the start of the pattern itself. PCRE2_INFO_EXTRAOPTIONS returns the
|
||||||
|
extra options that were set in the compile context by calling the
|
||||||
|
pcre2_set_compile_extra_options() function.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED
|
For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED
|
||||||
|
@ -3157,13 +3165,27 @@ options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
|
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
|
||||||
replacing every matching substring. If this is not set, only the first matching
|
replacing every matching substring. If this option is not set, only the first
|
||||||
substring is replaced. If any matched substring has zero length, after the
|
matching substring is replaced. The search for matches takes place in the
|
||||||
substitution has happened, an attempt to find a non-empty match at the same
|
original subject string (that is, previous replacements do not affect it).
|
||||||
position is performed. If this is not successful, the current position is
|
Iteration is implemented by advancing the <i>startoffset</i> value for each
|
||||||
advanced by one character except when CRLF is a valid newline sequence and the
|
search, which is always passed the entire subject string. If an offset limit is
|
||||||
next two characters are CR, LF. In this case, the current position is advanced
|
set in the match context, searching stops when that limit is reached.
|
||||||
by two characters.
|
</P>
|
||||||
|
<P>
|
||||||
|
You can restrict the effect of a global substitution to a portion of the
|
||||||
|
subject string by setting either or both of <i>startoffset</i> and an offset
|
||||||
|
limit. Here is a \fPpcre2test\fP example:
|
||||||
|
<pre>
|
||||||
|
/B/g,replace=!,use_offset_limit
|
||||||
|
ABC ABC ABC ABC\=offset=3,offset_limit=12
|
||||||
|
2: ABC A!C A!C ABC
|
||||||
|
</pre>
|
||||||
|
When continuing with global substitutions after matching a substring with zero
|
||||||
|
length, an attempt to find a non-empty match at the same offset is performed.
|
||||||
|
If this is not successful, the offset is advanced by one character except when
|
||||||
|
CRLF is a valid newline sequence and the next two characters are CR, LF. In
|
||||||
|
this case, the offset is advanced by two characters.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
|
||||||
|
@ -3398,7 +3420,7 @@ Here is an example of a simple call to <b>pcre2_dfa_match()</b>:
|
||||||
11, /* the length of the subject string */
|
11, /* the length of the subject string */
|
||||||
0, /* start at offset 0 in the subject */
|
0, /* start at offset 0 in the subject */
|
||||||
0, /* default options */
|
0, /* default options */
|
||||||
match_data, /* the match data block */
|
md, /* the match data block */
|
||||||
NULL, /* a match context; NULL means use defaults */
|
NULL, /* a match context; NULL means use defaults */
|
||||||
wspace, /* working space vector */
|
wspace, /* working space vector */
|
||||||
20); /* number of elements (NOT size in bytes) */
|
20); /* number of elements (NOT size in bytes) */
|
||||||
|
@ -3567,7 +3589,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 13 October 2017
|
Last updated: 16 December 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -206,18 +206,20 @@ callouts such as the example above are obeyed.
|
||||||
<br><a name="SEC4" href="#TOC1">THE CALLOUT INTERFACE</a><br>
|
<br><a name="SEC4" href="#TOC1">THE CALLOUT INTERFACE</a><br>
|
||||||
<P>
|
<P>
|
||||||
During matching, when PCRE2 reaches a callout point, if an external function is
|
During matching, when PCRE2 reaches a callout point, if an external function is
|
||||||
provided in the match context, it is called. This applies to both normal and
|
provided in the match context, it is called. This applies to both normal,
|
||||||
DFA matching. The first argument to the callout function is a pointer to a
|
DFA, and JIT matching. The first argument to the callout function is a pointer
|
||||||
<b>pcre2_callout</b> block. The second argument is the void * callout data that
|
to a <b>pcre2_callout</b> block. The second argument is the void * callout data
|
||||||
was supplied when the callout was set up by calling <b>pcre2_set_callout()</b>
|
that was supplied when the callout was set up by calling
|
||||||
(see the
|
<b>pcre2_set_callout()</b> (see the
|
||||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
documentation). The callout block structure contains the following fields:
|
documentation). The callout block structure contains the following fields, not
|
||||||
|
necessarily in this order:
|
||||||
<pre>
|
<pre>
|
||||||
uint32_t <i>version</i>;
|
uint32_t <i>version</i>;
|
||||||
uint32_t <i>callout_number</i>;
|
uint32_t <i>callout_number</i>;
|
||||||
uint32_t <i>capture_top</i>;
|
uint32_t <i>capture_top</i>;
|
||||||
uint32_t <i>capture_last</i>;
|
uint32_t <i>capture_last</i>;
|
||||||
|
uint32_t <i>callout_flags</i>;
|
||||||
PCRE2_SIZE *<i>offset_vector</i>;
|
PCRE2_SIZE *<i>offset_vector</i>;
|
||||||
PCRE2_SPTR <i>mark</i>;
|
PCRE2_SPTR <i>mark</i>;
|
||||||
PCRE2_SPTR <i>subject</i>;
|
PCRE2_SPTR <i>subject</i>;
|
||||||
|
@ -231,11 +233,12 @@ documentation). The callout block structure contains the following fields:
|
||||||
PCRE2_SPTR <i>callout_string</i>;
|
PCRE2_SPTR <i>callout_string</i>;
|
||||||
</pre>
|
</pre>
|
||||||
The <i>version</i> field contains the version number of the block format. The
|
The <i>version</i> field contains the version number of the block format. The
|
||||||
current version is 1; the three callout string fields were added for this
|
current version is 2; the three callout string fields were added for version 1,
|
||||||
version. If you are writing an application that might use an earlier release of
|
and the <i>callout_flags</i> field for version 2. If you are writing an
|
||||||
PCRE2, you should check the version number before accessing any of these
|
application that might use an earlier release of PCRE2, you should check the
|
||||||
fields. The version number will increase in future if more fields are added,
|
version number before accessing any of these fields. The version number will
|
||||||
but the intention is never to remove any of the existing fields.
|
increase in future if more fields are added, but the intention is never to
|
||||||
|
remove any of the existing fields.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Fields for numerical callouts
|
Fields for numerical callouts
|
||||||
|
@ -358,6 +361,36 @@ the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
|
||||||
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
|
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
|
||||||
callouts from the DFA matching function this field always contains NULL.
|
callouts from the DFA matching function this field always contains NULL.
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
The <i>callout_flags</i> field is always zero in callouts from
|
||||||
|
<b>pcre2_dfa_match()</b> or when JIT is being used. When <b>pcre2_match()</b>
|
||||||
|
without JIT is used, the following bits may be set:
|
||||||
|
<pre>
|
||||||
|
PCRE2_CALLOUT_STARTMATCH
|
||||||
|
</pre>
|
||||||
|
This is set for the first callout after the start of matching for each new
|
||||||
|
starting position in the subject.
|
||||||
|
<pre>
|
||||||
|
PCRE2_CALLOUT_BACKTRACK
|
||||||
|
</pre>
|
||||||
|
This is set if there has been a matching backtrack since the previous callout,
|
||||||
|
or since the start of matching if this is the first callout from a
|
||||||
|
<b>pcre2_match()</b> run.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Both bits are set when a backtrack has caused a "bumpalong" to a new starting
|
||||||
|
position in the subject. Output from <b>pcre2test</b> does not indicate the
|
||||||
|
presence of these bits unless the <b>callout_extra</b> modifier is set.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The information in the <b>callout_flags</b> field is provided so that
|
||||||
|
applications can track and tell their users how matching with backtracking is
|
||||||
|
done. This can be useful when trying to optimize patterns, or just to
|
||||||
|
understand how PCRE2 works. There is no support in <b>pcre2_dfa_match()</b>
|
||||||
|
because there is no backtracking in DFA matching, and there is no support in
|
||||||
|
JIT because JIT is all about maximimizing matching performance. In both these
|
||||||
|
cases the <b>callout_flags</b> field is always zero.
|
||||||
|
</P>
|
||||||
<br><a name="SEC5" href="#TOC1">RETURN VALUES FROM CALLOUTS</a><br>
|
<br><a name="SEC5" href="#TOC1">RETURN VALUES FROM CALLOUTS</a><br>
|
||||||
<P>
|
<P>
|
||||||
The external callout function returns an integer to PCRE2. If the value is
|
The external callout function returns an integer to PCRE2. If the value is
|
||||||
|
@ -428,7 +461,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 14 April 2017
|
Last updated: 22 December 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -133,11 +133,13 @@ The <b>--locale</b> option can be used to override this.
|
||||||
<br><a name="SEC3" href="#TOC1">SUPPORT FOR COMPRESSED FILES</a><br>
|
<br><a name="SEC3" href="#TOC1">SUPPORT FOR COMPRESSED FILES</a><br>
|
||||||
<P>
|
<P>
|
||||||
It is possible to compile <b>pcre2grep</b> so that it uses <b>libz</b> or
|
It is possible to compile <b>pcre2grep</b> so that it uses <b>libz</b> or
|
||||||
<b>libbz2</b> to read files whose names end in <b>.gz</b> or <b>.bz2</b>,
|
<b>libbz2</b> to read compressed files whose names end in <b>.gz</b> or
|
||||||
respectively. You can find out whether your binary has support for one or both
|
<b>.bz2</b>, respectively. You can find out whether your <b>pcre2grep</b> binary
|
||||||
of these file types by running it with the <b>--help</b> option. If the
|
has support for one or both of these file types by running it with the
|
||||||
appropriate support is not present, files are treated as plain text. The
|
<b>--help</b> option. If the appropriate support is not present, all files are
|
||||||
standard input is always so treated.
|
treated as plain text. The standard input is always so treated. When input is
|
||||||
|
from a compressed .gz or .bz2 file, the <b>--line-buffered</b> option is
|
||||||
|
ignored.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
|
<br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -151,7 +153,7 @@ of changing the way binary files are handled.
|
||||||
<br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
|
<br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
The order in which some of the options appear can affect the output. For
|
The order in which some of the options appear can affect the output. For
|
||||||
example, both the <b>-h</b> and <b>-l</b> options affect the printing of file
|
example, both the <b>-H</b> and <b>-l</b> options affect the printing of file
|
||||||
names. Whichever comes later in the command line will be the one that takes
|
names. Whichever comes later in the command line will be the one that takes
|
||||||
effect. Similarly, except where noted below, if an option is given twice, the
|
effect. Similarly, except where noted below, if an option is given twice, the
|
||||||
later setting is used. Numerical values for options may be followed by K or M,
|
later setting is used. Numerical values for options may be followed by K or M,
|
||||||
|
@ -396,14 +398,16 @@ searching a single file. By default, the file name is not shown in this case.
|
||||||
For matching lines, the file name is followed by a colon; for context lines, a
|
For matching lines, the file name is followed by a colon; for context lines, a
|
||||||
hyphen separator is used. If a line number is also being output, it follows the
|
hyphen separator is used. If a line number is also being output, it follows the
|
||||||
file name. When the <b>-M</b> option causes a pattern to match more than one
|
file name. When the <b>-M</b> option causes a pattern to match more than one
|
||||||
line, only the first is preceded by the file name.
|
line, only the first is preceded by the file name. This option overrides any
|
||||||
|
previous <b>-h</b>, <b>-l</b>, or <b>-L</b> options.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-h</b>, <b>--no-filename</b>
|
<b>-h</b>, <b>--no-filename</b>
|
||||||
Suppress the output file names when searching multiple files. By default,
|
Suppress the output file names when searching multiple files. By default,
|
||||||
file names are shown when multiple files are searched. For matching lines, the
|
file names are shown when multiple files are searched. For matching lines, the
|
||||||
file name is followed by a colon; for context lines, a hyphen separator is used.
|
file name is followed by a colon; for context lines, a hyphen separator is used.
|
||||||
If a line number is also being output, it follows the file name.
|
If a line number is also being output, it follows the file name. This option
|
||||||
|
overrides any previous <b>-H</b>, <b>-L</b>, or <b>-l</b> options.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>--heap-limit</b>=<i>number</i>
|
<b>--heap-limit</b>=<i>number</i>
|
||||||
|
@ -460,17 +464,19 @@ given any number of times. If a directory matches both <b>--include-dir</b> and
|
||||||
<b>-L</b>, <b>--files-without-match</b>
|
<b>-L</b>, <b>--files-without-match</b>
|
||||||
Instead of outputting lines from the files, just output the names of the files
|
Instead of outputting lines from the files, just output the names of the files
|
||||||
that do not contain any lines that would have been output. Each file name is
|
that do not contain any lines that would have been output. Each file name is
|
||||||
output once, on a separate line.
|
output once, on a separate line. This option overrides any previous <b>-H</b>,
|
||||||
|
<b>-h</b>, or <b>-l</b> options.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-l</b>, <b>--files-with-matches</b>
|
<b>-l</b>, <b>--files-with-matches</b>
|
||||||
Instead of outputting lines from the files, just output the names of the files
|
Instead of outputting lines from the files, just output the names of the files
|
||||||
containing lines that would have been output. Each file name is output
|
containing lines that would have been output. Each file name is output once, on
|
||||||
once, on a separate line. Searching normally stops as soon as a matching line
|
a separate line. Searching normally stops as soon as a matching line is found
|
||||||
is found in a file. However, if the <b>-c</b> (count) option is also used,
|
in a file. However, if the <b>-c</b> (count) option is also used, matching
|
||||||
matching continues in order to obtain the correct count, and those files that
|
continues in order to obtain the correct count, and those files that have at
|
||||||
have at least one match are listed along with their counts. Using this option
|
least one match are listed along with their counts. Using this option with
|
||||||
with <b>-c</b> is a way of suppressing the listing of files with no matches.
|
<b>-c</b> is a way of suppressing the listing of files with no matches. This
|
||||||
|
opeion overrides any previous <b>-H</b>, <b>-h</b>, or <b>-L</b> options.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>--label</b>=<i>name</i>
|
<b>--label</b>=<i>name</i>
|
||||||
|
@ -480,14 +486,16 @@ short form for this option.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>--line-buffered</b>
|
<b>--line-buffered</b>
|
||||||
When this option is given, input is read and processed line by line, and the
|
When this option is given, non-compressed input is read and processed line by
|
||||||
output is flushed after each write. By default, input is read in large chunks,
|
line, and the output is flushed after each write. By default, input is read in
|
||||||
unless <b>pcre2grep</b> can determine that it is reading from a terminal (which
|
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
|
||||||
is currently possible only in Unix-like environments). Output to terminal is
|
terminal (which is currently possible only in Unix-like environments). Output
|
||||||
normally automatically flushed by the operating system. This option can be
|
to terminal is normally automatically flushed by the operating system. This
|
||||||
useful when the input or output is attached to a pipe and you do not want
|
option can be useful when the input or output is attached to a pipe and you do
|
||||||
<b>pcre2grep</b> to buffer up large amounts of data. However, its use will
|
not want <b>pcre2grep</b> to buffer up large amounts of data. However, its use
|
||||||
affect performance, and the <b>-M</b> (multiline) option ceases to work.
|
will affect performance, and the <b>-M</b> (multiline) option ceases to work.
|
||||||
|
When input is from a compressed .gz or .bz2 file, <b>--line-buffered</b> is
|
||||||
|
ignored.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>--line-offsets</b>
|
<b>--line-offsets</b>
|
||||||
|
@ -941,7 +949,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC15" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC15" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 11 October 2017
|
Last updated: 13 November 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -159,6 +159,12 @@ Behave as if each pattern has the <b>auto_callout</b> modifier, that is, insert
|
||||||
automatic callouts into every pattern that is compiled.
|
automatic callouts into every pattern that is compiled.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
<b>-AC</b>
|
||||||
|
As for <b>-ac</b>, but in addition behave as if each subject line has the
|
||||||
|
<b>callout_extra</b> modifier, that is, show additional information from
|
||||||
|
callouts.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
<b>-b</b>
|
<b>-b</b>
|
||||||
Behave as if each pattern has the <b>fullbincode</b> modifier; the full
|
Behave as if each pattern has the <b>fullbincode</b> modifier; the full
|
||||||
internal binary form of the pattern is output after compilation.
|
internal binary form of the pattern is output after compilation.
|
||||||
|
@ -243,8 +249,8 @@ available, and the use of JIT is verified.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-LM</b>
|
<b>-LM</b>
|
||||||
List modifiers: write a list of available pattern and subject modifiers to the
|
List modifiers: write a list of available pattern and subject modifiers to the
|
||||||
standard output, then exit with zero exit code. All other options are ignored.
|
standard output, then exit with zero exit code. All other options are ignored.
|
||||||
If both -C and -LM are present, whichever is first is recognized.
|
If both -C and -LM are present, whichever is first is recognized.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -1182,6 +1188,7 @@ pattern.
|
||||||
callout_capture show captures at callout time
|
callout_capture show captures at callout time
|
||||||
callout_data=<n> set a value to pass via callouts
|
callout_data=<n> set a value to pass via callouts
|
||||||
callout_error=<n>[:<m>] control callout error
|
callout_error=<n>[:<m>] control callout error
|
||||||
|
callout_extra show extra callout information
|
||||||
callout_fail=<n>[:<m>] control callout failure
|
callout_fail=<n>[:<m>] control callout failure
|
||||||
callout_no_where do not show position of a callout
|
callout_no_where do not show position of a callout
|
||||||
callout_none do not supply a callout function
|
callout_none do not supply a callout function
|
||||||
|
@ -1694,49 +1701,10 @@ documentation.
|
||||||
<br><a name="SEC16" href="#TOC1">CALLOUTS</a><br>
|
<br><a name="SEC16" href="#TOC1">CALLOUTS</a><br>
|
||||||
<P>
|
<P>
|
||||||
If the pattern contains any callout requests, <b>pcre2test</b>'s callout
|
If the pattern contains any callout requests, <b>pcre2test</b>'s callout
|
||||||
function is called during matching unless <b>callout_none</b> is specified.
|
function is called during matching unless <b>callout_none</b> is specified. This
|
||||||
This works with both matching functions.
|
works with both matching functions, and with JIT, though there are some
|
||||||
</P>
|
differences in behaviour. The output for callouts with numerical arguments and
|
||||||
<P>
|
those with string arguments is slightly different.
|
||||||
The callout function in <b>pcre2test</b> returns zero (carry on matching) by
|
|
||||||
default, but you can use a <b>callout_fail</b> modifier in a subject line to
|
|
||||||
change this and other parameters of the callout.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
If <b>callout_capture</b> is set, the current captured groups are output when a
|
|
||||||
callout occurs. By default, the callout function then generates output that
|
|
||||||
indicates where the current match start and matching points are in the subject,
|
|
||||||
and what the next pattern item is. This output is suppressed if the
|
|
||||||
<b>callout_no_where</b> modifier is set.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
The default return from the callout function is zero, which allows matching to
|
|
||||||
continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
|
|
||||||
there is only one number, 1 is returned instead of 0 (causing matching to
|
|
||||||
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
|
|
||||||
are given, 1 is returned when callout <n> is reached and there have been at
|
|
||||||
least <m> callouts. The <b>callout_error</b> modifier is similar, except that
|
|
||||||
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
|
||||||
aborted. If both these modifiers are set for the same callout number,
|
|
||||||
<b>callout_error</b> takes precedence. Note that callouts with string arguments
|
|
||||||
are always given the number zero. See
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
|
|
||||||
This is set as the "user data" that is passed to the matching function, and
|
|
||||||
passed back when the callout function is invoked. Any value other than zero is
|
|
||||||
used as a return from <b>pcre2test</b>'s callout function.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
Inserting callouts can be helpful when using <b>pcre2test</b> to check
|
|
||||||
complicated regular expressions. For further information about callouts, see
|
|
||||||
the
|
|
||||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
|
||||||
documentation.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
The output for callouts with numerical arguments and those with string
|
|
||||||
arguments is slightly different.
|
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Callouts with numerical arguments
|
Callouts with numerical arguments
|
||||||
|
@ -1811,6 +1779,107 @@ example:
|
||||||
|
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Callout modifiers
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
The callout function in <b>pcre2test</b> returns zero (carry on matching) by
|
||||||
|
default, but you can use a <b>callout_fail</b> modifier in a subject line to
|
||||||
|
change this and other parameters of the callout (see below).
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If the <b>callout_capture</b> modifier is set, the current captured groups are
|
||||||
|
output when a callout occurs. This is useful only for non-DFA matching, as
|
||||||
|
<b>pcre2_dfa_match()</b> does not support capturing, so no captures are ever
|
||||||
|
shown.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The normal callout output, showing the callout number or pattern offset (as
|
||||||
|
described above) is suppressed if the <b>callout_no_where</b> modifier is set.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
When using the interpretive matching function <b>pcre2_match()</b> without JIT,
|
||||||
|
setting the <b>callout_extra</b> modifier causes additional output from
|
||||||
|
<b>pcre2test</b>'s callout function to be generated. For the first callout in a
|
||||||
|
match attempt at a new starting position in the subject, "New match attempt" is
|
||||||
|
output. If there has been a backtrack since the last callout (or start of
|
||||||
|
matching if this is the first callout), "Backtrack" is output, followed by "No
|
||||||
|
other matching paths" if the backtrack ended the previous match attempt. For
|
||||||
|
example:
|
||||||
|
<pre>
|
||||||
|
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||||
|
data> aac\=callout_extra
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
+3 ^ ^ )
|
||||||
|
+4 ^ ^ b
|
||||||
|
Backtrack
|
||||||
|
--->aac
|
||||||
|
+3 ^^ )
|
||||||
|
+4 ^^ b
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
+3 ^^ )
|
||||||
|
+4 ^^ b
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
No match
|
||||||
|
</pre>
|
||||||
|
Notice that various optimizations must be turned off if you want all possible
|
||||||
|
matching paths to be scanned. If <b>no_start_optimize</b> is not used, there is
|
||||||
|
an immediate "no match", without any callouts, because the starting
|
||||||
|
optimization fails to find "b" in the subject, which it knows must be present
|
||||||
|
for any match. If <b>no_auto_possess</b> is not used, the "a+" item is turned
|
||||||
|
into "a++", which reduces the number of backtracks.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The <b>callout_extra</b> modifier has no effect if used with the DFA matching
|
||||||
|
function, or with JIT.
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Return values from callouts
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
The default return from the callout function is zero, which allows matching to
|
||||||
|
continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
|
||||||
|
there is only one number, 1 is returned instead of 0 (causing matching to
|
||||||
|
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
|
||||||
|
are given, 1 is returned when callout <n> is reached and there have been at
|
||||||
|
least <m> callouts. The <b>callout_error</b> modifier is similar, except that
|
||||||
|
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
||||||
|
aborted. If both these modifiers are set for the same callout number,
|
||||||
|
<b>callout_error</b> takes precedence. Note that callouts with string arguments
|
||||||
|
are always given the number zero.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
|
||||||
|
This is set as the "user data" that is passed to the matching function, and
|
||||||
|
passed back when the callout function is invoked. Any value other than zero is
|
||||||
|
used as a return from <b>pcre2test</b>'s callout function.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Inserting callouts can be helpful when using <b>pcre2test</b> to check
|
||||||
|
complicated regular expressions. For further information about callouts, see
|
||||||
|
the
|
||||||
|
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||||
|
documentation.
|
||||||
|
</P>
|
||||||
<br><a name="SEC17" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
|
<br><a name="SEC17" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
|
||||||
<P>
|
<P>
|
||||||
When <b>pcre2test</b> is outputting text in the compiled version of a pattern,
|
When <b>pcre2test</b> is outputting text in the compiled version of a pattern,
|
||||||
|
@ -1913,7 +1982,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 17 October 2017
|
Last updated: 21 December 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
1394
doc/pcre2.txt
1394
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -3185,7 +3185,7 @@ subject string by setting either or both of \fIstartoffset\fP and an offset
|
||||||
limit. Here is a \fPpcre2test\fP example:
|
limit. Here is a \fPpcre2test\fP example:
|
||||||
.sp
|
.sp
|
||||||
/B/g,replace=!,use_offset_limit
|
/B/g,replace=!,use_offset_limit
|
||||||
ABC ABC ABC ABC\=offset=3,offset_limit=12
|
ABC ABC ABC ABC\e=offset=3,offset_limit=12
|
||||||
2: ABC A!C A!C ABC
|
2: ABC A!C A!C ABC
|
||||||
.sp
|
.sp
|
||||||
When continuing with global substitutions after matching a substring with zero
|
When continuing with global substitutions after matching a substring with zero
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2CALLOUT 3 "14 April 2017" "PCRE2 10.30"
|
.TH PCRE2CALLOUT 3 "22 December 2017" "PCRE2 10.31"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -191,20 +191,22 @@ callouts such as the example above are obeyed.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
During matching, when PCRE2 reaches a callout point, if an external function is
|
During matching, when PCRE2 reaches a callout point, if an external function is
|
||||||
provided in the match context, it is called. This applies to both normal and
|
provided in the match context, it is called. This applies to both normal,
|
||||||
DFA matching. The first argument to the callout function is a pointer to a
|
DFA, and JIT matching. The first argument to the callout function is a pointer
|
||||||
\fBpcre2_callout\fP block. The second argument is the void * callout data that
|
to a \fBpcre2_callout\fP block. The second argument is the void * callout data
|
||||||
was supplied when the callout was set up by calling \fBpcre2_set_callout()\fP
|
that was supplied when the callout was set up by calling
|
||||||
(see the
|
\fBpcre2_set_callout()\fP (see the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2api\fP
|
\fBpcre2api\fP
|
||||||
.\"
|
.\"
|
||||||
documentation). The callout block structure contains the following fields:
|
documentation). The callout block structure contains the following fields, not
|
||||||
|
necessarily in this order:
|
||||||
.sp
|
.sp
|
||||||
uint32_t \fIversion\fP;
|
uint32_t \fIversion\fP;
|
||||||
uint32_t \fIcallout_number\fP;
|
uint32_t \fIcallout_number\fP;
|
||||||
uint32_t \fIcapture_top\fP;
|
uint32_t \fIcapture_top\fP;
|
||||||
uint32_t \fIcapture_last\fP;
|
uint32_t \fIcapture_last\fP;
|
||||||
|
uint32_t \fIcallout_flags\fP;
|
||||||
PCRE2_SIZE *\fIoffset_vector\fP;
|
PCRE2_SIZE *\fIoffset_vector\fP;
|
||||||
PCRE2_SPTR \fImark\fP;
|
PCRE2_SPTR \fImark\fP;
|
||||||
PCRE2_SPTR \fIsubject\fP;
|
PCRE2_SPTR \fIsubject\fP;
|
||||||
|
@ -218,11 +220,12 @@ documentation). The callout block structure contains the following fields:
|
||||||
PCRE2_SPTR \fIcallout_string\fP;
|
PCRE2_SPTR \fIcallout_string\fP;
|
||||||
.sp
|
.sp
|
||||||
The \fIversion\fP field contains the version number of the block format. The
|
The \fIversion\fP field contains the version number of the block format. The
|
||||||
current version is 1; the three callout string fields were added for this
|
current version is 2; the three callout string fields were added for version 1,
|
||||||
version. If you are writing an application that might use an earlier release of
|
and the \fIcallout_flags\fP field for version 2. If you are writing an
|
||||||
PCRE2, you should check the version number before accessing any of these
|
application that might use an earlier release of PCRE2, you should check the
|
||||||
fields. The version number will increase in future if more fields are added,
|
version number before accessing any of these fields. The version number will
|
||||||
but the intention is never to remove any of the existing fields.
|
increase in future if more fields are added, but the intention is never to
|
||||||
|
remove any of the existing fields.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Fields for numerical callouts"
|
.SS "Fields for numerical callouts"
|
||||||
|
@ -331,6 +334,33 @@ the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
|
||||||
(*THEN) item in the match, or NULL if no such items have been passed. Instances
|
(*THEN) item in the match, or NULL if no such items have been passed. Instances
|
||||||
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
|
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
|
||||||
callouts from the DFA matching function this field always contains NULL.
|
callouts from the DFA matching function this field always contains NULL.
|
||||||
|
.P
|
||||||
|
The \fIcallout_flags\fP field is always zero in callouts from
|
||||||
|
\fBpcre2_dfa_match()\fP or when JIT is being used. When \fBpcre2_match()\fP
|
||||||
|
without JIT is used, the following bits may be set:
|
||||||
|
.sp
|
||||||
|
PCRE2_CALLOUT_STARTMATCH
|
||||||
|
.sp
|
||||||
|
This is set for the first callout after the start of matching for each new
|
||||||
|
starting position in the subject.
|
||||||
|
.sp
|
||||||
|
PCRE2_CALLOUT_BACKTRACK
|
||||||
|
.sp
|
||||||
|
This is set if there has been a matching backtrack since the previous callout,
|
||||||
|
or since the start of matching if this is the first callout from a
|
||||||
|
\fBpcre2_match()\fP run.
|
||||||
|
.P
|
||||||
|
Both bits are set when a backtrack has caused a "bumpalong" to a new starting
|
||||||
|
position in the subject. Output from \fBpcre2test\fP does not indicate the
|
||||||
|
presence of these bits unless the \fBcallout_extra\fP modifier is set.
|
||||||
|
.P
|
||||||
|
The information in the \fBcallout_flags\fP field is provided so that
|
||||||
|
applications can track and tell their users how matching with backtracking is
|
||||||
|
done. This can be useful when trying to optimize patterns, or just to
|
||||||
|
understand how PCRE2 works. There is no support in \fBpcre2_dfa_match()\fP
|
||||||
|
because there is no backtracking in DFA matching, and there is no support in
|
||||||
|
JIT because JIT is all about maximimizing matching performance. In both these
|
||||||
|
cases the \fBcallout_flags\fP field is always zero.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "RETURN VALUES FROM CALLOUTS"
|
.SH "RETURN VALUES FROM CALLOUTS"
|
||||||
|
@ -411,6 +441,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 14 April 2017
|
Last updated: 22 December 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
154
doc/pcre2test.1
154
doc/pcre2test.1
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "17 October 2017" "PCRE 10.31"
|
.TH PCRE2TEST 1 "21 Decbmber 2017" "PCRE 10.31"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -129,6 +129,11 @@ has not been built, this option causes an error.
|
||||||
Behave as if each pattern has the \fBauto_callout\fP modifier, that is, insert
|
Behave as if each pattern has the \fBauto_callout\fP modifier, that is, insert
|
||||||
automatic callouts into every pattern that is compiled.
|
automatic callouts into every pattern that is compiled.
|
||||||
.TP 10
|
.TP 10
|
||||||
|
\fB-AC\fP
|
||||||
|
As for \fB-ac\fP, but in addition behave as if each subject line has the
|
||||||
|
\fBcallout_extra\fP modifier, that is, show additional information from
|
||||||
|
callouts.
|
||||||
|
.TP 10
|
||||||
\fB-b\fP
|
\fB-b\fP
|
||||||
Behave as if each pattern has the \fBfullbincode\fP modifier; the full
|
Behave as if each pattern has the \fBfullbincode\fP modifier; the full
|
||||||
internal binary form of the pattern is output after compilation.
|
internal binary form of the pattern is output after compilation.
|
||||||
|
@ -203,8 +208,8 @@ successful compilation, each pattern is passed to the just-in-time compiler, if
|
||||||
available, and the use of JIT is verified.
|
available, and the use of JIT is verified.
|
||||||
.TP 10
|
.TP 10
|
||||||
\fB-LM\fP
|
\fB-LM\fP
|
||||||
List modifiers: write a list of available pattern and subject modifiers to the
|
List modifiers: write a list of available pattern and subject modifiers to the
|
||||||
standard output, then exit with zero exit code. All other options are ignored.
|
standard output, then exit with zero exit code. All other options are ignored.
|
||||||
If both -C and -LM are present, whichever is first is recognized.
|
If both -C and -LM are present, whichever is first is recognized.
|
||||||
.TP 10
|
.TP 10
|
||||||
\fB-pattern\fB \fImodifier-list\fP
|
\fB-pattern\fB \fImodifier-list\fP
|
||||||
|
@ -1152,6 +1157,7 @@ pattern.
|
||||||
callout_capture show captures at callout time
|
callout_capture show captures at callout time
|
||||||
callout_data=<n> set a value to pass via callouts
|
callout_data=<n> set a value to pass via callouts
|
||||||
callout_error=<n>[:<m>] control callout error
|
callout_error=<n>[:<m>] control callout error
|
||||||
|
callout_extra show extra callout information
|
||||||
callout_fail=<n>[:<m>] control callout failure
|
callout_fail=<n>[:<m>] control callout failure
|
||||||
callout_no_where do not show position of a callout
|
callout_no_where do not show position of a callout
|
||||||
callout_none do not supply a callout function
|
callout_none do not supply a callout function
|
||||||
|
@ -1664,45 +1670,10 @@ documentation.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
If the pattern contains any callout requests, \fBpcre2test\fP's callout
|
If the pattern contains any callout requests, \fBpcre2test\fP's callout
|
||||||
function is called during matching unless \fBcallout_none\fP is specified.
|
function is called during matching unless \fBcallout_none\fP is specified. This
|
||||||
This works with both matching functions.
|
works with both matching functions, and with JIT, though there are some
|
||||||
.P
|
differences in behaviour. The output for callouts with numerical arguments and
|
||||||
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
|
those with string arguments is slightly different.
|
||||||
default, but you can use a \fBcallout_fail\fP modifier in a subject line to
|
|
||||||
change this and other parameters of the callout.
|
|
||||||
.P
|
|
||||||
If \fBcallout_capture\fP is set, the current captured groups are output when a
|
|
||||||
callout occurs. By default, the callout function then generates output that
|
|
||||||
indicates where the current match start and matching points are in the subject,
|
|
||||||
and what the next pattern item is. This output is suppressed if the
|
|
||||||
\fBcallout_no_where\fP modifier is set.
|
|
||||||
.P
|
|
||||||
The default return from the callout function is zero, which allows matching to
|
|
||||||
continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
|
|
||||||
there is only one number, 1 is returned instead of 0 (causing matching to
|
|
||||||
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
|
|
||||||
are given, 1 is returned when callout <n> is reached and there have been at
|
|
||||||
least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
|
|
||||||
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
|
||||||
aborted. If both these modifiers are set for the same callout number,
|
|
||||||
\fBcallout_error\fP takes precedence. Note that callouts with string arguments
|
|
||||||
are always given the number zero. See
|
|
||||||
.P
|
|
||||||
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
|
|
||||||
This is set as the "user data" that is passed to the matching function, and
|
|
||||||
passed back when the callout function is invoked. Any value other than zero is
|
|
||||||
used as a return from \fBpcre2test\fP's callout function.
|
|
||||||
.P
|
|
||||||
Inserting callouts can be helpful when using \fBpcre2test\fP to check
|
|
||||||
complicated regular expressions. For further information about callouts, see
|
|
||||||
the
|
|
||||||
.\" HREF
|
|
||||||
\fBpcre2callout\fP
|
|
||||||
.\"
|
|
||||||
documentation.
|
|
||||||
.P
|
|
||||||
The output for callouts with numerical arguments and those with string
|
|
||||||
arguments is slightly different.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Callouts with numerical arguments"
|
.SS "Callouts with numerical arguments"
|
||||||
|
@ -1776,6 +1747,103 @@ example:
|
||||||
.sp
|
.sp
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SS "Callout modifiers"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
|
||||||
|
default, but you can use a \fBcallout_fail\fP modifier in a subject line to
|
||||||
|
change this and other parameters of the callout (see below).
|
||||||
|
.P
|
||||||
|
If the \fBcallout_capture\fP modifier is set, the current captured groups are
|
||||||
|
output when a callout occurs. This is useful only for non-DFA matching, as
|
||||||
|
\fBpcre2_dfa_match()\fP does not support capturing, so no captures are ever
|
||||||
|
shown.
|
||||||
|
.P
|
||||||
|
The normal callout output, showing the callout number or pattern offset (as
|
||||||
|
described above) is suppressed if the \fBcallout_no_where\fP modifier is set.
|
||||||
|
.P
|
||||||
|
When using the interpretive matching function \fBpcre2_match()\fP without JIT,
|
||||||
|
setting the \fBcallout_extra\fP modifier causes additional output from
|
||||||
|
\fBpcre2test\fP's callout function to be generated. For the first callout in a
|
||||||
|
match attempt at a new starting position in the subject, "New match attempt" is
|
||||||
|
output. If there has been a backtrack since the last callout (or start of
|
||||||
|
matching if this is the first callout), "Backtrack" is output, followed by "No
|
||||||
|
other matching paths" if the backtrack ended the previous match attempt. For
|
||||||
|
example:
|
||||||
|
.sp
|
||||||
|
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||||
|
data> aac\e=callout_extra
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
+3 ^ ^ )
|
||||||
|
+4 ^ ^ b
|
||||||
|
Backtrack
|
||||||
|
--->aac
|
||||||
|
+3 ^^ )
|
||||||
|
+4 ^^ b
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
+3 ^^ )
|
||||||
|
+4 ^^ b
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
No match
|
||||||
|
.sp
|
||||||
|
Notice that various optimizations must be turned off if you want all possible
|
||||||
|
matching paths to be scanned. If \fBno_start_optimize\fP is not used, there is
|
||||||
|
an immediate "no match", without any callouts, because the starting
|
||||||
|
optimization fails to find "b" in the subject, which it knows must be present
|
||||||
|
for any match. If \fBno_auto_possess\fP is not used, the "a+" item is turned
|
||||||
|
into "a++", which reduces the number of backtracks.
|
||||||
|
.P
|
||||||
|
The \fBcallout_extra\fP modifier has no effect if used with the DFA matching
|
||||||
|
function, or with JIT.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SS "Return values from callouts"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
The default return from the callout function is zero, which allows matching to
|
||||||
|
continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
|
||||||
|
there is only one number, 1 is returned instead of 0 (causing matching to
|
||||||
|
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
|
||||||
|
are given, 1 is returned when callout <n> is reached and there have been at
|
||||||
|
least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
|
||||||
|
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
||||||
|
aborted. If both these modifiers are set for the same callout number,
|
||||||
|
\fBcallout_error\fP takes precedence. Note that callouts with string arguments
|
||||||
|
are always given the number zero.
|
||||||
|
.P
|
||||||
|
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
|
||||||
|
This is set as the "user data" that is passed to the matching function, and
|
||||||
|
passed back when the callout function is invoked. Any value other than zero is
|
||||||
|
used as a return from \fBpcre2test\fP's callout function.
|
||||||
|
.P
|
||||||
|
Inserting callouts can be helpful when using \fBpcre2test\fP to check
|
||||||
|
complicated regular expressions. For further information about callouts, see
|
||||||
|
the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2callout\fP
|
||||||
|
.\"
|
||||||
|
documentation.
|
||||||
|
.
|
||||||
|
.
|
||||||
.
|
.
|
||||||
.SH "NON-PRINTING CHARACTERS"
|
.SH "NON-PRINTING CHARACTERS"
|
||||||
.rs
|
.rs
|
||||||
|
@ -1894,6 +1962,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 17 October 2017
|
Last updated: 21 December 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -120,6 +120,10 @@ COMMAND LINE OPTIONS
|
||||||
is, insert automatic callouts into every pattern that is com-
|
is, insert automatic callouts into every pattern that is com-
|
||||||
piled.
|
piled.
|
||||||
|
|
||||||
|
-AC As for -ac, but in addition behave as if each subject line
|
||||||
|
has the callout_extra modifier, that is, show additional
|
||||||
|
information from callouts.
|
||||||
|
|
||||||
-b Behave as if each pattern has the fullbincode modifier; the
|
-b Behave as if each pattern has the fullbincode modifier; the
|
||||||
full internal binary form of the pattern is output after com-
|
full internal binary form of the pattern is output after com-
|
||||||
pilation.
|
pilation.
|
||||||
|
@ -1056,6 +1060,7 @@ SUBJECT MODIFIERS
|
||||||
callout_capture show captures at callout time
|
callout_capture show captures at callout time
|
||||||
callout_data=<n> set a value to pass via callouts
|
callout_data=<n> set a value to pass via callouts
|
||||||
callout_error=<n>[:<m>] control callout error
|
callout_error=<n>[:<m>] control callout error
|
||||||
|
callout_extra show extra callout information
|
||||||
callout_fail=<n>[:<m>] control callout failure
|
callout_fail=<n>[:<m>] control callout failure
|
||||||
callout_no_where do not show position of a callout
|
callout_no_where do not show position of a callout
|
||||||
callout_none do not supply a callout function
|
callout_none do not supply a callout function
|
||||||
|
@ -1529,63 +1534,30 @@ RESTARTING AFTER A PARTIAL MATCH
|
||||||
CALLOUTS
|
CALLOUTS
|
||||||
|
|
||||||
If the pattern contains any callout requests, pcre2test's callout func-
|
If the pattern contains any callout requests, pcre2test's callout func-
|
||||||
tion is called during matching unless callout_none is specified. This
|
tion is called during matching unless callout_none is specified. This
|
||||||
works with both matching functions.
|
works with both matching functions, and with JIT, though there are some
|
||||||
|
differences in behaviour. The output for callouts with numerical argu-
|
||||||
The callout function in pcre2test returns zero (carry on matching) by
|
ments and those with string arguments is slightly different.
|
||||||
default, but you can use a callout_fail modifier in a subject line to
|
|
||||||
change this and other parameters of the callout.
|
|
||||||
|
|
||||||
If callout_capture is set, the current captured groups are output when
|
|
||||||
a callout occurs. By default, the callout function then generates out-
|
|
||||||
put that indicates where the current match start and matching points
|
|
||||||
are in the subject, and what the next pattern item is. This output is
|
|
||||||
suppressed if the callout_no_where modifier is set.
|
|
||||||
|
|
||||||
The default return from the callout function is zero, which allows
|
|
||||||
matching to continue. The callout_fail modifier can be given one or two
|
|
||||||
numbers. If there is only one number, 1 is returned instead of 0 (caus-
|
|
||||||
ing matching to backtrack) when a callout of that number is reached. If
|
|
||||||
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
|
|
||||||
reached and there have been at least <m> callouts. The callout_error
|
|
||||||
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
|
|
||||||
ing the entire matching process to be aborted. If both these modifiers
|
|
||||||
are set for the same callout number, callout_error takes precedence.
|
|
||||||
Note that callouts with string arguments are always given the number
|
|
||||||
zero. See
|
|
||||||
|
|
||||||
The callout_data modifier can be given an unsigned or a negative num-
|
|
||||||
ber. This is set as the "user data" that is passed to the matching
|
|
||||||
function, and passed back when the callout function is invoked. Any
|
|
||||||
value other than zero is used as a return from pcre2test's callout
|
|
||||||
function.
|
|
||||||
|
|
||||||
Inserting callouts can be helpful when using pcre2test to check compli-
|
|
||||||
cated regular expressions. For further information about callouts, see
|
|
||||||
the pcre2callout documentation.
|
|
||||||
|
|
||||||
The output for callouts with numerical arguments and those with string
|
|
||||||
arguments is slightly different.
|
|
||||||
|
|
||||||
Callouts with numerical arguments
|
Callouts with numerical arguments
|
||||||
|
|
||||||
By default, the callout function displays the callout number, the start
|
By default, the callout function displays the callout number, the start
|
||||||
and current positions in the subject text at the callout time, and the
|
and current positions in the subject text at the callout time, and the
|
||||||
next pattern item to be tested. For example:
|
next pattern item to be tested. For example:
|
||||||
|
|
||||||
--->pqrabcdef
|
--->pqrabcdef
|
||||||
0 ^ ^ \d
|
0 ^ ^ \d
|
||||||
|
|
||||||
This output indicates that callout number 0 occurred for a match
|
This output indicates that callout number 0 occurred for a match
|
||||||
attempt starting at the fourth character of the subject string, when
|
attempt starting at the fourth character of the subject string, when
|
||||||
the pointer was at the seventh character, and when the next pattern
|
the pointer was at the seventh character, and when the next pattern
|
||||||
item was \d. Just one circumflex is output if the start and current
|
item was \d. Just one circumflex is output if the start and current
|
||||||
positions are the same, or if the current position precedes the start
|
positions are the same, or if the current position precedes the start
|
||||||
position, which can happen if the callout is in a lookbehind assertion.
|
position, which can happen if the callout is in a lookbehind assertion.
|
||||||
|
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||||
a result of the auto_callout pattern modifier. In this case, instead of
|
a result of the auto_callout pattern modifier. In this case, instead of
|
||||||
showing the callout number, the offset in the pattern, preceded by a
|
showing the callout number, the offset in the pattern, preceded by a
|
||||||
plus, is output. For example:
|
plus, is output. For example:
|
||||||
|
|
||||||
re> /\d?[A-E]\*/auto_callout
|
re> /\d?[A-E]\*/auto_callout
|
||||||
|
@ -1598,7 +1570,7 @@ CALLOUTS
|
||||||
0: E*
|
0: E*
|
||||||
|
|
||||||
If a pattern contains (*MARK) items, an additional line is output when-
|
If a pattern contains (*MARK) items, an additional line is output when-
|
||||||
ever a change of latest mark is passed to the callout function. For
|
ever a change of latest mark is passed to the callout function. For
|
||||||
example:
|
example:
|
||||||
|
|
||||||
re> /a(*MARK:X)bc/auto_callout
|
re> /a(*MARK:X)bc/auto_callout
|
||||||
|
@ -1612,17 +1584,17 @@ CALLOUTS
|
||||||
+12 ^ ^
|
+12 ^ ^
|
||||||
0: abc
|
0: abc
|
||||||
|
|
||||||
The mark changes between matching "a" and "b", but stays the same for
|
The mark changes between matching "a" and "b", but stays the same for
|
||||||
the rest of the match, so nothing more is output. If, as a result of
|
the rest of the match, so nothing more is output. If, as a result of
|
||||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||||
output.
|
output.
|
||||||
|
|
||||||
Callouts with string arguments
|
Callouts with string arguments
|
||||||
|
|
||||||
The output for a callout with a string argument is similar, except that
|
The output for a callout with a string argument is similar, except that
|
||||||
instead of outputting a callout number before the position indicators,
|
instead of outputting a callout number before the position indicators,
|
||||||
the callout string and its offset in the pattern string are output
|
the callout string and its offset in the pattern string are output
|
||||||
before the reflection of the subject string, and the subject string is
|
before the reflection of the subject string, and the subject string is
|
||||||
reflected for each callout. For example:
|
reflected for each callout. For example:
|
||||||
|
|
||||||
re> /^ab(?C'first')cd(?C"second")ef/
|
re> /^ab(?C'first')cd(?C"second")ef/
|
||||||
|
@ -1636,6 +1608,100 @@ CALLOUTS
|
||||||
0: abcdef
|
0: abcdef
|
||||||
|
|
||||||
|
|
||||||
|
Callout modifiers
|
||||||
|
|
||||||
|
The callout function in pcre2test returns zero (carry on matching) by
|
||||||
|
default, but you can use a callout_fail modifier in a subject line to
|
||||||
|
change this and other parameters of the callout (see below).
|
||||||
|
|
||||||
|
If the callout_capture modifier is set, the current captured groups are
|
||||||
|
output when a callout occurs. This is useful only for non-DFA matching,
|
||||||
|
as pcre2_dfa_match() does not support capturing, so no captures are
|
||||||
|
ever shown.
|
||||||
|
|
||||||
|
The normal callout output, showing the callout number or pattern offset
|
||||||
|
(as described above) is suppressed if the callout_no_where modifier is
|
||||||
|
set.
|
||||||
|
|
||||||
|
When using the interpretive matching function pcre2_match() without
|
||||||
|
JIT, setting the callout_extra modifier causes additional output from
|
||||||
|
pcre2test's callout function to be generated. For the first callout in
|
||||||
|
a match attempt at a new starting position in the subject, "New match
|
||||||
|
attempt" is output. If there has been a backtrack since the last call-
|
||||||
|
out (or start of matching if this is the first callout), "Backtrack" is
|
||||||
|
output, followed by "No other matching paths" if the backtrack ended
|
||||||
|
the previous match attempt. For example:
|
||||||
|
|
||||||
|
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||||
|
data> aac\=callout_extra
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
+3 ^ ^ )
|
||||||
|
+4 ^ ^ b
|
||||||
|
Backtrack
|
||||||
|
--->aac
|
||||||
|
+3 ^^ )
|
||||||
|
+4 ^^ b
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
+3 ^^ )
|
||||||
|
+4 ^^ b
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+0 ^ (
|
||||||
|
+1 ^ a+
|
||||||
|
No match
|
||||||
|
|
||||||
|
Notice that various optimizations must be turned off if you want all
|
||||||
|
possible matching paths to be scanned. If no_start_optimize is not
|
||||||
|
used, there is an immediate "no match", without any callouts, because
|
||||||
|
the starting optimization fails to find "b" in the subject, which it
|
||||||
|
knows must be present for any match. If no_auto_possess is not used,
|
||||||
|
the "a+" item is turned into "a++", which reduces the number of back-
|
||||||
|
tracks.
|
||||||
|
|
||||||
|
The callout_extra modifier has no effect if used with the DFA matching
|
||||||
|
function, or with JIT.
|
||||||
|
|
||||||
|
Return values from callouts
|
||||||
|
|
||||||
|
The default return from the callout function is zero, which allows
|
||||||
|
matching to continue. The callout_fail modifier can be given one or two
|
||||||
|
numbers. If there is only one number, 1 is returned instead of 0 (caus-
|
||||||
|
ing matching to backtrack) when a callout of that number is reached. If
|
||||||
|
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
|
||||||
|
reached and there have been at least <m> callouts. The callout_error
|
||||||
|
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
|
||||||
|
ing the entire matching process to be aborted. If both these modifiers
|
||||||
|
are set for the same callout number, callout_error takes precedence.
|
||||||
|
Note that callouts with string arguments are always given the number
|
||||||
|
zero.
|
||||||
|
|
||||||
|
The callout_data modifier can be given an unsigned or a negative num-
|
||||||
|
ber. This is set as the "user data" that is passed to the matching
|
||||||
|
function, and passed back when the callout function is invoked. Any
|
||||||
|
value other than zero is used as a return from pcre2test's callout
|
||||||
|
function.
|
||||||
|
|
||||||
|
Inserting callouts can be helpful when using pcre2test to check compli-
|
||||||
|
cated regular expressions. For further information about callouts, see
|
||||||
|
the pcre2callout documentation.
|
||||||
|
|
||||||
|
|
||||||
NON-PRINTING CHARACTERS
|
NON-PRINTING CHARACTERS
|
||||||
|
|
||||||
When pcre2test is outputting text in the compiled version of a pattern,
|
When pcre2test is outputting text in the compiled version of a pattern,
|
||||||
|
@ -1733,5 +1799,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 17 October 2017
|
Last updated: 21 December 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
|
|
|
@ -494,6 +494,11 @@ without changing the API of the function, thereby allowing old clients to work
|
||||||
without modification. Define the generic version in a macro; the width-specific
|
without modification. Define the generic version in a macro; the width-specific
|
||||||
versions are generated from this macro below. */
|
versions are generated from this macro below. */
|
||||||
|
|
||||||
|
/* Flags for the callout_flags field. These are cleared after a callout. */
|
||||||
|
|
||||||
|
#define PCRE2_CALLOUT_STARTMATCH 0x00000001u /* Set for each bumpalong */
|
||||||
|
#define PCRE2_CALLOUT_BACKTRACK 0x00000002u /* Set after a backtrack */
|
||||||
|
|
||||||
#define PCRE2_STRUCTURE_LIST \
|
#define PCRE2_STRUCTURE_LIST \
|
||||||
typedef struct pcre2_callout_block { \
|
typedef struct pcre2_callout_block { \
|
||||||
uint32_t version; /* Identifies version of block */ \
|
uint32_t version; /* Identifies version of block */ \
|
||||||
|
@ -513,6 +518,8 @@ typedef struct pcre2_callout_block { \
|
||||||
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
|
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
|
||||||
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
|
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
|
||||||
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
|
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
|
||||||
|
/* ------------------- Added for Version 2 -------------------------- */ \
|
||||||
|
uint32_t callout_flags; /* See above for list */ \
|
||||||
/* ------------------------------------------------------------------ */ \
|
/* ------------------------------------------------------------------ */ \
|
||||||
} pcre2_callout_block; \
|
} pcre2_callout_block; \
|
||||||
\
|
\
|
||||||
|
|
|
@ -494,6 +494,11 @@ without changing the API of the function, thereby allowing old clients to work
|
||||||
without modification. Define the generic version in a macro; the width-specific
|
without modification. Define the generic version in a macro; the width-specific
|
||||||
versions are generated from this macro below. */
|
versions are generated from this macro below. */
|
||||||
|
|
||||||
|
/* Flags for the callout_flags field. These are cleared after a callout. */
|
||||||
|
|
||||||
|
#define PCRE2_CALLOUT_STARTMATCH 0x00000001u /* Set for each bumpalong */
|
||||||
|
#define PCRE2_CALLOUT_BACKTRACK 0x00000002u /* Set after a backtrack */
|
||||||
|
|
||||||
#define PCRE2_STRUCTURE_LIST \
|
#define PCRE2_STRUCTURE_LIST \
|
||||||
typedef struct pcre2_callout_block { \
|
typedef struct pcre2_callout_block { \
|
||||||
uint32_t version; /* Identifies version of block */ \
|
uint32_t version; /* Identifies version of block */ \
|
||||||
|
@ -513,6 +518,8 @@ typedef struct pcre2_callout_block { \
|
||||||
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
|
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
|
||||||
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
|
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
|
||||||
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
|
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
|
||||||
|
/* ------------------- Added for Version 2 -------------------------- */ \
|
||||||
|
uint32_t callout_flags; /* See above for list */ \
|
||||||
/* ------------------------------------------------------------------ */ \
|
/* ------------------------------------------------------------------ */ \
|
||||||
} pcre2_callout_block; \
|
} pcre2_callout_block; \
|
||||||
\
|
\
|
||||||
|
|
|
@ -2574,7 +2574,8 @@ for (;;)
|
||||||
if (mb->callout != NULL)
|
if (mb->callout != NULL)
|
||||||
{
|
{
|
||||||
pcre2_callout_block cb;
|
pcre2_callout_block cb;
|
||||||
cb.version = 1;
|
cb.version = 2;
|
||||||
|
cb.callout_flags = 0;
|
||||||
cb.capture_top = 1;
|
cb.capture_top = 1;
|
||||||
cb.capture_last = 0;
|
cb.capture_last = 0;
|
||||||
cb.offset_vector = offsets;
|
cb.offset_vector = offsets;
|
||||||
|
@ -2943,7 +2944,8 @@ for (;;)
|
||||||
if (mb->callout != NULL)
|
if (mb->callout != NULL)
|
||||||
{
|
{
|
||||||
pcre2_callout_block cb;
|
pcre2_callout_block cb;
|
||||||
cb.version = 1;
|
cb.version = 2;
|
||||||
|
cb.callout_flags = 0;
|
||||||
cb.capture_top = 1;
|
cb.capture_top = 1;
|
||||||
cb.capture_last = 0;
|
cb.capture_last = 0;
|
||||||
cb.offset_vector = offsets;
|
cb.offset_vector = offsets;
|
||||||
|
|
|
@ -7952,7 +7952,8 @@ oveccount = callout_block->capture_top;
|
||||||
|
|
||||||
SLJIT_ASSERT(oveccount >= 1);
|
SLJIT_ASSERT(oveccount >= 1);
|
||||||
|
|
||||||
callout_block->version = 1;
|
callout_block->version = 2;
|
||||||
|
callout_block->callout_flags = 0;
|
||||||
|
|
||||||
/* Offsets in subject. */
|
/* Offsets in subject. */
|
||||||
callout_block->subject_length = arguments->end - arguments->begin;
|
callout_block->subject_length = arguments->end - arguments->begin;
|
||||||
|
|
|
@ -321,6 +321,7 @@ callout_ovector[0] = callout_ovector[1] = PCRE2_UNSET;
|
||||||
rc = mb->callout(cb, mb->callout_data);
|
rc = mb->callout(cb, mb->callout_data);
|
||||||
callout_ovector[0] = save0;
|
callout_ovector[0] = save0;
|
||||||
callout_ovector[1] = save1;
|
callout_ovector[1] = save1;
|
||||||
|
cb->callout_flags = 0;
|
||||||
return rc;
|
return rc;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -5919,8 +5920,9 @@ in rrc. */
|
||||||
#define LBL(val) case val: goto L_RM##val;
|
#define LBL(val) case val: goto L_RM##val;
|
||||||
|
|
||||||
RETURN_SWITCH:
|
RETURN_SWITCH:
|
||||||
if (Frdepth == 0) return rrc; /* Exit from the top level */
|
if (Frdepth == 0) return rrc; /* Exit from the top level */
|
||||||
F = (heapframe *)((char *)F - Fback_frame); /* Back track */
|
F = (heapframe *)((char *)F - Fback_frame); /* Back track */
|
||||||
|
mb->cb->callout_flags |= PCRE2_CALLOUT_BACKTRACK; /* Note for callouts */
|
||||||
|
|
||||||
#ifdef DEBUG_SHOW_RMATCH
|
#ifdef DEBUG_SHOW_RMATCH
|
||||||
fprintf(stderr, "++ RETURN %d to %d\n", rrc, Freturn_id);
|
fprintf(stderr, "++ RETURN %d to %d\n", rrc, Freturn_id);
|
||||||
|
@ -6171,13 +6173,14 @@ startline = (re->flags & PCRE2_STARTLINE) != 0;
|
||||||
bumpalong_limit = (mcontext->offset_limit == PCRE2_UNSET)?
|
bumpalong_limit = (mcontext->offset_limit == PCRE2_UNSET)?
|
||||||
end_subject : subject + mcontext->offset_limit;
|
end_subject : subject + mcontext->offset_limit;
|
||||||
|
|
||||||
/* Set up the fixed fields in the callout block, with a pointer in the
|
/* Initialize and set up the fixed fields in the callout block, with a pointer
|
||||||
match block. */
|
in the match block. */
|
||||||
|
|
||||||
mb->cb = &cb;
|
mb->cb = &cb;
|
||||||
cb.version = 1;
|
cb.version = 2;
|
||||||
cb.subject = subject;
|
cb.subject = subject;
|
||||||
cb.subject_length = (PCRE2_SIZE)(end_subject - subject);
|
cb.subject_length = (PCRE2_SIZE)(end_subject - subject);
|
||||||
|
cb.callout_flags = 0;
|
||||||
|
|
||||||
/* Fill in the remaining fields in the match block. */
|
/* Fill in the remaining fields in the match block. */
|
||||||
|
|
||||||
|
@ -6644,6 +6647,8 @@ for(;;)
|
||||||
first starting point for which a partial match was found. */
|
first starting point for which a partial match was found. */
|
||||||
|
|
||||||
cb.start_match = (PCRE2_SIZE)(start_match - subject);
|
cb.start_match = (PCRE2_SIZE)(start_match - subject);
|
||||||
|
cb.callout_flags |= PCRE2_CALLOUT_STARTMATCH;
|
||||||
|
|
||||||
mb->start_used_ptr = start_match;
|
mb->start_used_ptr = start_match;
|
||||||
mb->last_used_ptr = start_match;
|
mb->last_used_ptr = start_match;
|
||||||
mb->match_call_count = 0;
|
mb->match_call_count = 0;
|
||||||
|
|
|
@ -485,6 +485,7 @@ so many of them that they are split into two fields. */
|
||||||
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
|
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
|
||||||
#define CTL2_SUBJECT_LITERAL 0x00000010u
|
#define CTL2_SUBJECT_LITERAL 0x00000010u
|
||||||
#define CTL2_CALLOUT_NO_WHERE 0x00000020u
|
#define CTL2_CALLOUT_NO_WHERE 0x00000020u
|
||||||
|
#define CTL2_CALLOUT_EXTRA 0x00000040u
|
||||||
|
|
||||||
#define CTL2_NL_SET 0x40000000u /* Informational */
|
#define CTL2_NL_SET 0x40000000u /* Informational */
|
||||||
#define CTL2_BSR_SET 0x80000000u /* Informational */
|
#define CTL2_BSR_SET 0x80000000u /* Informational */
|
||||||
|
@ -598,6 +599,7 @@ static modstruct modlist[] = {
|
||||||
{ "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
|
{ "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
|
||||||
{ "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) },
|
{ "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) },
|
||||||
{ "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) },
|
{ "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) },
|
||||||
|
{ "callout_extra", MOD_DAT, MOD_CTL, CTL2_CALLOUT_EXTRA, DO(control2) },
|
||||||
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
|
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
|
||||||
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
|
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
|
||||||
{ "callout_no_where", MOD_DAT, MOD_CTL, CTL2_CALLOUT_NO_WHERE, DO(control2) },
|
{ "callout_no_where", MOD_DAT, MOD_CTL, CTL2_CALLOUT_NO_WHERE, DO(control2) },
|
||||||
|
@ -3971,7 +3973,7 @@ Returns: nothing
|
||||||
static void
|
static void
|
||||||
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||||
before,
|
before,
|
||||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||||
|
@ -3981,6 +3983,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
|
||||||
((controls & CTL_BINCODE) != 0)? " bincode" : "",
|
((controls & CTL_BINCODE) != 0)? " bincode" : "",
|
||||||
((controls2 & CTL2_BSR_SET) != 0)? " bsr" : "",
|
((controls2 & CTL2_BSR_SET) != 0)? " bsr" : "",
|
||||||
((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "",
|
((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "",
|
||||||
|
((controls2 & CTL2_CALLOUT_EXTRA) != 0)? " callout_extra" : "",
|
||||||
((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "",
|
((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "",
|
||||||
((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "",
|
((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "",
|
||||||
((controls2 & CTL2_CALLOUT_NO_WHERE) != 0)? " callout_no_where" : "",
|
((controls2 & CTL2_CALLOUT_NO_WHERE) != 0)? " callout_no_where" : "",
|
||||||
|
@ -4409,7 +4412,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)
|
||||||
|
|
||||||
pattern_info(PCRE2_INFO_ARGOPTIONS, &compile_options, FALSE);
|
pattern_info(PCRE2_INFO_ARGOPTIONS, &compile_options, FALSE);
|
||||||
pattern_info(PCRE2_INFO_ALLOPTIONS, &overall_options, FALSE);
|
pattern_info(PCRE2_INFO_ALLOPTIONS, &overall_options, FALSE);
|
||||||
pattern_info(PCRE2_INFO_EXTRAOPTIONS, &extra_options, FALSE);
|
pattern_info(PCRE2_INFO_EXTRAOPTIONS, &extra_options, FALSE);
|
||||||
|
|
||||||
/* Remove UTF/UCP if they were there only because of forbid_utf. This saves
|
/* Remove UTF/UCP if they were there only because of forbid_utf. This saves
|
||||||
cluttering up the verification output of non-UTF test files. */
|
cluttering up the verification output of non-UTF test files. */
|
||||||
|
@ -4436,9 +4439,9 @@ if ((pat_patctl.control & CTL_INFO) != 0)
|
||||||
show_compile_options(overall_options, "Overall options:", "\n");
|
show_compile_options(overall_options, "Overall options:", "\n");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if (extra_options != 0)
|
if (extra_options != 0)
|
||||||
show_compile_extra_options(extra_options, "Extra options:", "\n");
|
show_compile_extra_options(extra_options, "Extra options:", "\n");
|
||||||
|
|
||||||
if (jchanged) fprintf(outfile, "Duplicate name status changes\n");
|
if (jchanged) fprintf(outfile, "Duplicate name status changes\n");
|
||||||
|
|
||||||
|
@ -5842,17 +5845,43 @@ Return:
|
||||||
static int
|
static int
|
||||||
callout_function(pcre2_callout_block_8 *cb, void *callout_data_ptr)
|
callout_function(pcre2_callout_block_8 *cb, void *callout_data_ptr)
|
||||||
{
|
{
|
||||||
|
FILE *f, *fdefault;
|
||||||
uint32_t i, pre_start, post_start, subject_length;
|
uint32_t i, pre_start, post_start, subject_length;
|
||||||
PCRE2_SIZE current_position;
|
PCRE2_SIZE current_position;
|
||||||
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
|
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
|
||||||
BOOL callout_capture = (dat_datctl.control & CTL_CALLOUT_CAPTURE) != 0;
|
BOOL callout_capture = (dat_datctl.control & CTL_CALLOUT_CAPTURE) != 0;
|
||||||
BOOL callout_where = (dat_datctl.control2 & CTL2_CALLOUT_NO_WHERE) == 0;
|
BOOL callout_where = (dat_datctl.control2 & CTL2_CALLOUT_NO_WHERE) == 0;
|
||||||
|
|
||||||
/* This FILE is used for echoing the subject. This is done only once in simple
|
/* The FILE f is used for echoing the subject string if it is non-NULL. This
|
||||||
cases. */
|
happens only once in simple cases, but we want to repeat after any additional
|
||||||
|
output caused by CALLOUT_EXTRA. */
|
||||||
|
|
||||||
FILE *f = (first_callout || callout_capture || cb->callout_string != NULL)?
|
fdefault = (!first_callout && !callout_capture && cb->callout_string == NULL)?
|
||||||
outfile : NULL;
|
NULL : outfile;
|
||||||
|
|
||||||
|
if ((dat_datctl.control2 & CTL2_CALLOUT_EXTRA) != 0)
|
||||||
|
{
|
||||||
|
f = outfile;
|
||||||
|
switch (cb->callout_flags)
|
||||||
|
{
|
||||||
|
case PCRE2_CALLOUT_BACKTRACK:
|
||||||
|
fprintf(f, "Backtrack\n");
|
||||||
|
break;
|
||||||
|
|
||||||
|
case PCRE2_CALLOUT_STARTMATCH|PCRE2_CALLOUT_BACKTRACK:
|
||||||
|
fprintf(f, "Backtrack\nNo other matching paths\n");
|
||||||
|
/* Fall through */
|
||||||
|
|
||||||
|
case PCRE2_CALLOUT_STARTMATCH:
|
||||||
|
fprintf(f, "New match attempt\n");
|
||||||
|
break;
|
||||||
|
|
||||||
|
default:
|
||||||
|
f = fdefault;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else f = fdefault;
|
||||||
|
|
||||||
/* For a callout with a string argument, show the string first because there
|
/* For a callout with a string argument, show the string first because there
|
||||||
isn't a tidy way to fit it in the rest of the data. */
|
isn't a tidy way to fit it in the rest of the data. */
|
||||||
|
@ -5902,7 +5931,6 @@ lengths of the substrings. */
|
||||||
|
|
||||||
if (callout_where)
|
if (callout_where)
|
||||||
{
|
{
|
||||||
|
|
||||||
if (f != NULL) fprintf(f, "--->");
|
if (f != NULL) fprintf(f, "--->");
|
||||||
|
|
||||||
/* The subject before the match start. */
|
/* The subject before the match start. */
|
||||||
|
@ -5931,9 +5959,10 @@ if (callout_where)
|
||||||
|
|
||||||
if (f != NULL) fprintf(f, "\n");
|
if (f != NULL) fprintf(f, "\n");
|
||||||
|
|
||||||
/* For automatic callouts, show the pattern offset. Otherwise, for a numerical
|
/* For automatic callouts, show the pattern offset. Otherwise, for a
|
||||||
callout whose number has not already been shown with captured strings, show the
|
numerical callout whose number has not already been shown with captured
|
||||||
number here. A callout with a string argument has been displayed above. */
|
strings, show the number here. A callout with a string argument has been
|
||||||
|
displayed above. */
|
||||||
|
|
||||||
if (cb->callout_number == 255)
|
if (cb->callout_number == 255)
|
||||||
{
|
{
|
||||||
|
@ -5963,6 +5992,8 @@ if (callout_where)
|
||||||
if (cb->next_item_length != 0)
|
if (cb->next_item_length != 0)
|
||||||
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
|
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
|
||||||
pbuffer8 + cb->pattern_position);
|
pbuffer8 + cb->pattern_position);
|
||||||
|
else
|
||||||
|
fprintf(outfile, "End of pattern");
|
||||||
|
|
||||||
fprintf(outfile, "\n");
|
fprintf(outfile, "\n");
|
||||||
}
|
}
|
||||||
|
@ -7685,7 +7716,8 @@ printf(" -16 use the 16-bit library\n");
|
||||||
#ifdef SUPPORT_PCRE2_32
|
#ifdef SUPPORT_PCRE2_32
|
||||||
printf(" -32 use the 32-bit library\n");
|
printf(" -32 use the 32-bit library\n");
|
||||||
#endif
|
#endif
|
||||||
printf(" -ac set default pattern option PCRE2_AUTO_CALLOUT\n");
|
printf(" -ac set default pattern modifier PCRE2_AUTO_CALLOUT\n");
|
||||||
|
printf(" -AC as -ac, but also set subject 'callout_extra' modifier\n");
|
||||||
printf(" -b set default pattern modifier 'fullbincode'\n");
|
printf(" -b set default pattern modifier 'fullbincode'\n");
|
||||||
printf(" -C show PCRE2 compile-time options and exit\n");
|
printf(" -C show PCRE2 compile-time options and exit\n");
|
||||||
printf(" -C arg show a specific compile-time option and exit with its\n");
|
printf(" -C arg show a specific compile-time option and exit with its\n");
|
||||||
|
@ -8181,6 +8213,11 @@ while (argc > 1 && argv[op][0] == '-' && argv[op][1] != 0)
|
||||||
|
|
||||||
/* Set some common pattern and subject controls */
|
/* Set some common pattern and subject controls */
|
||||||
|
|
||||||
|
else if (strcmp(arg, "-AC") == 0)
|
||||||
|
{
|
||||||
|
def_patctl.options |= PCRE2_AUTO_CALLOUT;
|
||||||
|
def_datctl.control2 |= CTL2_CALLOUT_EXTRA;
|
||||||
|
}
|
||||||
else if (strcmp(arg, "-ac") == 0) def_patctl.options |= PCRE2_AUTO_CALLOUT;
|
else if (strcmp(arg, "-ac") == 0) def_patctl.options |= PCRE2_AUTO_CALLOUT;
|
||||||
else if (strcmp(arg, "-b") == 0) def_patctl.control |= CTL_FULLBINCODE;
|
else if (strcmp(arg, "-b") == 0) def_patctl.control |= CTL_FULLBINCODE;
|
||||||
else if (strcmp(arg, "-d") == 0) def_patctl.control |= CTL_DEBUG;
|
else if (strcmp(arg, "-d") == 0) def_patctl.control |= CTL_DEBUG;
|
||||||
|
|
|
@ -5383,6 +5383,16 @@ a)"xI
|
||||||
|
|
||||||
"(?=(a))\1?b"I
|
"(?=(a))\1?b"I
|
||||||
ab
|
ab
|
||||||
aaab
|
aaab
|
||||||
|
|
||||||
|
# JIT does not support callout_extra
|
||||||
|
|
||||||
|
/(*NO_JIT)(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||||
|
\= Expect no match
|
||||||
|
aac\=callout_extra
|
||||||
|
|
||||||
|
/(*NO_JIT)a+(?C'XXX')b/no_start_optimize,no_auto_possess
|
||||||
|
\= Expect no match
|
||||||
|
aac\=callout_extra
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -361,12 +361,12 @@ Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||||
Subject length lower bound = 1
|
Subject length lower bound = 1
|
||||||
abc\=callout_fail=1
|
abc\=callout_fail=1
|
||||||
--->abc
|
--->abc
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
1 ^^
|
1 ^^ End of pattern
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
1 ^^
|
1 ^^ End of pattern
|
||||||
1 ^^
|
1 ^^ End of pattern
|
||||||
No match
|
No match
|
||||||
|
|
||||||
/(*NO_AUTO_POSSESS)\w+(?C1)/BI
|
/(*NO_AUTO_POSSESS)\w+(?C1)/BI
|
||||||
|
@ -385,12 +385,12 @@ Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||||
Subject length lower bound = 1
|
Subject length lower bound = 1
|
||||||
abc\=callout_fail=1
|
abc\=callout_fail=1
|
||||||
--->abc
|
--->abc
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
1 ^^
|
1 ^^ End of pattern
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
1 ^^
|
1 ^^ End of pattern
|
||||||
1 ^^
|
1 ^^ End of pattern
|
||||||
No match
|
No match
|
||||||
|
|
||||||
# This test breaks the JIT stack limit
|
# This test breaks the JIT stack limit
|
||||||
|
|
|
@ -3832,7 +3832,7 @@ Subject length lower bound = 2
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
abbbbbccc\=callout_data=1
|
abbbbbccc\=callout_data=1
|
||||||
--->abbbbbccc
|
--->abbbbbccc
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
Callout data = 1
|
Callout data = 1
|
||||||
No match
|
No match
|
||||||
|
|
||||||
|
@ -3844,21 +3844,21 @@ Subject length lower bound = 2
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
abbbbbccc\=callout_data=1
|
abbbbbccc\=callout_data=1
|
||||||
--->abbbbbccc
|
--->abbbbbccc
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
Callout data = 1
|
Callout data = 1
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
Callout data = 1
|
Callout data = 1
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
Callout data = 1
|
Callout data = 1
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
Callout data = 1
|
Callout data = 1
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
Callout data = 1
|
Callout data = 1
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
Callout data = 1
|
Callout data = 1
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
Callout data = 1
|
Callout data = 1
|
||||||
1 ^ ^
|
1 ^ ^ End of pattern
|
||||||
Callout data = 1
|
Callout data = 1
|
||||||
No match
|
No match
|
||||||
|
|
||||||
|
@ -4718,7 +4718,7 @@ Subject length lower bound = 5
|
||||||
+2 ^ ^ c
|
+2 ^ ^ c
|
||||||
+3 ^ ^ d
|
+3 ^ ^ d
|
||||||
+4 ^ ^ e
|
+4 ^ ^ e
|
||||||
+5 ^ ^
|
+5 ^ ^ End of pattern
|
||||||
0: abcde
|
0: abcde
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
abcdfe
|
abcdfe
|
||||||
|
@ -4750,13 +4750,13 @@ Subject length lower bound = 1
|
||||||
--->ab
|
--->ab
|
||||||
+0 ^ a*
|
+0 ^ a*
|
||||||
+2 ^^ b
|
+2 ^^ b
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: ab
|
0: ab
|
||||||
aaaab
|
aaaab
|
||||||
--->aaaab
|
--->aaaab
|
||||||
+0 ^ a*
|
+0 ^ a*
|
||||||
+2 ^ ^ b
|
+2 ^ ^ b
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: aaaab
|
0: aaaab
|
||||||
aaaacb
|
aaaacb
|
||||||
--->aaaacb
|
--->aaaacb
|
||||||
|
@ -4770,7 +4770,7 @@ Subject length lower bound = 1
|
||||||
+2 ^^ b
|
+2 ^^ b
|
||||||
+0 ^ a*
|
+0 ^ a*
|
||||||
+2 ^ b
|
+2 ^ b
|
||||||
+3 ^^
|
+3 ^^ End of pattern
|
||||||
0: b
|
0: b
|
||||||
|
|
||||||
/a*b/IB,auto_callout
|
/a*b/IB,auto_callout
|
||||||
|
@ -4793,13 +4793,13 @@ Subject length lower bound = 1
|
||||||
--->ab
|
--->ab
|
||||||
+0 ^ a*
|
+0 ^ a*
|
||||||
+2 ^^ b
|
+2 ^^ b
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: ab
|
0: ab
|
||||||
aaaab
|
aaaab
|
||||||
--->aaaab
|
--->aaaab
|
||||||
+0 ^ a*
|
+0 ^ a*
|
||||||
+2 ^ ^ b
|
+2 ^ ^ b
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: aaaab
|
0: aaaab
|
||||||
aaaacb
|
aaaacb
|
||||||
--->aaaacb
|
--->aaaacb
|
||||||
|
@ -4813,7 +4813,7 @@ Subject length lower bound = 1
|
||||||
+2 ^^ b
|
+2 ^^ b
|
||||||
+0 ^ a*
|
+0 ^ a*
|
||||||
+2 ^ b
|
+2 ^ b
|
||||||
+3 ^^
|
+3 ^^ End of pattern
|
||||||
0: b
|
0: b
|
||||||
|
|
||||||
/a+b/IB,auto_callout
|
/a+b/IB,auto_callout
|
||||||
|
@ -4836,13 +4836,13 @@ Subject length lower bound = 2
|
||||||
--->ab
|
--->ab
|
||||||
+0 ^ a+
|
+0 ^ a+
|
||||||
+2 ^^ b
|
+2 ^^ b
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: ab
|
0: ab
|
||||||
aaaab
|
aaaab
|
||||||
--->aaaab
|
--->aaaab
|
||||||
+0 ^ a+
|
+0 ^ a+
|
||||||
+2 ^ ^ b
|
+2 ^ ^ b
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: aaaab
|
0: aaaab
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
aaaacb
|
aaaacb
|
||||||
|
@ -4897,7 +4897,7 @@ Subject length lower bound = 4
|
||||||
+3 ^ ^ c
|
+3 ^ ^ c
|
||||||
+4 ^ ^ |
|
+4 ^ ^ |
|
||||||
+9 ^ ^ x
|
+9 ^ ^ x
|
||||||
+10 ^ ^
|
+10 ^ ^ End of pattern
|
||||||
0: abcx
|
0: abcx
|
||||||
1: abc
|
1: abc
|
||||||
defx
|
defx
|
||||||
|
@ -4909,7 +4909,7 @@ Subject length lower bound = 4
|
||||||
+7 ^ ^ f
|
+7 ^ ^ f
|
||||||
+8 ^ ^ )
|
+8 ^ ^ )
|
||||||
+9 ^ ^ x
|
+9 ^ ^ x
|
||||||
+10 ^ ^
|
+10 ^ ^ End of pattern
|
||||||
0: defx
|
0: defx
|
||||||
1: def
|
1: def
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
|
@ -4971,7 +4971,7 @@ Subject length lower bound = 4
|
||||||
+3 ^ ^ c
|
+3 ^ ^ c
|
||||||
+4 ^ ^ |
|
+4 ^ ^ |
|
||||||
+9 ^ ^ x
|
+9 ^ ^ x
|
||||||
+10 ^ ^
|
+10 ^ ^ End of pattern
|
||||||
0: abcx
|
0: abcx
|
||||||
1: abc
|
1: abc
|
||||||
defx
|
defx
|
||||||
|
@ -4983,7 +4983,7 @@ Subject length lower bound = 4
|
||||||
+7 ^ ^ f
|
+7 ^ ^ f
|
||||||
+8 ^ ^ )
|
+8 ^ ^ )
|
||||||
+9 ^ ^ x
|
+9 ^ ^ x
|
||||||
+10 ^ ^
|
+10 ^ ^ End of pattern
|
||||||
0: defx
|
0: defx
|
||||||
1: def
|
1: def
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
|
@ -5024,7 +5024,7 @@ Subject length lower bound = 6
|
||||||
+3 ^ ^ |
|
+3 ^ ^ |
|
||||||
+1 ^ ^ a
|
+1 ^ ^ a
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+12 ^ ^
|
+12 ^ ^ End of pattern
|
||||||
0: ababab
|
0: ababab
|
||||||
1: ab
|
1: ab
|
||||||
abcdabcd
|
abcdabcd
|
||||||
|
@ -5044,7 +5044,7 @@ Subject length lower bound = 6
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^ d
|
+5 ^ ^ d
|
||||||
+6 ^ ^ ){3,4}
|
+6 ^ ^ ){3,4}
|
||||||
+12 ^ ^
|
+12 ^ ^ End of pattern
|
||||||
0: abcdabcd
|
0: abcdabcd
|
||||||
1: cd
|
1: cd
|
||||||
abcdcdcdcdcd
|
abcdcdcdcdcd
|
||||||
|
@ -5065,7 +5065,7 @@ Subject length lower bound = 6
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^ d
|
+5 ^ ^ d
|
||||||
+6 ^ ^ ){3,4}
|
+6 ^ ^ ){3,4}
|
||||||
+12 ^ ^
|
+12 ^ ^ End of pattern
|
||||||
0: abcdcdcd
|
0: abcdcdcd
|
||||||
1: cd
|
1: cd
|
||||||
|
|
||||||
|
@ -5276,7 +5276,7 @@ Subject length lower bound = 11
|
||||||
+21 ^ ^ 1
|
+21 ^ ^ 1
|
||||||
+22 ^ ^ 2
|
+22 ^ ^ 2
|
||||||
+23 ^ ^ 3
|
+23 ^ ^ 3
|
||||||
+24 ^ ^
|
+24 ^ ^ End of pattern
|
||||||
0: aacaacaacaacaac123
|
0: aacaacaacaacaac123
|
||||||
1: aac
|
1: aac
|
||||||
|
|
||||||
|
@ -8900,7 +8900,7 @@ Subject length lower bound = 0
|
||||||
+7 ^ b
|
+7 ^ b
|
||||||
+11 ^ ^
|
+11 ^ ^
|
||||||
+12 ^ )
|
+12 ^ )
|
||||||
+13 ^
|
+13 ^ End of pattern
|
||||||
0:
|
0:
|
||||||
abc
|
abc
|
||||||
--->abc
|
--->abc
|
||||||
|
@ -8921,7 +8921,7 @@ Subject length lower bound = 0
|
||||||
+8 ^^ )
|
+8 ^^ )
|
||||||
+9 ^ b
|
+9 ^ b
|
||||||
+10 ^^ |
|
+10 ^^ |
|
||||||
+13 ^^
|
+13 ^^ End of pattern
|
||||||
0: b
|
0: b
|
||||||
|
|
||||||
/(?(?=b).*b|^d)/I
|
/(?(?=b).*b|^d)/I
|
||||||
|
@ -8938,14 +8938,14 @@ Subject length lower bound = 1
|
||||||
+0 ^ x
|
+0 ^ x
|
||||||
+1 ^^ y
|
+1 ^^ y
|
||||||
+2 ^ ^ z
|
+2 ^ ^ z
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: xyz
|
0: xyz
|
||||||
abcxyz
|
abcxyz
|
||||||
--->abcxyz
|
--->abcxyz
|
||||||
+0 ^ x
|
+0 ^ x
|
||||||
+1 ^^ y
|
+1 ^^ y
|
||||||
+2 ^ ^ z
|
+2 ^ ^ z
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: xyz
|
0: xyz
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
abc
|
abc
|
||||||
|
@ -8962,7 +8962,7 @@ No match
|
||||||
+0 ^ x
|
+0 ^ x
|
||||||
+1 ^^ y
|
+1 ^^ y
|
||||||
+2 ^ ^ z
|
+2 ^ ^ z
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: xyz
|
0: xyz
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
abc
|
abc
|
||||||
|
@ -8996,7 +8996,7 @@ No match
|
||||||
+15 ^ x
|
+15 ^ x
|
||||||
+16 ^^ y
|
+16 ^^ y
|
||||||
+17 ^ ^ z
|
+17 ^ ^ z
|
||||||
+18 ^ ^
|
+18 ^ ^ End of pattern
|
||||||
0: xyz
|
0: xyz
|
||||||
|
|
||||||
/(*NO_AUTO_POSSESS)a+b/B
|
/(*NO_AUTO_POSSESS)a+b/B
|
||||||
|
@ -9017,7 +9017,7 @@ No match
|
||||||
+0 ^ x
|
+0 ^ x
|
||||||
+1 ^^ y
|
+1 ^^ y
|
||||||
+2 ^ ^ z
|
+2 ^ ^ z
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: xyz
|
0: xyz
|
||||||
|
|
||||||
/^"((?(?=[a])[^"])|b)*"$/auto_callout
|
/^"((?(?=[a])[^"])|b)*"$/auto_callout
|
||||||
|
@ -9046,7 +9046,7 @@ No match
|
||||||
+17 ^ ^ |
|
+17 ^ ^ |
|
||||||
+21 ^ ^ "
|
+21 ^ ^ "
|
||||||
+22 ^ ^ $
|
+22 ^ ^ $
|
||||||
+23 ^ ^
|
+23 ^ ^ End of pattern
|
||||||
0: "ab"
|
0: "ab"
|
||||||
1:
|
1:
|
||||||
|
|
||||||
|
@ -11136,7 +11136,7 @@ Latest Mark: A
|
||||||
+10 ^ ^ |
|
+10 ^ ^ |
|
||||||
+18 ^ ^ z
|
+18 ^ ^ z
|
||||||
+19 ^ ^ |
|
+19 ^ ^ |
|
||||||
+24 ^ ^
|
+24 ^ ^ End of pattern
|
||||||
0: adz
|
0: adz
|
||||||
1: adz
|
1: adz
|
||||||
2: d
|
2: d
|
||||||
|
@ -11155,7 +11155,7 @@ Latest Mark: A
|
||||||
Latest Mark: B
|
Latest Mark: B
|
||||||
+18 ^ ^ z
|
+18 ^ ^ z
|
||||||
+19 ^ ^ |
|
+19 ^ ^ |
|
||||||
+24 ^ ^
|
+24 ^ ^ End of pattern
|
||||||
0: aez
|
0: aez
|
||||||
1: aez
|
1: aez
|
||||||
2: e
|
2: e
|
||||||
|
@ -11177,7 +11177,7 @@ Latest Mark: B
|
||||||
+21 ^^ e
|
+21 ^^ e
|
||||||
+22 ^ ^ q
|
+22 ^ ^ q
|
||||||
+23 ^ ^ )
|
+23 ^ ^ )
|
||||||
+24 ^ ^
|
+24 ^ ^ End of pattern
|
||||||
0: aeq
|
0: aeq
|
||||||
1: aeq
|
1: aeq
|
||||||
|
|
||||||
|
@ -11951,7 +11951,7 @@ Partial match: 123a
|
||||||
+11 ^ b
|
+11 ^ b
|
||||||
+12 ^^ b
|
+12 ^^ b
|
||||||
+13 ^ ^ )
|
+13 ^ ^ )
|
||||||
+14 ^ ^
|
+14 ^ ^ End of pattern
|
||||||
0: bb
|
0: bb
|
||||||
|
|
||||||
/(?C1)^(?C2)(?(?C99)(?=(?C3)a(?C4))(?C5)a(?C6)a(?C7)|(?C8)b(?C9)b(?C10))(?C11)/
|
/(?C1)^(?C2)(?(?C99)(?=(?C3)a(?C4))(?C5)a(?C6)a(?C7)|(?C8)b(?C9)b(?C10))(?C11)/
|
||||||
|
@ -11964,7 +11964,7 @@ Partial match: 123a
|
||||||
8 ^ b
|
8 ^ b
|
||||||
9 ^^ b
|
9 ^^ b
|
||||||
10 ^ ^ )
|
10 ^ ^ )
|
||||||
11 ^ ^
|
11 ^ ^ End of pattern
|
||||||
0: bb
|
0: bb
|
||||||
|
|
||||||
# Perl seems to have a bug with this one.
|
# Perl seems to have a bug with this one.
|
||||||
|
@ -15144,7 +15144,7 @@ Subject length lower bound = 0
|
||||||
+0 ^ (
|
+0 ^ (
|
||||||
+1 ^ )\Q\E*
|
+1 ^ )\Q\E*
|
||||||
+7 ^ ]
|
+7 ^ ]
|
||||||
+8 ^^
|
+8 ^^ End of pattern
|
||||||
0: ]
|
0: ]
|
||||||
1:
|
1:
|
||||||
|
|
||||||
|
@ -15428,7 +15428,7 @@ Failed: error 125 at offset 13: lookbehind assertion is not fixed length
|
||||||
+0 ^ a
|
+0 ^ a
|
||||||
+1 ^^ b
|
+1 ^^ b
|
||||||
1 ^ ^ c
|
1 ^ ^ c
|
||||||
+8 ^ ^
|
+8 ^ ^ End of pattern
|
||||||
0: abc
|
0: abc
|
||||||
|
|
||||||
/'ab(?C1)c'/hex,auto_callout
|
/'ab(?C1)c'/hex,auto_callout
|
||||||
|
@ -15437,7 +15437,7 @@ Failed: error 125 at offset 13: lookbehind assertion is not fixed length
|
||||||
+0 ^ a
|
+0 ^ a
|
||||||
+1 ^^ b
|
+1 ^^ b
|
||||||
1 ^ ^ c
|
1 ^ ^ c
|
||||||
+8 ^ ^
|
+8 ^ ^ End of pattern
|
||||||
0: abc
|
0: abc
|
||||||
|
|
||||||
# Perl accepts these, but gives a warning. We can't warn, so give an error.
|
# Perl accepts these, but gives a warning. We can't warn, so give an error.
|
||||||
|
@ -16256,7 +16256,7 @@ Failed: error 192 at offset 0: invalid option bits with PCRE2_LITERAL
|
||||||
+2 ^ ^ b
|
+2 ^ ^ b
|
||||||
+3 ^ ^ (
|
+3 ^ ^ (
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^
|
+5 ^ ^ End of pattern
|
||||||
0: a\b(c
|
0: a\b(c
|
||||||
|
|
||||||
/a\b(c/literal,auto_callout
|
/a\b(c/literal,auto_callout
|
||||||
|
@ -16267,7 +16267,7 @@ Failed: error 192 at offset 0: invalid option bits with PCRE2_LITERAL
|
||||||
+2 ^ ^ b
|
+2 ^ ^ b
|
||||||
+3 ^ ^ (
|
+3 ^ ^ (
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^
|
+5 ^ ^ End of pattern
|
||||||
0: a\b(c
|
0: a\b(c
|
||||||
|
|
||||||
/(*CR)abc/literal
|
/(*CR)abc/literal
|
||||||
|
@ -16380,9 +16380,65 @@ Subject length lower bound = 1
|
||||||
ab
|
ab
|
||||||
0: ab
|
0: ab
|
||||||
1: a
|
1: a
|
||||||
aaab
|
aaab
|
||||||
0: ab
|
0: ab
|
||||||
1: a
|
1: a
|
||||||
|
|
||||||
|
# JIT does not support callout_extra
|
||||||
|
|
||||||
|
/(*NO_JIT)(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||||
|
\= Expect no match
|
||||||
|
aac\=callout_extra
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+9 ^ (
|
||||||
|
+10 ^ a+
|
||||||
|
+12 ^ ^ )
|
||||||
|
+13 ^ ^ b
|
||||||
|
Backtrack
|
||||||
|
--->aac
|
||||||
|
+12 ^^ )
|
||||||
|
+13 ^^ b
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+9 ^ (
|
||||||
|
+10 ^ a+
|
||||||
|
+12 ^^ )
|
||||||
|
+13 ^^ b
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+9 ^ (
|
||||||
|
+10 ^ a+
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
--->aac
|
||||||
|
+9 ^ (
|
||||||
|
+10 ^ a+
|
||||||
|
No match
|
||||||
|
|
||||||
|
/(*NO_JIT)a+(?C'XXX')b/no_start_optimize,no_auto_possess
|
||||||
|
\= Expect no match
|
||||||
|
aac\=callout_extra
|
||||||
|
New match attempt
|
||||||
|
Callout (15): 'XXX'
|
||||||
|
--->aac
|
||||||
|
^ ^ b
|
||||||
|
Backtrack
|
||||||
|
Callout (15): 'XXX'
|
||||||
|
--->aac
|
||||||
|
^^ b
|
||||||
|
Backtrack
|
||||||
|
No other matching paths
|
||||||
|
New match attempt
|
||||||
|
Callout (15): 'XXX'
|
||||||
|
--->aac
|
||||||
|
^^ b
|
||||||
|
No match
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
Error -65: PCRE2_ERROR_BADDATA (unknown error number)
|
Error -65: PCRE2_ERROR_BADDATA (unknown error number)
|
||||||
|
|
|
@ -3763,7 +3763,7 @@ No match
|
||||||
abcd
|
abcd
|
||||||
--->abcd
|
--->abcd
|
||||||
+0 ^ \w+
|
+0 ^ \w+
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: abcd
|
0: abcd
|
||||||
|
|
||||||
/[\p{N}]?+/B,no_auto_possess
|
/[\p{N}]?+/B,no_auto_possess
|
||||||
|
@ -4165,7 +4165,7 @@ Failed: error 125 at offset 2: lookbehind assertion is not fixed length
|
||||||
+0 ^ .
|
+0 ^ .
|
||||||
+0 ^ .
|
+0 ^ .
|
||||||
+1 ^ ^ .
|
+1 ^ ^ .
|
||||||
+2 ^ ^
|
+2 ^ ^ End of pattern
|
||||||
0: \x{123}\x{123}
|
0: \x{123}\x{123}
|
||||||
|
|
||||||
# This tests processing wide characters in extended mode.
|
# This tests processing wide characters in extended mode.
|
||||||
|
|
|
@ -726,7 +726,7 @@ No match
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+2 ^ ^ b
|
+2 ^ ^ b
|
||||||
+3 ^ ^ |
|
+3 ^ ^ |
|
||||||
+12 ^ ^
|
+12 ^ ^ End of pattern
|
||||||
+1 ^ ^ a
|
+1 ^ ^ a
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
0: ababab
|
0: ababab
|
||||||
|
@ -745,12 +745,12 @@ No match
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+2 ^ ^ b
|
+2 ^ ^ b
|
||||||
+3 ^ ^ |
|
+3 ^ ^ |
|
||||||
+12 ^ ^
|
+12 ^ ^ End of pattern
|
||||||
+1 ^ ^ a
|
+1 ^ ^ a
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^ d
|
+5 ^ ^ d
|
||||||
+6 ^ ^ ){3,4}
|
+6 ^ ^ ){3,4}
|
||||||
+12 ^ ^
|
+12 ^ ^ End of pattern
|
||||||
0: abcdabcd
|
0: abcdabcd
|
||||||
1: abcdab
|
1: abcdab
|
||||||
abcdcdcdcdcd
|
abcdcdcdcdcd
|
||||||
|
@ -768,12 +768,12 @@ No match
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^ d
|
+5 ^ ^ d
|
||||||
+6 ^ ^ ){3,4}
|
+6 ^ ^ ){3,4}
|
||||||
+12 ^ ^
|
+12 ^ ^ End of pattern
|
||||||
+1 ^ ^ a
|
+1 ^ ^ a
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^ d
|
+5 ^ ^ d
|
||||||
+6 ^ ^ ){3,4}
|
+6 ^ ^ ){3,4}
|
||||||
+12 ^ ^
|
+12 ^ ^ End of pattern
|
||||||
0: abcdcdcd
|
0: abcdcdcd
|
||||||
1: abcdcd
|
1: abcdcd
|
||||||
|
|
||||||
|
@ -6610,14 +6610,14 @@ No match
|
||||||
+0 ^ x
|
+0 ^ x
|
||||||
+1 ^^ y
|
+1 ^^ y
|
||||||
+2 ^ ^ z
|
+2 ^ ^ z
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: xyz
|
0: xyz
|
||||||
abcxyz
|
abcxyz
|
||||||
--->abcxyz
|
--->abcxyz
|
||||||
+0 ^ x
|
+0 ^ x
|
||||||
+1 ^^ y
|
+1 ^^ y
|
||||||
+2 ^ ^ z
|
+2 ^ ^ z
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: xyz
|
0: xyz
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
abc
|
abc
|
||||||
|
@ -6634,7 +6634,7 @@ No match
|
||||||
+0 ^ x
|
+0 ^ x
|
||||||
+1 ^^ y
|
+1 ^^ y
|
||||||
+2 ^ ^ z
|
+2 ^ ^ z
|
||||||
+3 ^ ^
|
+3 ^ ^ End of pattern
|
||||||
0: xyz
|
0: xyz
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
abc
|
abc
|
||||||
|
@ -6668,7 +6668,7 @@ No match
|
||||||
+15 ^ x
|
+15 ^ x
|
||||||
+16 ^^ y
|
+16 ^^ y
|
||||||
+17 ^ ^ z
|
+17 ^ ^ z
|
||||||
+18 ^ ^
|
+18 ^ ^ End of pattern
|
||||||
0: xyz
|
0: xyz
|
||||||
|
|
||||||
/(?C)ab/
|
/(?C)ab/
|
||||||
|
@ -6684,7 +6684,7 @@ No match
|
||||||
--->ab
|
--->ab
|
||||||
+0 ^ a
|
+0 ^ a
|
||||||
+1 ^^ b
|
+1 ^^ b
|
||||||
+2 ^ ^
|
+2 ^ ^ End of pattern
|
||||||
0: ab
|
0: ab
|
||||||
ab\=callout_none
|
ab\=callout_none
|
||||||
0: ab
|
0: ab
|
||||||
|
@ -6717,7 +6717,7 @@ No match
|
||||||
+8 ^ [a]
|
+8 ^ [a]
|
||||||
+17 ^ ^ |
|
+17 ^ ^ |
|
||||||
+22 ^ ^ $
|
+22 ^ ^ $
|
||||||
+23 ^ ^
|
+23 ^ ^ End of pattern
|
||||||
0: "ab"
|
0: "ab"
|
||||||
"ab"\=callout_none
|
"ab"\=callout_none
|
||||||
0: "ab"
|
0: "ab"
|
||||||
|
|
Loading…
Reference in New Issue