Add callout_flags to callout blocks, and set bits within it from pcre2_match()

interpretation.
This commit is contained in:
Philip.Hazel 2017-12-22 15:56:27 +00:00
parent 814cc96bc5
commit 94d5f4a050
24 changed files with 1896 additions and 1402 deletions

View File

@ -89,6 +89,12 @@ and set its never-changing fields once only.
compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS
to retrieve them, and update pcre2test to show them.
22. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new
field callout_flags in callout blocks. The bits are set by pcre2_match(), but
not by JIT or pcre2_dfa_match(). Their settings are shown in pcre2test callouts
if the callout_extra subject modifier is set. These bits are provided to help
with tracking how a backtracking match is proceeding.
Version 10.30 14-August-2017
----------------------------

View File

@ -30,7 +30,13 @@ DESCRIPTION
<P>
This function matches a compiled regular expression against a given subject
string, using a matching algorithm that is similar to Perl's. It returns
offsets to captured substrings. Its arguments are:
offsets to what it has matched and to captured substrings via the
<b>match_data</b> block, which can be processed by functions with names that
start with <b>pcre2_get_ovector_...()</b> or <b>pcre2_substring_...()</b>. The
return from <b>pcre2_match()</b> is one more than the highest numbered capturing
pair that has been set (for example, 1 if there are no captures), zero if the
vector of offsets is too small, or a negative error code for no match and other
errors. The function arguments are:
<pre>
<i>code</i> Points to the compiled pattern
<i>subject</i> Points to the subject string

View File

@ -27,7 +27,7 @@ DESCRIPTION
<P>
This function returns information about a compiled pattern. Its arguments are:
<pre>
<i>code</i> Pointer to a compiled regular expression
<i>code</i> Pointer to a compiled regular expression pattern
<i>what</i> What information is required
<i>where</i> Where to put the information
</pre>
@ -42,6 +42,8 @@ request are as follows:
PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
PCRE2_INFO_CAPTURECOUNT Number of capturing subpatterns
PCRE2_INFO_DEPTHLIMIT Backtracking depth limit if set, otherwise PCRE2_ERROR_UNSET
PCRE2_INFO_EXTRAOPTIONS Extra options that were passed in the
compile context
PCRE2_INFO_FIRSTBITMAP Bitmap of first code units, or NULL
PCRE2_INFO_FIRSTCODETYPE Type of start-of-match information
0 nothing set

View File

@ -920,11 +920,15 @@ The <i>offset_limit</i> parameter limits how far an unanchored search can
advance in the subject string. The default value is PCRE2_UNSET. The
<b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> functions return
PCRE2_ERROR_NOMATCH if a match with a starting point before or at the given
offset is not found. For example, if the pattern /abc/ is matched against
"123abc" with an offset limit less than 3, the result is PCRE2_ERROR_NO_MATCH.
A match can never be found if the <i>startoffset</i> argument of
<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> is greater than the offset
limit.
offset is not found. The <b>pcre2_substitute()</b> function makes no more
substitutions.
</P>
<P>
For example, if the pattern /abc/ is matched against "123abc" with an offset
limit less than 3, the result is PCRE2_ERROR_NO_MATCH. A match can never be
found if the <i>startoffset</i> argument of <b>pcre2_match()</b>,
<b>pcre2_dfa_match()</b>, or <b>pcre2_substitute()</b> is greater than the offset
limit set in the match context.
</P>
<P>
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
@ -934,10 +938,11 @@ PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
</P>
<P>
The offset limit facility can be used to track progress when searching large
subject strings. See also the PCRE2_FIRSTLINE option, which requires a match to
start within the first line of the subject. If this is set with an offset
limit, a match must occur in the first line and also within the offset limit.
In other words, whichever limit comes first is used.
subject strings or to limit the extent of global substitutions. See also the
PCRE2_FIRSTLINE option, which requires a match to start within the first line
of the subject. If this is set with an offset limit, a match must occur in the
first line and also within the offset limit. In other words, whichever limit
comes first is used.
<br>
<br>
<b>int pcre2_set_heap_limit(pcre2_match_context *<i>mcontext</i>,</b>
@ -1940,12 +1945,15 @@ are as follows:
<pre>
PCRE2_INFO_ALLOPTIONS
PCRE2_INFO_ARGOPTIONS
PCRE2_INFO_EXTRAOPTIONS
</pre>
Return a copy of the pattern's options. The third argument should point to a
Return copies of the pattern's options. The third argument should point to a
<b>uint32_t</b> variable. PCRE2_INFO_ARGOPTIONS returns exactly the options that
were passed to <b>pcre2_compile()</b>, whereas PCRE2_INFO_ALLOPTIONS returns
the compile options as modified by any top-level (*XXX) option settings such as
(*UTF) at the start of the pattern itself.
(*UTF) at the start of the pattern itself. PCRE2_INFO_EXTRAOPTIONS returns the
extra options that were set in the compile context by calling the
pcre2_set_compile_extra_options() function.
</P>
<P>
For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED
@ -3157,13 +3165,27 @@ options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
</P>
<P>
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
replacing every matching substring. If this is not set, only the first matching
substring is replaced. If any matched substring has zero length, after the
substitution has happened, an attempt to find a non-empty match at the same
position is performed. If this is not successful, the current position is
advanced by one character except when CRLF is a valid newline sequence and the
next two characters are CR, LF. In this case, the current position is advanced
by two characters.
replacing every matching substring. If this option is not set, only the first
matching substring is replaced. The search for matches takes place in the
original subject string (that is, previous replacements do not affect it).
Iteration is implemented by advancing the <i>startoffset</i> value for each
search, which is always passed the entire subject string. If an offset limit is
set in the match context, searching stops when that limit is reached.
</P>
<P>
You can restrict the effect of a global substitution to a portion of the
subject string by setting either or both of <i>startoffset</i> and an offset
limit. Here is a \fPpcre2test\fP example:
<pre>
/B/g,replace=!,use_offset_limit
ABC ABC ABC ABC\=offset=3,offset_limit=12
2: ABC A!C A!C ABC
</pre>
When continuing with global substitutions after matching a substring with zero
length, an attempt to find a non-empty match at the same offset is performed.
If this is not successful, the offset is advanced by one character except when
CRLF is a valid newline sequence and the next two characters are CR, LF. In
this case, the offset is advanced by two characters.
</P>
<P>
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
@ -3398,7 +3420,7 @@ Here is an example of a simple call to <b>pcre2_dfa_match()</b>:
11, /* the length of the subject string */
0, /* start at offset 0 in the subject */
0, /* default options */
match_data, /* the match data block */
md, /* the match data block */
NULL, /* a match context; NULL means use defaults */
wspace, /* working space vector */
20); /* number of elements (NOT size in bytes) */
@ -3567,7 +3589,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 13 October 2017
Last updated: 16 December 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -206,18 +206,20 @@ callouts such as the example above are obeyed.
<br><a name="SEC4" href="#TOC1">THE CALLOUT INTERFACE</a><br>
<P>
During matching, when PCRE2 reaches a callout point, if an external function is
provided in the match context, it is called. This applies to both normal and
DFA matching. The first argument to the callout function is a pointer to a
<b>pcre2_callout</b> block. The second argument is the void * callout data that
was supplied when the callout was set up by calling <b>pcre2_set_callout()</b>
(see the
provided in the match context, it is called. This applies to both normal,
DFA, and JIT matching. The first argument to the callout function is a pointer
to a <b>pcre2_callout</b> block. The second argument is the void * callout data
that was supplied when the callout was set up by calling
<b>pcre2_set_callout()</b> (see the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation). The callout block structure contains the following fields:
documentation). The callout block structure contains the following fields, not
necessarily in this order:
<pre>
uint32_t <i>version</i>;
uint32_t <i>callout_number</i>;
uint32_t <i>capture_top</i>;
uint32_t <i>capture_last</i>;
uint32_t <i>callout_flags</i>;
PCRE2_SIZE *<i>offset_vector</i>;
PCRE2_SPTR <i>mark</i>;
PCRE2_SPTR <i>subject</i>;
@ -231,11 +233,12 @@ documentation). The callout block structure contains the following fields:
PCRE2_SPTR <i>callout_string</i>;
</pre>
The <i>version</i> field contains the version number of the block format. The
current version is 1; the three callout string fields were added for this
version. If you are writing an application that might use an earlier release of
PCRE2, you should check the version number before accessing any of these
fields. The version number will increase in future if more fields are added,
but the intention is never to remove any of the existing fields.
current version is 2; the three callout string fields were added for version 1,
and the <i>callout_flags</i> field for version 2. If you are writing an
application that might use an earlier release of PCRE2, you should check the
version number before accessing any of these fields. The version number will
increase in future if more fields are added, but the intention is never to
remove any of the existing fields.
</P>
<br><b>
Fields for numerical callouts
@ -358,6 +361,36 @@ the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
callouts from the DFA matching function this field always contains NULL.
</P>
<P>
The <i>callout_flags</i> field is always zero in callouts from
<b>pcre2_dfa_match()</b> or when JIT is being used. When <b>pcre2_match()</b>
without JIT is used, the following bits may be set:
<pre>
PCRE2_CALLOUT_STARTMATCH
</pre>
This is set for the first callout after the start of matching for each new
starting position in the subject.
<pre>
PCRE2_CALLOUT_BACKTRACK
</pre>
This is set if there has been a matching backtrack since the previous callout,
or since the start of matching if this is the first callout from a
<b>pcre2_match()</b> run.
</P>
<P>
Both bits are set when a backtrack has caused a "bumpalong" to a new starting
position in the subject. Output from <b>pcre2test</b> does not indicate the
presence of these bits unless the <b>callout_extra</b> modifier is set.
</P>
<P>
The information in the <b>callout_flags</b> field is provided so that
applications can track and tell their users how matching with backtracking is
done. This can be useful when trying to optimize patterns, or just to
understand how PCRE2 works. There is no support in <b>pcre2_dfa_match()</b>
because there is no backtracking in DFA matching, and there is no support in
JIT because JIT is all about maximimizing matching performance. In both these
cases the <b>callout_flags</b> field is always zero.
</P>
<br><a name="SEC5" href="#TOC1">RETURN VALUES FROM CALLOUTS</a><br>
<P>
The external callout function returns an integer to PCRE2. If the value is
@ -428,7 +461,7 @@ Cambridge, England.
</P>
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
<P>
Last updated: 14 April 2017
Last updated: 22 December 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -133,11 +133,13 @@ The <b>--locale</b> option can be used to override this.
<br><a name="SEC3" href="#TOC1">SUPPORT FOR COMPRESSED FILES</a><br>
<P>
It is possible to compile <b>pcre2grep</b> so that it uses <b>libz</b> or
<b>libbz2</b> to read files whose names end in <b>.gz</b> or <b>.bz2</b>,
respectively. You can find out whether your binary has support for one or both
of these file types by running it with the <b>--help</b> option. If the
appropriate support is not present, files are treated as plain text. The
standard input is always so treated.
<b>libbz2</b> to read compressed files whose names end in <b>.gz</b> or
<b>.bz2</b>, respectively. You can find out whether your <b>pcre2grep</b> binary
has support for one or both of these file types by running it with the
<b>--help</b> option. If the appropriate support is not present, all files are
treated as plain text. The standard input is always so treated. When input is
from a compressed .gz or .bz2 file, the <b>--line-buffered</b> option is
ignored.
</P>
<br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
<P>
@ -151,7 +153,7 @@ of changing the way binary files are handled.
<br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
<P>
The order in which some of the options appear can affect the output. For
example, both the <b>-h</b> and <b>-l</b> options affect the printing of file
example, both the <b>-H</b> and <b>-l</b> options affect the printing of file
names. Whichever comes later in the command line will be the one that takes
effect. Similarly, except where noted below, if an option is given twice, the
later setting is used. Numerical values for options may be followed by K or M,
@ -396,14 +398,16 @@ searching a single file. By default, the file name is not shown in this case.
For matching lines, the file name is followed by a colon; for context lines, a
hyphen separator is used. If a line number is also being output, it follows the
file name. When the <b>-M</b> option causes a pattern to match more than one
line, only the first is preceded by the file name.
line, only the first is preceded by the file name. This option overrides any
previous <b>-h</b>, <b>-l</b>, or <b>-L</b> options.
</P>
<P>
<b>-h</b>, <b>--no-filename</b>
Suppress the output file names when searching multiple files. By default,
file names are shown when multiple files are searched. For matching lines, the
file name is followed by a colon; for context lines, a hyphen separator is used.
If a line number is also being output, it follows the file name.
If a line number is also being output, it follows the file name. This option
overrides any previous <b>-H</b>, <b>-L</b>, or <b>-l</b> options.
</P>
<P>
<b>--heap-limit</b>=<i>number</i>
@ -460,17 +464,19 @@ given any number of times. If a directory matches both <b>--include-dir</b> and
<b>-L</b>, <b>--files-without-match</b>
Instead of outputting lines from the files, just output the names of the files
that do not contain any lines that would have been output. Each file name is
output once, on a separate line.
output once, on a separate line. This option overrides any previous <b>-H</b>,
<b>-h</b>, or <b>-l</b> options.
</P>
<P>
<b>-l</b>, <b>--files-with-matches</b>
Instead of outputting lines from the files, just output the names of the files
containing lines that would have been output. Each file name is output
once, on a separate line. Searching normally stops as soon as a matching line
is found in a file. However, if the <b>-c</b> (count) option is also used,
matching continues in order to obtain the correct count, and those files that
have at least one match are listed along with their counts. Using this option
with <b>-c</b> is a way of suppressing the listing of files with no matches.
containing lines that would have been output. Each file name is output once, on
a separate line. Searching normally stops as soon as a matching line is found
in a file. However, if the <b>-c</b> (count) option is also used, matching
continues in order to obtain the correct count, and those files that have at
least one match are listed along with their counts. Using this option with
<b>-c</b> is a way of suppressing the listing of files with no matches. This
opeion overrides any previous <b>-H</b>, <b>-h</b>, or <b>-L</b> options.
</P>
<P>
<b>--label</b>=<i>name</i>
@ -480,14 +486,16 @@ short form for this option.
</P>
<P>
<b>--line-buffered</b>
When this option is given, input is read and processed line by line, and the
output is flushed after each write. By default, input is read in large chunks,
unless <b>pcre2grep</b> can determine that it is reading from a terminal (which
is currently possible only in Unix-like environments). Output to terminal is
normally automatically flushed by the operating system. This option can be
useful when the input or output is attached to a pipe and you do not want
<b>pcre2grep</b> to buffer up large amounts of data. However, its use will
affect performance, and the <b>-M</b> (multiline) option ceases to work.
When this option is given, non-compressed input is read and processed line by
line, and the output is flushed after each write. By default, input is read in
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
terminal (which is currently possible only in Unix-like environments). Output
to terminal is normally automatically flushed by the operating system. This
option can be useful when the input or output is attached to a pipe and you do
not want <b>pcre2grep</b> to buffer up large amounts of data. However, its use
will affect performance, and the <b>-M</b> (multiline) option ceases to work.
When input is from a compressed .gz or .bz2 file, <b>--line-buffered</b> is
ignored.
</P>
<P>
<b>--line-offsets</b>
@ -941,7 +949,7 @@ Cambridge, England.
</P>
<br><a name="SEC15" href="#TOC1">REVISION</a><br>
<P>
Last updated: 11 October 2017
Last updated: 13 November 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -159,6 +159,12 @@ Behave as if each pattern has the <b>auto_callout</b> modifier, that is, insert
automatic callouts into every pattern that is compiled.
</P>
<P>
<b>-AC</b>
As for <b>-ac</b>, but in addition behave as if each subject line has the
<b>callout_extra</b> modifier, that is, show additional information from
callouts.
</P>
<P>
<b>-b</b>
Behave as if each pattern has the <b>fullbincode</b> modifier; the full
internal binary form of the pattern is output after compilation.
@ -1182,6 +1188,7 @@ pattern.
callout_capture show captures at callout time
callout_data=&#60;n&#62; set a value to pass via callouts
callout_error=&#60;n&#62;[:&#60;m&#62;] control callout error
callout_extra show extra callout information
callout_fail=&#60;n&#62;[:&#60;m&#62;] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function
@ -1694,49 +1701,10 @@ documentation.
<br><a name="SEC16" href="#TOC1">CALLOUTS</a><br>
<P>
If the pattern contains any callout requests, <b>pcre2test</b>'s callout
function is called during matching unless <b>callout_none</b> is specified.
This works with both matching functions.
</P>
<P>
The callout function in <b>pcre2test</b> returns zero (carry on matching) by
default, but you can use a <b>callout_fail</b> modifier in a subject line to
change this and other parameters of the callout.
</P>
<P>
If <b>callout_capture</b> is set, the current captured groups are output when a
callout occurs. By default, the callout function then generates output that
indicates where the current match start and matching points are in the subject,
and what the next pattern item is. This output is suppressed if the
<b>callout_no_where</b> modifier is set.
</P>
<P>
The default return from the callout function is zero, which allows matching to
continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;)
are given, 1 is returned when callout &#60;n&#62; is reached and there have been at
least &#60;m&#62; callouts. The <b>callout_error</b> modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
<b>callout_error</b> takes precedence. Note that callouts with string arguments
are always given the number zero. See
</P>
<P>
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from <b>pcre2test</b>'s callout function.
</P>
<P>
Inserting callouts can be helpful when using <b>pcre2test</b> to check
complicated regular expressions. For further information about callouts, see
the
<a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation.
</P>
<P>
The output for callouts with numerical arguments and those with string
arguments is slightly different.
function is called during matching unless <b>callout_none</b> is specified. This
works with both matching functions, and with JIT, though there are some
differences in behaviour. The output for callouts with numerical arguments and
those with string arguments is slightly different.
</P>
<br><b>
Callouts with numerical arguments
@ -1811,6 +1779,107 @@ example:
</PRE>
</P>
<br><b>
Callout modifiers
</b><br>
<P>
The callout function in <b>pcre2test</b> returns zero (carry on matching) by
default, but you can use a <b>callout_fail</b> modifier in a subject line to
change this and other parameters of the callout (see below).
</P>
<P>
If the <b>callout_capture</b> modifier is set, the current captured groups are
output when a callout occurs. This is useful only for non-DFA matching, as
<b>pcre2_dfa_match()</b> does not support capturing, so no captures are ever
shown.
</P>
<P>
The normal callout output, showing the callout number or pattern offset (as
described above) is suppressed if the <b>callout_no_where</b> modifier is set.
</P>
<P>
When using the interpretive matching function <b>pcre2_match()</b> without JIT,
setting the <b>callout_extra</b> modifier causes additional output from
<b>pcre2test</b>'s callout function to be generated. For the first callout in a
match attempt at a new starting position in the subject, "New match attempt" is
output. If there has been a backtrack since the last callout (or start of
matching if this is the first callout), "Backtrack" is output, followed by "No
other matching paths" if the backtrack ended the previous match attempt. For
example:
<pre>
re&#62; /(a+)b/auto_callout,no_start_optimize,no_auto_possess
data&#62; aac\=callout_extra
New match attempt
---&#62;aac
+0 ^ (
+1 ^ a+
+3 ^ ^ )
+4 ^ ^ b
Backtrack
---&#62;aac
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
---&#62;aac
+0 ^ (
+1 ^ a+
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
---&#62;aac
+0 ^ (
+1 ^ a+
Backtrack
No other matching paths
New match attempt
---&#62;aac
+0 ^ (
+1 ^ a+
No match
</pre>
Notice that various optimizations must be turned off if you want all possible
matching paths to be scanned. If <b>no_start_optimize</b> is not used, there is
an immediate "no match", without any callouts, because the starting
optimization fails to find "b" in the subject, which it knows must be present
for any match. If <b>no_auto_possess</b> is not used, the "a+" item is turned
into "a++", which reduces the number of backtracks.
</P>
<P>
The <b>callout_extra</b> modifier has no effect if used with the DFA matching
function, or with JIT.
</P>
<br><b>
Return values from callouts
</b><br>
<P>
The default return from the callout function is zero, which allows matching to
continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;)
are given, 1 is returned when callout &#60;n&#62; is reached and there have been at
least &#60;m&#62; callouts. The <b>callout_error</b> modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
<b>callout_error</b> takes precedence. Note that callouts with string arguments
are always given the number zero.
</P>
<P>
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from <b>pcre2test</b>'s callout function.
</P>
<P>
Inserting callouts can be helpful when using <b>pcre2test</b> to check
complicated regular expressions. For further information about callouts, see
the
<a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation.
</P>
<br><a name="SEC17" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
<P>
When <b>pcre2test</b> is outputting text in the compiled version of a pattern,
@ -1913,7 +1982,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 17 October 2017
Last updated: 21 December 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -929,11 +929,14 @@ PCRE2 CONTEXTS
advance in the subject string. The default value is PCRE2_UNSET. The
pcre2_match() and pcre2_dfa_match() functions return
PCRE2_ERROR_NOMATCH if a match with a starting point before or at the
given offset is not found. For example, if the pattern /abc/ is matched
against "123abc" with an offset limit less than 3, the result is
PCRE2_ERROR_NO_MATCH. A match can never be found if the startoffset
argument of pcre2_match() or pcre2_dfa_match() is greater than the off-
set limit.
given offset is not found. The pcre2_substitute() function makes no
more substitutions.
For example, if the pattern /abc/ is matched against "123abc" with an
offset limit less than 3, the result is PCRE2_ERROR_NO_MATCH. A match
can never be found if the startoffset argument of pcre2_match(),
pcre2_dfa_match(), or pcre2_substitute() is greater than the offset
limit set in the match context.
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT
option when calling pcre2_compile() so that when JIT is in use, differ-
@ -942,11 +945,11 @@ PCRE2 CONTEXTS
ated.
The offset limit facility can be used to track progress when searching
large subject strings. See also the PCRE2_FIRSTLINE option, which
requires a match to start within the first line of the subject. If this
is set with an offset limit, a match must occur in the first line and
also within the offset limit. In other words, whichever limit comes
first is used.
large subject strings or to limit the extent of global substitutions.
See also the PCRE2_FIRSTLINE option, which requires a match to start
within the first line of the subject. If this is set with an offset
limit, a match must occur in the first line and also within the offset
limit. In other words, whichever limit comes first is used.
int pcre2_set_heap_limit(pcre2_match_context *mcontext,
uint32_t value);
@ -1910,12 +1913,16 @@ INFORMATION ABOUT A COMPILED PATTERN
PCRE2_INFO_ALLOPTIONS
PCRE2_INFO_ARGOPTIONS
PCRE2_INFO_EXTRAOPTIONS
Return a copy of the pattern's options. The third argument should point
Return copies of the pattern's options. The third argument should point
to a uint32_t variable. PCRE2_INFO_ARGOPTIONS returns exactly the
options that were passed to pcre2_compile(), whereas PCRE2_INFO_ALLOP-
TIONS returns the compile options as modified by any top-level (*XXX)
option settings such as (*UTF) at the start of the pattern itself.
PCRE2_INFO_EXTRAOPTIONS returns the extra options that were set in the
compile context by calling the pcre2_set_compile_extra_options() func-
tion.
For example, if the pattern /(*UTF)abc/ is compiled with the
PCRE2_EXTENDED option, the result for PCRE2_INFO_ALLOPTIONS is
@ -3062,13 +3069,28 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
options can be set in the options argument of pcre2_substitute().
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
string, replacing every matching substring. If this is not set, only
the first matching substring is replaced. If any matched substring has
zero length, after the substitution has happened, an attempt to find a
non-empty match at the same position is performed. If this is not suc-
cessful, the current position is advanced by one character except when
CRLF is a valid newline sequence and the next two characters are CR,
LF. In this case, the current position is advanced by two characters.
string, replacing every matching substring. If this option is not set,
only the first matching substring is replaced. The search for matches
takes place in the original subject string (that is, previous replace-
ments do not affect it). Iteration is implemented by advancing the
startoffset value for each search, which is always passed the entire
subject string. If an offset limit is set in the match context, search-
ing stops when that limit is reached.
You can restrict the effect of a global substitution to a portion of
the subject string by setting either or both of startoffset and an off-
set limit. Here is a pcre2test example:
/B/g,replace=!,use_offset_limit
ABC ABC ABC ABC\=offset=3,offset_limit=12
2: ABC A!C A!C ABC
When continuing with global substitutions after matching a substring
with zero length, an attempt to find a non-empty match at the same off-
set is performed. If this is not successful, the offset is advanced by
one character except when CRLF is a valid newline sequence and the next
two characters are CR, LF. In this case, the offset is advanced by two
characters.
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
@ -3288,7 +3310,7 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
11, /* the length of the subject string */
0, /* start at offset 0 in the subject */
0, /* default options */
match_data, /* the match data block */
md, /* the match data block */
NULL, /* a match context; NULL means use defaults */
wspace, /* working space vector */
20); /* number of elements (NOT size in bytes) */
@ -3447,7 +3469,7 @@ AUTHOR
REVISION
Last updated: 13 October 2017
Last updated: 16 December 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
@ -4183,16 +4205,18 @@ THE CALLOUT INTERFACE
During matching, when PCRE2 reaches a callout point, if an external
function is provided in the match context, it is called. This applies
to both normal and DFA matching. The first argument to the callout
function is a pointer to a pcre2_callout block. The second argument is
the void * callout data that was supplied when the callout was set up
by calling pcre2_set_callout() (see the pcre2api documentation). The
callout block structure contains the following fields:
to both normal, DFA, and JIT matching. The first argument to the call-
out function is a pointer to a pcre2_callout block. The second argument
is the void * callout data that was supplied when the callout was set
up by calling pcre2_set_callout() (see the pcre2api documentation). The
callout block structure contains the following fields, not necessarily
in this order:
uint32_t version;
uint32_t callout_number;
uint32_t capture_top;
uint32_t capture_last;
uint32_t callout_flags;
PCRE2_SIZE *offset_vector;
PCRE2_SPTR mark;
PCRE2_SPTR subject;
@ -4206,12 +4230,12 @@ THE CALLOUT INTERFACE
PCRE2_SPTR callout_string;
The version field contains the version number of the block format. The
current version is 1; the three callout string fields were added for
this version. If you are writing an application that might use an ear-
lier release of PCRE2, you should check the version number before
accessing any of these fields. The version number will increase in
future if more fields are added, but the intention is never to remove
any of the existing fields.
current version is 2; the three callout string fields were added for
version 1, and the callout_flags field for version 2. If you are writ-
ing an application that might use an earlier release of PCRE2, you
should check the version number before accessing any of these fields.
The version number will increase in future if more fields are added,
but the intention is never to remove any of the existing fields.
Fields for numerical callouts
@ -4318,6 +4342,34 @@ THE CALLOUT INTERFACE
previous (*MARK). In callouts from the DFA matching function this field
always contains NULL.
The callout_flags field is always zero in callouts from
pcre2_dfa_match() or when JIT is being used. When pcre2_match() without
JIT is used, the following bits may be set:
PCRE2_CALLOUT_STARTMATCH
This is set for the first callout after the start of matching for each
new starting position in the subject.
PCRE2_CALLOUT_BACKTRACK
This is set if there has been a matching backtrack since the previous
callout, or since the start of matching if this is the first callout
from a pcre2_match() run.
Both bits are set when a backtrack has caused a "bumpalong" to a new
starting position in the subject. Output from pcre2test does not indi-
cate the presence of these bits unless the callout_extra modifier is
set.
The information in the callout_flags field is provided so that applica-
tions can track and tell their users how matching with backtracking is
done. This can be useful when trying to optimize patterns, or just to
understand how PCRE2 works. There is no support in pcre2_dfa_match()
because there is no backtracking in DFA matching, and there is no sup-
port in JIT because JIT is all about maximimizing matching performance.
In both these cases the callout_flags field is always zero.
RETURN VALUES FROM CALLOUTS
@ -4387,7 +4439,7 @@ AUTHOR
REVISION
Last updated: 14 April 2017
Last updated: 22 December 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -3185,7 +3185,7 @@ subject string by setting either or both of \fIstartoffset\fP and an offset
limit. Here is a \fPpcre2test\fP example:
.sp
/B/g,replace=!,use_offset_limit
ABC ABC ABC ABC\=offset=3,offset_limit=12
ABC ABC ABC ABC\e=offset=3,offset_limit=12
2: ABC A!C A!C ABC
.sp
When continuing with global substitutions after matching a substring with zero

View File

@ -1,4 +1,4 @@
.TH PCRE2CALLOUT 3 "14 April 2017" "PCRE2 10.30"
.TH PCRE2CALLOUT 3 "22 December 2017" "PCRE2 10.31"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -191,20 +191,22 @@ callouts such as the example above are obeyed.
.rs
.sp
During matching, when PCRE2 reaches a callout point, if an external function is
provided in the match context, it is called. This applies to both normal and
DFA matching. The first argument to the callout function is a pointer to a
\fBpcre2_callout\fP block. The second argument is the void * callout data that
was supplied when the callout was set up by calling \fBpcre2_set_callout()\fP
(see the
provided in the match context, it is called. This applies to both normal,
DFA, and JIT matching. The first argument to the callout function is a pointer
to a \fBpcre2_callout\fP block. The second argument is the void * callout data
that was supplied when the callout was set up by calling
\fBpcre2_set_callout()\fP (see the
.\" HREF
\fBpcre2api\fP
.\"
documentation). The callout block structure contains the following fields:
documentation). The callout block structure contains the following fields, not
necessarily in this order:
.sp
uint32_t \fIversion\fP;
uint32_t \fIcallout_number\fP;
uint32_t \fIcapture_top\fP;
uint32_t \fIcapture_last\fP;
uint32_t \fIcallout_flags\fP;
PCRE2_SIZE *\fIoffset_vector\fP;
PCRE2_SPTR \fImark\fP;
PCRE2_SPTR \fIsubject\fP;
@ -218,11 +220,12 @@ documentation). The callout block structure contains the following fields:
PCRE2_SPTR \fIcallout_string\fP;
.sp
The \fIversion\fP field contains the version number of the block format. The
current version is 1; the three callout string fields were added for this
version. If you are writing an application that might use an earlier release of
PCRE2, you should check the version number before accessing any of these
fields. The version number will increase in future if more fields are added,
but the intention is never to remove any of the existing fields.
current version is 2; the three callout string fields were added for version 1,
and the \fIcallout_flags\fP field for version 2. If you are writing an
application that might use an earlier release of PCRE2, you should check the
version number before accessing any of these fields. The version number will
increase in future if more fields are added, but the intention is never to
remove any of the existing fields.
.
.
.SS "Fields for numerical callouts"
@ -331,6 +334,33 @@ the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
(*THEN) item in the match, or NULL if no such items have been passed. Instances
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
callouts from the DFA matching function this field always contains NULL.
.P
The \fIcallout_flags\fP field is always zero in callouts from
\fBpcre2_dfa_match()\fP or when JIT is being used. When \fBpcre2_match()\fP
without JIT is used, the following bits may be set:
.sp
PCRE2_CALLOUT_STARTMATCH
.sp
This is set for the first callout after the start of matching for each new
starting position in the subject.
.sp
PCRE2_CALLOUT_BACKTRACK
.sp
This is set if there has been a matching backtrack since the previous callout,
or since the start of matching if this is the first callout from a
\fBpcre2_match()\fP run.
.P
Both bits are set when a backtrack has caused a "bumpalong" to a new starting
position in the subject. Output from \fBpcre2test\fP does not indicate the
presence of these bits unless the \fBcallout_extra\fP modifier is set.
.P
The information in the \fBcallout_flags\fP field is provided so that
applications can track and tell their users how matching with backtracking is
done. This can be useful when trying to optimize patterns, or just to
understand how PCRE2 works. There is no support in \fBpcre2_dfa_match()\fP
because there is no backtracking in DFA matching, and there is no support in
JIT because JIT is all about maximimizing matching performance. In both these
cases the \fBcallout_flags\fP field is always zero.
.
.
.SH "RETURN VALUES FROM CALLOUTS"
@ -411,6 +441,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 14 April 2017
Last updated: 22 December 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -103,11 +103,12 @@ DESCRIPTION
SUPPORT FOR COMPRESSED FILES
It is possible to compile pcre2grep so that it uses libz or libbz2 to
read files whose names end in .gz or .bz2, respectively. You can find
out whether your binary has support for one or both of these file types
by running it with the --help option. If the appropriate support is not
present, files are treated as plain text. The standard input is always
so treated.
read compressed files whose names end in .gz or .bz2, respectively. You
can find out whether your pcre2grep binary has support for one or both
of these file types by running it with the --help option. If the appro-
priate support is not present, all files are treated as plain text. The
standard input is always so treated. When input is from a compressed
.gz or .bz2 file, the --line-buffered option is ignored.
BINARY FILES
@ -124,7 +125,7 @@ BINARY FILES
OPTIONS
The order in which some of the options appear can affect the output.
For example, both the -h and -l options affect the printing of file
For example, both the -H and -l options affect the printing of file
names. Whichever comes later in the command line will be the one that
takes effect. Similarly, except where noted below, if an option is
given twice, the later setting is used. Numerical values for options
@ -376,7 +377,8 @@ OPTIONS
is used. If a line number is also being output, it follows
the file name. When the -M option causes a pattern to match
more than one line, only the first is preceded by the file
name.
name. This option overrides any previous -h, -l, or -L
options.
-h, --no-filename
Suppress the output file names when searching multiple files.
@ -384,6 +386,7 @@ OPTIONS
searched. For matching lines, the file name is followed by a
colon; for context lines, a hyphen separator is used. If a
line number is also being output, it follows the file name.
This option overrides any previous -H, -L, or -l options.
--heap-limit=number
See --match-limit below.
@ -436,7 +439,8 @@ OPTIONS
Instead of outputting lines from the files, just output the
names of the files that do not contain any lines that would
have been output. Each file name is output once, on a sepa-
rate line.
rate line. This option overrides any previous -H, -h, or -l
options.
-l, --files-with-matches
Instead of outputting lines from the files, just output the
@ -447,7 +451,8 @@ OPTIONS
matching continues in order to obtain the correct count, and
those files that have at least one match are listed along
with their counts. Using this option with -c is a way of sup-
pressing the listing of files with no matches.
pressing the listing of files with no matches. This opeion
overrides any previous -H, -h, or -L options.
--label=name
This option supplies a name to be used for the standard input
@ -455,16 +460,18 @@ OPTIONS
input)" is used. There is no short form for this option.
--line-buffered
When this option is given, input is read and processed line
by line, and the output is flushed after each write. By
default, input is read in large chunks, unless pcre2grep can
determine that it is reading from a terminal (which is cur-
rently possible only in Unix-like environments). Output to
terminal is normally automatically flushed by the operating
system. This option can be useful when the input or output is
attached to a pipe and you do not want pcre2grep to buffer up
large amounts of data. However, its use will affect perfor-
mance, and the -M (multiline) option ceases to work.
When this option is given, non-compressed input is read and
processed line by line, and the output is flushed after each
write. By default, input is read in large chunks, unless
pcre2grep can determine that it is reading from a terminal
(which is currently possible only in Unix-like environments).
Output to terminal is normally automatically flushed by the
operating system. This option can be useful when the input or
output is attached to a pipe and you do not want pcre2grep to
buffer up large amounts of data. However, its use will affect
performance, and the -M (multiline) option ceases to work.
When input is from a compressed .gz or .bz2 file, --line-
buffered is ignored.
--line-offsets
Instead of showing lines or parts of lines that match, show
@ -922,5 +929,5 @@ AUTHOR
REVISION
Last updated: 11 October 2017
Last updated: 13 November 2017
Copyright (c) 1997-2017 University of Cambridge.

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "17 October 2017" "PCRE 10.31"
.TH PCRE2TEST 1 "21 Decbmber 2017" "PCRE 10.31"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -129,6 +129,11 @@ has not been built, this option causes an error.
Behave as if each pattern has the \fBauto_callout\fP modifier, that is, insert
automatic callouts into every pattern that is compiled.
.TP 10
\fB-AC\fP
As for \fB-ac\fP, but in addition behave as if each subject line has the
\fBcallout_extra\fP modifier, that is, show additional information from
callouts.
.TP 10
\fB-b\fP
Behave as if each pattern has the \fBfullbincode\fP modifier; the full
internal binary form of the pattern is output after compilation.
@ -1152,6 +1157,7 @@ pattern.
callout_capture show captures at callout time
callout_data=<n> set a value to pass via callouts
callout_error=<n>[:<m>] control callout error
callout_extra show extra callout information
callout_fail=<n>[:<m>] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function
@ -1664,45 +1670,10 @@ documentation.
.rs
.sp
If the pattern contains any callout requests, \fBpcre2test\fP's callout
function is called during matching unless \fBcallout_none\fP is specified.
This works with both matching functions.
.P
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
default, but you can use a \fBcallout_fail\fP modifier in a subject line to
change this and other parameters of the callout.
.P
If \fBcallout_capture\fP is set, the current captured groups are output when a
callout occurs. By default, the callout function then generates output that
indicates where the current match start and matching points are in the subject,
and what the next pattern item is. This output is suppressed if the
\fBcallout_no_where\fP modifier is set.
.P
The default return from the callout function is zero, which allows matching to
continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
are given, 1 is returned when callout <n> is reached and there have been at
least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
\fBcallout_error\fP takes precedence. Note that callouts with string arguments
are always given the number zero. See
.P
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from \fBpcre2test\fP's callout function.
.P
Inserting callouts can be helpful when using \fBpcre2test\fP to check
complicated regular expressions. For further information about callouts, see
the
.\" HREF
\fBpcre2callout\fP
.\"
documentation.
.P
The output for callouts with numerical arguments and those with string
arguments is slightly different.
function is called during matching unless \fBcallout_none\fP is specified. This
works with both matching functions, and with JIT, though there are some
differences in behaviour. The output for callouts with numerical arguments and
those with string arguments is slightly different.
.
.
.SS "Callouts with numerical arguments"
@ -1776,6 +1747,103 @@ example:
.sp
.
.
.SS "Callout modifiers"
.rs
.sp
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
default, but you can use a \fBcallout_fail\fP modifier in a subject line to
change this and other parameters of the callout (see below).
.P
If the \fBcallout_capture\fP modifier is set, the current captured groups are
output when a callout occurs. This is useful only for non-DFA matching, as
\fBpcre2_dfa_match()\fP does not support capturing, so no captures are ever
shown.
.P
The normal callout output, showing the callout number or pattern offset (as
described above) is suppressed if the \fBcallout_no_where\fP modifier is set.
.P
When using the interpretive matching function \fBpcre2_match()\fP without JIT,
setting the \fBcallout_extra\fP modifier causes additional output from
\fBpcre2test\fP's callout function to be generated. For the first callout in a
match attempt at a new starting position in the subject, "New match attempt" is
output. If there has been a backtrack since the last callout (or start of
matching if this is the first callout), "Backtrack" is output, followed by "No
other matching paths" if the backtrack ended the previous match attempt. For
example:
.sp
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
data> aac\e=callout_extra
New match attempt
--->aac
+0 ^ (
+1 ^ a+
+3 ^ ^ )
+4 ^ ^ b
Backtrack
--->aac
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
No match
.sp
Notice that various optimizations must be turned off if you want all possible
matching paths to be scanned. If \fBno_start_optimize\fP is not used, there is
an immediate "no match", without any callouts, because the starting
optimization fails to find "b" in the subject, which it knows must be present
for any match. If \fBno_auto_possess\fP is not used, the "a+" item is turned
into "a++", which reduces the number of backtracks.
.P
The \fBcallout_extra\fP modifier has no effect if used with the DFA matching
function, or with JIT.
.
.
.SS "Return values from callouts"
.rs
.sp
The default return from the callout function is zero, which allows matching to
continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
are given, 1 is returned when callout <n> is reached and there have been at
least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
\fBcallout_error\fP takes precedence. Note that callouts with string arguments
are always given the number zero.
.P
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from \fBpcre2test\fP's callout function.
.P
Inserting callouts can be helpful when using \fBpcre2test\fP to check
complicated regular expressions. For further information about callouts, see
the
.\" HREF
\fBpcre2callout\fP
.\"
documentation.
.
.
.
.SH "NON-PRINTING CHARACTERS"
.rs
@ -1894,6 +1962,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 17 October 2017
Last updated: 21 December 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -120,6 +120,10 @@ COMMAND LINE OPTIONS
is, insert automatic callouts into every pattern that is com-
piled.
-AC As for -ac, but in addition behave as if each subject line
has the callout_extra modifier, that is, show additional
information from callouts.
-b Behave as if each pattern has the fullbincode modifier; the
full internal binary form of the pattern is output after com-
pilation.
@ -1056,6 +1060,7 @@ SUBJECT MODIFIERS
callout_capture show captures at callout time
callout_data=<n> set a value to pass via callouts
callout_error=<n>[:<m>] control callout error
callout_extra show extra callout information
callout_fail=<n>[:<m>] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function
@ -1530,42 +1535,9 @@ CALLOUTS
If the pattern contains any callout requests, pcre2test's callout func-
tion is called during matching unless callout_none is specified. This
works with both matching functions.
The callout function in pcre2test returns zero (carry on matching) by
default, but you can use a callout_fail modifier in a subject line to
change this and other parameters of the callout.
If callout_capture is set, the current captured groups are output when
a callout occurs. By default, the callout function then generates out-
put that indicates where the current match start and matching points
are in the subject, and what the next pattern item is. This output is
suppressed if the callout_no_where modifier is set.
The default return from the callout function is zero, which allows
matching to continue. The callout_fail modifier can be given one or two
numbers. If there is only one number, 1 is returned instead of 0 (caus-
ing matching to backtrack) when a callout of that number is reached. If
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
reached and there have been at least <m> callouts. The callout_error
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
ing the entire matching process to be aborted. If both these modifiers
are set for the same callout number, callout_error takes precedence.
Note that callouts with string arguments are always given the number
zero. See
The callout_data modifier can be given an unsigned or a negative num-
ber. This is set as the "user data" that is passed to the matching
function, and passed back when the callout function is invoked. Any
value other than zero is used as a return from pcre2test's callout
function.
Inserting callouts can be helpful when using pcre2test to check compli-
cated regular expressions. For further information about callouts, see
the pcre2callout documentation.
The output for callouts with numerical arguments and those with string
arguments is slightly different.
works with both matching functions, and with JIT, though there are some
differences in behaviour. The output for callouts with numerical argu-
ments and those with string arguments is slightly different.
Callouts with numerical arguments
@ -1636,6 +1608,100 @@ CALLOUTS
0: abcdef
Callout modifiers
The callout function in pcre2test returns zero (carry on matching) by
default, but you can use a callout_fail modifier in a subject line to
change this and other parameters of the callout (see below).
If the callout_capture modifier is set, the current captured groups are
output when a callout occurs. This is useful only for non-DFA matching,
as pcre2_dfa_match() does not support capturing, so no captures are
ever shown.
The normal callout output, showing the callout number or pattern offset
(as described above) is suppressed if the callout_no_where modifier is
set.
When using the interpretive matching function pcre2_match() without
JIT, setting the callout_extra modifier causes additional output from
pcre2test's callout function to be generated. For the first callout in
a match attempt at a new starting position in the subject, "New match
attempt" is output. If there has been a backtrack since the last call-
out (or start of matching if this is the first callout), "Backtrack" is
output, followed by "No other matching paths" if the backtrack ended
the previous match attempt. For example:
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
data> aac\=callout_extra
New match attempt
--->aac
+0 ^ (
+1 ^ a+
+3 ^ ^ )
+4 ^ ^ b
Backtrack
--->aac
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
No match
Notice that various optimizations must be turned off if you want all
possible matching paths to be scanned. If no_start_optimize is not
used, there is an immediate "no match", without any callouts, because
the starting optimization fails to find "b" in the subject, which it
knows must be present for any match. If no_auto_possess is not used,
the "a+" item is turned into "a++", which reduces the number of back-
tracks.
The callout_extra modifier has no effect if used with the DFA matching
function, or with JIT.
Return values from callouts
The default return from the callout function is zero, which allows
matching to continue. The callout_fail modifier can be given one or two
numbers. If there is only one number, 1 is returned instead of 0 (caus-
ing matching to backtrack) when a callout of that number is reached. If
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
reached and there have been at least <m> callouts. The callout_error
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
ing the entire matching process to be aborted. If both these modifiers
are set for the same callout number, callout_error takes precedence.
Note that callouts with string arguments are always given the number
zero.
The callout_data modifier can be given an unsigned or a negative num-
ber. This is set as the "user data" that is passed to the matching
function, and passed back when the callout function is invoked. Any
value other than zero is used as a return from pcre2test's callout
function.
Inserting callouts can be helpful when using pcre2test to check compli-
cated regular expressions. For further information about callouts, see
the pcre2callout documentation.
NON-PRINTING CHARACTERS
When pcre2test is outputting text in the compiled version of a pattern,
@ -1733,5 +1799,5 @@ AUTHOR
REVISION
Last updated: 17 October 2017
Last updated: 21 December 2017
Copyright (c) 1997-2017 University of Cambridge.

View File

@ -494,6 +494,11 @@ without changing the API of the function, thereby allowing old clients to work
without modification. Define the generic version in a macro; the width-specific
versions are generated from this macro below. */
/* Flags for the callout_flags field. These are cleared after a callout. */
#define PCRE2_CALLOUT_STARTMATCH 0x00000001u /* Set for each bumpalong */
#define PCRE2_CALLOUT_BACKTRACK 0x00000002u /* Set after a backtrack */
#define PCRE2_STRUCTURE_LIST \
typedef struct pcre2_callout_block { \
uint32_t version; /* Identifies version of block */ \
@ -513,6 +518,8 @@ typedef struct pcre2_callout_block { \
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
/* ------------------- Added for Version 2 -------------------------- */ \
uint32_t callout_flags; /* See above for list */ \
/* ------------------------------------------------------------------ */ \
} pcre2_callout_block; \
\

View File

@ -494,6 +494,11 @@ without changing the API of the function, thereby allowing old clients to work
without modification. Define the generic version in a macro; the width-specific
versions are generated from this macro below. */
/* Flags for the callout_flags field. These are cleared after a callout. */
#define PCRE2_CALLOUT_STARTMATCH 0x00000001u /* Set for each bumpalong */
#define PCRE2_CALLOUT_BACKTRACK 0x00000002u /* Set after a backtrack */
#define PCRE2_STRUCTURE_LIST \
typedef struct pcre2_callout_block { \
uint32_t version; /* Identifies version of block */ \
@ -513,6 +518,8 @@ typedef struct pcre2_callout_block { \
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
/* ------------------- Added for Version 2 -------------------------- */ \
uint32_t callout_flags; /* See above for list */ \
/* ------------------------------------------------------------------ */ \
} pcre2_callout_block; \
\

View File

@ -2574,7 +2574,8 @@ for (;;)
if (mb->callout != NULL)
{
pcre2_callout_block cb;
cb.version = 1;
cb.version = 2;
cb.callout_flags = 0;
cb.capture_top = 1;
cb.capture_last = 0;
cb.offset_vector = offsets;
@ -2943,7 +2944,8 @@ for (;;)
if (mb->callout != NULL)
{
pcre2_callout_block cb;
cb.version = 1;
cb.version = 2;
cb.callout_flags = 0;
cb.capture_top = 1;
cb.capture_last = 0;
cb.offset_vector = offsets;

View File

@ -7952,7 +7952,8 @@ oveccount = callout_block->capture_top;
SLJIT_ASSERT(oveccount >= 1);
callout_block->version = 1;
callout_block->version = 2;
callout_block->callout_flags = 0;
/* Offsets in subject. */
callout_block->subject_length = arguments->end - arguments->begin;

View File

@ -321,6 +321,7 @@ callout_ovector[0] = callout_ovector[1] = PCRE2_UNSET;
rc = mb->callout(cb, mb->callout_data);
callout_ovector[0] = save0;
callout_ovector[1] = save1;
cb->callout_flags = 0;
return rc;
}
@ -5921,6 +5922,7 @@ in rrc. */
RETURN_SWITCH:
if (Frdepth == 0) return rrc; /* Exit from the top level */
F = (heapframe *)((char *)F - Fback_frame); /* Back track */
mb->cb->callout_flags |= PCRE2_CALLOUT_BACKTRACK; /* Note for callouts */
#ifdef DEBUG_SHOW_RMATCH
fprintf(stderr, "++ RETURN %d to %d\n", rrc, Freturn_id);
@ -6171,13 +6173,14 @@ startline = (re->flags & PCRE2_STARTLINE) != 0;
bumpalong_limit = (mcontext->offset_limit == PCRE2_UNSET)?
end_subject : subject + mcontext->offset_limit;
/* Set up the fixed fields in the callout block, with a pointer in the
match block. */
/* Initialize and set up the fixed fields in the callout block, with a pointer
in the match block. */
mb->cb = &cb;
cb.version = 1;
cb.version = 2;
cb.subject = subject;
cb.subject_length = (PCRE2_SIZE)(end_subject - subject);
cb.callout_flags = 0;
/* Fill in the remaining fields in the match block. */
@ -6644,6 +6647,8 @@ for(;;)
first starting point for which a partial match was found. */
cb.start_match = (PCRE2_SIZE)(start_match - subject);
cb.callout_flags |= PCRE2_CALLOUT_STARTMATCH;
mb->start_used_ptr = start_match;
mb->last_used_ptr = start_match;
mb->match_call_count = 0;

View File

@ -485,6 +485,7 @@ so many of them that they are split into two fields. */
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
#define CTL2_SUBJECT_LITERAL 0x00000010u
#define CTL2_CALLOUT_NO_WHERE 0x00000020u
#define CTL2_CALLOUT_EXTRA 0x00000040u
#define CTL2_NL_SET 0x40000000u /* Informational */
#define CTL2_BSR_SET 0x80000000u /* Informational */
@ -598,6 +599,7 @@ static modstruct modlist[] = {
{ "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
{ "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) },
{ "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) },
{ "callout_extra", MOD_DAT, MOD_CTL, CTL2_CALLOUT_EXTRA, DO(control2) },
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
{ "callout_no_where", MOD_DAT, MOD_CTL, CTL2_CALLOUT_NO_WHERE, DO(control2) },
@ -3971,7 +3973,7 @@ Returns: nothing
static void
show_controls(uint32_t controls, uint32_t controls2, const char *before)
{
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@ -3981,6 +3983,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
((controls & CTL_BINCODE) != 0)? " bincode" : "",
((controls2 & CTL2_BSR_SET) != 0)? " bsr" : "",
((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "",
((controls2 & CTL2_CALLOUT_EXTRA) != 0)? " callout_extra" : "",
((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "",
((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "",
((controls2 & CTL2_CALLOUT_NO_WHERE) != 0)? " callout_no_where" : "",
@ -5842,17 +5845,43 @@ Return:
static int
callout_function(pcre2_callout_block_8 *cb, void *callout_data_ptr)
{
FILE *f, *fdefault;
uint32_t i, pre_start, post_start, subject_length;
PCRE2_SIZE current_position;
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
BOOL callout_capture = (dat_datctl.control & CTL_CALLOUT_CAPTURE) != 0;
BOOL callout_where = (dat_datctl.control2 & CTL2_CALLOUT_NO_WHERE) == 0;
/* This FILE is used for echoing the subject. This is done only once in simple
cases. */
/* The FILE f is used for echoing the subject string if it is non-NULL. This
happens only once in simple cases, but we want to repeat after any additional
output caused by CALLOUT_EXTRA. */
FILE *f = (first_callout || callout_capture || cb->callout_string != NULL)?
outfile : NULL;
fdefault = (!first_callout && !callout_capture && cb->callout_string == NULL)?
NULL : outfile;
if ((dat_datctl.control2 & CTL2_CALLOUT_EXTRA) != 0)
{
f = outfile;
switch (cb->callout_flags)
{
case PCRE2_CALLOUT_BACKTRACK:
fprintf(f, "Backtrack\n");
break;
case PCRE2_CALLOUT_STARTMATCH|PCRE2_CALLOUT_BACKTRACK:
fprintf(f, "Backtrack\nNo other matching paths\n");
/* Fall through */
case PCRE2_CALLOUT_STARTMATCH:
fprintf(f, "New match attempt\n");
break;
default:
f = fdefault;
break;
}
}
else f = fdefault;
/* For a callout with a string argument, show the string first because there
isn't a tidy way to fit it in the rest of the data. */
@ -5902,7 +5931,6 @@ lengths of the substrings. */
if (callout_where)
{
if (f != NULL) fprintf(f, "--->");
/* The subject before the match start. */
@ -5931,9 +5959,10 @@ if (callout_where)
if (f != NULL) fprintf(f, "\n");
/* For automatic callouts, show the pattern offset. Otherwise, for a numerical
callout whose number has not already been shown with captured strings, show the
number here. A callout with a string argument has been displayed above. */
/* For automatic callouts, show the pattern offset. Otherwise, for a
numerical callout whose number has not already been shown with captured
strings, show the number here. A callout with a string argument has been
displayed above. */
if (cb->callout_number == 255)
{
@ -5963,6 +5992,8 @@ if (callout_where)
if (cb->next_item_length != 0)
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
pbuffer8 + cb->pattern_position);
else
fprintf(outfile, "End of pattern");
fprintf(outfile, "\n");
}
@ -7685,7 +7716,8 @@ printf(" -16 use the 16-bit library\n");
#ifdef SUPPORT_PCRE2_32
printf(" -32 use the 32-bit library\n");
#endif
printf(" -ac set default pattern option PCRE2_AUTO_CALLOUT\n");
printf(" -ac set default pattern modifier PCRE2_AUTO_CALLOUT\n");
printf(" -AC as -ac, but also set subject 'callout_extra' modifier\n");
printf(" -b set default pattern modifier 'fullbincode'\n");
printf(" -C show PCRE2 compile-time options and exit\n");
printf(" -C arg show a specific compile-time option and exit with its\n");
@ -8181,6 +8213,11 @@ while (argc > 1 && argv[op][0] == '-' && argv[op][1] != 0)
/* Set some common pattern and subject controls */
else if (strcmp(arg, "-AC") == 0)
{
def_patctl.options |= PCRE2_AUTO_CALLOUT;
def_datctl.control2 |= CTL2_CALLOUT_EXTRA;
}
else if (strcmp(arg, "-ac") == 0) def_patctl.options |= PCRE2_AUTO_CALLOUT;
else if (strcmp(arg, "-b") == 0) def_patctl.control |= CTL_FULLBINCODE;
else if (strcmp(arg, "-d") == 0) def_patctl.control |= CTL_DEBUG;

10
testdata/testinput2 vendored
View File

@ -5385,4 +5385,14 @@ a)"xI
ab
aaab
# JIT does not support callout_extra
/(*NO_JIT)(a+)b/auto_callout,no_start_optimize,no_auto_possess
\= Expect no match
aac\=callout_extra
/(*NO_JIT)a+(?C'XXX')b/no_start_optimize,no_auto_possess
\= Expect no match
aac\=callout_extra
# End of testinput2

24
testdata/testoutput15 vendored
View File

@ -361,12 +361,12 @@ Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Subject length lower bound = 1
abc\=callout_fail=1
--->abc
1 ^ ^
1 ^ ^
1 ^^
1 ^ ^
1 ^^
1 ^^
1 ^ ^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^^ End of pattern
No match
/(*NO_AUTO_POSSESS)\w+(?C1)/BI
@ -385,12 +385,12 @@ Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Subject length lower bound = 1
abc\=callout_fail=1
--->abc
1 ^ ^
1 ^ ^
1 ^^
1 ^ ^
1 ^^
1 ^^
1 ^ ^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^^ End of pattern
No match
# This test breaks the JIT stack limit

144
testdata/testoutput2 vendored
View File

@ -3832,7 +3832,7 @@ Subject length lower bound = 2
\= Expect no match
abbbbbccc\=callout_data=1
--->abbbbbccc
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
No match
@ -3844,21 +3844,21 @@ Subject length lower bound = 2
\= Expect no match
abbbbbccc\=callout_data=1
--->abbbbbccc
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
No match
@ -4718,7 +4718,7 @@ Subject length lower bound = 5
+2 ^ ^ c
+3 ^ ^ d
+4 ^ ^ e
+5 ^ ^
+5 ^ ^ End of pattern
0: abcde
\= Expect no match
abcdfe
@ -4750,13 +4750,13 @@ Subject length lower bound = 1
--->ab
+0 ^ a*
+2 ^^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: ab
aaaab
--->aaaab
+0 ^ a*
+2 ^ ^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: aaaab
aaaacb
--->aaaacb
@ -4770,7 +4770,7 @@ Subject length lower bound = 1
+2 ^^ b
+0 ^ a*
+2 ^ b
+3 ^^
+3 ^^ End of pattern
0: b
/a*b/IB,auto_callout
@ -4793,13 +4793,13 @@ Subject length lower bound = 1
--->ab
+0 ^ a*
+2 ^^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: ab
aaaab
--->aaaab
+0 ^ a*
+2 ^ ^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: aaaab
aaaacb
--->aaaacb
@ -4813,7 +4813,7 @@ Subject length lower bound = 1
+2 ^^ b
+0 ^ a*
+2 ^ b
+3 ^^
+3 ^^ End of pattern
0: b
/a+b/IB,auto_callout
@ -4836,13 +4836,13 @@ Subject length lower bound = 2
--->ab
+0 ^ a+
+2 ^^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: ab
aaaab
--->aaaab
+0 ^ a+
+2 ^ ^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: aaaab
\= Expect no match
aaaacb
@ -4897,7 +4897,7 @@ Subject length lower bound = 4
+3 ^ ^ c
+4 ^ ^ |
+9 ^ ^ x
+10 ^ ^
+10 ^ ^ End of pattern
0: abcx
1: abc
defx
@ -4909,7 +4909,7 @@ Subject length lower bound = 4
+7 ^ ^ f
+8 ^ ^ )
+9 ^ ^ x
+10 ^ ^
+10 ^ ^ End of pattern
0: defx
1: def
\= Expect no match
@ -4971,7 +4971,7 @@ Subject length lower bound = 4
+3 ^ ^ c
+4 ^ ^ |
+9 ^ ^ x
+10 ^ ^
+10 ^ ^ End of pattern
0: abcx
1: abc
defx
@ -4983,7 +4983,7 @@ Subject length lower bound = 4
+7 ^ ^ f
+8 ^ ^ )
+9 ^ ^ x
+10 ^ ^
+10 ^ ^ End of pattern
0: defx
1: def
\= Expect no match
@ -5024,7 +5024,7 @@ Subject length lower bound = 6
+3 ^ ^ |
+1 ^ ^ a
+4 ^ ^ c
+12 ^ ^
+12 ^ ^ End of pattern
0: ababab
1: ab
abcdabcd
@ -5044,7 +5044,7 @@ Subject length lower bound = 6
+4 ^ ^ c
+5 ^ ^ d
+6 ^ ^ ){3,4}
+12 ^ ^
+12 ^ ^ End of pattern
0: abcdabcd
1: cd
abcdcdcdcdcd
@ -5065,7 +5065,7 @@ Subject length lower bound = 6
+4 ^ ^ c
+5 ^ ^ d
+6 ^ ^ ){3,4}
+12 ^ ^
+12 ^ ^ End of pattern
0: abcdcdcd
1: cd
@ -5276,7 +5276,7 @@ Subject length lower bound = 11
+21 ^ ^ 1
+22 ^ ^ 2
+23 ^ ^ 3
+24 ^ ^
+24 ^ ^ End of pattern
0: aacaacaacaacaac123
1: aac
@ -8900,7 +8900,7 @@ Subject length lower bound = 0
+7 ^ b
+11 ^ ^
+12 ^ )
+13 ^
+13 ^ End of pattern
0:
abc
--->abc
@ -8921,7 +8921,7 @@ Subject length lower bound = 0
+8 ^^ )
+9 ^ b
+10 ^^ |
+13 ^^
+13 ^^ End of pattern
0: b
/(?(?=b).*b|^d)/I
@ -8938,14 +8938,14 @@ Subject length lower bound = 1
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
abcxyz
--->abcxyz
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
\= Expect no match
abc
@ -8962,7 +8962,7 @@ No match
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
\= Expect no match
abc
@ -8996,7 +8996,7 @@ No match
+15 ^ x
+16 ^^ y
+17 ^ ^ z
+18 ^ ^
+18 ^ ^ End of pattern
0: xyz
/(*NO_AUTO_POSSESS)a+b/B
@ -9017,7 +9017,7 @@ No match
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
/^"((?(?=[a])[^"])|b)*"$/auto_callout
@ -9046,7 +9046,7 @@ No match
+17 ^ ^ |
+21 ^ ^ "
+22 ^ ^ $
+23 ^ ^
+23 ^ ^ End of pattern
0: "ab"
1:
@ -11136,7 +11136,7 @@ Latest Mark: A
+10 ^ ^ |
+18 ^ ^ z
+19 ^ ^ |
+24 ^ ^
+24 ^ ^ End of pattern
0: adz
1: adz
2: d
@ -11155,7 +11155,7 @@ Latest Mark: A
Latest Mark: B
+18 ^ ^ z
+19 ^ ^ |
+24 ^ ^
+24 ^ ^ End of pattern
0: aez
1: aez
2: e
@ -11177,7 +11177,7 @@ Latest Mark: B
+21 ^^ e
+22 ^ ^ q
+23 ^ ^ )
+24 ^ ^
+24 ^ ^ End of pattern
0: aeq
1: aeq
@ -11951,7 +11951,7 @@ Partial match: 123a
+11 ^ b
+12 ^^ b
+13 ^ ^ )
+14 ^ ^
+14 ^ ^ End of pattern
0: bb
/(?C1)^(?C2)(?(?C99)(?=(?C3)a(?C4))(?C5)a(?C6)a(?C7)|(?C8)b(?C9)b(?C10))(?C11)/
@ -11964,7 +11964,7 @@ Partial match: 123a
8 ^ b
9 ^^ b
10 ^ ^ )
11 ^ ^
11 ^ ^ End of pattern
0: bb
# Perl seems to have a bug with this one.
@ -15144,7 +15144,7 @@ Subject length lower bound = 0
+0 ^ (
+1 ^ )\Q\E*
+7 ^ ]
+8 ^^
+8 ^^ End of pattern
0: ]
1:
@ -15428,7 +15428,7 @@ Failed: error 125 at offset 13: lookbehind assertion is not fixed length
+0 ^ a
+1 ^^ b
1 ^ ^ c
+8 ^ ^
+8 ^ ^ End of pattern
0: abc
/'ab(?C1)c'/hex,auto_callout
@ -15437,7 +15437,7 @@ Failed: error 125 at offset 13: lookbehind assertion is not fixed length
+0 ^ a
+1 ^^ b
1 ^ ^ c
+8 ^ ^
+8 ^ ^ End of pattern
0: abc
# Perl accepts these, but gives a warning. We can't warn, so give an error.
@ -16256,7 +16256,7 @@ Failed: error 192 at offset 0: invalid option bits with PCRE2_LITERAL
+2 ^ ^ b
+3 ^ ^ (
+4 ^ ^ c
+5 ^ ^
+5 ^ ^ End of pattern
0: a\b(c
/a\b(c/literal,auto_callout
@ -16267,7 +16267,7 @@ Failed: error 192 at offset 0: invalid option bits with PCRE2_LITERAL
+2 ^ ^ b
+3 ^ ^ (
+4 ^ ^ c
+5 ^ ^
+5 ^ ^ End of pattern
0: a\b(c
/(*CR)abc/literal
@ -16384,6 +16384,62 @@ Subject length lower bound = 1
0: ab
1: a
# JIT does not support callout_extra
/(*NO_JIT)(a+)b/auto_callout,no_start_optimize,no_auto_possess
\= Expect no match
aac\=callout_extra
New match attempt
--->aac
+9 ^ (
+10 ^ a+
+12 ^ ^ )
+13 ^ ^ b
Backtrack
--->aac
+12 ^^ )
+13 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+9 ^ (
+10 ^ a+
+12 ^^ )
+13 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+9 ^ (
+10 ^ a+
Backtrack
No other matching paths
New match attempt
--->aac
+9 ^ (
+10 ^ a+
No match
/(*NO_JIT)a+(?C'XXX')b/no_start_optimize,no_auto_possess
\= Expect no match
aac\=callout_extra
New match attempt
Callout (15): 'XXX'
--->aac
^ ^ b
Backtrack
Callout (15): 'XXX'
--->aac
^^ b
Backtrack
No other matching paths
New match attempt
Callout (15): 'XXX'
--->aac
^^ b
No match
# End of testinput2
Error -65: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data

View File

@ -3763,7 +3763,7 @@ No match
abcd
--->abcd
+0 ^ \w+
+3 ^ ^
+3 ^ ^ End of pattern
0: abcd
/[\p{N}]?+/B,no_auto_possess
@ -4165,7 +4165,7 @@ Failed: error 125 at offset 2: lookbehind assertion is not fixed length
+0 ^ .
+0 ^ .
+1 ^ ^ .
+2 ^ ^
+2 ^ ^ End of pattern
0: \x{123}\x{123}
# This tests processing wide characters in extended mode.

22
testdata/testoutput6 vendored
View File

@ -726,7 +726,7 @@ No match
+4 ^ ^ c
+2 ^ ^ b
+3 ^ ^ |
+12 ^ ^
+12 ^ ^ End of pattern
+1 ^ ^ a
+4 ^ ^ c
0: ababab
@ -745,12 +745,12 @@ No match
+4 ^ ^ c
+2 ^ ^ b
+3 ^ ^ |
+12 ^ ^
+12 ^ ^ End of pattern
+1 ^ ^ a
+4 ^ ^ c
+5 ^ ^ d
+6 ^ ^ ){3,4}
+12 ^ ^
+12 ^ ^ End of pattern
0: abcdabcd
1: abcdab
abcdcdcdcdcd
@ -768,12 +768,12 @@ No match
+4 ^ ^ c
+5 ^ ^ d
+6 ^ ^ ){3,4}
+12 ^ ^
+12 ^ ^ End of pattern
+1 ^ ^ a
+4 ^ ^ c
+5 ^ ^ d
+6 ^ ^ ){3,4}
+12 ^ ^
+12 ^ ^ End of pattern
0: abcdcdcd
1: abcdcd
@ -6610,14 +6610,14 @@ No match
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
abcxyz
--->abcxyz
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
\= Expect no match
abc
@ -6634,7 +6634,7 @@ No match
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
\= Expect no match
abc
@ -6668,7 +6668,7 @@ No match
+15 ^ x
+16 ^^ y
+17 ^ ^ z
+18 ^ ^
+18 ^ ^ End of pattern
0: xyz
/(?C)ab/
@ -6684,7 +6684,7 @@ No match
--->ab
+0 ^ a
+1 ^^ b
+2 ^ ^
+2 ^ ^ End of pattern
0: ab
ab\=callout_none
0: ab
@ -6717,7 +6717,7 @@ No match
+8 ^ [a]
+17 ^ ^ |
+22 ^ ^ $
+23 ^ ^
+23 ^ ^ End of pattern
0: "ab"
"ab"\=callout_none
0: "ab"