Add callout_flags to callout blocks, and set bits within it from pcre2_match()
interpretation.
This commit is contained in:
parent
814cc96bc5
commit
94d5f4a050
|
@ -89,6 +89,12 @@ and set its never-changing fields once only.
|
|||
compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS
|
||||
to retrieve them, and update pcre2test to show them.
|
||||
|
||||
22. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new
|
||||
field callout_flags in callout blocks. The bits are set by pcre2_match(), but
|
||||
not by JIT or pcre2_dfa_match(). Their settings are shown in pcre2test callouts
|
||||
if the callout_extra subject modifier is set. These bits are provided to help
|
||||
with tracking how a backtracking match is proceeding.
|
||||
|
||||
|
||||
Version 10.30 14-August-2017
|
||||
----------------------------
|
||||
|
|
|
@ -30,7 +30,13 @@ DESCRIPTION
|
|||
<P>
|
||||
This function matches a compiled regular expression against a given subject
|
||||
string, using a matching algorithm that is similar to Perl's. It returns
|
||||
offsets to captured substrings. Its arguments are:
|
||||
offsets to what it has matched and to captured substrings via the
|
||||
<b>match_data</b> block, which can be processed by functions with names that
|
||||
start with <b>pcre2_get_ovector_...()</b> or <b>pcre2_substring_...()</b>. The
|
||||
return from <b>pcre2_match()</b> is one more than the highest numbered capturing
|
||||
pair that has been set (for example, 1 if there are no captures), zero if the
|
||||
vector of offsets is too small, or a negative error code for no match and other
|
||||
errors. The function arguments are:
|
||||
<pre>
|
||||
<i>code</i> Points to the compiled pattern
|
||||
<i>subject</i> Points to the subject string
|
||||
|
|
|
@ -27,7 +27,7 @@ DESCRIPTION
|
|||
<P>
|
||||
This function returns information about a compiled pattern. Its arguments are:
|
||||
<pre>
|
||||
<i>code</i> Pointer to a compiled regular expression
|
||||
<i>code</i> Pointer to a compiled regular expression pattern
|
||||
<i>what</i> What information is required
|
||||
<i>where</i> Where to put the information
|
||||
</pre>
|
||||
|
@ -42,6 +42,8 @@ request are as follows:
|
|||
PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
|
||||
PCRE2_INFO_CAPTURECOUNT Number of capturing subpatterns
|
||||
PCRE2_INFO_DEPTHLIMIT Backtracking depth limit if set, otherwise PCRE2_ERROR_UNSET
|
||||
PCRE2_INFO_EXTRAOPTIONS Extra options that were passed in the
|
||||
compile context
|
||||
PCRE2_INFO_FIRSTBITMAP Bitmap of first code units, or NULL
|
||||
PCRE2_INFO_FIRSTCODETYPE Type of start-of-match information
|
||||
0 nothing set
|
||||
|
|
|
@ -920,11 +920,15 @@ The <i>offset_limit</i> parameter limits how far an unanchored search can
|
|||
advance in the subject string. The default value is PCRE2_UNSET. The
|
||||
<b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> functions return
|
||||
PCRE2_ERROR_NOMATCH if a match with a starting point before or at the given
|
||||
offset is not found. For example, if the pattern /abc/ is matched against
|
||||
"123abc" with an offset limit less than 3, the result is PCRE2_ERROR_NO_MATCH.
|
||||
A match can never be found if the <i>startoffset</i> argument of
|
||||
<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> is greater than the offset
|
||||
limit.
|
||||
offset is not found. The <b>pcre2_substitute()</b> function makes no more
|
||||
substitutions.
|
||||
</P>
|
||||
<P>
|
||||
For example, if the pattern /abc/ is matched against "123abc" with an offset
|
||||
limit less than 3, the result is PCRE2_ERROR_NO_MATCH. A match can never be
|
||||
found if the <i>startoffset</i> argument of <b>pcre2_match()</b>,
|
||||
<b>pcre2_dfa_match()</b>, or <b>pcre2_substitute()</b> is greater than the offset
|
||||
limit set in the match context.
|
||||
</P>
|
||||
<P>
|
||||
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
|
||||
|
@ -934,10 +938,11 @@ PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
|||
</P>
|
||||
<P>
|
||||
The offset limit facility can be used to track progress when searching large
|
||||
subject strings. See also the PCRE2_FIRSTLINE option, which requires a match to
|
||||
start within the first line of the subject. If this is set with an offset
|
||||
limit, a match must occur in the first line and also within the offset limit.
|
||||
In other words, whichever limit comes first is used.
|
||||
subject strings or to limit the extent of global substitutions. See also the
|
||||
PCRE2_FIRSTLINE option, which requires a match to start within the first line
|
||||
of the subject. If this is set with an offset limit, a match must occur in the
|
||||
first line and also within the offset limit. In other words, whichever limit
|
||||
comes first is used.
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_heap_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
|
@ -1940,12 +1945,15 @@ are as follows:
|
|||
<pre>
|
||||
PCRE2_INFO_ALLOPTIONS
|
||||
PCRE2_INFO_ARGOPTIONS
|
||||
PCRE2_INFO_EXTRAOPTIONS
|
||||
</pre>
|
||||
Return a copy of the pattern's options. The third argument should point to a
|
||||
Return copies of the pattern's options. The third argument should point to a
|
||||
<b>uint32_t</b> variable. PCRE2_INFO_ARGOPTIONS returns exactly the options that
|
||||
were passed to <b>pcre2_compile()</b>, whereas PCRE2_INFO_ALLOPTIONS returns
|
||||
the compile options as modified by any top-level (*XXX) option settings such as
|
||||
(*UTF) at the start of the pattern itself.
|
||||
(*UTF) at the start of the pattern itself. PCRE2_INFO_EXTRAOPTIONS returns the
|
||||
extra options that were set in the compile context by calling the
|
||||
pcre2_set_compile_extra_options() function.
|
||||
</P>
|
||||
<P>
|
||||
For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED
|
||||
|
@ -3157,13 +3165,27 @@ options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
|
|||
</P>
|
||||
<P>
|
||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
|
||||
replacing every matching substring. If this is not set, only the first matching
|
||||
substring is replaced. If any matched substring has zero length, after the
|
||||
substitution has happened, an attempt to find a non-empty match at the same
|
||||
position is performed. If this is not successful, the current position is
|
||||
advanced by one character except when CRLF is a valid newline sequence and the
|
||||
next two characters are CR, LF. In this case, the current position is advanced
|
||||
by two characters.
|
||||
replacing every matching substring. If this option is not set, only the first
|
||||
matching substring is replaced. The search for matches takes place in the
|
||||
original subject string (that is, previous replacements do not affect it).
|
||||
Iteration is implemented by advancing the <i>startoffset</i> value for each
|
||||
search, which is always passed the entire subject string. If an offset limit is
|
||||
set in the match context, searching stops when that limit is reached.
|
||||
</P>
|
||||
<P>
|
||||
You can restrict the effect of a global substitution to a portion of the
|
||||
subject string by setting either or both of <i>startoffset</i> and an offset
|
||||
limit. Here is a \fPpcre2test\fP example:
|
||||
<pre>
|
||||
/B/g,replace=!,use_offset_limit
|
||||
ABC ABC ABC ABC\=offset=3,offset_limit=12
|
||||
2: ABC A!C A!C ABC
|
||||
</pre>
|
||||
When continuing with global substitutions after matching a substring with zero
|
||||
length, an attempt to find a non-empty match at the same offset is performed.
|
||||
If this is not successful, the offset is advanced by one character except when
|
||||
CRLF is a valid newline sequence and the next two characters are CR, LF. In
|
||||
this case, the offset is advanced by two characters.
|
||||
</P>
|
||||
<P>
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
|
||||
|
@ -3398,7 +3420,7 @@ Here is an example of a simple call to <b>pcre2_dfa_match()</b>:
|
|||
11, /* the length of the subject string */
|
||||
0, /* start at offset 0 in the subject */
|
||||
0, /* default options */
|
||||
match_data, /* the match data block */
|
||||
md, /* the match data block */
|
||||
NULL, /* a match context; NULL means use defaults */
|
||||
wspace, /* working space vector */
|
||||
20); /* number of elements (NOT size in bytes) */
|
||||
|
@ -3567,7 +3589,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 13 October 2017
|
||||
Last updated: 16 December 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -206,18 +206,20 @@ callouts such as the example above are obeyed.
|
|||
<br><a name="SEC4" href="#TOC1">THE CALLOUT INTERFACE</a><br>
|
||||
<P>
|
||||
During matching, when PCRE2 reaches a callout point, if an external function is
|
||||
provided in the match context, it is called. This applies to both normal and
|
||||
DFA matching. The first argument to the callout function is a pointer to a
|
||||
<b>pcre2_callout</b> block. The second argument is the void * callout data that
|
||||
was supplied when the callout was set up by calling <b>pcre2_set_callout()</b>
|
||||
(see the
|
||||
provided in the match context, it is called. This applies to both normal,
|
||||
DFA, and JIT matching. The first argument to the callout function is a pointer
|
||||
to a <b>pcre2_callout</b> block. The second argument is the void * callout data
|
||||
that was supplied when the callout was set up by calling
|
||||
<b>pcre2_set_callout()</b> (see the
|
||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||
documentation). The callout block structure contains the following fields:
|
||||
documentation). The callout block structure contains the following fields, not
|
||||
necessarily in this order:
|
||||
<pre>
|
||||
uint32_t <i>version</i>;
|
||||
uint32_t <i>callout_number</i>;
|
||||
uint32_t <i>capture_top</i>;
|
||||
uint32_t <i>capture_last</i>;
|
||||
uint32_t <i>callout_flags</i>;
|
||||
PCRE2_SIZE *<i>offset_vector</i>;
|
||||
PCRE2_SPTR <i>mark</i>;
|
||||
PCRE2_SPTR <i>subject</i>;
|
||||
|
@ -231,11 +233,12 @@ documentation). The callout block structure contains the following fields:
|
|||
PCRE2_SPTR <i>callout_string</i>;
|
||||
</pre>
|
||||
The <i>version</i> field contains the version number of the block format. The
|
||||
current version is 1; the three callout string fields were added for this
|
||||
version. If you are writing an application that might use an earlier release of
|
||||
PCRE2, you should check the version number before accessing any of these
|
||||
fields. The version number will increase in future if more fields are added,
|
||||
but the intention is never to remove any of the existing fields.
|
||||
current version is 2; the three callout string fields were added for version 1,
|
||||
and the <i>callout_flags</i> field for version 2. If you are writing an
|
||||
application that might use an earlier release of PCRE2, you should check the
|
||||
version number before accessing any of these fields. The version number will
|
||||
increase in future if more fields are added, but the intention is never to
|
||||
remove any of the existing fields.
|
||||
</P>
|
||||
<br><b>
|
||||
Fields for numerical callouts
|
||||
|
@ -358,6 +361,36 @@ the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
|
|||
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
|
||||
callouts from the DFA matching function this field always contains NULL.
|
||||
</P>
|
||||
<P>
|
||||
The <i>callout_flags</i> field is always zero in callouts from
|
||||
<b>pcre2_dfa_match()</b> or when JIT is being used. When <b>pcre2_match()</b>
|
||||
without JIT is used, the following bits may be set:
|
||||
<pre>
|
||||
PCRE2_CALLOUT_STARTMATCH
|
||||
</pre>
|
||||
This is set for the first callout after the start of matching for each new
|
||||
starting position in the subject.
|
||||
<pre>
|
||||
PCRE2_CALLOUT_BACKTRACK
|
||||
</pre>
|
||||
This is set if there has been a matching backtrack since the previous callout,
|
||||
or since the start of matching if this is the first callout from a
|
||||
<b>pcre2_match()</b> run.
|
||||
</P>
|
||||
<P>
|
||||
Both bits are set when a backtrack has caused a "bumpalong" to a new starting
|
||||
position in the subject. Output from <b>pcre2test</b> does not indicate the
|
||||
presence of these bits unless the <b>callout_extra</b> modifier is set.
|
||||
</P>
|
||||
<P>
|
||||
The information in the <b>callout_flags</b> field is provided so that
|
||||
applications can track and tell their users how matching with backtracking is
|
||||
done. This can be useful when trying to optimize patterns, or just to
|
||||
understand how PCRE2 works. There is no support in <b>pcre2_dfa_match()</b>
|
||||
because there is no backtracking in DFA matching, and there is no support in
|
||||
JIT because JIT is all about maximimizing matching performance. In both these
|
||||
cases the <b>callout_flags</b> field is always zero.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">RETURN VALUES FROM CALLOUTS</a><br>
|
||||
<P>
|
||||
The external callout function returns an integer to PCRE2. If the value is
|
||||
|
@ -428,7 +461,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 14 April 2017
|
||||
Last updated: 22 December 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -133,11 +133,13 @@ The <b>--locale</b> option can be used to override this.
|
|||
<br><a name="SEC3" href="#TOC1">SUPPORT FOR COMPRESSED FILES</a><br>
|
||||
<P>
|
||||
It is possible to compile <b>pcre2grep</b> so that it uses <b>libz</b> or
|
||||
<b>libbz2</b> to read files whose names end in <b>.gz</b> or <b>.bz2</b>,
|
||||
respectively. You can find out whether your binary has support for one or both
|
||||
of these file types by running it with the <b>--help</b> option. If the
|
||||
appropriate support is not present, files are treated as plain text. The
|
||||
standard input is always so treated.
|
||||
<b>libbz2</b> to read compressed files whose names end in <b>.gz</b> or
|
||||
<b>.bz2</b>, respectively. You can find out whether your <b>pcre2grep</b> binary
|
||||
has support for one or both of these file types by running it with the
|
||||
<b>--help</b> option. If the appropriate support is not present, all files are
|
||||
treated as plain text. The standard input is always so treated. When input is
|
||||
from a compressed .gz or .bz2 file, the <b>--line-buffered</b> option is
|
||||
ignored.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
|
||||
<P>
|
||||
|
@ -151,7 +153,7 @@ of changing the way binary files are handled.
|
|||
<br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
|
||||
<P>
|
||||
The order in which some of the options appear can affect the output. For
|
||||
example, both the <b>-h</b> and <b>-l</b> options affect the printing of file
|
||||
example, both the <b>-H</b> and <b>-l</b> options affect the printing of file
|
||||
names. Whichever comes later in the command line will be the one that takes
|
||||
effect. Similarly, except where noted below, if an option is given twice, the
|
||||
later setting is used. Numerical values for options may be followed by K or M,
|
||||
|
@ -396,14 +398,16 @@ searching a single file. By default, the file name is not shown in this case.
|
|||
For matching lines, the file name is followed by a colon; for context lines, a
|
||||
hyphen separator is used. If a line number is also being output, it follows the
|
||||
file name. When the <b>-M</b> option causes a pattern to match more than one
|
||||
line, only the first is preceded by the file name.
|
||||
line, only the first is preceded by the file name. This option overrides any
|
||||
previous <b>-h</b>, <b>-l</b>, or <b>-L</b> options.
|
||||
</P>
|
||||
<P>
|
||||
<b>-h</b>, <b>--no-filename</b>
|
||||
Suppress the output file names when searching multiple files. By default,
|
||||
file names are shown when multiple files are searched. For matching lines, the
|
||||
file name is followed by a colon; for context lines, a hyphen separator is used.
|
||||
If a line number is also being output, it follows the file name.
|
||||
If a line number is also being output, it follows the file name. This option
|
||||
overrides any previous <b>-H</b>, <b>-L</b>, or <b>-l</b> options.
|
||||
</P>
|
||||
<P>
|
||||
<b>--heap-limit</b>=<i>number</i>
|
||||
|
@ -460,17 +464,19 @@ given any number of times. If a directory matches both <b>--include-dir</b> and
|
|||
<b>-L</b>, <b>--files-without-match</b>
|
||||
Instead of outputting lines from the files, just output the names of the files
|
||||
that do not contain any lines that would have been output. Each file name is
|
||||
output once, on a separate line.
|
||||
output once, on a separate line. This option overrides any previous <b>-H</b>,
|
||||
<b>-h</b>, or <b>-l</b> options.
|
||||
</P>
|
||||
<P>
|
||||
<b>-l</b>, <b>--files-with-matches</b>
|
||||
Instead of outputting lines from the files, just output the names of the files
|
||||
containing lines that would have been output. Each file name is output
|
||||
once, on a separate line. Searching normally stops as soon as a matching line
|
||||
is found in a file. However, if the <b>-c</b> (count) option is also used,
|
||||
matching continues in order to obtain the correct count, and those files that
|
||||
have at least one match are listed along with their counts. Using this option
|
||||
with <b>-c</b> is a way of suppressing the listing of files with no matches.
|
||||
containing lines that would have been output. Each file name is output once, on
|
||||
a separate line. Searching normally stops as soon as a matching line is found
|
||||
in a file. However, if the <b>-c</b> (count) option is also used, matching
|
||||
continues in order to obtain the correct count, and those files that have at
|
||||
least one match are listed along with their counts. Using this option with
|
||||
<b>-c</b> is a way of suppressing the listing of files with no matches. This
|
||||
opeion overrides any previous <b>-H</b>, <b>-h</b>, or <b>-L</b> options.
|
||||
</P>
|
||||
<P>
|
||||
<b>--label</b>=<i>name</i>
|
||||
|
@ -480,14 +486,16 @@ short form for this option.
|
|||
</P>
|
||||
<P>
|
||||
<b>--line-buffered</b>
|
||||
When this option is given, input is read and processed line by line, and the
|
||||
output is flushed after each write. By default, input is read in large chunks,
|
||||
unless <b>pcre2grep</b> can determine that it is reading from a terminal (which
|
||||
is currently possible only in Unix-like environments). Output to terminal is
|
||||
normally automatically flushed by the operating system. This option can be
|
||||
useful when the input or output is attached to a pipe and you do not want
|
||||
<b>pcre2grep</b> to buffer up large amounts of data. However, its use will
|
||||
affect performance, and the <b>-M</b> (multiline) option ceases to work.
|
||||
When this option is given, non-compressed input is read and processed line by
|
||||
line, and the output is flushed after each write. By default, input is read in
|
||||
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
|
||||
terminal (which is currently possible only in Unix-like environments). Output
|
||||
to terminal is normally automatically flushed by the operating system. This
|
||||
option can be useful when the input or output is attached to a pipe and you do
|
||||
not want <b>pcre2grep</b> to buffer up large amounts of data. However, its use
|
||||
will affect performance, and the <b>-M</b> (multiline) option ceases to work.
|
||||
When input is from a compressed .gz or .bz2 file, <b>--line-buffered</b> is
|
||||
ignored.
|
||||
</P>
|
||||
<P>
|
||||
<b>--line-offsets</b>
|
||||
|
@ -941,7 +949,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC15" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 11 October 2017
|
||||
Last updated: 13 November 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -159,6 +159,12 @@ Behave as if each pattern has the <b>auto_callout</b> modifier, that is, insert
|
|||
automatic callouts into every pattern that is compiled.
|
||||
</P>
|
||||
<P>
|
||||
<b>-AC</b>
|
||||
As for <b>-ac</b>, but in addition behave as if each subject line has the
|
||||
<b>callout_extra</b> modifier, that is, show additional information from
|
||||
callouts.
|
||||
</P>
|
||||
<P>
|
||||
<b>-b</b>
|
||||
Behave as if each pattern has the <b>fullbincode</b> modifier; the full
|
||||
internal binary form of the pattern is output after compilation.
|
||||
|
@ -1182,6 +1188,7 @@ pattern.
|
|||
callout_capture show captures at callout time
|
||||
callout_data=<n> set a value to pass via callouts
|
||||
callout_error=<n>[:<m>] control callout error
|
||||
callout_extra show extra callout information
|
||||
callout_fail=<n>[:<m>] control callout failure
|
||||
callout_no_where do not show position of a callout
|
||||
callout_none do not supply a callout function
|
||||
|
@ -1694,49 +1701,10 @@ documentation.
|
|||
<br><a name="SEC16" href="#TOC1">CALLOUTS</a><br>
|
||||
<P>
|
||||
If the pattern contains any callout requests, <b>pcre2test</b>'s callout
|
||||
function is called during matching unless <b>callout_none</b> is specified.
|
||||
This works with both matching functions.
|
||||
</P>
|
||||
<P>
|
||||
The callout function in <b>pcre2test</b> returns zero (carry on matching) by
|
||||
default, but you can use a <b>callout_fail</b> modifier in a subject line to
|
||||
change this and other parameters of the callout.
|
||||
</P>
|
||||
<P>
|
||||
If <b>callout_capture</b> is set, the current captured groups are output when a
|
||||
callout occurs. By default, the callout function then generates output that
|
||||
indicates where the current match start and matching points are in the subject,
|
||||
and what the next pattern item is. This output is suppressed if the
|
||||
<b>callout_no_where</b> modifier is set.
|
||||
</P>
|
||||
<P>
|
||||
The default return from the callout function is zero, which allows matching to
|
||||
continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
|
||||
there is only one number, 1 is returned instead of 0 (causing matching to
|
||||
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
|
||||
are given, 1 is returned when callout <n> is reached and there have been at
|
||||
least <m> callouts. The <b>callout_error</b> modifier is similar, except that
|
||||
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
||||
aborted. If both these modifiers are set for the same callout number,
|
||||
<b>callout_error</b> takes precedence. Note that callouts with string arguments
|
||||
are always given the number zero. See
|
||||
</P>
|
||||
<P>
|
||||
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
|
||||
This is set as the "user data" that is passed to the matching function, and
|
||||
passed back when the callout function is invoked. Any value other than zero is
|
||||
used as a return from <b>pcre2test</b>'s callout function.
|
||||
</P>
|
||||
<P>
|
||||
Inserting callouts can be helpful when using <b>pcre2test</b> to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
The output for callouts with numerical arguments and those with string
|
||||
arguments is slightly different.
|
||||
function is called during matching unless <b>callout_none</b> is specified. This
|
||||
works with both matching functions, and with JIT, though there are some
|
||||
differences in behaviour. The output for callouts with numerical arguments and
|
||||
those with string arguments is slightly different.
|
||||
</P>
|
||||
<br><b>
|
||||
Callouts with numerical arguments
|
||||
|
@ -1811,6 +1779,107 @@ example:
|
|||
|
||||
</PRE>
|
||||
</P>
|
||||
<br><b>
|
||||
Callout modifiers
|
||||
</b><br>
|
||||
<P>
|
||||
The callout function in <b>pcre2test</b> returns zero (carry on matching) by
|
||||
default, but you can use a <b>callout_fail</b> modifier in a subject line to
|
||||
change this and other parameters of the callout (see below).
|
||||
</P>
|
||||
<P>
|
||||
If the <b>callout_capture</b> modifier is set, the current captured groups are
|
||||
output when a callout occurs. This is useful only for non-DFA matching, as
|
||||
<b>pcre2_dfa_match()</b> does not support capturing, so no captures are ever
|
||||
shown.
|
||||
</P>
|
||||
<P>
|
||||
The normal callout output, showing the callout number or pattern offset (as
|
||||
described above) is suppressed if the <b>callout_no_where</b> modifier is set.
|
||||
</P>
|
||||
<P>
|
||||
When using the interpretive matching function <b>pcre2_match()</b> without JIT,
|
||||
setting the <b>callout_extra</b> modifier causes additional output from
|
||||
<b>pcre2test</b>'s callout function to be generated. For the first callout in a
|
||||
match attempt at a new starting position in the subject, "New match attempt" is
|
||||
output. If there has been a backtrack since the last callout (or start of
|
||||
matching if this is the first callout), "Backtrack" is output, followed by "No
|
||||
other matching paths" if the backtrack ended the previous match attempt. For
|
||||
example:
|
||||
<pre>
|
||||
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||
data> aac\=callout_extra
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
+3 ^ ^ )
|
||||
+4 ^ ^ b
|
||||
Backtrack
|
||||
--->aac
|
||||
+3 ^^ )
|
||||
+4 ^^ b
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
+3 ^^ )
|
||||
+4 ^^ b
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
No match
|
||||
</pre>
|
||||
Notice that various optimizations must be turned off if you want all possible
|
||||
matching paths to be scanned. If <b>no_start_optimize</b> is not used, there is
|
||||
an immediate "no match", without any callouts, because the starting
|
||||
optimization fails to find "b" in the subject, which it knows must be present
|
||||
for any match. If <b>no_auto_possess</b> is not used, the "a+" item is turned
|
||||
into "a++", which reduces the number of backtracks.
|
||||
</P>
|
||||
<P>
|
||||
The <b>callout_extra</b> modifier has no effect if used with the DFA matching
|
||||
function, or with JIT.
|
||||
</P>
|
||||
<br><b>
|
||||
Return values from callouts
|
||||
</b><br>
|
||||
<P>
|
||||
The default return from the callout function is zero, which allows matching to
|
||||
continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
|
||||
there is only one number, 1 is returned instead of 0 (causing matching to
|
||||
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
|
||||
are given, 1 is returned when callout <n> is reached and there have been at
|
||||
least <m> callouts. The <b>callout_error</b> modifier is similar, except that
|
||||
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
||||
aborted. If both these modifiers are set for the same callout number,
|
||||
<b>callout_error</b> takes precedence. Note that callouts with string arguments
|
||||
are always given the number zero.
|
||||
</P>
|
||||
<P>
|
||||
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
|
||||
This is set as the "user data" that is passed to the matching function, and
|
||||
passed back when the callout function is invoked. Any value other than zero is
|
||||
used as a return from <b>pcre2test</b>'s callout function.
|
||||
</P>
|
||||
<P>
|
||||
Inserting callouts can be helpful when using <b>pcre2test</b> to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC17" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
|
||||
<P>
|
||||
When <b>pcre2test</b> is outputting text in the compiled version of a pattern,
|
||||
|
@ -1913,7 +1982,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 17 October 2017
|
||||
Last updated: 21 December 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
1394
doc/pcre2.txt
1394
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -3185,7 +3185,7 @@ subject string by setting either or both of \fIstartoffset\fP and an offset
|
|||
limit. Here is a \fPpcre2test\fP example:
|
||||
.sp
|
||||
/B/g,replace=!,use_offset_limit
|
||||
ABC ABC ABC ABC\=offset=3,offset_limit=12
|
||||
ABC ABC ABC ABC\e=offset=3,offset_limit=12
|
||||
2: ABC A!C A!C ABC
|
||||
.sp
|
||||
When continuing with global substitutions after matching a substring with zero
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2CALLOUT 3 "14 April 2017" "PCRE2 10.30"
|
||||
.TH PCRE2CALLOUT 3 "22 December 2017" "PCRE2 10.31"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -191,20 +191,22 @@ callouts such as the example above are obeyed.
|
|||
.rs
|
||||
.sp
|
||||
During matching, when PCRE2 reaches a callout point, if an external function is
|
||||
provided in the match context, it is called. This applies to both normal and
|
||||
DFA matching. The first argument to the callout function is a pointer to a
|
||||
\fBpcre2_callout\fP block. The second argument is the void * callout data that
|
||||
was supplied when the callout was set up by calling \fBpcre2_set_callout()\fP
|
||||
(see the
|
||||
provided in the match context, it is called. This applies to both normal,
|
||||
DFA, and JIT matching. The first argument to the callout function is a pointer
|
||||
to a \fBpcre2_callout\fP block. The second argument is the void * callout data
|
||||
that was supplied when the callout was set up by calling
|
||||
\fBpcre2_set_callout()\fP (see the
|
||||
.\" HREF
|
||||
\fBpcre2api\fP
|
||||
.\"
|
||||
documentation). The callout block structure contains the following fields:
|
||||
documentation). The callout block structure contains the following fields, not
|
||||
necessarily in this order:
|
||||
.sp
|
||||
uint32_t \fIversion\fP;
|
||||
uint32_t \fIcallout_number\fP;
|
||||
uint32_t \fIcapture_top\fP;
|
||||
uint32_t \fIcapture_last\fP;
|
||||
uint32_t \fIcallout_flags\fP;
|
||||
PCRE2_SIZE *\fIoffset_vector\fP;
|
||||
PCRE2_SPTR \fImark\fP;
|
||||
PCRE2_SPTR \fIsubject\fP;
|
||||
|
@ -218,11 +220,12 @@ documentation). The callout block structure contains the following fields:
|
|||
PCRE2_SPTR \fIcallout_string\fP;
|
||||
.sp
|
||||
The \fIversion\fP field contains the version number of the block format. The
|
||||
current version is 1; the three callout string fields were added for this
|
||||
version. If you are writing an application that might use an earlier release of
|
||||
PCRE2, you should check the version number before accessing any of these
|
||||
fields. The version number will increase in future if more fields are added,
|
||||
but the intention is never to remove any of the existing fields.
|
||||
current version is 2; the three callout string fields were added for version 1,
|
||||
and the \fIcallout_flags\fP field for version 2. If you are writing an
|
||||
application that might use an earlier release of PCRE2, you should check the
|
||||
version number before accessing any of these fields. The version number will
|
||||
increase in future if more fields are added, but the intention is never to
|
||||
remove any of the existing fields.
|
||||
.
|
||||
.
|
||||
.SS "Fields for numerical callouts"
|
||||
|
@ -331,6 +334,33 @@ the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
|
|||
(*THEN) item in the match, or NULL if no such items have been passed. Instances
|
||||
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
|
||||
callouts from the DFA matching function this field always contains NULL.
|
||||
.P
|
||||
The \fIcallout_flags\fP field is always zero in callouts from
|
||||
\fBpcre2_dfa_match()\fP or when JIT is being used. When \fBpcre2_match()\fP
|
||||
without JIT is used, the following bits may be set:
|
||||
.sp
|
||||
PCRE2_CALLOUT_STARTMATCH
|
||||
.sp
|
||||
This is set for the first callout after the start of matching for each new
|
||||
starting position in the subject.
|
||||
.sp
|
||||
PCRE2_CALLOUT_BACKTRACK
|
||||
.sp
|
||||
This is set if there has been a matching backtrack since the previous callout,
|
||||
or since the start of matching if this is the first callout from a
|
||||
\fBpcre2_match()\fP run.
|
||||
.P
|
||||
Both bits are set when a backtrack has caused a "bumpalong" to a new starting
|
||||
position in the subject. Output from \fBpcre2test\fP does not indicate the
|
||||
presence of these bits unless the \fBcallout_extra\fP modifier is set.
|
||||
.P
|
||||
The information in the \fBcallout_flags\fP field is provided so that
|
||||
applications can track and tell their users how matching with backtracking is
|
||||
done. This can be useful when trying to optimize patterns, or just to
|
||||
understand how PCRE2 works. There is no support in \fBpcre2_dfa_match()\fP
|
||||
because there is no backtracking in DFA matching, and there is no support in
|
||||
JIT because JIT is all about maximimizing matching performance. In both these
|
||||
cases the \fBcallout_flags\fP field is always zero.
|
||||
.
|
||||
.
|
||||
.SH "RETURN VALUES FROM CALLOUTS"
|
||||
|
@ -411,6 +441,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 14 April 2017
|
||||
Last updated: 22 December 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
File diff suppressed because it is too large
Load Diff
150
doc/pcre2test.1
150
doc/pcre2test.1
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2TEST 1 "17 October 2017" "PCRE 10.31"
|
||||
.TH PCRE2TEST 1 "21 Decbmber 2017" "PCRE 10.31"
|
||||
.SH NAME
|
||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -129,6 +129,11 @@ has not been built, this option causes an error.
|
|||
Behave as if each pattern has the \fBauto_callout\fP modifier, that is, insert
|
||||
automatic callouts into every pattern that is compiled.
|
||||
.TP 10
|
||||
\fB-AC\fP
|
||||
As for \fB-ac\fP, but in addition behave as if each subject line has the
|
||||
\fBcallout_extra\fP modifier, that is, show additional information from
|
||||
callouts.
|
||||
.TP 10
|
||||
\fB-b\fP
|
||||
Behave as if each pattern has the \fBfullbincode\fP modifier; the full
|
||||
internal binary form of the pattern is output after compilation.
|
||||
|
@ -1152,6 +1157,7 @@ pattern.
|
|||
callout_capture show captures at callout time
|
||||
callout_data=<n> set a value to pass via callouts
|
||||
callout_error=<n>[:<m>] control callout error
|
||||
callout_extra show extra callout information
|
||||
callout_fail=<n>[:<m>] control callout failure
|
||||
callout_no_where do not show position of a callout
|
||||
callout_none do not supply a callout function
|
||||
|
@ -1664,45 +1670,10 @@ documentation.
|
|||
.rs
|
||||
.sp
|
||||
If the pattern contains any callout requests, \fBpcre2test\fP's callout
|
||||
function is called during matching unless \fBcallout_none\fP is specified.
|
||||
This works with both matching functions.
|
||||
.P
|
||||
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
|
||||
default, but you can use a \fBcallout_fail\fP modifier in a subject line to
|
||||
change this and other parameters of the callout.
|
||||
.P
|
||||
If \fBcallout_capture\fP is set, the current captured groups are output when a
|
||||
callout occurs. By default, the callout function then generates output that
|
||||
indicates where the current match start and matching points are in the subject,
|
||||
and what the next pattern item is. This output is suppressed if the
|
||||
\fBcallout_no_where\fP modifier is set.
|
||||
.P
|
||||
The default return from the callout function is zero, which allows matching to
|
||||
continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
|
||||
there is only one number, 1 is returned instead of 0 (causing matching to
|
||||
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
|
||||
are given, 1 is returned when callout <n> is reached and there have been at
|
||||
least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
|
||||
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
||||
aborted. If both these modifiers are set for the same callout number,
|
||||
\fBcallout_error\fP takes precedence. Note that callouts with string arguments
|
||||
are always given the number zero. See
|
||||
.P
|
||||
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
|
||||
This is set as the "user data" that is passed to the matching function, and
|
||||
passed back when the callout function is invoked. Any value other than zero is
|
||||
used as a return from \fBpcre2test\fP's callout function.
|
||||
.P
|
||||
Inserting callouts can be helpful when using \fBpcre2test\fP to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
.\" HREF
|
||||
\fBpcre2callout\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
The output for callouts with numerical arguments and those with string
|
||||
arguments is slightly different.
|
||||
function is called during matching unless \fBcallout_none\fP is specified. This
|
||||
works with both matching functions, and with JIT, though there are some
|
||||
differences in behaviour. The output for callouts with numerical arguments and
|
||||
those with string arguments is slightly different.
|
||||
.
|
||||
.
|
||||
.SS "Callouts with numerical arguments"
|
||||
|
@ -1776,6 +1747,103 @@ example:
|
|||
.sp
|
||||
.
|
||||
.
|
||||
.SS "Callout modifiers"
|
||||
.rs
|
||||
.sp
|
||||
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
|
||||
default, but you can use a \fBcallout_fail\fP modifier in a subject line to
|
||||
change this and other parameters of the callout (see below).
|
||||
.P
|
||||
If the \fBcallout_capture\fP modifier is set, the current captured groups are
|
||||
output when a callout occurs. This is useful only for non-DFA matching, as
|
||||
\fBpcre2_dfa_match()\fP does not support capturing, so no captures are ever
|
||||
shown.
|
||||
.P
|
||||
The normal callout output, showing the callout number or pattern offset (as
|
||||
described above) is suppressed if the \fBcallout_no_where\fP modifier is set.
|
||||
.P
|
||||
When using the interpretive matching function \fBpcre2_match()\fP without JIT,
|
||||
setting the \fBcallout_extra\fP modifier causes additional output from
|
||||
\fBpcre2test\fP's callout function to be generated. For the first callout in a
|
||||
match attempt at a new starting position in the subject, "New match attempt" is
|
||||
output. If there has been a backtrack since the last callout (or start of
|
||||
matching if this is the first callout), "Backtrack" is output, followed by "No
|
||||
other matching paths" if the backtrack ended the previous match attempt. For
|
||||
example:
|
||||
.sp
|
||||
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||
data> aac\e=callout_extra
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
+3 ^ ^ )
|
||||
+4 ^ ^ b
|
||||
Backtrack
|
||||
--->aac
|
||||
+3 ^^ )
|
||||
+4 ^^ b
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
+3 ^^ )
|
||||
+4 ^^ b
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
No match
|
||||
.sp
|
||||
Notice that various optimizations must be turned off if you want all possible
|
||||
matching paths to be scanned. If \fBno_start_optimize\fP is not used, there is
|
||||
an immediate "no match", without any callouts, because the starting
|
||||
optimization fails to find "b" in the subject, which it knows must be present
|
||||
for any match. If \fBno_auto_possess\fP is not used, the "a+" item is turned
|
||||
into "a++", which reduces the number of backtracks.
|
||||
.P
|
||||
The \fBcallout_extra\fP modifier has no effect if used with the DFA matching
|
||||
function, or with JIT.
|
||||
.
|
||||
.
|
||||
.SS "Return values from callouts"
|
||||
.rs
|
||||
.sp
|
||||
The default return from the callout function is zero, which allows matching to
|
||||
continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
|
||||
there is only one number, 1 is returned instead of 0 (causing matching to
|
||||
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
|
||||
are given, 1 is returned when callout <n> is reached and there have been at
|
||||
least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
|
||||
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
|
||||
aborted. If both these modifiers are set for the same callout number,
|
||||
\fBcallout_error\fP takes precedence. Note that callouts with string arguments
|
||||
are always given the number zero.
|
||||
.P
|
||||
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
|
||||
This is set as the "user data" that is passed to the matching function, and
|
||||
passed back when the callout function is invoked. Any value other than zero is
|
||||
used as a return from \fBpcre2test\fP's callout function.
|
||||
.P
|
||||
Inserting callouts can be helpful when using \fBpcre2test\fP to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
.\" HREF
|
||||
\fBpcre2callout\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.
|
||||
.SH "NON-PRINTING CHARACTERS"
|
||||
.rs
|
||||
|
@ -1894,6 +1962,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 17 October 2017
|
||||
Last updated: 21 December 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -120,6 +120,10 @@ COMMAND LINE OPTIONS
|
|||
is, insert automatic callouts into every pattern that is com-
|
||||
piled.
|
||||
|
||||
-AC As for -ac, but in addition behave as if each subject line
|
||||
has the callout_extra modifier, that is, show additional
|
||||
information from callouts.
|
||||
|
||||
-b Behave as if each pattern has the fullbincode modifier; the
|
||||
full internal binary form of the pattern is output after com-
|
||||
pilation.
|
||||
|
@ -1056,6 +1060,7 @@ SUBJECT MODIFIERS
|
|||
callout_capture show captures at callout time
|
||||
callout_data=<n> set a value to pass via callouts
|
||||
callout_error=<n>[:<m>] control callout error
|
||||
callout_extra show extra callout information
|
||||
callout_fail=<n>[:<m>] control callout failure
|
||||
callout_no_where do not show position of a callout
|
||||
callout_none do not supply a callout function
|
||||
|
@ -1529,63 +1534,30 @@ RESTARTING AFTER A PARTIAL MATCH
|
|||
CALLOUTS
|
||||
|
||||
If the pattern contains any callout requests, pcre2test's callout func-
|
||||
tion is called during matching unless callout_none is specified. This
|
||||
works with both matching functions.
|
||||
|
||||
The callout function in pcre2test returns zero (carry on matching) by
|
||||
default, but you can use a callout_fail modifier in a subject line to
|
||||
change this and other parameters of the callout.
|
||||
|
||||
If callout_capture is set, the current captured groups are output when
|
||||
a callout occurs. By default, the callout function then generates out-
|
||||
put that indicates where the current match start and matching points
|
||||
are in the subject, and what the next pattern item is. This output is
|
||||
suppressed if the callout_no_where modifier is set.
|
||||
|
||||
The default return from the callout function is zero, which allows
|
||||
matching to continue. The callout_fail modifier can be given one or two
|
||||
numbers. If there is only one number, 1 is returned instead of 0 (caus-
|
||||
ing matching to backtrack) when a callout of that number is reached. If
|
||||
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
|
||||
reached and there have been at least <m> callouts. The callout_error
|
||||
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
|
||||
ing the entire matching process to be aborted. If both these modifiers
|
||||
are set for the same callout number, callout_error takes precedence.
|
||||
Note that callouts with string arguments are always given the number
|
||||
zero. See
|
||||
|
||||
The callout_data modifier can be given an unsigned or a negative num-
|
||||
ber. This is set as the "user data" that is passed to the matching
|
||||
function, and passed back when the callout function is invoked. Any
|
||||
value other than zero is used as a return from pcre2test's callout
|
||||
function.
|
||||
|
||||
Inserting callouts can be helpful when using pcre2test to check compli-
|
||||
cated regular expressions. For further information about callouts, see
|
||||
the pcre2callout documentation.
|
||||
|
||||
The output for callouts with numerical arguments and those with string
|
||||
arguments is slightly different.
|
||||
tion is called during matching unless callout_none is specified. This
|
||||
works with both matching functions, and with JIT, though there are some
|
||||
differences in behaviour. The output for callouts with numerical argu-
|
||||
ments and those with string arguments is slightly different.
|
||||
|
||||
Callouts with numerical arguments
|
||||
|
||||
By default, the callout function displays the callout number, the start
|
||||
and current positions in the subject text at the callout time, and the
|
||||
and current positions in the subject text at the callout time, and the
|
||||
next pattern item to be tested. For example:
|
||||
|
||||
--->pqrabcdef
|
||||
0 ^ ^ \d
|
||||
|
||||
This output indicates that callout number 0 occurred for a match
|
||||
attempt starting at the fourth character of the subject string, when
|
||||
the pointer was at the seventh character, and when the next pattern
|
||||
item was \d. Just one circumflex is output if the start and current
|
||||
positions are the same, or if the current position precedes the start
|
||||
This output indicates that callout number 0 occurred for a match
|
||||
attempt starting at the fourth character of the subject string, when
|
||||
the pointer was at the seventh character, and when the next pattern
|
||||
item was \d. Just one circumflex is output if the start and current
|
||||
positions are the same, or if the current position precedes the start
|
||||
position, which can happen if the callout is in a lookbehind assertion.
|
||||
|
||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||
a result of the auto_callout pattern modifier. In this case, instead of
|
||||
showing the callout number, the offset in the pattern, preceded by a
|
||||
showing the callout number, the offset in the pattern, preceded by a
|
||||
plus, is output. For example:
|
||||
|
||||
re> /\d?[A-E]\*/auto_callout
|
||||
|
@ -1598,7 +1570,7 @@ CALLOUTS
|
|||
0: E*
|
||||
|
||||
If a pattern contains (*MARK) items, an additional line is output when-
|
||||
ever a change of latest mark is passed to the callout function. For
|
||||
ever a change of latest mark is passed to the callout function. For
|
||||
example:
|
||||
|
||||
re> /a(*MARK:X)bc/auto_callout
|
||||
|
@ -1612,17 +1584,17 @@ CALLOUTS
|
|||
+12 ^ ^
|
||||
0: abc
|
||||
|
||||
The mark changes between matching "a" and "b", but stays the same for
|
||||
the rest of the match, so nothing more is output. If, as a result of
|
||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||
The mark changes between matching "a" and "b", but stays the same for
|
||||
the rest of the match, so nothing more is output. If, as a result of
|
||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||
output.
|
||||
|
||||
Callouts with string arguments
|
||||
|
||||
The output for a callout with a string argument is similar, except that
|
||||
instead of outputting a callout number before the position indicators,
|
||||
the callout string and its offset in the pattern string are output
|
||||
before the reflection of the subject string, and the subject string is
|
||||
instead of outputting a callout number before the position indicators,
|
||||
the callout string and its offset in the pattern string are output
|
||||
before the reflection of the subject string, and the subject string is
|
||||
reflected for each callout. For example:
|
||||
|
||||
re> /^ab(?C'first')cd(?C"second")ef/
|
||||
|
@ -1636,6 +1608,100 @@ CALLOUTS
|
|||
0: abcdef
|
||||
|
||||
|
||||
Callout modifiers
|
||||
|
||||
The callout function in pcre2test returns zero (carry on matching) by
|
||||
default, but you can use a callout_fail modifier in a subject line to
|
||||
change this and other parameters of the callout (see below).
|
||||
|
||||
If the callout_capture modifier is set, the current captured groups are
|
||||
output when a callout occurs. This is useful only for non-DFA matching,
|
||||
as pcre2_dfa_match() does not support capturing, so no captures are
|
||||
ever shown.
|
||||
|
||||
The normal callout output, showing the callout number or pattern offset
|
||||
(as described above) is suppressed if the callout_no_where modifier is
|
||||
set.
|
||||
|
||||
When using the interpretive matching function pcre2_match() without
|
||||
JIT, setting the callout_extra modifier causes additional output from
|
||||
pcre2test's callout function to be generated. For the first callout in
|
||||
a match attempt at a new starting position in the subject, "New match
|
||||
attempt" is output. If there has been a backtrack since the last call-
|
||||
out (or start of matching if this is the first callout), "Backtrack" is
|
||||
output, followed by "No other matching paths" if the backtrack ended
|
||||
the previous match attempt. For example:
|
||||
|
||||
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||
data> aac\=callout_extra
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
+3 ^ ^ )
|
||||
+4 ^ ^ b
|
||||
Backtrack
|
||||
--->aac
|
||||
+3 ^^ )
|
||||
+4 ^^ b
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
+3 ^^ )
|
||||
+4 ^^ b
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+0 ^ (
|
||||
+1 ^ a+
|
||||
No match
|
||||
|
||||
Notice that various optimizations must be turned off if you want all
|
||||
possible matching paths to be scanned. If no_start_optimize is not
|
||||
used, there is an immediate "no match", without any callouts, because
|
||||
the starting optimization fails to find "b" in the subject, which it
|
||||
knows must be present for any match. If no_auto_possess is not used,
|
||||
the "a+" item is turned into "a++", which reduces the number of back-
|
||||
tracks.
|
||||
|
||||
The callout_extra modifier has no effect if used with the DFA matching
|
||||
function, or with JIT.
|
||||
|
||||
Return values from callouts
|
||||
|
||||
The default return from the callout function is zero, which allows
|
||||
matching to continue. The callout_fail modifier can be given one or two
|
||||
numbers. If there is only one number, 1 is returned instead of 0 (caus-
|
||||
ing matching to backtrack) when a callout of that number is reached. If
|
||||
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
|
||||
reached and there have been at least <m> callouts. The callout_error
|
||||
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
|
||||
ing the entire matching process to be aborted. If both these modifiers
|
||||
are set for the same callout number, callout_error takes precedence.
|
||||
Note that callouts with string arguments are always given the number
|
||||
zero.
|
||||
|
||||
The callout_data modifier can be given an unsigned or a negative num-
|
||||
ber. This is set as the "user data" that is passed to the matching
|
||||
function, and passed back when the callout function is invoked. Any
|
||||
value other than zero is used as a return from pcre2test's callout
|
||||
function.
|
||||
|
||||
Inserting callouts can be helpful when using pcre2test to check compli-
|
||||
cated regular expressions. For further information about callouts, see
|
||||
the pcre2callout documentation.
|
||||
|
||||
|
||||
NON-PRINTING CHARACTERS
|
||||
|
||||
When pcre2test is outputting text in the compiled version of a pattern,
|
||||
|
@ -1733,5 +1799,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 17 October 2017
|
||||
Last updated: 21 December 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
|
|
|
@ -494,6 +494,11 @@ without changing the API of the function, thereby allowing old clients to work
|
|||
without modification. Define the generic version in a macro; the width-specific
|
||||
versions are generated from this macro below. */
|
||||
|
||||
/* Flags for the callout_flags field. These are cleared after a callout. */
|
||||
|
||||
#define PCRE2_CALLOUT_STARTMATCH 0x00000001u /* Set for each bumpalong */
|
||||
#define PCRE2_CALLOUT_BACKTRACK 0x00000002u /* Set after a backtrack */
|
||||
|
||||
#define PCRE2_STRUCTURE_LIST \
|
||||
typedef struct pcre2_callout_block { \
|
||||
uint32_t version; /* Identifies version of block */ \
|
||||
|
@ -513,6 +518,8 @@ typedef struct pcre2_callout_block { \
|
|||
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
|
||||
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
|
||||
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
|
||||
/* ------------------- Added for Version 2 -------------------------- */ \
|
||||
uint32_t callout_flags; /* See above for list */ \
|
||||
/* ------------------------------------------------------------------ */ \
|
||||
} pcre2_callout_block; \
|
||||
\
|
||||
|
|
|
@ -494,6 +494,11 @@ without changing the API of the function, thereby allowing old clients to work
|
|||
without modification. Define the generic version in a macro; the width-specific
|
||||
versions are generated from this macro below. */
|
||||
|
||||
/* Flags for the callout_flags field. These are cleared after a callout. */
|
||||
|
||||
#define PCRE2_CALLOUT_STARTMATCH 0x00000001u /* Set for each bumpalong */
|
||||
#define PCRE2_CALLOUT_BACKTRACK 0x00000002u /* Set after a backtrack */
|
||||
|
||||
#define PCRE2_STRUCTURE_LIST \
|
||||
typedef struct pcre2_callout_block { \
|
||||
uint32_t version; /* Identifies version of block */ \
|
||||
|
@ -513,6 +518,8 @@ typedef struct pcre2_callout_block { \
|
|||
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
|
||||
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
|
||||
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
|
||||
/* ------------------- Added for Version 2 -------------------------- */ \
|
||||
uint32_t callout_flags; /* See above for list */ \
|
||||
/* ------------------------------------------------------------------ */ \
|
||||
} pcre2_callout_block; \
|
||||
\
|
||||
|
|
|
@ -2574,7 +2574,8 @@ for (;;)
|
|||
if (mb->callout != NULL)
|
||||
{
|
||||
pcre2_callout_block cb;
|
||||
cb.version = 1;
|
||||
cb.version = 2;
|
||||
cb.callout_flags = 0;
|
||||
cb.capture_top = 1;
|
||||
cb.capture_last = 0;
|
||||
cb.offset_vector = offsets;
|
||||
|
@ -2943,7 +2944,8 @@ for (;;)
|
|||
if (mb->callout != NULL)
|
||||
{
|
||||
pcre2_callout_block cb;
|
||||
cb.version = 1;
|
||||
cb.version = 2;
|
||||
cb.callout_flags = 0;
|
||||
cb.capture_top = 1;
|
||||
cb.capture_last = 0;
|
||||
cb.offset_vector = offsets;
|
||||
|
|
|
@ -7952,7 +7952,8 @@ oveccount = callout_block->capture_top;
|
|||
|
||||
SLJIT_ASSERT(oveccount >= 1);
|
||||
|
||||
callout_block->version = 1;
|
||||
callout_block->version = 2;
|
||||
callout_block->callout_flags = 0;
|
||||
|
||||
/* Offsets in subject. */
|
||||
callout_block->subject_length = arguments->end - arguments->begin;
|
||||
|
|
|
@ -321,6 +321,7 @@ callout_ovector[0] = callout_ovector[1] = PCRE2_UNSET;
|
|||
rc = mb->callout(cb, mb->callout_data);
|
||||
callout_ovector[0] = save0;
|
||||
callout_ovector[1] = save1;
|
||||
cb->callout_flags = 0;
|
||||
return rc;
|
||||
}
|
||||
|
||||
|
@ -5919,8 +5920,9 @@ in rrc. */
|
|||
#define LBL(val) case val: goto L_RM##val;
|
||||
|
||||
RETURN_SWITCH:
|
||||
if (Frdepth == 0) return rrc; /* Exit from the top level */
|
||||
F = (heapframe *)((char *)F - Fback_frame); /* Back track */
|
||||
if (Frdepth == 0) return rrc; /* Exit from the top level */
|
||||
F = (heapframe *)((char *)F - Fback_frame); /* Back track */
|
||||
mb->cb->callout_flags |= PCRE2_CALLOUT_BACKTRACK; /* Note for callouts */
|
||||
|
||||
#ifdef DEBUG_SHOW_RMATCH
|
||||
fprintf(stderr, "++ RETURN %d to %d\n", rrc, Freturn_id);
|
||||
|
@ -6171,13 +6173,14 @@ startline = (re->flags & PCRE2_STARTLINE) != 0;
|
|||
bumpalong_limit = (mcontext->offset_limit == PCRE2_UNSET)?
|
||||
end_subject : subject + mcontext->offset_limit;
|
||||
|
||||
/* Set up the fixed fields in the callout block, with a pointer in the
|
||||
match block. */
|
||||
/* Initialize and set up the fixed fields in the callout block, with a pointer
|
||||
in the match block. */
|
||||
|
||||
mb->cb = &cb;
|
||||
cb.version = 1;
|
||||
cb.version = 2;
|
||||
cb.subject = subject;
|
||||
cb.subject_length = (PCRE2_SIZE)(end_subject - subject);
|
||||
cb.callout_flags = 0;
|
||||
|
||||
/* Fill in the remaining fields in the match block. */
|
||||
|
||||
|
@ -6644,6 +6647,8 @@ for(;;)
|
|||
first starting point for which a partial match was found. */
|
||||
|
||||
cb.start_match = (PCRE2_SIZE)(start_match - subject);
|
||||
cb.callout_flags |= PCRE2_CALLOUT_STARTMATCH;
|
||||
|
||||
mb->start_used_ptr = start_match;
|
||||
mb->last_used_ptr = start_match;
|
||||
mb->match_call_count = 0;
|
||||
|
|
|
@ -485,6 +485,7 @@ so many of them that they are split into two fields. */
|
|||
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
|
||||
#define CTL2_SUBJECT_LITERAL 0x00000010u
|
||||
#define CTL2_CALLOUT_NO_WHERE 0x00000020u
|
||||
#define CTL2_CALLOUT_EXTRA 0x00000040u
|
||||
|
||||
#define CTL2_NL_SET 0x40000000u /* Informational */
|
||||
#define CTL2_BSR_SET 0x80000000u /* Informational */
|
||||
|
@ -598,6 +599,7 @@ static modstruct modlist[] = {
|
|||
{ "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
|
||||
{ "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) },
|
||||
{ "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) },
|
||||
{ "callout_extra", MOD_DAT, MOD_CTL, CTL2_CALLOUT_EXTRA, DO(control2) },
|
||||
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
|
||||
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
|
||||
{ "callout_no_where", MOD_DAT, MOD_CTL, CTL2_CALLOUT_NO_WHERE, DO(control2) },
|
||||
|
@ -3971,7 +3973,7 @@ Returns: nothing
|
|||
static void
|
||||
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
||||
{
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
before,
|
||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||
|
@ -3981,6 +3983,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
|
|||
((controls & CTL_BINCODE) != 0)? " bincode" : "",
|
||||
((controls2 & CTL2_BSR_SET) != 0)? " bsr" : "",
|
||||
((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "",
|
||||
((controls2 & CTL2_CALLOUT_EXTRA) != 0)? " callout_extra" : "",
|
||||
((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "",
|
||||
((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "",
|
||||
((controls2 & CTL2_CALLOUT_NO_WHERE) != 0)? " callout_no_where" : "",
|
||||
|
@ -5842,17 +5845,43 @@ Return:
|
|||
static int
|
||||
callout_function(pcre2_callout_block_8 *cb, void *callout_data_ptr)
|
||||
{
|
||||
FILE *f, *fdefault;
|
||||
uint32_t i, pre_start, post_start, subject_length;
|
||||
PCRE2_SIZE current_position;
|
||||
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
|
||||
BOOL callout_capture = (dat_datctl.control & CTL_CALLOUT_CAPTURE) != 0;
|
||||
BOOL callout_where = (dat_datctl.control2 & CTL2_CALLOUT_NO_WHERE) == 0;
|
||||
|
||||
/* This FILE is used for echoing the subject. This is done only once in simple
|
||||
cases. */
|
||||
/* The FILE f is used for echoing the subject string if it is non-NULL. This
|
||||
happens only once in simple cases, but we want to repeat after any additional
|
||||
output caused by CALLOUT_EXTRA. */
|
||||
|
||||
FILE *f = (first_callout || callout_capture || cb->callout_string != NULL)?
|
||||
outfile : NULL;
|
||||
fdefault = (!first_callout && !callout_capture && cb->callout_string == NULL)?
|
||||
NULL : outfile;
|
||||
|
||||
if ((dat_datctl.control2 & CTL2_CALLOUT_EXTRA) != 0)
|
||||
{
|
||||
f = outfile;
|
||||
switch (cb->callout_flags)
|
||||
{
|
||||
case PCRE2_CALLOUT_BACKTRACK:
|
||||
fprintf(f, "Backtrack\n");
|
||||
break;
|
||||
|
||||
case PCRE2_CALLOUT_STARTMATCH|PCRE2_CALLOUT_BACKTRACK:
|
||||
fprintf(f, "Backtrack\nNo other matching paths\n");
|
||||
/* Fall through */
|
||||
|
||||
case PCRE2_CALLOUT_STARTMATCH:
|
||||
fprintf(f, "New match attempt\n");
|
||||
break;
|
||||
|
||||
default:
|
||||
f = fdefault;
|
||||
break;
|
||||
}
|
||||
}
|
||||
else f = fdefault;
|
||||
|
||||
/* For a callout with a string argument, show the string first because there
|
||||
isn't a tidy way to fit it in the rest of the data. */
|
||||
|
@ -5902,7 +5931,6 @@ lengths of the substrings. */
|
|||
|
||||
if (callout_where)
|
||||
{
|
||||
|
||||
if (f != NULL) fprintf(f, "--->");
|
||||
|
||||
/* The subject before the match start. */
|
||||
|
@ -5931,9 +5959,10 @@ if (callout_where)
|
|||
|
||||
if (f != NULL) fprintf(f, "\n");
|
||||
|
||||
/* For automatic callouts, show the pattern offset. Otherwise, for a numerical
|
||||
callout whose number has not already been shown with captured strings, show the
|
||||
number here. A callout with a string argument has been displayed above. */
|
||||
/* For automatic callouts, show the pattern offset. Otherwise, for a
|
||||
numerical callout whose number has not already been shown with captured
|
||||
strings, show the number here. A callout with a string argument has been
|
||||
displayed above. */
|
||||
|
||||
if (cb->callout_number == 255)
|
||||
{
|
||||
|
@ -5963,6 +5992,8 @@ if (callout_where)
|
|||
if (cb->next_item_length != 0)
|
||||
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
|
||||
pbuffer8 + cb->pattern_position);
|
||||
else
|
||||
fprintf(outfile, "End of pattern");
|
||||
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
|
@ -7685,7 +7716,8 @@ printf(" -16 use the 16-bit library\n");
|
|||
#ifdef SUPPORT_PCRE2_32
|
||||
printf(" -32 use the 32-bit library\n");
|
||||
#endif
|
||||
printf(" -ac set default pattern option PCRE2_AUTO_CALLOUT\n");
|
||||
printf(" -ac set default pattern modifier PCRE2_AUTO_CALLOUT\n");
|
||||
printf(" -AC as -ac, but also set subject 'callout_extra' modifier\n");
|
||||
printf(" -b set default pattern modifier 'fullbincode'\n");
|
||||
printf(" -C show PCRE2 compile-time options and exit\n");
|
||||
printf(" -C arg show a specific compile-time option and exit with its\n");
|
||||
|
@ -8181,6 +8213,11 @@ while (argc > 1 && argv[op][0] == '-' && argv[op][1] != 0)
|
|||
|
||||
/* Set some common pattern and subject controls */
|
||||
|
||||
else if (strcmp(arg, "-AC") == 0)
|
||||
{
|
||||
def_patctl.options |= PCRE2_AUTO_CALLOUT;
|
||||
def_datctl.control2 |= CTL2_CALLOUT_EXTRA;
|
||||
}
|
||||
else if (strcmp(arg, "-ac") == 0) def_patctl.options |= PCRE2_AUTO_CALLOUT;
|
||||
else if (strcmp(arg, "-b") == 0) def_patctl.control |= CTL_FULLBINCODE;
|
||||
else if (strcmp(arg, "-d") == 0) def_patctl.control |= CTL_DEBUG;
|
||||
|
|
|
@ -5385,4 +5385,14 @@ a)"xI
|
|||
ab
|
||||
aaab
|
||||
|
||||
# JIT does not support callout_extra
|
||||
|
||||
/(*NO_JIT)(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||
\= Expect no match
|
||||
aac\=callout_extra
|
||||
|
||||
/(*NO_JIT)a+(?C'XXX')b/no_start_optimize,no_auto_possess
|
||||
\= Expect no match
|
||||
aac\=callout_extra
|
||||
|
||||
# End of testinput2
|
||||
|
|
|
@ -361,12 +361,12 @@ Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
|||
Subject length lower bound = 1
|
||||
abc\=callout_fail=1
|
||||
--->abc
|
||||
1 ^ ^
|
||||
1 ^ ^
|
||||
1 ^^
|
||||
1 ^ ^
|
||||
1 ^^
|
||||
1 ^^
|
||||
1 ^ ^ End of pattern
|
||||
1 ^ ^ End of pattern
|
||||
1 ^^ End of pattern
|
||||
1 ^ ^ End of pattern
|
||||
1 ^^ End of pattern
|
||||
1 ^^ End of pattern
|
||||
No match
|
||||
|
||||
/(*NO_AUTO_POSSESS)\w+(?C1)/BI
|
||||
|
@ -385,12 +385,12 @@ Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
|||
Subject length lower bound = 1
|
||||
abc\=callout_fail=1
|
||||
--->abc
|
||||
1 ^ ^
|
||||
1 ^ ^
|
||||
1 ^^
|
||||
1 ^ ^
|
||||
1 ^^
|
||||
1 ^^
|
||||
1 ^ ^ End of pattern
|
||||
1 ^ ^ End of pattern
|
||||
1 ^^ End of pattern
|
||||
1 ^ ^ End of pattern
|
||||
1 ^^ End of pattern
|
||||
1 ^^ End of pattern
|
||||
No match
|
||||
|
||||
# This test breaks the JIT stack limit
|
||||
|
|
|
@ -3832,7 +3832,7 @@ Subject length lower bound = 2
|
|||
\= Expect no match
|
||||
abbbbbccc\=callout_data=1
|
||||
--->abbbbbccc
|
||||
1 ^ ^
|
||||
1 ^ ^ End of pattern
|
||||
Callout data = 1
|
||||
No match
|
||||
|
||||
|
@ -3844,21 +3844,21 @@ Subject length lower bound = 2
|
|||
\= Expect no match
|
||||
abbbbbccc\=callout_data=1
|
||||
--->abbbbbccc
|
||||
1 ^ ^
|
||||
1 ^ ^ End of pattern
|
||||
Callout data = 1
|
||||
1 ^ ^
|
||||
1 ^ ^ End of pattern
|
||||
Callout data = 1
|
||||
1 ^ ^
|
||||
1 ^ ^ End of pattern
|
||||
Callout data = 1
|
||||
1 ^ ^
|
||||
1 ^ ^ End of pattern
|
||||
Callout data = 1
|
||||
1 ^ ^
|
||||
1 ^ ^ End of pattern
|
||||
Callout data = 1
|
||||
1 ^ ^
|
||||
1 ^ ^ End of pattern
|
||||
Callout data = 1
|
||||
1 ^ ^
|
||||
1 ^ ^ End of pattern
|
||||
Callout data = 1
|
||||
1 ^ ^
|
||||
1 ^ ^ End of pattern
|
||||
Callout data = 1
|
||||
No match
|
||||
|
||||
|
@ -4718,7 +4718,7 @@ Subject length lower bound = 5
|
|||
+2 ^ ^ c
|
||||
+3 ^ ^ d
|
||||
+4 ^ ^ e
|
||||
+5 ^ ^
|
||||
+5 ^ ^ End of pattern
|
||||
0: abcde
|
||||
\= Expect no match
|
||||
abcdfe
|
||||
|
@ -4750,13 +4750,13 @@ Subject length lower bound = 1
|
|||
--->ab
|
||||
+0 ^ a*
|
||||
+2 ^^ b
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: ab
|
||||
aaaab
|
||||
--->aaaab
|
||||
+0 ^ a*
|
||||
+2 ^ ^ b
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: aaaab
|
||||
aaaacb
|
||||
--->aaaacb
|
||||
|
@ -4770,7 +4770,7 @@ Subject length lower bound = 1
|
|||
+2 ^^ b
|
||||
+0 ^ a*
|
||||
+2 ^ b
|
||||
+3 ^^
|
||||
+3 ^^ End of pattern
|
||||
0: b
|
||||
|
||||
/a*b/IB,auto_callout
|
||||
|
@ -4793,13 +4793,13 @@ Subject length lower bound = 1
|
|||
--->ab
|
||||
+0 ^ a*
|
||||
+2 ^^ b
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: ab
|
||||
aaaab
|
||||
--->aaaab
|
||||
+0 ^ a*
|
||||
+2 ^ ^ b
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: aaaab
|
||||
aaaacb
|
||||
--->aaaacb
|
||||
|
@ -4813,7 +4813,7 @@ Subject length lower bound = 1
|
|||
+2 ^^ b
|
||||
+0 ^ a*
|
||||
+2 ^ b
|
||||
+3 ^^
|
||||
+3 ^^ End of pattern
|
||||
0: b
|
||||
|
||||
/a+b/IB,auto_callout
|
||||
|
@ -4836,13 +4836,13 @@ Subject length lower bound = 2
|
|||
--->ab
|
||||
+0 ^ a+
|
||||
+2 ^^ b
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: ab
|
||||
aaaab
|
||||
--->aaaab
|
||||
+0 ^ a+
|
||||
+2 ^ ^ b
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: aaaab
|
||||
\= Expect no match
|
||||
aaaacb
|
||||
|
@ -4897,7 +4897,7 @@ Subject length lower bound = 4
|
|||
+3 ^ ^ c
|
||||
+4 ^ ^ |
|
||||
+9 ^ ^ x
|
||||
+10 ^ ^
|
||||
+10 ^ ^ End of pattern
|
||||
0: abcx
|
||||
1: abc
|
||||
defx
|
||||
|
@ -4909,7 +4909,7 @@ Subject length lower bound = 4
|
|||
+7 ^ ^ f
|
||||
+8 ^ ^ )
|
||||
+9 ^ ^ x
|
||||
+10 ^ ^
|
||||
+10 ^ ^ End of pattern
|
||||
0: defx
|
||||
1: def
|
||||
\= Expect no match
|
||||
|
@ -4971,7 +4971,7 @@ Subject length lower bound = 4
|
|||
+3 ^ ^ c
|
||||
+4 ^ ^ |
|
||||
+9 ^ ^ x
|
||||
+10 ^ ^
|
||||
+10 ^ ^ End of pattern
|
||||
0: abcx
|
||||
1: abc
|
||||
defx
|
||||
|
@ -4983,7 +4983,7 @@ Subject length lower bound = 4
|
|||
+7 ^ ^ f
|
||||
+8 ^ ^ )
|
||||
+9 ^ ^ x
|
||||
+10 ^ ^
|
||||
+10 ^ ^ End of pattern
|
||||
0: defx
|
||||
1: def
|
||||
\= Expect no match
|
||||
|
@ -5024,7 +5024,7 @@ Subject length lower bound = 6
|
|||
+3 ^ ^ |
|
||||
+1 ^ ^ a
|
||||
+4 ^ ^ c
|
||||
+12 ^ ^
|
||||
+12 ^ ^ End of pattern
|
||||
0: ababab
|
||||
1: ab
|
||||
abcdabcd
|
||||
|
@ -5044,7 +5044,7 @@ Subject length lower bound = 6
|
|||
+4 ^ ^ c
|
||||
+5 ^ ^ d
|
||||
+6 ^ ^ ){3,4}
|
||||
+12 ^ ^
|
||||
+12 ^ ^ End of pattern
|
||||
0: abcdabcd
|
||||
1: cd
|
||||
abcdcdcdcdcd
|
||||
|
@ -5065,7 +5065,7 @@ Subject length lower bound = 6
|
|||
+4 ^ ^ c
|
||||
+5 ^ ^ d
|
||||
+6 ^ ^ ){3,4}
|
||||
+12 ^ ^
|
||||
+12 ^ ^ End of pattern
|
||||
0: abcdcdcd
|
||||
1: cd
|
||||
|
||||
|
@ -5276,7 +5276,7 @@ Subject length lower bound = 11
|
|||
+21 ^ ^ 1
|
||||
+22 ^ ^ 2
|
||||
+23 ^ ^ 3
|
||||
+24 ^ ^
|
||||
+24 ^ ^ End of pattern
|
||||
0: aacaacaacaacaac123
|
||||
1: aac
|
||||
|
||||
|
@ -8900,7 +8900,7 @@ Subject length lower bound = 0
|
|||
+7 ^ b
|
||||
+11 ^ ^
|
||||
+12 ^ )
|
||||
+13 ^
|
||||
+13 ^ End of pattern
|
||||
0:
|
||||
abc
|
||||
--->abc
|
||||
|
@ -8921,7 +8921,7 @@ Subject length lower bound = 0
|
|||
+8 ^^ )
|
||||
+9 ^ b
|
||||
+10 ^^ |
|
||||
+13 ^^
|
||||
+13 ^^ End of pattern
|
||||
0: b
|
||||
|
||||
/(?(?=b).*b|^d)/I
|
||||
|
@ -8938,14 +8938,14 @@ Subject length lower bound = 1
|
|||
+0 ^ x
|
||||
+1 ^^ y
|
||||
+2 ^ ^ z
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: xyz
|
||||
abcxyz
|
||||
--->abcxyz
|
||||
+0 ^ x
|
||||
+1 ^^ y
|
||||
+2 ^ ^ z
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: xyz
|
||||
\= Expect no match
|
||||
abc
|
||||
|
@ -8962,7 +8962,7 @@ No match
|
|||
+0 ^ x
|
||||
+1 ^^ y
|
||||
+2 ^ ^ z
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: xyz
|
||||
\= Expect no match
|
||||
abc
|
||||
|
@ -8996,7 +8996,7 @@ No match
|
|||
+15 ^ x
|
||||
+16 ^^ y
|
||||
+17 ^ ^ z
|
||||
+18 ^ ^
|
||||
+18 ^ ^ End of pattern
|
||||
0: xyz
|
||||
|
||||
/(*NO_AUTO_POSSESS)a+b/B
|
||||
|
@ -9017,7 +9017,7 @@ No match
|
|||
+0 ^ x
|
||||
+1 ^^ y
|
||||
+2 ^ ^ z
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: xyz
|
||||
|
||||
/^"((?(?=[a])[^"])|b)*"$/auto_callout
|
||||
|
@ -9046,7 +9046,7 @@ No match
|
|||
+17 ^ ^ |
|
||||
+21 ^ ^ "
|
||||
+22 ^ ^ $
|
||||
+23 ^ ^
|
||||
+23 ^ ^ End of pattern
|
||||
0: "ab"
|
||||
1:
|
||||
|
||||
|
@ -11136,7 +11136,7 @@ Latest Mark: A
|
|||
+10 ^ ^ |
|
||||
+18 ^ ^ z
|
||||
+19 ^ ^ |
|
||||
+24 ^ ^
|
||||
+24 ^ ^ End of pattern
|
||||
0: adz
|
||||
1: adz
|
||||
2: d
|
||||
|
@ -11155,7 +11155,7 @@ Latest Mark: A
|
|||
Latest Mark: B
|
||||
+18 ^ ^ z
|
||||
+19 ^ ^ |
|
||||
+24 ^ ^
|
||||
+24 ^ ^ End of pattern
|
||||
0: aez
|
||||
1: aez
|
||||
2: e
|
||||
|
@ -11177,7 +11177,7 @@ Latest Mark: B
|
|||
+21 ^^ e
|
||||
+22 ^ ^ q
|
||||
+23 ^ ^ )
|
||||
+24 ^ ^
|
||||
+24 ^ ^ End of pattern
|
||||
0: aeq
|
||||
1: aeq
|
||||
|
||||
|
@ -11951,7 +11951,7 @@ Partial match: 123a
|
|||
+11 ^ b
|
||||
+12 ^^ b
|
||||
+13 ^ ^ )
|
||||
+14 ^ ^
|
||||
+14 ^ ^ End of pattern
|
||||
0: bb
|
||||
|
||||
/(?C1)^(?C2)(?(?C99)(?=(?C3)a(?C4))(?C5)a(?C6)a(?C7)|(?C8)b(?C9)b(?C10))(?C11)/
|
||||
|
@ -11964,7 +11964,7 @@ Partial match: 123a
|
|||
8 ^ b
|
||||
9 ^^ b
|
||||
10 ^ ^ )
|
||||
11 ^ ^
|
||||
11 ^ ^ End of pattern
|
||||
0: bb
|
||||
|
||||
# Perl seems to have a bug with this one.
|
||||
|
@ -15144,7 +15144,7 @@ Subject length lower bound = 0
|
|||
+0 ^ (
|
||||
+1 ^ )\Q\E*
|
||||
+7 ^ ]
|
||||
+8 ^^
|
||||
+8 ^^ End of pattern
|
||||
0: ]
|
||||
1:
|
||||
|
||||
|
@ -15428,7 +15428,7 @@ Failed: error 125 at offset 13: lookbehind assertion is not fixed length
|
|||
+0 ^ a
|
||||
+1 ^^ b
|
||||
1 ^ ^ c
|
||||
+8 ^ ^
|
||||
+8 ^ ^ End of pattern
|
||||
0: abc
|
||||
|
||||
/'ab(?C1)c'/hex,auto_callout
|
||||
|
@ -15437,7 +15437,7 @@ Failed: error 125 at offset 13: lookbehind assertion is not fixed length
|
|||
+0 ^ a
|
||||
+1 ^^ b
|
||||
1 ^ ^ c
|
||||
+8 ^ ^
|
||||
+8 ^ ^ End of pattern
|
||||
0: abc
|
||||
|
||||
# Perl accepts these, but gives a warning. We can't warn, so give an error.
|
||||
|
@ -16256,7 +16256,7 @@ Failed: error 192 at offset 0: invalid option bits with PCRE2_LITERAL
|
|||
+2 ^ ^ b
|
||||
+3 ^ ^ (
|
||||
+4 ^ ^ c
|
||||
+5 ^ ^
|
||||
+5 ^ ^ End of pattern
|
||||
0: a\b(c
|
||||
|
||||
/a\b(c/literal,auto_callout
|
||||
|
@ -16267,7 +16267,7 @@ Failed: error 192 at offset 0: invalid option bits with PCRE2_LITERAL
|
|||
+2 ^ ^ b
|
||||
+3 ^ ^ (
|
||||
+4 ^ ^ c
|
||||
+5 ^ ^
|
||||
+5 ^ ^ End of pattern
|
||||
0: a\b(c
|
||||
|
||||
/(*CR)abc/literal
|
||||
|
@ -16384,6 +16384,62 @@ Subject length lower bound = 1
|
|||
0: ab
|
||||
1: a
|
||||
|
||||
# JIT does not support callout_extra
|
||||
|
||||
/(*NO_JIT)(a+)b/auto_callout,no_start_optimize,no_auto_possess
|
||||
\= Expect no match
|
||||
aac\=callout_extra
|
||||
New match attempt
|
||||
--->aac
|
||||
+9 ^ (
|
||||
+10 ^ a+
|
||||
+12 ^ ^ )
|
||||
+13 ^ ^ b
|
||||
Backtrack
|
||||
--->aac
|
||||
+12 ^^ )
|
||||
+13 ^^ b
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+9 ^ (
|
||||
+10 ^ a+
|
||||
+12 ^^ )
|
||||
+13 ^^ b
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+9 ^ (
|
||||
+10 ^ a+
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
--->aac
|
||||
+9 ^ (
|
||||
+10 ^ a+
|
||||
No match
|
||||
|
||||
/(*NO_JIT)a+(?C'XXX')b/no_start_optimize,no_auto_possess
|
||||
\= Expect no match
|
||||
aac\=callout_extra
|
||||
New match attempt
|
||||
Callout (15): 'XXX'
|
||||
--->aac
|
||||
^ ^ b
|
||||
Backtrack
|
||||
Callout (15): 'XXX'
|
||||
--->aac
|
||||
^^ b
|
||||
Backtrack
|
||||
No other matching paths
|
||||
New match attempt
|
||||
Callout (15): 'XXX'
|
||||
--->aac
|
||||
^^ b
|
||||
No match
|
||||
|
||||
# End of testinput2
|
||||
Error -65: PCRE2_ERROR_BADDATA (unknown error number)
|
||||
Error -62: bad serialized data
|
||||
|
|
|
@ -3763,7 +3763,7 @@ No match
|
|||
abcd
|
||||
--->abcd
|
||||
+0 ^ \w+
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: abcd
|
||||
|
||||
/[\p{N}]?+/B,no_auto_possess
|
||||
|
@ -4165,7 +4165,7 @@ Failed: error 125 at offset 2: lookbehind assertion is not fixed length
|
|||
+0 ^ .
|
||||
+0 ^ .
|
||||
+1 ^ ^ .
|
||||
+2 ^ ^
|
||||
+2 ^ ^ End of pattern
|
||||
0: \x{123}\x{123}
|
||||
|
||||
# This tests processing wide characters in extended mode.
|
||||
|
|
|
@ -726,7 +726,7 @@ No match
|
|||
+4 ^ ^ c
|
||||
+2 ^ ^ b
|
||||
+3 ^ ^ |
|
||||
+12 ^ ^
|
||||
+12 ^ ^ End of pattern
|
||||
+1 ^ ^ a
|
||||
+4 ^ ^ c
|
||||
0: ababab
|
||||
|
@ -745,12 +745,12 @@ No match
|
|||
+4 ^ ^ c
|
||||
+2 ^ ^ b
|
||||
+3 ^ ^ |
|
||||
+12 ^ ^
|
||||
+12 ^ ^ End of pattern
|
||||
+1 ^ ^ a
|
||||
+4 ^ ^ c
|
||||
+5 ^ ^ d
|
||||
+6 ^ ^ ){3,4}
|
||||
+12 ^ ^
|
||||
+12 ^ ^ End of pattern
|
||||
0: abcdabcd
|
||||
1: abcdab
|
||||
abcdcdcdcdcd
|
||||
|
@ -768,12 +768,12 @@ No match
|
|||
+4 ^ ^ c
|
||||
+5 ^ ^ d
|
||||
+6 ^ ^ ){3,4}
|
||||
+12 ^ ^
|
||||
+12 ^ ^ End of pattern
|
||||
+1 ^ ^ a
|
||||
+4 ^ ^ c
|
||||
+5 ^ ^ d
|
||||
+6 ^ ^ ){3,4}
|
||||
+12 ^ ^
|
||||
+12 ^ ^ End of pattern
|
||||
0: abcdcdcd
|
||||
1: abcdcd
|
||||
|
||||
|
@ -6610,14 +6610,14 @@ No match
|
|||
+0 ^ x
|
||||
+1 ^^ y
|
||||
+2 ^ ^ z
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: xyz
|
||||
abcxyz
|
||||
--->abcxyz
|
||||
+0 ^ x
|
||||
+1 ^^ y
|
||||
+2 ^ ^ z
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: xyz
|
||||
\= Expect no match
|
||||
abc
|
||||
|
@ -6634,7 +6634,7 @@ No match
|
|||
+0 ^ x
|
||||
+1 ^^ y
|
||||
+2 ^ ^ z
|
||||
+3 ^ ^
|
||||
+3 ^ ^ End of pattern
|
||||
0: xyz
|
||||
\= Expect no match
|
||||
abc
|
||||
|
@ -6668,7 +6668,7 @@ No match
|
|||
+15 ^ x
|
||||
+16 ^^ y
|
||||
+17 ^ ^ z
|
||||
+18 ^ ^
|
||||
+18 ^ ^ End of pattern
|
||||
0: xyz
|
||||
|
||||
/(?C)ab/
|
||||
|
@ -6684,7 +6684,7 @@ No match
|
|||
--->ab
|
||||
+0 ^ a
|
||||
+1 ^^ b
|
||||
+2 ^ ^
|
||||
+2 ^ ^ End of pattern
|
||||
0: ab
|
||||
ab\=callout_none
|
||||
0: ab
|
||||
|
@ -6717,7 +6717,7 @@ No match
|
|||
+8 ^ [a]
|
||||
+17 ^ ^ |
|
||||
+22 ^ ^ $
|
||||
+23 ^ ^
|
||||
+23 ^ ^ End of pattern
|
||||
0: "ab"
|
||||
"ab"\=callout_none
|
||||
0: "ab"
|
||||
|
|
Loading…
Reference in New Issue