Add callout_flags to callout blocks, and set bits within it from pcre2_match()

interpretation.
This commit is contained in:
Philip.Hazel 2017-12-22 15:56:27 +00:00
parent 814cc96bc5
commit 94d5f4a050
24 changed files with 1896 additions and 1402 deletions

View File

@ -16,10 +16,10 @@ that is called by both pcre2_match() and pcre2_dfa_match().
4. Add new pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and
PCRE2_CONFIG_COMPILED_WIDTHS.
5. Cut out \C tests in the JIT regression tests when NEVER_BACKSLASH_C is
5. Cut out \C tests in the JIT regression tests when NEVER_BACKSLASH_C is
defined (e.g. by --enable-never-backslash-C).
6. Defined public names for all the pcre2_compile() error numbers, and used
6. Defined public names for all the pcre2_compile() error numbers, and used
the public names in pcre2_convert.c.
7. Fixed a small memory leak in pcre2test (convert contexts).
@ -30,8 +30,8 @@ the public names in pcre2_convert.c.
PCRE2GREP_RC to the exit status, because VMS does not distinguish between
exit(0) and exit(1).
10. Added the -LM (list modifiers) option to pcre2test. Also made -C complain
about a bad option only if the following argument item does not start with a
10. Added the -LM (list modifiers) option to pcre2test. Also made -C complain
about a bad option only if the following argument item does not start with a
hyphen.
11. pcre2grep was truncating components of file names to 128 characters when
@ -39,20 +39,20 @@ processing files with the -r option, and also (some very odd code) truncating
path names to 512 characters. There is now a check on the absolute length of
full path file names, which may be up to 2047 characters long.
12. When an assertion contained (*ACCEPT) it caused all open capturing groups
to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to
misbehaviour for subsequent references to groups that started outside the
recursion. ACCEPT in an assertion now closes only those groups that were
12. When an assertion contained (*ACCEPT) it caused all open capturing groups
to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to
misbehaviour for subsequent references to groups that started outside the
recursion. ACCEPT in an assertion now closes only those groups that were
started within that assertion. Fixes oss-fuzz issues 3852 and 3891.
13. Multiline matching in pcre2grep was misbehaving if the pattern matched
within a line, and then matched again at the end of the line and over into
subsequent lines. Behaviour was different with and without colouring, and
sometimes context lines were incorrectly printed and/or line endings were lost.
13. Multiline matching in pcre2grep was misbehaving if the pattern matched
within a line, and then matched again at the end of the line and over into
subsequent lines. Behaviour was different with and without colouring, and
sometimes context lines were incorrectly printed and/or line endings were lost.
All these issues should now be fixed.
14. If --line-buffered was specified for pcre2grep when input was from a
compressed file (.gz or .bz2) a segfault occurred. (Line buffering should be
14. If --line-buffered was specified for pcre2grep when input was from a
compressed file (.gz or .bz2) a segfault occurred. (Line buffering should be
ignored for compressed files.)
15. Although pcre2_jit_match checks whether the pattern is compiled
@ -60,26 +60,26 @@ in a given mode, it was also expected that at least one mode is available.
This is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION
when the pattern is not optimized by JIT at all.
16. The line number and related variables such as match counts in pcre2grep
were all int variables, causing overflow when files with more than 2147483647
lines were processed (assuming 32-bit ints). They have all been changed to
16. The line number and related variables such as match counts in pcre2grep
were all int variables, causing overflow when files with more than 2147483647
lines were processed (assuming 32-bit ints). They have all been changed to
unsigned long ints.
17. If a backreference with a minimum repeat count of zero was first in a
pattern, apart from assertions, an incorrect first matching character could be
recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set
17. If a backreference with a minimum repeat count of zero was first in a
pattern, apart from assertions, an incorrect first matching character could be
recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set
as the first character of a match.
18. Characters in a leading positive assertion are considered for recording a
first character of a match when the rest of the pattern does not provide one.
However, a character in a non-assertive group within a leading assertion such
as in the pattern /(?=(a))\1?b/ caused this process to fail. This was an
infelicity rather than an outright bug, because it did not affect the result of
a match, just its speed. (In fact, in this case, the starting 'a' was
as in the pattern /(?=(a))\1?b/ caused this process to fail. This was an
infelicity rather than an outright bug, because it did not affect the result of
a match, just its speed. (In fact, in this case, the starting 'a' was
subsequently picked up in the study.)
19. A minor tidy in pcre2_match(): making all PCRE2_ERROR_ returns use "return"
instead of "RRETURN" saves unwinding the backtracks in these cases (only one
instead of "RRETURN" saves unwinding the backtracks in these cases (only one
didn't).
20. Allocate a single callout block on the stack at the start of pcre2_match()
@ -89,6 +89,12 @@ and set its never-changing fields once only.
compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS
to retrieve them, and update pcre2test to show them.
22. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new
field callout_flags in callout blocks. The bits are set by pcre2_match(), but
not by JIT or pcre2_dfa_match(). Their settings are shown in pcre2test callouts
if the callout_extra subject modifier is set. These bits are provided to help
with tracking how a backtracking match is proceeding.
Version 10.30 14-August-2017
----------------------------

View File

@ -30,7 +30,13 @@ DESCRIPTION
<P>
This function matches a compiled regular expression against a given subject
string, using a matching algorithm that is similar to Perl's. It returns
offsets to captured substrings. Its arguments are:
offsets to what it has matched and to captured substrings via the
<b>match_data</b> block, which can be processed by functions with names that
start with <b>pcre2_get_ovector_...()</b> or <b>pcre2_substring_...()</b>. The
return from <b>pcre2_match()</b> is one more than the highest numbered capturing
pair that has been set (for example, 1 if there are no captures), zero if the
vector of offsets is too small, or a negative error code for no match and other
errors. The function arguments are:
<pre>
<i>code</i> Points to the compiled pattern
<i>subject</i> Points to the subject string

View File

@ -27,7 +27,7 @@ DESCRIPTION
<P>
This function returns information about a compiled pattern. Its arguments are:
<pre>
<i>code</i> Pointer to a compiled regular expression
<i>code</i> Pointer to a compiled regular expression pattern
<i>what</i> What information is required
<i>where</i> Where to put the information
</pre>
@ -42,6 +42,8 @@ request are as follows:
PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
PCRE2_INFO_CAPTURECOUNT Number of capturing subpatterns
PCRE2_INFO_DEPTHLIMIT Backtracking depth limit if set, otherwise PCRE2_ERROR_UNSET
PCRE2_INFO_EXTRAOPTIONS Extra options that were passed in the
compile context
PCRE2_INFO_FIRSTBITMAP Bitmap of first code units, or NULL
PCRE2_INFO_FIRSTCODETYPE Type of start-of-match information
0 nothing set

View File

@ -920,11 +920,15 @@ The <i>offset_limit</i> parameter limits how far an unanchored search can
advance in the subject string. The default value is PCRE2_UNSET. The
<b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> functions return
PCRE2_ERROR_NOMATCH if a match with a starting point before or at the given
offset is not found. For example, if the pattern /abc/ is matched against
"123abc" with an offset limit less than 3, the result is PCRE2_ERROR_NO_MATCH.
A match can never be found if the <i>startoffset</i> argument of
<b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> is greater than the offset
limit.
offset is not found. The <b>pcre2_substitute()</b> function makes no more
substitutions.
</P>
<P>
For example, if the pattern /abc/ is matched against "123abc" with an offset
limit less than 3, the result is PCRE2_ERROR_NO_MATCH. A match can never be
found if the <i>startoffset</i> argument of <b>pcre2_match()</b>,
<b>pcre2_dfa_match()</b>, or <b>pcre2_substitute()</b> is greater than the offset
limit set in the match context.
</P>
<P>
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
@ -934,10 +938,11 @@ PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
</P>
<P>
The offset limit facility can be used to track progress when searching large
subject strings. See also the PCRE2_FIRSTLINE option, which requires a match to
start within the first line of the subject. If this is set with an offset
limit, a match must occur in the first line and also within the offset limit.
In other words, whichever limit comes first is used.
subject strings or to limit the extent of global substitutions. See also the
PCRE2_FIRSTLINE option, which requires a match to start within the first line
of the subject. If this is set with an offset limit, a match must occur in the
first line and also within the offset limit. In other words, whichever limit
comes first is used.
<br>
<br>
<b>int pcre2_set_heap_limit(pcre2_match_context *<i>mcontext</i>,</b>
@ -1940,12 +1945,15 @@ are as follows:
<pre>
PCRE2_INFO_ALLOPTIONS
PCRE2_INFO_ARGOPTIONS
PCRE2_INFO_EXTRAOPTIONS
</pre>
Return a copy of the pattern's options. The third argument should point to a
Return copies of the pattern's options. The third argument should point to a
<b>uint32_t</b> variable. PCRE2_INFO_ARGOPTIONS returns exactly the options that
were passed to <b>pcre2_compile()</b>, whereas PCRE2_INFO_ALLOPTIONS returns
the compile options as modified by any top-level (*XXX) option settings such as
(*UTF) at the start of the pattern itself.
(*UTF) at the start of the pattern itself. PCRE2_INFO_EXTRAOPTIONS returns the
extra options that were set in the compile context by calling the
pcre2_set_compile_extra_options() function.
</P>
<P>
For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED
@ -3157,13 +3165,27 @@ options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
</P>
<P>
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
replacing every matching substring. If this is not set, only the first matching
substring is replaced. If any matched substring has zero length, after the
substitution has happened, an attempt to find a non-empty match at the same
position is performed. If this is not successful, the current position is
advanced by one character except when CRLF is a valid newline sequence and the
next two characters are CR, LF. In this case, the current position is advanced
by two characters.
replacing every matching substring. If this option is not set, only the first
matching substring is replaced. The search for matches takes place in the
original subject string (that is, previous replacements do not affect it).
Iteration is implemented by advancing the <i>startoffset</i> value for each
search, which is always passed the entire subject string. If an offset limit is
set in the match context, searching stops when that limit is reached.
</P>
<P>
You can restrict the effect of a global substitution to a portion of the
subject string by setting either or both of <i>startoffset</i> and an offset
limit. Here is a \fPpcre2test\fP example:
<pre>
/B/g,replace=!,use_offset_limit
ABC ABC ABC ABC\=offset=3,offset_limit=12
2: ABC A!C A!C ABC
</pre>
When continuing with global substitutions after matching a substring with zero
length, an attempt to find a non-empty match at the same offset is performed.
If this is not successful, the offset is advanced by one character except when
CRLF is a valid newline sequence and the next two characters are CR, LF. In
this case, the offset is advanced by two characters.
</P>
<P>
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
@ -3398,7 +3420,7 @@ Here is an example of a simple call to <b>pcre2_dfa_match()</b>:
11, /* the length of the subject string */
0, /* start at offset 0 in the subject */
0, /* default options */
match_data, /* the match data block */
md, /* the match data block */
NULL, /* a match context; NULL means use defaults */
wspace, /* working space vector */
20); /* number of elements (NOT size in bytes) */
@ -3567,7 +3589,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 13 October 2017
Last updated: 16 December 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -206,18 +206,20 @@ callouts such as the example above are obeyed.
<br><a name="SEC4" href="#TOC1">THE CALLOUT INTERFACE</a><br>
<P>
During matching, when PCRE2 reaches a callout point, if an external function is
provided in the match context, it is called. This applies to both normal and
DFA matching. The first argument to the callout function is a pointer to a
<b>pcre2_callout</b> block. The second argument is the void * callout data that
was supplied when the callout was set up by calling <b>pcre2_set_callout()</b>
(see the
provided in the match context, it is called. This applies to both normal,
DFA, and JIT matching. The first argument to the callout function is a pointer
to a <b>pcre2_callout</b> block. The second argument is the void * callout data
that was supplied when the callout was set up by calling
<b>pcre2_set_callout()</b> (see the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation). The callout block structure contains the following fields:
documentation). The callout block structure contains the following fields, not
necessarily in this order:
<pre>
uint32_t <i>version</i>;
uint32_t <i>callout_number</i>;
uint32_t <i>capture_top</i>;
uint32_t <i>capture_last</i>;
uint32_t <i>callout_flags</i>;
PCRE2_SIZE *<i>offset_vector</i>;
PCRE2_SPTR <i>mark</i>;
PCRE2_SPTR <i>subject</i>;
@ -231,11 +233,12 @@ documentation). The callout block structure contains the following fields:
PCRE2_SPTR <i>callout_string</i>;
</pre>
The <i>version</i> field contains the version number of the block format. The
current version is 1; the three callout string fields were added for this
version. If you are writing an application that might use an earlier release of
PCRE2, you should check the version number before accessing any of these
fields. The version number will increase in future if more fields are added,
but the intention is never to remove any of the existing fields.
current version is 2; the three callout string fields were added for version 1,
and the <i>callout_flags</i> field for version 2. If you are writing an
application that might use an earlier release of PCRE2, you should check the
version number before accessing any of these fields. The version number will
increase in future if more fields are added, but the intention is never to
remove any of the existing fields.
</P>
<br><b>
Fields for numerical callouts
@ -358,6 +361,36 @@ the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
callouts from the DFA matching function this field always contains NULL.
</P>
<P>
The <i>callout_flags</i> field is always zero in callouts from
<b>pcre2_dfa_match()</b> or when JIT is being used. When <b>pcre2_match()</b>
without JIT is used, the following bits may be set:
<pre>
PCRE2_CALLOUT_STARTMATCH
</pre>
This is set for the first callout after the start of matching for each new
starting position in the subject.
<pre>
PCRE2_CALLOUT_BACKTRACK
</pre>
This is set if there has been a matching backtrack since the previous callout,
or since the start of matching if this is the first callout from a
<b>pcre2_match()</b> run.
</P>
<P>
Both bits are set when a backtrack has caused a "bumpalong" to a new starting
position in the subject. Output from <b>pcre2test</b> does not indicate the
presence of these bits unless the <b>callout_extra</b> modifier is set.
</P>
<P>
The information in the <b>callout_flags</b> field is provided so that
applications can track and tell their users how matching with backtracking is
done. This can be useful when trying to optimize patterns, or just to
understand how PCRE2 works. There is no support in <b>pcre2_dfa_match()</b>
because there is no backtracking in DFA matching, and there is no support in
JIT because JIT is all about maximimizing matching performance. In both these
cases the <b>callout_flags</b> field is always zero.
</P>
<br><a name="SEC5" href="#TOC1">RETURN VALUES FROM CALLOUTS</a><br>
<P>
The external callout function returns an integer to PCRE2. If the value is
@ -428,7 +461,7 @@ Cambridge, England.
</P>
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
<P>
Last updated: 14 April 2017
Last updated: 22 December 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -133,11 +133,13 @@ The <b>--locale</b> option can be used to override this.
<br><a name="SEC3" href="#TOC1">SUPPORT FOR COMPRESSED FILES</a><br>
<P>
It is possible to compile <b>pcre2grep</b> so that it uses <b>libz</b> or
<b>libbz2</b> to read files whose names end in <b>.gz</b> or <b>.bz2</b>,
respectively. You can find out whether your binary has support for one or both
of these file types by running it with the <b>--help</b> option. If the
appropriate support is not present, files are treated as plain text. The
standard input is always so treated.
<b>libbz2</b> to read compressed files whose names end in <b>.gz</b> or
<b>.bz2</b>, respectively. You can find out whether your <b>pcre2grep</b> binary
has support for one or both of these file types by running it with the
<b>--help</b> option. If the appropriate support is not present, all files are
treated as plain text. The standard input is always so treated. When input is
from a compressed .gz or .bz2 file, the <b>--line-buffered</b> option is
ignored.
</P>
<br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
<P>
@ -151,7 +153,7 @@ of changing the way binary files are handled.
<br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
<P>
The order in which some of the options appear can affect the output. For
example, both the <b>-h</b> and <b>-l</b> options affect the printing of file
example, both the <b>-H</b> and <b>-l</b> options affect the printing of file
names. Whichever comes later in the command line will be the one that takes
effect. Similarly, except where noted below, if an option is given twice, the
later setting is used. Numerical values for options may be followed by K or M,
@ -396,14 +398,16 @@ searching a single file. By default, the file name is not shown in this case.
For matching lines, the file name is followed by a colon; for context lines, a
hyphen separator is used. If a line number is also being output, it follows the
file name. When the <b>-M</b> option causes a pattern to match more than one
line, only the first is preceded by the file name.
line, only the first is preceded by the file name. This option overrides any
previous <b>-h</b>, <b>-l</b>, or <b>-L</b> options.
</P>
<P>
<b>-h</b>, <b>--no-filename</b>
Suppress the output file names when searching multiple files. By default,
file names are shown when multiple files are searched. For matching lines, the
file name is followed by a colon; for context lines, a hyphen separator is used.
If a line number is also being output, it follows the file name.
If a line number is also being output, it follows the file name. This option
overrides any previous <b>-H</b>, <b>-L</b>, or <b>-l</b> options.
</P>
<P>
<b>--heap-limit</b>=<i>number</i>
@ -460,17 +464,19 @@ given any number of times. If a directory matches both <b>--include-dir</b> and
<b>-L</b>, <b>--files-without-match</b>
Instead of outputting lines from the files, just output the names of the files
that do not contain any lines that would have been output. Each file name is
output once, on a separate line.
output once, on a separate line. This option overrides any previous <b>-H</b>,
<b>-h</b>, or <b>-l</b> options.
</P>
<P>
<b>-l</b>, <b>--files-with-matches</b>
Instead of outputting lines from the files, just output the names of the files
containing lines that would have been output. Each file name is output
once, on a separate line. Searching normally stops as soon as a matching line
is found in a file. However, if the <b>-c</b> (count) option is also used,
matching continues in order to obtain the correct count, and those files that
have at least one match are listed along with their counts. Using this option
with <b>-c</b> is a way of suppressing the listing of files with no matches.
containing lines that would have been output. Each file name is output once, on
a separate line. Searching normally stops as soon as a matching line is found
in a file. However, if the <b>-c</b> (count) option is also used, matching
continues in order to obtain the correct count, and those files that have at
least one match are listed along with their counts. Using this option with
<b>-c</b> is a way of suppressing the listing of files with no matches. This
opeion overrides any previous <b>-H</b>, <b>-h</b>, or <b>-L</b> options.
</P>
<P>
<b>--label</b>=<i>name</i>
@ -480,14 +486,16 @@ short form for this option.
</P>
<P>
<b>--line-buffered</b>
When this option is given, input is read and processed line by line, and the
output is flushed after each write. By default, input is read in large chunks,
unless <b>pcre2grep</b> can determine that it is reading from a terminal (which
is currently possible only in Unix-like environments). Output to terminal is
normally automatically flushed by the operating system. This option can be
useful when the input or output is attached to a pipe and you do not want
<b>pcre2grep</b> to buffer up large amounts of data. However, its use will
affect performance, and the <b>-M</b> (multiline) option ceases to work.
When this option is given, non-compressed input is read and processed line by
line, and the output is flushed after each write. By default, input is read in
large chunks, unless <b>pcre2grep</b> can determine that it is reading from a
terminal (which is currently possible only in Unix-like environments). Output
to terminal is normally automatically flushed by the operating system. This
option can be useful when the input or output is attached to a pipe and you do
not want <b>pcre2grep</b> to buffer up large amounts of data. However, its use
will affect performance, and the <b>-M</b> (multiline) option ceases to work.
When input is from a compressed .gz or .bz2 file, <b>--line-buffered</b> is
ignored.
</P>
<P>
<b>--line-offsets</b>
@ -941,7 +949,7 @@ Cambridge, England.
</P>
<br><a name="SEC15" href="#TOC1">REVISION</a><br>
<P>
Last updated: 11 October 2017
Last updated: 13 November 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -159,6 +159,12 @@ Behave as if each pattern has the <b>auto_callout</b> modifier, that is, insert
automatic callouts into every pattern that is compiled.
</P>
<P>
<b>-AC</b>
As for <b>-ac</b>, but in addition behave as if each subject line has the
<b>callout_extra</b> modifier, that is, show additional information from
callouts.
</P>
<P>
<b>-b</b>
Behave as if each pattern has the <b>fullbincode</b> modifier; the full
internal binary form of the pattern is output after compilation.
@ -243,8 +249,8 @@ available, and the use of JIT is verified.
</P>
<P>
<b>-LM</b>
List modifiers: write a list of available pattern and subject modifiers to the
standard output, then exit with zero exit code. All other options are ignored.
List modifiers: write a list of available pattern and subject modifiers to the
standard output, then exit with zero exit code. All other options are ignored.
If both -C and -LM are present, whichever is first is recognized.
</P>
<P>
@ -1182,6 +1188,7 @@ pattern.
callout_capture show captures at callout time
callout_data=&#60;n&#62; set a value to pass via callouts
callout_error=&#60;n&#62;[:&#60;m&#62;] control callout error
callout_extra show extra callout information
callout_fail=&#60;n&#62;[:&#60;m&#62;] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function
@ -1694,49 +1701,10 @@ documentation.
<br><a name="SEC16" href="#TOC1">CALLOUTS</a><br>
<P>
If the pattern contains any callout requests, <b>pcre2test</b>'s callout
function is called during matching unless <b>callout_none</b> is specified.
This works with both matching functions.
</P>
<P>
The callout function in <b>pcre2test</b> returns zero (carry on matching) by
default, but you can use a <b>callout_fail</b> modifier in a subject line to
change this and other parameters of the callout.
</P>
<P>
If <b>callout_capture</b> is set, the current captured groups are output when a
callout occurs. By default, the callout function then generates output that
indicates where the current match start and matching points are in the subject,
and what the next pattern item is. This output is suppressed if the
<b>callout_no_where</b> modifier is set.
</P>
<P>
The default return from the callout function is zero, which allows matching to
continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;)
are given, 1 is returned when callout &#60;n&#62; is reached and there have been at
least &#60;m&#62; callouts. The <b>callout_error</b> modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
<b>callout_error</b> takes precedence. Note that callouts with string arguments
are always given the number zero. See
</P>
<P>
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from <b>pcre2test</b>'s callout function.
</P>
<P>
Inserting callouts can be helpful when using <b>pcre2test</b> to check
complicated regular expressions. For further information about callouts, see
the
<a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation.
</P>
<P>
The output for callouts with numerical arguments and those with string
arguments is slightly different.
function is called during matching unless <b>callout_none</b> is specified. This
works with both matching functions, and with JIT, though there are some
differences in behaviour. The output for callouts with numerical arguments and
those with string arguments is slightly different.
</P>
<br><b>
Callouts with numerical arguments
@ -1811,6 +1779,107 @@ example:
</PRE>
</P>
<br><b>
Callout modifiers
</b><br>
<P>
The callout function in <b>pcre2test</b> returns zero (carry on matching) by
default, but you can use a <b>callout_fail</b> modifier in a subject line to
change this and other parameters of the callout (see below).
</P>
<P>
If the <b>callout_capture</b> modifier is set, the current captured groups are
output when a callout occurs. This is useful only for non-DFA matching, as
<b>pcre2_dfa_match()</b> does not support capturing, so no captures are ever
shown.
</P>
<P>
The normal callout output, showing the callout number or pattern offset (as
described above) is suppressed if the <b>callout_no_where</b> modifier is set.
</P>
<P>
When using the interpretive matching function <b>pcre2_match()</b> without JIT,
setting the <b>callout_extra</b> modifier causes additional output from
<b>pcre2test</b>'s callout function to be generated. For the first callout in a
match attempt at a new starting position in the subject, "New match attempt" is
output. If there has been a backtrack since the last callout (or start of
matching if this is the first callout), "Backtrack" is output, followed by "No
other matching paths" if the backtrack ended the previous match attempt. For
example:
<pre>
re&#62; /(a+)b/auto_callout,no_start_optimize,no_auto_possess
data&#62; aac\=callout_extra
New match attempt
---&#62;aac
+0 ^ (
+1 ^ a+
+3 ^ ^ )
+4 ^ ^ b
Backtrack
---&#62;aac
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
---&#62;aac
+0 ^ (
+1 ^ a+
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
---&#62;aac
+0 ^ (
+1 ^ a+
Backtrack
No other matching paths
New match attempt
---&#62;aac
+0 ^ (
+1 ^ a+
No match
</pre>
Notice that various optimizations must be turned off if you want all possible
matching paths to be scanned. If <b>no_start_optimize</b> is not used, there is
an immediate "no match", without any callouts, because the starting
optimization fails to find "b" in the subject, which it knows must be present
for any match. If <b>no_auto_possess</b> is not used, the "a+" item is turned
into "a++", which reduces the number of backtracks.
</P>
<P>
The <b>callout_extra</b> modifier has no effect if used with the DFA matching
function, or with JIT.
</P>
<br><b>
Return values from callouts
</b><br>
<P>
The default return from the callout function is zero, which allows matching to
continue. The <b>callout_fail</b> modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;)
are given, 1 is returned when callout &#60;n&#62; is reached and there have been at
least &#60;m&#62; callouts. The <b>callout_error</b> modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
<b>callout_error</b> takes precedence. Note that callouts with string arguments
are always given the number zero.
</P>
<P>
The <b>callout_data</b> modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from <b>pcre2test</b>'s callout function.
</P>
<P>
Inserting callouts can be helpful when using <b>pcre2test</b> to check
complicated regular expressions. For further information about callouts, see
the
<a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation.
</P>
<br><a name="SEC17" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
<P>
When <b>pcre2test</b> is outputting text in the compiled version of a pattern,
@ -1913,7 +1982,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 17 October 2017
Last updated: 21 December 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

File diff suppressed because it is too large Load Diff

View File

@ -3185,7 +3185,7 @@ subject string by setting either or both of \fIstartoffset\fP and an offset
limit. Here is a \fPpcre2test\fP example:
.sp
/B/g,replace=!,use_offset_limit
ABC ABC ABC ABC\=offset=3,offset_limit=12
ABC ABC ABC ABC\e=offset=3,offset_limit=12
2: ABC A!C A!C ABC
.sp
When continuing with global substitutions after matching a substring with zero

View File

@ -1,4 +1,4 @@
.TH PCRE2CALLOUT 3 "14 April 2017" "PCRE2 10.30"
.TH PCRE2CALLOUT 3 "22 December 2017" "PCRE2 10.31"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -191,20 +191,22 @@ callouts such as the example above are obeyed.
.rs
.sp
During matching, when PCRE2 reaches a callout point, if an external function is
provided in the match context, it is called. This applies to both normal and
DFA matching. The first argument to the callout function is a pointer to a
\fBpcre2_callout\fP block. The second argument is the void * callout data that
was supplied when the callout was set up by calling \fBpcre2_set_callout()\fP
(see the
provided in the match context, it is called. This applies to both normal,
DFA, and JIT matching. The first argument to the callout function is a pointer
to a \fBpcre2_callout\fP block. The second argument is the void * callout data
that was supplied when the callout was set up by calling
\fBpcre2_set_callout()\fP (see the
.\" HREF
\fBpcre2api\fP
.\"
documentation). The callout block structure contains the following fields:
documentation). The callout block structure contains the following fields, not
necessarily in this order:
.sp
uint32_t \fIversion\fP;
uint32_t \fIcallout_number\fP;
uint32_t \fIcapture_top\fP;
uint32_t \fIcapture_last\fP;
uint32_t \fIcallout_flags\fP;
PCRE2_SIZE *\fIoffset_vector\fP;
PCRE2_SPTR \fImark\fP;
PCRE2_SPTR \fIsubject\fP;
@ -218,11 +220,12 @@ documentation). The callout block structure contains the following fields:
PCRE2_SPTR \fIcallout_string\fP;
.sp
The \fIversion\fP field contains the version number of the block format. The
current version is 1; the three callout string fields were added for this
version. If you are writing an application that might use an earlier release of
PCRE2, you should check the version number before accessing any of these
fields. The version number will increase in future if more fields are added,
but the intention is never to remove any of the existing fields.
current version is 2; the three callout string fields were added for version 1,
and the \fIcallout_flags\fP field for version 2. If you are writing an
application that might use an earlier release of PCRE2, you should check the
version number before accessing any of these fields. The version number will
increase in future if more fields are added, but the intention is never to
remove any of the existing fields.
.
.
.SS "Fields for numerical callouts"
@ -331,6 +334,33 @@ the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
(*THEN) item in the match, or NULL if no such items have been passed. Instances
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
callouts from the DFA matching function this field always contains NULL.
.P
The \fIcallout_flags\fP field is always zero in callouts from
\fBpcre2_dfa_match()\fP or when JIT is being used. When \fBpcre2_match()\fP
without JIT is used, the following bits may be set:
.sp
PCRE2_CALLOUT_STARTMATCH
.sp
This is set for the first callout after the start of matching for each new
starting position in the subject.
.sp
PCRE2_CALLOUT_BACKTRACK
.sp
This is set if there has been a matching backtrack since the previous callout,
or since the start of matching if this is the first callout from a
\fBpcre2_match()\fP run.
.P
Both bits are set when a backtrack has caused a "bumpalong" to a new starting
position in the subject. Output from \fBpcre2test\fP does not indicate the
presence of these bits unless the \fBcallout_extra\fP modifier is set.
.P
The information in the \fBcallout_flags\fP field is provided so that
applications can track and tell their users how matching with backtracking is
done. This can be useful when trying to optimize patterns, or just to
understand how PCRE2 works. There is no support in \fBpcre2_dfa_match()\fP
because there is no backtracking in DFA matching, and there is no support in
JIT because JIT is all about maximimizing matching performance. In both these
cases the \fBcallout_flags\fP field is always zero.
.
.
.SH "RETURN VALUES FROM CALLOUTS"
@ -411,6 +441,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 14 April 2017
Last updated: 22 December 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "17 October 2017" "PCRE 10.31"
.TH PCRE2TEST 1 "21 Decbmber 2017" "PCRE 10.31"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -129,6 +129,11 @@ has not been built, this option causes an error.
Behave as if each pattern has the \fBauto_callout\fP modifier, that is, insert
automatic callouts into every pattern that is compiled.
.TP 10
\fB-AC\fP
As for \fB-ac\fP, but in addition behave as if each subject line has the
\fBcallout_extra\fP modifier, that is, show additional information from
callouts.
.TP 10
\fB-b\fP
Behave as if each pattern has the \fBfullbincode\fP modifier; the full
internal binary form of the pattern is output after compilation.
@ -203,8 +208,8 @@ successful compilation, each pattern is passed to the just-in-time compiler, if
available, and the use of JIT is verified.
.TP 10
\fB-LM\fP
List modifiers: write a list of available pattern and subject modifiers to the
standard output, then exit with zero exit code. All other options are ignored.
List modifiers: write a list of available pattern and subject modifiers to the
standard output, then exit with zero exit code. All other options are ignored.
If both -C and -LM are present, whichever is first is recognized.
.TP 10
\fB-pattern\fB \fImodifier-list\fP
@ -1152,6 +1157,7 @@ pattern.
callout_capture show captures at callout time
callout_data=<n> set a value to pass via callouts
callout_error=<n>[:<m>] control callout error
callout_extra show extra callout information
callout_fail=<n>[:<m>] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function
@ -1664,45 +1670,10 @@ documentation.
.rs
.sp
If the pattern contains any callout requests, \fBpcre2test\fP's callout
function is called during matching unless \fBcallout_none\fP is specified.
This works with both matching functions.
.P
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
default, but you can use a \fBcallout_fail\fP modifier in a subject line to
change this and other parameters of the callout.
.P
If \fBcallout_capture\fP is set, the current captured groups are output when a
callout occurs. By default, the callout function then generates output that
indicates where the current match start and matching points are in the subject,
and what the next pattern item is. This output is suppressed if the
\fBcallout_no_where\fP modifier is set.
.P
The default return from the callout function is zero, which allows matching to
continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
are given, 1 is returned when callout <n> is reached and there have been at
least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
\fBcallout_error\fP takes precedence. Note that callouts with string arguments
are always given the number zero. See
.P
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from \fBpcre2test\fP's callout function.
.P
Inserting callouts can be helpful when using \fBpcre2test\fP to check
complicated regular expressions. For further information about callouts, see
the
.\" HREF
\fBpcre2callout\fP
.\"
documentation.
.P
The output for callouts with numerical arguments and those with string
arguments is slightly different.
function is called during matching unless \fBcallout_none\fP is specified. This
works with both matching functions, and with JIT, though there are some
differences in behaviour. The output for callouts with numerical arguments and
those with string arguments is slightly different.
.
.
.SS "Callouts with numerical arguments"
@ -1776,6 +1747,103 @@ example:
.sp
.
.
.SS "Callout modifiers"
.rs
.sp
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
default, but you can use a \fBcallout_fail\fP modifier in a subject line to
change this and other parameters of the callout (see below).
.P
If the \fBcallout_capture\fP modifier is set, the current captured groups are
output when a callout occurs. This is useful only for non-DFA matching, as
\fBpcre2_dfa_match()\fP does not support capturing, so no captures are ever
shown.
.P
The normal callout output, showing the callout number or pattern offset (as
described above) is suppressed if the \fBcallout_no_where\fP modifier is set.
.P
When using the interpretive matching function \fBpcre2_match()\fP without JIT,
setting the \fBcallout_extra\fP modifier causes additional output from
\fBpcre2test\fP's callout function to be generated. For the first callout in a
match attempt at a new starting position in the subject, "New match attempt" is
output. If there has been a backtrack since the last callout (or start of
matching if this is the first callout), "Backtrack" is output, followed by "No
other matching paths" if the backtrack ended the previous match attempt. For
example:
.sp
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
data> aac\e=callout_extra
New match attempt
--->aac
+0 ^ (
+1 ^ a+
+3 ^ ^ )
+4 ^ ^ b
Backtrack
--->aac
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
No match
.sp
Notice that various optimizations must be turned off if you want all possible
matching paths to be scanned. If \fBno_start_optimize\fP is not used, there is
an immediate "no match", without any callouts, because the starting
optimization fails to find "b" in the subject, which it knows must be present
for any match. If \fBno_auto_possess\fP is not used, the "a+" item is turned
into "a++", which reduces the number of backtracks.
.P
The \fBcallout_extra\fP modifier has no effect if used with the DFA matching
function, or with JIT.
.
.
.SS "Return values from callouts"
.rs
.sp
The default return from the callout function is zero, which allows matching to
continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If
there is only one number, 1 is returned instead of 0 (causing matching to
backtrack) when a callout of that number is reached. If two numbers (<n>:<m>)
are given, 1 is returned when callout <n> is reached and there have been at
least <m> callouts. The \fBcallout_error\fP modifier is similar, except that
PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
aborted. If both these modifiers are set for the same callout number,
\fBcallout_error\fP takes precedence. Note that callouts with string arguments
are always given the number zero.
.P
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
This is set as the "user data" that is passed to the matching function, and
passed back when the callout function is invoked. Any value other than zero is
used as a return from \fBpcre2test\fP's callout function.
.P
Inserting callouts can be helpful when using \fBpcre2test\fP to check
complicated regular expressions. For further information about callouts, see
the
.\" HREF
\fBpcre2callout\fP
.\"
documentation.
.
.
.
.SH "NON-PRINTING CHARACTERS"
.rs
@ -1894,6 +1962,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 17 October 2017
Last updated: 21 December 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -120,6 +120,10 @@ COMMAND LINE OPTIONS
is, insert automatic callouts into every pattern that is com-
piled.
-AC As for -ac, but in addition behave as if each subject line
has the callout_extra modifier, that is, show additional
information from callouts.
-b Behave as if each pattern has the fullbincode modifier; the
full internal binary form of the pattern is output after com-
pilation.
@ -1056,6 +1060,7 @@ SUBJECT MODIFIERS
callout_capture show captures at callout time
callout_data=<n> set a value to pass via callouts
callout_error=<n>[:<m>] control callout error
callout_extra show extra callout information
callout_fail=<n>[:<m>] control callout failure
callout_no_where do not show position of a callout
callout_none do not supply a callout function
@ -1529,63 +1534,30 @@ RESTARTING AFTER A PARTIAL MATCH
CALLOUTS
If the pattern contains any callout requests, pcre2test's callout func-
tion is called during matching unless callout_none is specified. This
works with both matching functions.
The callout function in pcre2test returns zero (carry on matching) by
default, but you can use a callout_fail modifier in a subject line to
change this and other parameters of the callout.
If callout_capture is set, the current captured groups are output when
a callout occurs. By default, the callout function then generates out-
put that indicates where the current match start and matching points
are in the subject, and what the next pattern item is. This output is
suppressed if the callout_no_where modifier is set.
The default return from the callout function is zero, which allows
matching to continue. The callout_fail modifier can be given one or two
numbers. If there is only one number, 1 is returned instead of 0 (caus-
ing matching to backtrack) when a callout of that number is reached. If
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
reached and there have been at least <m> callouts. The callout_error
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
ing the entire matching process to be aborted. If both these modifiers
are set for the same callout number, callout_error takes precedence.
Note that callouts with string arguments are always given the number
zero. See
The callout_data modifier can be given an unsigned or a negative num-
ber. This is set as the "user data" that is passed to the matching
function, and passed back when the callout function is invoked. Any
value other than zero is used as a return from pcre2test's callout
function.
Inserting callouts can be helpful when using pcre2test to check compli-
cated regular expressions. For further information about callouts, see
the pcre2callout documentation.
The output for callouts with numerical arguments and those with string
arguments is slightly different.
tion is called during matching unless callout_none is specified. This
works with both matching functions, and with JIT, though there are some
differences in behaviour. The output for callouts with numerical argu-
ments and those with string arguments is slightly different.
Callouts with numerical arguments
By default, the callout function displays the callout number, the start
and current positions in the subject text at the callout time, and the
and current positions in the subject text at the callout time, and the
next pattern item to be tested. For example:
--->pqrabcdef
0 ^ ^ \d
This output indicates that callout number 0 occurred for a match
attempt starting at the fourth character of the subject string, when
the pointer was at the seventh character, and when the next pattern
item was \d. Just one circumflex is output if the start and current
positions are the same, or if the current position precedes the start
This output indicates that callout number 0 occurred for a match
attempt starting at the fourth character of the subject string, when
the pointer was at the seventh character, and when the next pattern
item was \d. Just one circumflex is output if the start and current
positions are the same, or if the current position precedes the start
position, which can happen if the callout is in a lookbehind assertion.
Callouts numbered 255 are assumed to be automatic callouts, inserted as
a result of the auto_callout pattern modifier. In this case, instead of
showing the callout number, the offset in the pattern, preceded by a
showing the callout number, the offset in the pattern, preceded by a
plus, is output. For example:
re> /\d?[A-E]\*/auto_callout
@ -1598,7 +1570,7 @@ CALLOUTS
0: E*
If a pattern contains (*MARK) items, an additional line is output when-
ever a change of latest mark is passed to the callout function. For
ever a change of latest mark is passed to the callout function. For
example:
re> /a(*MARK:X)bc/auto_callout
@ -1612,17 +1584,17 @@ CALLOUTS
+12 ^ ^
0: abc
The mark changes between matching "a" and "b", but stays the same for
the rest of the match, so nothing more is output. If, as a result of
backtracking, the mark reverts to being unset, the text "<unset>" is
The mark changes between matching "a" and "b", but stays the same for
the rest of the match, so nothing more is output. If, as a result of
backtracking, the mark reverts to being unset, the text "<unset>" is
output.
Callouts with string arguments
The output for a callout with a string argument is similar, except that
instead of outputting a callout number before the position indicators,
the callout string and its offset in the pattern string are output
before the reflection of the subject string, and the subject string is
instead of outputting a callout number before the position indicators,
the callout string and its offset in the pattern string are output
before the reflection of the subject string, and the subject string is
reflected for each callout. For example:
re> /^ab(?C'first')cd(?C"second")ef/
@ -1636,6 +1608,100 @@ CALLOUTS
0: abcdef
Callout modifiers
The callout function in pcre2test returns zero (carry on matching) by
default, but you can use a callout_fail modifier in a subject line to
change this and other parameters of the callout (see below).
If the callout_capture modifier is set, the current captured groups are
output when a callout occurs. This is useful only for non-DFA matching,
as pcre2_dfa_match() does not support capturing, so no captures are
ever shown.
The normal callout output, showing the callout number or pattern offset
(as described above) is suppressed if the callout_no_where modifier is
set.
When using the interpretive matching function pcre2_match() without
JIT, setting the callout_extra modifier causes additional output from
pcre2test's callout function to be generated. For the first callout in
a match attempt at a new starting position in the subject, "New match
attempt" is output. If there has been a backtrack since the last call-
out (or start of matching if this is the first callout), "Backtrack" is
output, followed by "No other matching paths" if the backtrack ended
the previous match attempt. For example:
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
data> aac\=callout_extra
New match attempt
--->aac
+0 ^ (
+1 ^ a+
+3 ^ ^ )
+4 ^ ^ b
Backtrack
--->aac
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
+3 ^^ )
+4 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
Backtrack
No other matching paths
New match attempt
--->aac
+0 ^ (
+1 ^ a+
No match
Notice that various optimizations must be turned off if you want all
possible matching paths to be scanned. If no_start_optimize is not
used, there is an immediate "no match", without any callouts, because
the starting optimization fails to find "b" in the subject, which it
knows must be present for any match. If no_auto_possess is not used,
the "a+" item is turned into "a++", which reduces the number of back-
tracks.
The callout_extra modifier has no effect if used with the DFA matching
function, or with JIT.
Return values from callouts
The default return from the callout function is zero, which allows
matching to continue. The callout_fail modifier can be given one or two
numbers. If there is only one number, 1 is returned instead of 0 (caus-
ing matching to backtrack) when a callout of that number is reached. If
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
reached and there have been at least <m> callouts. The callout_error
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
ing the entire matching process to be aborted. If both these modifiers
are set for the same callout number, callout_error takes precedence.
Note that callouts with string arguments are always given the number
zero.
The callout_data modifier can be given an unsigned or a negative num-
ber. This is set as the "user data" that is passed to the matching
function, and passed back when the callout function is invoked. Any
value other than zero is used as a return from pcre2test's callout
function.
Inserting callouts can be helpful when using pcre2test to check compli-
cated regular expressions. For further information about callouts, see
the pcre2callout documentation.
NON-PRINTING CHARACTERS
When pcre2test is outputting text in the compiled version of a pattern,
@ -1733,5 +1799,5 @@ AUTHOR
REVISION
Last updated: 17 October 2017
Last updated: 21 December 2017
Copyright (c) 1997-2017 University of Cambridge.

View File

@ -494,6 +494,11 @@ without changing the API of the function, thereby allowing old clients to work
without modification. Define the generic version in a macro; the width-specific
versions are generated from this macro below. */
/* Flags for the callout_flags field. These are cleared after a callout. */
#define PCRE2_CALLOUT_STARTMATCH 0x00000001u /* Set for each bumpalong */
#define PCRE2_CALLOUT_BACKTRACK 0x00000002u /* Set after a backtrack */
#define PCRE2_STRUCTURE_LIST \
typedef struct pcre2_callout_block { \
uint32_t version; /* Identifies version of block */ \
@ -513,6 +518,8 @@ typedef struct pcre2_callout_block { \
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
/* ------------------- Added for Version 2 -------------------------- */ \
uint32_t callout_flags; /* See above for list */ \
/* ------------------------------------------------------------------ */ \
} pcre2_callout_block; \
\

View File

@ -494,6 +494,11 @@ without changing the API of the function, thereby allowing old clients to work
without modification. Define the generic version in a macro; the width-specific
versions are generated from this macro below. */
/* Flags for the callout_flags field. These are cleared after a callout. */
#define PCRE2_CALLOUT_STARTMATCH 0x00000001u /* Set for each bumpalong */
#define PCRE2_CALLOUT_BACKTRACK 0x00000002u /* Set after a backtrack */
#define PCRE2_STRUCTURE_LIST \
typedef struct pcre2_callout_block { \
uint32_t version; /* Identifies version of block */ \
@ -513,6 +518,8 @@ typedef struct pcre2_callout_block { \
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
/* ------------------- Added for Version 2 -------------------------- */ \
uint32_t callout_flags; /* See above for list */ \
/* ------------------------------------------------------------------ */ \
} pcre2_callout_block; \
\

View File

@ -2574,7 +2574,8 @@ for (;;)
if (mb->callout != NULL)
{
pcre2_callout_block cb;
cb.version = 1;
cb.version = 2;
cb.callout_flags = 0;
cb.capture_top = 1;
cb.capture_last = 0;
cb.offset_vector = offsets;
@ -2943,7 +2944,8 @@ for (;;)
if (mb->callout != NULL)
{
pcre2_callout_block cb;
cb.version = 1;
cb.version = 2;
cb.callout_flags = 0;
cb.capture_top = 1;
cb.capture_last = 0;
cb.offset_vector = offsets;

View File

@ -7952,7 +7952,8 @@ oveccount = callout_block->capture_top;
SLJIT_ASSERT(oveccount >= 1);
callout_block->version = 1;
callout_block->version = 2;
callout_block->callout_flags = 0;
/* Offsets in subject. */
callout_block->subject_length = arguments->end - arguments->begin;

View File

@ -321,6 +321,7 @@ callout_ovector[0] = callout_ovector[1] = PCRE2_UNSET;
rc = mb->callout(cb, mb->callout_data);
callout_ovector[0] = save0;
callout_ovector[1] = save1;
cb->callout_flags = 0;
return rc;
}
@ -5919,8 +5920,9 @@ in rrc. */
#define LBL(val) case val: goto L_RM##val;
RETURN_SWITCH:
if (Frdepth == 0) return rrc; /* Exit from the top level */
F = (heapframe *)((char *)F - Fback_frame); /* Back track */
if (Frdepth == 0) return rrc; /* Exit from the top level */
F = (heapframe *)((char *)F - Fback_frame); /* Back track */
mb->cb->callout_flags |= PCRE2_CALLOUT_BACKTRACK; /* Note for callouts */
#ifdef DEBUG_SHOW_RMATCH
fprintf(stderr, "++ RETURN %d to %d\n", rrc, Freturn_id);
@ -6171,13 +6173,14 @@ startline = (re->flags & PCRE2_STARTLINE) != 0;
bumpalong_limit = (mcontext->offset_limit == PCRE2_UNSET)?
end_subject : subject + mcontext->offset_limit;
/* Set up the fixed fields in the callout block, with a pointer in the
match block. */
/* Initialize and set up the fixed fields in the callout block, with a pointer
in the match block. */
mb->cb = &cb;
cb.version = 1;
cb.version = 2;
cb.subject = subject;
cb.subject_length = (PCRE2_SIZE)(end_subject - subject);
cb.callout_flags = 0;
/* Fill in the remaining fields in the match block. */
@ -6644,6 +6647,8 @@ for(;;)
first starting point for which a partial match was found. */
cb.start_match = (PCRE2_SIZE)(start_match - subject);
cb.callout_flags |= PCRE2_CALLOUT_STARTMATCH;
mb->start_used_ptr = start_match;
mb->last_used_ptr = start_match;
mb->match_call_count = 0;

View File

@ -485,6 +485,7 @@ so many of them that they are split into two fields. */
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
#define CTL2_SUBJECT_LITERAL 0x00000010u
#define CTL2_CALLOUT_NO_WHERE 0x00000020u
#define CTL2_CALLOUT_EXTRA 0x00000040u
#define CTL2_NL_SET 0x40000000u /* Informational */
#define CTL2_BSR_SET 0x80000000u /* Informational */
@ -598,6 +599,7 @@ static modstruct modlist[] = {
{ "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
{ "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) },
{ "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) },
{ "callout_extra", MOD_DAT, MOD_CTL, CTL2_CALLOUT_EXTRA, DO(control2) },
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
{ "callout_no_where", MOD_DAT, MOD_CTL, CTL2_CALLOUT_NO_WHERE, DO(control2) },
@ -3971,7 +3973,7 @@ Returns: nothing
static void
show_controls(uint32_t controls, uint32_t controls2, const char *before)
{
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@ -3981,6 +3983,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
((controls & CTL_BINCODE) != 0)? " bincode" : "",
((controls2 & CTL2_BSR_SET) != 0)? " bsr" : "",
((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "",
((controls2 & CTL2_CALLOUT_EXTRA) != 0)? " callout_extra" : "",
((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "",
((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "",
((controls2 & CTL2_CALLOUT_NO_WHERE) != 0)? " callout_no_where" : "",
@ -4409,7 +4412,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)
pattern_info(PCRE2_INFO_ARGOPTIONS, &compile_options, FALSE);
pattern_info(PCRE2_INFO_ALLOPTIONS, &overall_options, FALSE);
pattern_info(PCRE2_INFO_EXTRAOPTIONS, &extra_options, FALSE);
pattern_info(PCRE2_INFO_EXTRAOPTIONS, &extra_options, FALSE);
/* Remove UTF/UCP if they were there only because of forbid_utf. This saves
cluttering up the verification output of non-UTF test files. */
@ -4436,9 +4439,9 @@ if ((pat_patctl.control & CTL_INFO) != 0)
show_compile_options(overall_options, "Overall options:", "\n");
}
}
if (extra_options != 0)
show_compile_extra_options(extra_options, "Extra options:", "\n");
if (extra_options != 0)
show_compile_extra_options(extra_options, "Extra options:", "\n");
if (jchanged) fprintf(outfile, "Duplicate name status changes\n");
@ -5842,17 +5845,43 @@ Return:
static int
callout_function(pcre2_callout_block_8 *cb, void *callout_data_ptr)
{
FILE *f, *fdefault;
uint32_t i, pre_start, post_start, subject_length;
PCRE2_SIZE current_position;
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
BOOL callout_capture = (dat_datctl.control & CTL_CALLOUT_CAPTURE) != 0;
BOOL callout_where = (dat_datctl.control2 & CTL2_CALLOUT_NO_WHERE) == 0;
/* This FILE is used for echoing the subject. This is done only once in simple
cases. */
/* The FILE f is used for echoing the subject string if it is non-NULL. This
happens only once in simple cases, but we want to repeat after any additional
output caused by CALLOUT_EXTRA. */
FILE *f = (first_callout || callout_capture || cb->callout_string != NULL)?
outfile : NULL;
fdefault = (!first_callout && !callout_capture && cb->callout_string == NULL)?
NULL : outfile;
if ((dat_datctl.control2 & CTL2_CALLOUT_EXTRA) != 0)
{
f = outfile;
switch (cb->callout_flags)
{
case PCRE2_CALLOUT_BACKTRACK:
fprintf(f, "Backtrack\n");
break;
case PCRE2_CALLOUT_STARTMATCH|PCRE2_CALLOUT_BACKTRACK:
fprintf(f, "Backtrack\nNo other matching paths\n");
/* Fall through */
case PCRE2_CALLOUT_STARTMATCH:
fprintf(f, "New match attempt\n");
break;
default:
f = fdefault;
break;
}
}
else f = fdefault;
/* For a callout with a string argument, show the string first because there
isn't a tidy way to fit it in the rest of the data. */
@ -5902,7 +5931,6 @@ lengths of the substrings. */
if (callout_where)
{
if (f != NULL) fprintf(f, "--->");
/* The subject before the match start. */
@ -5931,9 +5959,10 @@ if (callout_where)
if (f != NULL) fprintf(f, "\n");
/* For automatic callouts, show the pattern offset. Otherwise, for a numerical
callout whose number has not already been shown with captured strings, show the
number here. A callout with a string argument has been displayed above. */
/* For automatic callouts, show the pattern offset. Otherwise, for a
numerical callout whose number has not already been shown with captured
strings, show the number here. A callout with a string argument has been
displayed above. */
if (cb->callout_number == 255)
{
@ -5963,6 +5992,8 @@ if (callout_where)
if (cb->next_item_length != 0)
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
pbuffer8 + cb->pattern_position);
else
fprintf(outfile, "End of pattern");
fprintf(outfile, "\n");
}
@ -7685,7 +7716,8 @@ printf(" -16 use the 16-bit library\n");
#ifdef SUPPORT_PCRE2_32
printf(" -32 use the 32-bit library\n");
#endif
printf(" -ac set default pattern option PCRE2_AUTO_CALLOUT\n");
printf(" -ac set default pattern modifier PCRE2_AUTO_CALLOUT\n");
printf(" -AC as -ac, but also set subject 'callout_extra' modifier\n");
printf(" -b set default pattern modifier 'fullbincode'\n");
printf(" -C show PCRE2 compile-time options and exit\n");
printf(" -C arg show a specific compile-time option and exit with its\n");
@ -8181,6 +8213,11 @@ while (argc > 1 && argv[op][0] == '-' && argv[op][1] != 0)
/* Set some common pattern and subject controls */
else if (strcmp(arg, "-AC") == 0)
{
def_patctl.options |= PCRE2_AUTO_CALLOUT;
def_datctl.control2 |= CTL2_CALLOUT_EXTRA;
}
else if (strcmp(arg, "-ac") == 0) def_patctl.options |= PCRE2_AUTO_CALLOUT;
else if (strcmp(arg, "-b") == 0) def_patctl.control |= CTL_FULLBINCODE;
else if (strcmp(arg, "-d") == 0) def_patctl.control |= CTL_DEBUG;

12
testdata/testinput2 vendored
View File

@ -5383,6 +5383,16 @@ a)"xI
"(?=(a))\1?b"I
ab
aaab
aaab
# JIT does not support callout_extra
/(*NO_JIT)(a+)b/auto_callout,no_start_optimize,no_auto_possess
\= Expect no match
aac\=callout_extra
/(*NO_JIT)a+(?C'XXX')b/no_start_optimize,no_auto_possess
\= Expect no match
aac\=callout_extra
# End of testinput2

24
testdata/testoutput15 vendored
View File

@ -361,12 +361,12 @@ Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Subject length lower bound = 1
abc\=callout_fail=1
--->abc
1 ^ ^
1 ^ ^
1 ^^
1 ^ ^
1 ^^
1 ^^
1 ^ ^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^^ End of pattern
No match
/(*NO_AUTO_POSSESS)\w+(?C1)/BI
@ -385,12 +385,12 @@ Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Subject length lower bound = 1
abc\=callout_fail=1
--->abc
1 ^ ^
1 ^ ^
1 ^^
1 ^ ^
1 ^^
1 ^^
1 ^ ^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^ ^ End of pattern
1 ^^ End of pattern
1 ^^ End of pattern
No match
# This test breaks the JIT stack limit

146
testdata/testoutput2 vendored
View File

@ -3832,7 +3832,7 @@ Subject length lower bound = 2
\= Expect no match
abbbbbccc\=callout_data=1
--->abbbbbccc
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
No match
@ -3844,21 +3844,21 @@ Subject length lower bound = 2
\= Expect no match
abbbbbccc\=callout_data=1
--->abbbbbccc
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
1 ^ ^
1 ^ ^ End of pattern
Callout data = 1
No match
@ -4718,7 +4718,7 @@ Subject length lower bound = 5
+2 ^ ^ c
+3 ^ ^ d
+4 ^ ^ e
+5 ^ ^
+5 ^ ^ End of pattern
0: abcde
\= Expect no match
abcdfe
@ -4750,13 +4750,13 @@ Subject length lower bound = 1
--->ab
+0 ^ a*
+2 ^^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: ab
aaaab
--->aaaab
+0 ^ a*
+2 ^ ^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: aaaab
aaaacb
--->aaaacb
@ -4770,7 +4770,7 @@ Subject length lower bound = 1
+2 ^^ b
+0 ^ a*
+2 ^ b
+3 ^^
+3 ^^ End of pattern
0: b
/a*b/IB,auto_callout
@ -4793,13 +4793,13 @@ Subject length lower bound = 1
--->ab
+0 ^ a*
+2 ^^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: ab
aaaab
--->aaaab
+0 ^ a*
+2 ^ ^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: aaaab
aaaacb
--->aaaacb
@ -4813,7 +4813,7 @@ Subject length lower bound = 1
+2 ^^ b
+0 ^ a*
+2 ^ b
+3 ^^
+3 ^^ End of pattern
0: b
/a+b/IB,auto_callout
@ -4836,13 +4836,13 @@ Subject length lower bound = 2
--->ab
+0 ^ a+
+2 ^^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: ab
aaaab
--->aaaab
+0 ^ a+
+2 ^ ^ b
+3 ^ ^
+3 ^ ^ End of pattern
0: aaaab
\= Expect no match
aaaacb
@ -4897,7 +4897,7 @@ Subject length lower bound = 4
+3 ^ ^ c
+4 ^ ^ |
+9 ^ ^ x
+10 ^ ^
+10 ^ ^ End of pattern
0: abcx
1: abc
defx
@ -4909,7 +4909,7 @@ Subject length lower bound = 4
+7 ^ ^ f
+8 ^ ^ )
+9 ^ ^ x
+10 ^ ^
+10 ^ ^ End of pattern
0: defx
1: def
\= Expect no match
@ -4971,7 +4971,7 @@ Subject length lower bound = 4
+3 ^ ^ c
+4 ^ ^ |
+9 ^ ^ x
+10 ^ ^
+10 ^ ^ End of pattern
0: abcx
1: abc
defx
@ -4983,7 +4983,7 @@ Subject length lower bound = 4
+7 ^ ^ f
+8 ^ ^ )
+9 ^ ^ x
+10 ^ ^
+10 ^ ^ End of pattern
0: defx
1: def
\= Expect no match
@ -5024,7 +5024,7 @@ Subject length lower bound = 6
+3 ^ ^ |
+1 ^ ^ a
+4 ^ ^ c
+12 ^ ^
+12 ^ ^ End of pattern
0: ababab
1: ab
abcdabcd
@ -5044,7 +5044,7 @@ Subject length lower bound = 6
+4 ^ ^ c
+5 ^ ^ d
+6 ^ ^ ){3,4}
+12 ^ ^
+12 ^ ^ End of pattern
0: abcdabcd
1: cd
abcdcdcdcdcd
@ -5065,7 +5065,7 @@ Subject length lower bound = 6
+4 ^ ^ c
+5 ^ ^ d
+6 ^ ^ ){3,4}
+12 ^ ^
+12 ^ ^ End of pattern
0: abcdcdcd
1: cd
@ -5276,7 +5276,7 @@ Subject length lower bound = 11
+21 ^ ^ 1
+22 ^ ^ 2
+23 ^ ^ 3
+24 ^ ^
+24 ^ ^ End of pattern
0: aacaacaacaacaac123
1: aac
@ -8900,7 +8900,7 @@ Subject length lower bound = 0
+7 ^ b
+11 ^ ^
+12 ^ )
+13 ^
+13 ^ End of pattern
0:
abc
--->abc
@ -8921,7 +8921,7 @@ Subject length lower bound = 0
+8 ^^ )
+9 ^ b
+10 ^^ |
+13 ^^
+13 ^^ End of pattern
0: b
/(?(?=b).*b|^d)/I
@ -8938,14 +8938,14 @@ Subject length lower bound = 1
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
abcxyz
--->abcxyz
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
\= Expect no match
abc
@ -8962,7 +8962,7 @@ No match
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
\= Expect no match
abc
@ -8996,7 +8996,7 @@ No match
+15 ^ x
+16 ^^ y
+17 ^ ^ z
+18 ^ ^
+18 ^ ^ End of pattern
0: xyz
/(*NO_AUTO_POSSESS)a+b/B
@ -9017,7 +9017,7 @@ No match
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
/^"((?(?=[a])[^"])|b)*"$/auto_callout
@ -9046,7 +9046,7 @@ No match
+17 ^ ^ |
+21 ^ ^ "
+22 ^ ^ $
+23 ^ ^
+23 ^ ^ End of pattern
0: "ab"
1:
@ -11136,7 +11136,7 @@ Latest Mark: A
+10 ^ ^ |
+18 ^ ^ z
+19 ^ ^ |
+24 ^ ^
+24 ^ ^ End of pattern
0: adz
1: adz
2: d
@ -11155,7 +11155,7 @@ Latest Mark: A
Latest Mark: B
+18 ^ ^ z
+19 ^ ^ |
+24 ^ ^
+24 ^ ^ End of pattern
0: aez
1: aez
2: e
@ -11177,7 +11177,7 @@ Latest Mark: B
+21 ^^ e
+22 ^ ^ q
+23 ^ ^ )
+24 ^ ^
+24 ^ ^ End of pattern
0: aeq
1: aeq
@ -11951,7 +11951,7 @@ Partial match: 123a
+11 ^ b
+12 ^^ b
+13 ^ ^ )
+14 ^ ^
+14 ^ ^ End of pattern
0: bb
/(?C1)^(?C2)(?(?C99)(?=(?C3)a(?C4))(?C5)a(?C6)a(?C7)|(?C8)b(?C9)b(?C10))(?C11)/
@ -11964,7 +11964,7 @@ Partial match: 123a
8 ^ b
9 ^^ b
10 ^ ^ )
11 ^ ^
11 ^ ^ End of pattern
0: bb
# Perl seems to have a bug with this one.
@ -15144,7 +15144,7 @@ Subject length lower bound = 0
+0 ^ (
+1 ^ )\Q\E*
+7 ^ ]
+8 ^^
+8 ^^ End of pattern
0: ]
1:
@ -15428,7 +15428,7 @@ Failed: error 125 at offset 13: lookbehind assertion is not fixed length
+0 ^ a
+1 ^^ b
1 ^ ^ c
+8 ^ ^
+8 ^ ^ End of pattern
0: abc
/'ab(?C1)c'/hex,auto_callout
@ -15437,7 +15437,7 @@ Failed: error 125 at offset 13: lookbehind assertion is not fixed length
+0 ^ a
+1 ^^ b
1 ^ ^ c
+8 ^ ^
+8 ^ ^ End of pattern
0: abc
# Perl accepts these, but gives a warning. We can't warn, so give an error.
@ -16256,7 +16256,7 @@ Failed: error 192 at offset 0: invalid option bits with PCRE2_LITERAL
+2 ^ ^ b
+3 ^ ^ (
+4 ^ ^ c
+5 ^ ^
+5 ^ ^ End of pattern
0: a\b(c
/a\b(c/literal,auto_callout
@ -16267,7 +16267,7 @@ Failed: error 192 at offset 0: invalid option bits with PCRE2_LITERAL
+2 ^ ^ b
+3 ^ ^ (
+4 ^ ^ c
+5 ^ ^
+5 ^ ^ End of pattern
0: a\b(c
/(*CR)abc/literal
@ -16380,9 +16380,65 @@ Subject length lower bound = 1
ab
0: ab
1: a
aaab
aaab
0: ab
1: a
# JIT does not support callout_extra
/(*NO_JIT)(a+)b/auto_callout,no_start_optimize,no_auto_possess
\= Expect no match
aac\=callout_extra
New match attempt
--->aac
+9 ^ (
+10 ^ a+
+12 ^ ^ )
+13 ^ ^ b
Backtrack
--->aac
+12 ^^ )
+13 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+9 ^ (
+10 ^ a+
+12 ^^ )
+13 ^^ b
Backtrack
No other matching paths
New match attempt
--->aac
+9 ^ (
+10 ^ a+
Backtrack
No other matching paths
New match attempt
--->aac
+9 ^ (
+10 ^ a+
No match
/(*NO_JIT)a+(?C'XXX')b/no_start_optimize,no_auto_possess
\= Expect no match
aac\=callout_extra
New match attempt
Callout (15): 'XXX'
--->aac
^ ^ b
Backtrack
Callout (15): 'XXX'
--->aac
^^ b
Backtrack
No other matching paths
New match attempt
Callout (15): 'XXX'
--->aac
^^ b
No match
# End of testinput2
Error -65: PCRE2_ERROR_BADDATA (unknown error number)

View File

@ -3763,7 +3763,7 @@ No match
abcd
--->abcd
+0 ^ \w+
+3 ^ ^
+3 ^ ^ End of pattern
0: abcd
/[\p{N}]?+/B,no_auto_possess
@ -4165,7 +4165,7 @@ Failed: error 125 at offset 2: lookbehind assertion is not fixed length
+0 ^ .
+0 ^ .
+1 ^ ^ .
+2 ^ ^
+2 ^ ^ End of pattern
0: \x{123}\x{123}
# This tests processing wide characters in extended mode.

22
testdata/testoutput6 vendored
View File

@ -726,7 +726,7 @@ No match
+4 ^ ^ c
+2 ^ ^ b
+3 ^ ^ |
+12 ^ ^
+12 ^ ^ End of pattern
+1 ^ ^ a
+4 ^ ^ c
0: ababab
@ -745,12 +745,12 @@ No match
+4 ^ ^ c
+2 ^ ^ b
+3 ^ ^ |
+12 ^ ^
+12 ^ ^ End of pattern
+1 ^ ^ a
+4 ^ ^ c
+5 ^ ^ d
+6 ^ ^ ){3,4}
+12 ^ ^
+12 ^ ^ End of pattern
0: abcdabcd
1: abcdab
abcdcdcdcdcd
@ -768,12 +768,12 @@ No match
+4 ^ ^ c
+5 ^ ^ d
+6 ^ ^ ){3,4}
+12 ^ ^
+12 ^ ^ End of pattern
+1 ^ ^ a
+4 ^ ^ c
+5 ^ ^ d
+6 ^ ^ ){3,4}
+12 ^ ^
+12 ^ ^ End of pattern
0: abcdcdcd
1: abcdcd
@ -6610,14 +6610,14 @@ No match
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
abcxyz
--->abcxyz
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
\= Expect no match
abc
@ -6634,7 +6634,7 @@ No match
+0 ^ x
+1 ^^ y
+2 ^ ^ z
+3 ^ ^
+3 ^ ^ End of pattern
0: xyz
\= Expect no match
abc
@ -6668,7 +6668,7 @@ No match
+15 ^ x
+16 ^^ y
+17 ^ ^ z
+18 ^ ^
+18 ^ ^ End of pattern
0: xyz
/(?C)ab/
@ -6684,7 +6684,7 @@ No match
--->ab
+0 ^ a
+1 ^^ b
+2 ^ ^
+2 ^ ^ End of pattern
0: ab
ab\=callout_none
0: ab
@ -6717,7 +6717,7 @@ No match
+8 ^ [a]
+17 ^ ^ |
+22 ^ ^ $
+23 ^ ^
+23 ^ ^ End of pattern
0: "ab"
"ab"\=callout_none
0: "ab"