Documentation for callouts with string arguments.

This commit is contained in:
Philip.Hazel 2015-03-15 17:49:03 +00:00
parent 15e034c9c2
commit 2ec7cbf9b5
6 changed files with 240 additions and 107 deletions

76
HACKING
View File

@ -8,7 +8,7 @@ library is referred to as PCRE1 below. For information about testing PCRE2, see
the pcre2test documentation and the comment at the head of the RunTest file. the pcre2test documentation and the comment at the head of the RunTest file.
PCRE1 releases were up to 8.3x when PCRE2 was developed. The 8.xx series will PCRE1 releases were up to 8.3x when PCRE2 was developed. The 8.xx series will
continue for bugfixes if necessary. PCRE2 releases start at 10.0 to avoid continue for bugfixes if necessary. PCRE2 releases started at 10.00 to avoid
confusion with PCRE1. confusion with PCRE1.
@ -39,7 +39,7 @@ subsequently heavily modified for Perl) compiles the expression twice: once in
a dummy mode in order to find out how much store will be needed, and then for a dummy mode in order to find out how much store will be needed, and then for
real. (The Perl version probably doesn't do this any more; I'm talking about real. (The Perl version probably doesn't do this any more; I'm talking about
the original library.) The execution function operates by backtracking and the original library.) The execution function operates by backtracking and
maximizing (or, optionally, minimizing in Perl) the amount of the subject that maximizing (or, optionally, minimizing, in Perl) the amount of the subject that
matches individual wild portions of the pattern. This is an "NFA algorithm" in matches individual wild portions of the pattern. This is an "NFA algorithm" in
Friedl's terminology. Friedl's terminology.
@ -63,7 +63,7 @@ modes, creating up to three different libraries. In the description that
follows, the word "short" is used for a 16-bit data quantity, and the phrase follows, the word "short" is used for a 16-bit data quantity, and the phrase
"code unit" is used for a quantity that is a byte in 8-bit mode, a short in "code unit" is used for a quantity that is a byte in 8-bit mode, a short in
16-bit mode and a 32-bit word in 32-bit mode. The names of PCRE2 functions are 16-bit mode and a 32-bit word in 32-bit mode. The names of PCRE2 functions are
given in generic form, without a _8, _16, or _32 suffix. given in generic form, without the _8, _16, or _32 suffix.
Computing the memory requirement: how it was Computing the memory requirement: how it was
@ -100,8 +100,9 @@ issue, and in the event, nobody has commented on it.
At release 8.34, a limit on the nesting depth of parentheses was re-introduced At release 8.34, a limit on the nesting depth of parentheses was re-introduced
(default 250, settable at build time) so as to put a limit on the amount of (default 250, settable at build time) so as to put a limit on the amount of
system stack used by the compile function. This is a safety feature for system stack used by the compile function, which uses recursive function calls
environments with small stacks where the patterns are provided by users. for nested parenthesized groups. This is a safety feature for environments with
small stacks where the patterns are provided by users.
Traditional matching function Traditional matching function
@ -158,8 +159,9 @@ default value for LINK_SIZE is 2, except for the 32-bit library, where it can
only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values, only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
and the 16-bit library can be compiled to use 4-byte values, though this and the 16-bit library can be compiled to use 4-byte values, though this
impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
necessary only when patterns whose compiled length is greater than 64K are necessary only when patterns whose compiled length is greater than 64K code
going to be processed. units are going to be processed. When a LINK_SIZE value uses more than one code
unit, the most significant unit is first.
In this description, we assume the "normal" compilation options. Data values In this description, we assume the "normal" compilation options. Data values
that are counts (e.g. quantifiers) are always two bytes long in 8-bit mode that are counts (e.g. quantifiers) are always two bytes long in 8-bit mode
@ -343,7 +345,7 @@ For classes containing characters with values greater than 255 or that contain
code points are less than 256, followed by a list of pairs (for a range) and code points are less than 256, followed by a list of pairs (for a range) and
single characters. In caseless mode, both cases are explicitly listed. single characters. In caseless mode, both cases are explicitly listed.
OP_XCLASS is followed by a LINK_SIZE item containing the total length of the OP_XCLASS is followed by a LINK_SIZE value containing the total length of the
opcode and its data. This is followed by a code unit containing flag bits: opcode and its data. This is followed by a code unit containing flag bits:
XCL_NOT indicates that this is a negative class, and XCL_MAP indicates that a XCL_NOT indicates that this is a negative class, and XCL_MAP indicates that a
bit map is present. There follows the bit map, if XCL_MAP is set, and then a bit map is present. There follows the bit map, if XCL_MAP is set, and then a
@ -356,7 +358,7 @@ sequence of items coded as follows:
XCL_NOTPROP a Unicode property (type, value) follows XCL_NOTPROP a Unicode property (type, value) follows
If a range starts with a code point less than 256 and ends with one greater If a range starts with a code point less than 256 and ends with one greater
than 256, it is split into two ranges, with characters less than 256 being than 255, it is split into two ranges, with characters less than 256 being
indicated in the bit map, and the rest with XCL_RANGE. indicated in the bit map, and the rest with XCL_RANGE.
When XCL_NOT is set, the bit map, if present, contains bits for characters that When XCL_NOT is set, the bit map, if present, contains bits for characters that
@ -412,17 +414,17 @@ compile time, so alternation always happens in the context of brackets.
myself, can be round, square, curly, or pointy. Hence this usage rather than myself, can be round, square, curly, or pointy. Hence this usage rather than
"parentheses".] "parentheses".]
Non-capturing brackets use the opcode OP_BRA, capturing brackets use OP_CBRA. Non-capturing brackets use the opcode OP_BRA, capturing brackets use OP_CBRA. A
A bracket opcode is followed by LINK_SIZE bytes which give the offset to the bracket opcode is followed by a LINK_SIZE value which gives the offset to the
next alternative OP_ALT or, if there aren't any branches, to the matching next alternative OP_ALT or, if there aren't any branches, to the matching
OP_KET opcode. Each OP_ALT is followed by LINK_SIZE bytes giving the offset to OP_KET opcode. Each OP_ALT is followed by a LINK_SIZE value giving the offset
the next one, or to the OP_KET opcode. For capturing brackets, the bracket to the next one, or to the OP_KET opcode. For capturing brackets, the bracket
number is a count that immediately follows the offset. number is a count that immediately follows the offset.
OP_KET is used for subpatterns that do not repeat indefinitely, and OP_KETRMIN OP_KET is used for subpatterns that do not repeat indefinitely, and OP_KETRMIN
and OP_KETRMAX are used for indefinite repetitions, minimally or maximally and OP_KETRMAX are used for indefinite repetitions, minimally or maximally
respectively (see below for possessive repetitions). All three are followed by respectively (see below for possessive repetitions). All three are followed by
LINK_SIZE bytes giving (as a positive number) the offset back to the matching a LINK_SIZE value giving (as a positive number) the offset back to the matching
bracket opcode. bracket opcode.
If a subpattern is quantified such that it is permitted to match zero times, it If a subpattern is quantified such that it is permitted to match zero times, it
@ -520,8 +522,11 @@ tests the PCRE2 version number. This compiles into one of the opcodes OP_TRUE
or OP_FALSE. or OP_FALSE.
If a condition is not a back reference, recursion test, DEFINE, or VERSION, it If a condition is not a back reference, recursion test, DEFINE, or VERSION, it
must start with an assertion, whose opcode immediately follows OP_COND or must start with an assertion, whose opcode normally immediately follows OP_COND
OP_SCOND. or OP_SCOND. However, if automatic callouts are enabled, a callout is inserted
immediately before the assertion. It is also possible to insert a manual
callout at this point. Only assertion conditions may have callouts preceding
the condition.
Recursion Recursion
@ -529,22 +534,43 @@ Recursion
Recursion either matches the current pattern, or some subexpression. The opcode Recursion either matches the current pattern, or some subexpression. The opcode
OP_RECURSE is followed by a LINK_SIZE value that is the offset to the starting OP_RECURSE is followed by a LINK_SIZE value that is the offset to the starting
bracket from the start of the whole pattern. OP_RECURSE is automatically bracket from the start of the whole pattern. OP_RECURSE is also used for
wrapped inside OP_ONCE brackets, because otherwise some patterns broke it. "subroutine" calls, even though they are not strictly a recursion. Repeated
OP_RECURSE is also used for "subroutine" calls, even though they are not recursions are automatically wrapped inside OP_ONCE brackets, because otherwise
strictly a recursion. some patterns broke them. A non-repeated recursion is not wrapped in OP_ONCE
brackets, but it is nevertheless still treated as an atomic group.
Callout Callout
------- -------
OP_CALLOUT is followed by one code unit of data that holds a callout number in A callout can nowadays have either a numerical argument or a string argument.
the range 0 to 254 for manual callouts, or 255 for an automatic callout. In These use OP_CALLOUT or OP_CALLOUT_STR, respectively. In each case these are
both cases there follows a count giving the offset in the pattern string to the followed by two LINK_SIZE values giving the offset in the pattern string to the
start of the following item, and another count giving the length of this item. start of the following item, and another count giving the length of this item.
These values make it possible for pcre2test to output useful tracing These values make it possible for pcre2test to output useful tracing
information using automatic callouts. information using callouts.
In the case of a numeric callout, after these two values there is a single code
unit containing the callout number, in the range 0-255, with 255 being used for
callouts that are automatically inserted as a result of the PCRE2_AUTO_CALLOUT
option. Thus, this opcode item is of fixed length:
[OP_CALLOUT] [PATTERN_OFFSET] [PATTERN_LENGTH] [NUMBER]
For callouts with string arguments, OP_CALLOUT_STR has three more data items:
a LINK_SIZE value giving the complete length of the entire opcode item, a
LINK_SIZE item containing the offset within the pattern string to the start of
the string argument, and the string itself, preceded by its starting delimiter
and followed by a binary zero. When a callout function is called, a pointer to
the actual string is passed, but the delimiter can be accessed as string[-1] if
the application needs it. In the 8-bit library, the callout in /X(?C'abc')Y/ is
compiled as the following bytes (decimal numbers represent binary values):
[OP_CALLOUT] [0] [10] [0] [1] [0] [14] [0] [5] ['] [a] [b] [c] [0]
-------- ------- -------- -------
| | | |
------- LINK_SIZE items ------
Opcode table checking Opcode table checking
--------------------- ---------------------
@ -554,4 +580,4 @@ not a real opcode, but is used to check that tables indexed by opcode are the
correct length, in order to catch updating errors. correct length, in order to catch updating errors.
Philip Hazel Philip Hazel
February 2015 March 2015

View File

@ -1,4 +1,4 @@
.TH PCRE2CALLOUT 3 "02 January 2015" "PCRE2 10.00" .TH PCRE2CALLOUT 3 "15 March 2015" "PCRE2 10.20"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -15,18 +15,22 @@ PCRE2 - Perl-compatible regular expressions (revised API)
PCRE2 provides a feature called "callout", which is a means of temporarily PCRE2 provides a feature called "callout", which is a means of temporarily
passing control to the caller of PCRE2 in the middle of pattern matching. The passing control to the caller of PCRE2 in the middle of pattern matching. The
caller of PCRE2 provides an external function by putting its entry point in caller of PCRE2 provides an external function by putting its entry point in
a match context (see \fBpcre2_set_callout()\fP) in the a match context (see \fBpcre2_set_callout()\fP in the
.\" HREF .\" HREF
\fBpcre2api\fP \fBpcre2api\fP
.\" .\"
documentation). documentation).
.P .P
Within a regular expression, (?C) indicates the points at which the external Within a regular expression, (?C<arg>) indicates a point at which the external
function is to be called. Different callout points can be identified by putting function is to be called. Different callout points can be identified by putting
a number less than 256 after the letter C. The default value is zero. a number less than 256 after the letter C. The default value is zero.
For example, this pattern has two callout points: Alternatively, the argument may be a delimited string. The starting delimiter
must be one of ` ' " ^ % # $ { and the ending delimiter is the same as the
start, except for {, where the ending delimiter is }. If the ending delimiter
is needed within the string, it must be doubled. For example, this pattern has
two callout points:
.sp .sp
(?C1)abc(?C2)def (?C1)abc(?C"some ""arbitrary"" text")def
.sp .sp
If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2 If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
automatically inserts callouts, all with number 255, before each item in the automatically inserts callouts, all with number 255, before each item in the
@ -43,20 +47,19 @@ alternation bar. If the pattern contains a conditional group whose condition is
an assertion, an automatic callout is inserted immediately before the an assertion, an automatic callout is inserted immediately before the
condition. Such a callout may also be inserted explicitly, for example: condition. Such a callout may also be inserted explicitly, for example:
.sp .sp
(?(?C9)(?=a)ab|de) (?(?C9)(?=a)ab|de) (?(?C%text%)(?!=d)ab|de)
.sp .sp
This applies only to assertion conditions (because they are themselves This applies only to assertion conditions (because they are themselves
independent groups). independent groups).
.P .P
Automatic callouts can be used for tracking the progress of pattern matching. Callouts can be useful for tracking the progress of pattern matching. The
The
.\" HREF .\" HREF
\fBpcre2test\fP \fBpcre2test\fP
.\" .\"
program has a pattern qualifier (/auto_callout) that sets automatic callouts; program has a pattern qualifier (/auto_callout) that sets automatic callouts.
when it is used, the output indicates how the pattern is being matched. This is When any callouts are present, the output from \fBpcre2test\fP indicates how
useful information when you are trying to optimize the performance of a the pattern is being matched. This is useful information when you are trying to
particular pattern. optimize the performance of a particular pattern.
. .
. .
.SH "MISSING CALLOUTS" .SH "MISSING CALLOUTS"
@ -193,15 +196,52 @@ documentation). The callout block structure contains the following fields:
PCRE2_SIZE \fIcurrent_position\fP; PCRE2_SIZE \fIcurrent_position\fP;
PCRE2_SIZE \fIpattern_position\fP; PCRE2_SIZE \fIpattern_position\fP;
PCRE2_SIZE \fInext_item_length\fP; PCRE2_SIZE \fInext_item_length\fP;
PCRE2_SIZE \fIcallout_string_offset\fP;
PCRE2_SPTR \fIcallout_string\fP;
uint32_t \fIcallout_string_length\fP;
.sp .sp
The \fIversion\fP field contains the version number of the block format. The The \fIversion\fP field contains the version number of the block format. The
current version is 0. The version number will change in future if additional current version is 1; the three callout string fields were added for this
fields are added, but the intention is never to remove any of the existing version. If you are writing an application that might use an earlier release of
fields. PCRE2, you should check the version number before accessing any of these
fields. The version number will increase in future if more fields are added,
but the intention is never to remove any of the existing fields.
.
.
.SS "Fields for numerical callouts"
.rs
.sp
For a numerical callout, \fIcallout_string\fP is NULL, and \fIcallout_number\fP
contains the number of the callout, in the range 0-255. This is the number
that follows (?C for manual callouts; it is 255 for automatically generated
callouts.
.
.
.SS "Fields for string callouts"
.rs
.sp
For callouts with string arguments, \fIcallout_number\fP is always zero, and
\fIcallout_string\fP points to the string that is contained within the compiled
pattern. Its length is given by \fIcallout_string_length\fP. Duplicated ending
delimiters that were present in the original pattern string have been turned
into single characters. An additional code unit containing binary zero is
present after the string, but is not included in the length. The delimiter that
was used to start the string is also stored within the pattern, immediately
before the string itself. You can therefore access this delimiter as
\fIcallout_string\fP[-1] if you need it.
.P .P
The \fIcallout_number\fP field contains the number of the callout, as compiled The \fIcallout_string_offset\fP field is the code unit offset to the start of
into the pattern (that is, the number after ?C for manual callouts, and 255 for the callout argument string within the original pattern string. This is
automatically generated callouts). provided for the benefit of applications such as script languages that might
need to report errors in the callout string within the pattern.
.
.
.SS "Fields for all callouts"
.rs
.sp
The remaining fields in the callout block are the same for both kinds of
callout.
.P .P
The \fIoffset_vector\fP field is a pointer to the vector of capturing offsets The \fIoffset_vector\fP field is a pointer to the vector of capturing offsets
(the "ovector") that was passed to the matching function in the match data (the "ovector") that was passed to the matching function in the match data
@ -246,7 +286,9 @@ of the entire subpattern.
.P .P
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
help in distinguishing between different automatic callouts, which all have the help in distinguishing between different automatic callouts, which all have the
same callout number. However, they are set for all callouts. same callout number. However, they are set for all callouts, and are used by
\fBpcre2test\fP to show the next item to be matched when displaying callout
information.
.P .P
In callouts from \fBpcre2_match()\fP the \fImark\fP field contains a pointer to In callouts from \fBpcre2_match()\fP the \fImark\fP field contains a pointer to
the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
@ -285,6 +327,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 02 January 2015 Last updated: 15 March 2015
Copyright (c) 1997-2015 University of Cambridge. Copyright (c) 1997-2015 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2COMPAT 3 "28 September 2014" "PCRE2 10.0" .TH PCRE2COMPAT 3 "15 March 2015" "PCRE2 10.20"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL" .SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -69,11 +69,11 @@ the
.\" .\"
documentation for details. documentation for details.
.P .P
8. Subpatterns that are called as subroutines (whether or not recursively) are 8. Subroutine calls (whether recursive or not) are treated as atomic groups.
always treated as atomic groups in PCRE2. This is like Python, but unlike Perl. Atomic recursion is like Python, but unlike Perl. Captured values that are set
Captured values that are set outside a subroutine call can be reference from outside a subroutine call can be referenced from inside in PCRE2, but not in
inside in PCRE2, but not in Perl. There is a discussion that explains these Perl. There is a discussion that explains these differences in more detail in
differences in more detail in the the
.\" HTML <a href="pcre2pattern.html#recursiondifference"> .\" HTML <a href="pcre2pattern.html#recursiondifference">
.\" </a> .\" </a>
section on recursion differences from Perl section on recursion differences from Perl
@ -185,6 +185,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 28 September 2014 Last updated: 15 March 2015
Copyright (c) 1997-2014 University of Cambridge. Copyright (c) 1997-2015 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "28 January 2015" "PCRE2 10.00" .TH PCRE2PATTERN 3 "15 March 2015" "PCRE2 10.20"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -2821,42 +2821,69 @@ same pair of parentheses when there is a repetition.
PCRE2 provides a similar feature, but of course it cannot obey arbitrary Perl PCRE2 provides a similar feature, but of course it cannot obey arbitrary Perl
code. The feature is called "callout". The caller of PCRE2 provides an external code. The feature is called "callout". The caller of PCRE2 provides an external
function by putting its entry point in a match context using the function function by putting its entry point in a match context using the function
\fBpcre2_set_callout()\fP and passing the context to \fBpcre2_match()\fP or \fBpcre2_set_callout()\fP, and then passing that context to \fBpcre2_match()\fP
\fBpcre2_dfa_match()\fP. If no match context is passed, or if the callout entry or \fBpcre2_dfa_match()\fP. If no match context is passed, or if the callout
point is set to NULL, callouts are disabled. entry point is set to NULL, callouts are disabled.
.P .P
Within a regular expression, (?C) indicates the points at which the external Within a regular expression, (?C<arg>) indicates a point at which the external
function is to be called. If you want to identify different callout points, you function is to be called. There are two kinds of callout: those with a
can put a number less than 256 after the letter C. The default value is zero. numerical argument and those with a string argument. (?C) on its own with no
For example, this pattern has two callout points: argument is treated as (?C0). A numerical argument allows the application to
distinguish between different callouts. String arguments were added for release
10.20 to make it possible for script languages that use PCRE2 to embed short
scripts within patterns in a similar way to Perl.
.P
During matching, when PCRE2 reaches a callout point, the external function is
called. It is provided with the number or string argument of the callout, the
position in the pattern, and one item of data that is also set in the match
block. The callout function may cause matching to proceed, to backtrack, or to
fail.
.P
By default, PCRE2 implements a number of optimizations at matching time, and
one side-effect is that sometimes callouts are skipped. If you need all
possible callouts to happen, you need to set options that disable the relevant
optimizations. More details, including a complete description of the
programming interface to the callout function, are given in the
.\" HREF
\fBpcre2callout\fP
.\"
documentation.
.
.
.SS "Callouts with numerical arguments"
.rs
.sp
If you just want to have a means of identifying different callout points, put a
number less than 256 after the letter C. For example, this pattern has two
callout points:
.sp .sp
(?C1)abc(?C2)def (?C1)abc(?C2)def
.sp .sp
If the PCRE2_AUTO_CALLOUT flag is passed to \fBpcre2_compile()\fP, callouts are If the PCRE2_AUTO_CALLOUT flag is passed to \fBpcre2_compile()\fP, numerical
automatically installed before each item in the pattern. They are all numbered callouts are automatically installed before each item in the pattern. They are
255. If there is a conditional group in the pattern whose condition is an all numbered 255. If there is a conditional group in the pattern whose
assertion, an additional callout is inserted just before the condition. An condition is an assertion, an additional callout is inserted just before the
explicit callout may also be set at this position, as in this example: condition. An explicit callout may also be set at this position, as in this
example:
.sp .sp
(?(?C9)(?=a)abc|def) (?(?C9)(?=a)abc|def)
.sp .sp
Note that this applies only to assertion conditions, not to other types of Note that this applies only to assertion conditions, not to other types of
condition. condition.
.P .
During matching, when PCRE2 reaches a callout point, the external function is .
called. It is provided with the number of the callout, the position in the .SS "Callouts with string arguments"
pattern, and one item of data that is also set in the match block. The callout .rs
function may cause matching to proceed, to backtrack, or to fail. .sp
.P A delimited string may be used instead of a number as a callout argument. The
By default, PCRE2 implements a number of optimizations at matching time, and starting delimiter must be one of ` ' " ^ % # $ { and the ending delimiter is
one side-effect is that sometimes callouts are skipped. If you need all the same as the start, except for {, where the ending delimiter is }. If the
possible callouts to happen, you need to set options that disable the relevant ending delimiter is needed within the string, it must be doubled. For
optimizations. More details, and a complete description of the interface to the example:
callout function, are given in the .sp
.\" HREF (?C'ab ''c'' d')xyz(?C{any text})pqr
\fBpcre2callout\fP .sp
.\" The doubling is removed before the string is passed to the callout function.
documentation.
. .
. .
.\" HTML <a name="backtrackcontrol"></a> .\" HTML <a name="backtrackcontrol"></a>
@ -3302,6 +3329,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 28 January 2015 Last updated: 15 March 2015
Copyright (c) 1997-2015 University of Cambridge. Copyright (c) 1997-2015 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2SYNTAX 3 "26 January 2015" "PCRE2 10.00" .TH PCRE2SYNTAX 3 "15 March 2015" "PCRE2 10.20"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -513,8 +513,13 @@ pattern is not anchored.
.SH "CALLOUTS" .SH "CALLOUTS"
.rs .rs
.sp .sp
(?C) callout (?C) callout (assumed number 0)
(?Cn) callout with data n (?Cn) callout with numerical data n
(?C"text") callout with string data
.sp
The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
start and the end), and the starting delimiter { matched with the ending
delimiter }. To encode the ending delimiter within the string, double it.
. .
. .
.SH "SEE ALSO" .SH "SEE ALSO"
@ -538,6 +543,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 26 January 2015 Last updated: 15 March 2015
Copyright (c) 1997-2015 University of Cambridge. Copyright (c) 1997-2015 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "23 January 2015" "PCRE 10.10" .TH PCRE2TEST 1 "14 March 2015" "PCRE 10.20"
.SH NAME .SH NAME
pcre2test - a program for testing Perl-compatible regular expressions. pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
@ -875,11 +875,14 @@ set, the current captured groups are output when a callout occurs.
The \fBcallout_fail\fP modifier can be given one or two numbers. If there is The \fBcallout_fail\fP modifier can be given one or two numbers. If there is
only one number, 1 is returned instead of 0 when a callout of that number is only one number, 1 is returned instead of 0 when a callout of that number is
reached. If two numbers are given, 1 is returned when callout <n> is reached reached. If two numbers are given, 1 is returned when callout <n> is reached
for the <m>th time. for the <m>th time. Note that callouts with string arguments are always given
the number zero. See "Callouts" below for a description of the output when a
callout it taken.
.P .P
The \fBcallout_data\fP modifier can be given an unsigned or a negative number. The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
Any value other than zero is used as a return from \fBpcre2test\fP's callout This is set as the "user data" that is passed to the matching function, and
function. passed back when the callout function is invoked. Any value other than zero is
used as a return from \fBpcre2test\fP's callout function.
. .
. .
.SS "Finding all matches in a string" .SS "Finding all matches in a string"
@ -1231,10 +1234,31 @@ documentation.
.rs .rs
.sp .sp
If the pattern contains any callout requests, \fBpcre2test\fP's callout If the pattern contains any callout requests, \fBpcre2test\fP's callout
function is called during matching. This works with both matching functions. By function is called during matching unless \fBcallout_none\fP is specified.
default, the called function displays the callout number, the start and current This works with both matching functions.
positions in the text at the callout time, and the next pattern item to be .P
tested. For example: The callout function in \fBpcre2test\fP returns zero (carry on matching) by
default, but you can use a \fBcallout_fail\fP modifier in a subject line (as
described above) to change this and other parameters of the callout.
.P
Inserting callouts can be helpful when using \fBpcre2test\fP to check
complicated regular expressions. For further information about callouts, see
the
.\" HREF
\fBpcre2callout\fP
.\"
documentation.
.P
The output for callouts with numerical arguments and those with string
arguments is slightly different.
.
.
.SS "Callouts with numerical arguments"
.rs
.sp
By default, the callout function displays the callout number, the start and
current positions in the subject text at the callout time, and the next pattern
item to be tested. For example:
.sp .sp
--->pqrabcdef --->pqrabcdef
0 ^ ^ \ed 0 ^ ^ \ed
@ -1275,18 +1299,27 @@ a change of latest mark is passed to the callout function. For example:
The mark changes between matching "a" and "b", but stays the same for the rest The mark changes between matching "a" and "b", but stays the same for the rest
of the match, so nothing more is output. If, as a result of backtracking, the of the match, so nothing more is output. If, as a result of backtracking, the
mark reverts to being unset, the text "<unset>" is output. mark reverts to being unset, the text "<unset>" is output.
.P .
The callout function in \fBpcre2test\fP returns zero (carry on matching) by .
default, but you can use a \fBcallout_fail\fP modifier in a subject line (as .SS "Callouts with string arguments"
described above) to change this and other parameters of the callout. .rs
.P .sp
Inserting callouts can be helpful when using \fBpcre2test\fP to check The output for a callout with a string argument is similar, except that instead
complicated regular expressions. For further information about callouts, see of outputting a callout number before the position indicators, the callout
the string and its offset in the pattern string are output before the reflection of
.\" HREF the subject string, and the subject string is reflected for each callout. For
\fBpcre2callout\fP example:
.\" .sp
documentation. re> /^ab(?C'first')cd(?C"second")ef/
data> abcdefg
Callout (7): 'first'
--->abcdefg
^ ^ c
Callout (20): "second"
--->abcdefg
^ ^ e
0: abcdef
.sp
. .
. .
. .
@ -1398,6 +1431,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 23 January 2015 Last updated: 14 March 2015
Copyright (c) 1997-2015 University of Cambridge. Copyright (c) 1997-2015 University of Cambridge.
.fi .fi