Documentation for callouts with string arguments.
This commit is contained in:
parent
15e034c9c2
commit
2ec7cbf9b5
76
HACKING
76
HACKING
|
@ -8,7 +8,7 @@ library is referred to as PCRE1 below. For information about testing PCRE2, see
|
|||
the pcre2test documentation and the comment at the head of the RunTest file.
|
||||
|
||||
PCRE1 releases were up to 8.3x when PCRE2 was developed. The 8.xx series will
|
||||
continue for bugfixes if necessary. PCRE2 releases start at 10.0 to avoid
|
||||
continue for bugfixes if necessary. PCRE2 releases started at 10.00 to avoid
|
||||
confusion with PCRE1.
|
||||
|
||||
|
||||
|
@ -39,7 +39,7 @@ subsequently heavily modified for Perl) compiles the expression twice: once in
|
|||
a dummy mode in order to find out how much store will be needed, and then for
|
||||
real. (The Perl version probably doesn't do this any more; I'm talking about
|
||||
the original library.) The execution function operates by backtracking and
|
||||
maximizing (or, optionally, minimizing in Perl) the amount of the subject that
|
||||
maximizing (or, optionally, minimizing, in Perl) the amount of the subject that
|
||||
matches individual wild portions of the pattern. This is an "NFA algorithm" in
|
||||
Friedl's terminology.
|
||||
|
||||
|
@ -63,7 +63,7 @@ modes, creating up to three different libraries. In the description that
|
|||
follows, the word "short" is used for a 16-bit data quantity, and the phrase
|
||||
"code unit" is used for a quantity that is a byte in 8-bit mode, a short in
|
||||
16-bit mode and a 32-bit word in 32-bit mode. The names of PCRE2 functions are
|
||||
given in generic form, without a _8, _16, or _32 suffix.
|
||||
given in generic form, without the _8, _16, or _32 suffix.
|
||||
|
||||
|
||||
Computing the memory requirement: how it was
|
||||
|
@ -100,8 +100,9 @@ issue, and in the event, nobody has commented on it.
|
|||
|
||||
At release 8.34, a limit on the nesting depth of parentheses was re-introduced
|
||||
(default 250, settable at build time) so as to put a limit on the amount of
|
||||
system stack used by the compile function. This is a safety feature for
|
||||
environments with small stacks where the patterns are provided by users.
|
||||
system stack used by the compile function, which uses recursive function calls
|
||||
for nested parenthesized groups. This is a safety feature for environments with
|
||||
small stacks where the patterns are provided by users.
|
||||
|
||||
|
||||
Traditional matching function
|
||||
|
@ -158,8 +159,9 @@ default value for LINK_SIZE is 2, except for the 32-bit library, where it can
|
|||
only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
|
||||
and the 16-bit library can be compiled to use 4-byte values, though this
|
||||
impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
|
||||
necessary only when patterns whose compiled length is greater than 64K are
|
||||
going to be processed.
|
||||
necessary only when patterns whose compiled length is greater than 64K code
|
||||
units are going to be processed. When a LINK_SIZE value uses more than one code
|
||||
unit, the most significant unit is first.
|
||||
|
||||
In this description, we assume the "normal" compilation options. Data values
|
||||
that are counts (e.g. quantifiers) are always two bytes long in 8-bit mode
|
||||
|
@ -343,7 +345,7 @@ For classes containing characters with values greater than 255 or that contain
|
|||
code points are less than 256, followed by a list of pairs (for a range) and
|
||||
single characters. In caseless mode, both cases are explicitly listed.
|
||||
|
||||
OP_XCLASS is followed by a LINK_SIZE item containing the total length of the
|
||||
OP_XCLASS is followed by a LINK_SIZE value containing the total length of the
|
||||
opcode and its data. This is followed by a code unit containing flag bits:
|
||||
XCL_NOT indicates that this is a negative class, and XCL_MAP indicates that a
|
||||
bit map is present. There follows the bit map, if XCL_MAP is set, and then a
|
||||
|
@ -356,7 +358,7 @@ sequence of items coded as follows:
|
|||
XCL_NOTPROP a Unicode property (type, value) follows
|
||||
|
||||
If a range starts with a code point less than 256 and ends with one greater
|
||||
than 256, it is split into two ranges, with characters less than 256 being
|
||||
than 255, it is split into two ranges, with characters less than 256 being
|
||||
indicated in the bit map, and the rest with XCL_RANGE.
|
||||
|
||||
When XCL_NOT is set, the bit map, if present, contains bits for characters that
|
||||
|
@ -412,17 +414,17 @@ compile time, so alternation always happens in the context of brackets.
|
|||
myself, can be round, square, curly, or pointy. Hence this usage rather than
|
||||
"parentheses".]
|
||||
|
||||
Non-capturing brackets use the opcode OP_BRA, capturing brackets use OP_CBRA.
|
||||
A bracket opcode is followed by LINK_SIZE bytes which give the offset to the
|
||||
Non-capturing brackets use the opcode OP_BRA, capturing brackets use OP_CBRA. A
|
||||
bracket opcode is followed by a LINK_SIZE value which gives the offset to the
|
||||
next alternative OP_ALT or, if there aren't any branches, to the matching
|
||||
OP_KET opcode. Each OP_ALT is followed by LINK_SIZE bytes giving the offset to
|
||||
the next one, or to the OP_KET opcode. For capturing brackets, the bracket
|
||||
OP_KET opcode. Each OP_ALT is followed by a LINK_SIZE value giving the offset
|
||||
to the next one, or to the OP_KET opcode. For capturing brackets, the bracket
|
||||
number is a count that immediately follows the offset.
|
||||
|
||||
OP_KET is used for subpatterns that do not repeat indefinitely, and OP_KETRMIN
|
||||
and OP_KETRMAX are used for indefinite repetitions, minimally or maximally
|
||||
respectively (see below for possessive repetitions). All three are followed by
|
||||
LINK_SIZE bytes giving (as a positive number) the offset back to the matching
|
||||
a LINK_SIZE value giving (as a positive number) the offset back to the matching
|
||||
bracket opcode.
|
||||
|
||||
If a subpattern is quantified such that it is permitted to match zero times, it
|
||||
|
@ -520,8 +522,11 @@ tests the PCRE2 version number. This compiles into one of the opcodes OP_TRUE
|
|||
or OP_FALSE.
|
||||
|
||||
If a condition is not a back reference, recursion test, DEFINE, or VERSION, it
|
||||
must start with an assertion, whose opcode immediately follows OP_COND or
|
||||
OP_SCOND.
|
||||
must start with an assertion, whose opcode normally immediately follows OP_COND
|
||||
or OP_SCOND. However, if automatic callouts are enabled, a callout is inserted
|
||||
immediately before the assertion. It is also possible to insert a manual
|
||||
callout at this point. Only assertion conditions may have callouts preceding
|
||||
the condition.
|
||||
|
||||
|
||||
Recursion
|
||||
|
@ -529,22 +534,43 @@ Recursion
|
|||
|
||||
Recursion either matches the current pattern, or some subexpression. The opcode
|
||||
OP_RECURSE is followed by a LINK_SIZE value that is the offset to the starting
|
||||
bracket from the start of the whole pattern. OP_RECURSE is automatically
|
||||
wrapped inside OP_ONCE brackets, because otherwise some patterns broke it.
|
||||
OP_RECURSE is also used for "subroutine" calls, even though they are not
|
||||
strictly a recursion.
|
||||
bracket from the start of the whole pattern. OP_RECURSE is also used for
|
||||
"subroutine" calls, even though they are not strictly a recursion. Repeated
|
||||
recursions are automatically wrapped inside OP_ONCE brackets, because otherwise
|
||||
some patterns broke them. A non-repeated recursion is not wrapped in OP_ONCE
|
||||
brackets, but it is nevertheless still treated as an atomic group.
|
||||
|
||||
|
||||
Callout
|
||||
-------
|
||||
|
||||
OP_CALLOUT is followed by one code unit of data that holds a callout number in
|
||||
the range 0 to 254 for manual callouts, or 255 for an automatic callout. In
|
||||
both cases there follows a count giving the offset in the pattern string to the
|
||||
A callout can nowadays have either a numerical argument or a string argument.
|
||||
These use OP_CALLOUT or OP_CALLOUT_STR, respectively. In each case these are
|
||||
followed by two LINK_SIZE values giving the offset in the pattern string to the
|
||||
start of the following item, and another count giving the length of this item.
|
||||
These values make it possible for pcre2test to output useful tracing
|
||||
information using automatic callouts.
|
||||
information using callouts.
|
||||
|
||||
In the case of a numeric callout, after these two values there is a single code
|
||||
unit containing the callout number, in the range 0-255, with 255 being used for
|
||||
callouts that are automatically inserted as a result of the PCRE2_AUTO_CALLOUT
|
||||
option. Thus, this opcode item is of fixed length:
|
||||
|
||||
[OP_CALLOUT] [PATTERN_OFFSET] [PATTERN_LENGTH] [NUMBER]
|
||||
|
||||
For callouts with string arguments, OP_CALLOUT_STR has three more data items:
|
||||
a LINK_SIZE value giving the complete length of the entire opcode item, a
|
||||
LINK_SIZE item containing the offset within the pattern string to the start of
|
||||
the string argument, and the string itself, preceded by its starting delimiter
|
||||
and followed by a binary zero. When a callout function is called, a pointer to
|
||||
the actual string is passed, but the delimiter can be accessed as string[-1] if
|
||||
the application needs it. In the 8-bit library, the callout in /X(?C'abc')Y/ is
|
||||
compiled as the following bytes (decimal numbers represent binary values):
|
||||
|
||||
[OP_CALLOUT] [0] [10] [0] [1] [0] [14] [0] [5] ['] [a] [b] [c] [0]
|
||||
-------- ------- -------- -------
|
||||
| | | |
|
||||
------- LINK_SIZE items ------
|
||||
|
||||
Opcode table checking
|
||||
---------------------
|
||||
|
@ -554,4 +580,4 @@ not a real opcode, but is used to check that tables indexed by opcode are the
|
|||
correct length, in order to catch updating errors.
|
||||
|
||||
Philip Hazel
|
||||
February 2015
|
||||
March 2015
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2CALLOUT 3 "02 January 2015" "PCRE2 10.00"
|
||||
.TH PCRE2CALLOUT 3 "15 March 2015" "PCRE2 10.20"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -15,18 +15,22 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
|||
PCRE2 provides a feature called "callout", which is a means of temporarily
|
||||
passing control to the caller of PCRE2 in the middle of pattern matching. The
|
||||
caller of PCRE2 provides an external function by putting its entry point in
|
||||
a match context (see \fBpcre2_set_callout()\fP) in the
|
||||
a match context (see \fBpcre2_set_callout()\fP in the
|
||||
.\" HREF
|
||||
\fBpcre2api\fP
|
||||
.\"
|
||||
documentation).
|
||||
.P
|
||||
Within a regular expression, (?C) indicates the points at which the external
|
||||
Within a regular expression, (?C<arg>) indicates a point at which the external
|
||||
function is to be called. Different callout points can be identified by putting
|
||||
a number less than 256 after the letter C. The default value is zero.
|
||||
For example, this pattern has two callout points:
|
||||
Alternatively, the argument may be a delimited string. The starting delimiter
|
||||
must be one of ` ' " ^ % # $ { and the ending delimiter is the same as the
|
||||
start, except for {, where the ending delimiter is }. If the ending delimiter
|
||||
is needed within the string, it must be doubled. For example, this pattern has
|
||||
two callout points:
|
||||
.sp
|
||||
(?C1)abc(?C2)def
|
||||
(?C1)abc(?C"some ""arbitrary"" text")def
|
||||
.sp
|
||||
If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
|
||||
automatically inserts callouts, all with number 255, before each item in the
|
||||
|
@ -43,20 +47,19 @@ alternation bar. If the pattern contains a conditional group whose condition is
|
|||
an assertion, an automatic callout is inserted immediately before the
|
||||
condition. Such a callout may also be inserted explicitly, for example:
|
||||
.sp
|
||||
(?(?C9)(?=a)ab|de)
|
||||
(?(?C9)(?=a)ab|de) (?(?C%text%)(?!=d)ab|de)
|
||||
.sp
|
||||
This applies only to assertion conditions (because they are themselves
|
||||
independent groups).
|
||||
.P
|
||||
Automatic callouts can be used for tracking the progress of pattern matching.
|
||||
The
|
||||
Callouts can be useful for tracking the progress of pattern matching. The
|
||||
.\" HREF
|
||||
\fBpcre2test\fP
|
||||
.\"
|
||||
program has a pattern qualifier (/auto_callout) that sets automatic callouts;
|
||||
when it is used, the output indicates how the pattern is being matched. This is
|
||||
useful information when you are trying to optimize the performance of a
|
||||
particular pattern.
|
||||
program has a pattern qualifier (/auto_callout) that sets automatic callouts.
|
||||
When any callouts are present, the output from \fBpcre2test\fP indicates how
|
||||
the pattern is being matched. This is useful information when you are trying to
|
||||
optimize the performance of a particular pattern.
|
||||
.
|
||||
.
|
||||
.SH "MISSING CALLOUTS"
|
||||
|
@ -193,15 +196,52 @@ documentation). The callout block structure contains the following fields:
|
|||
PCRE2_SIZE \fIcurrent_position\fP;
|
||||
PCRE2_SIZE \fIpattern_position\fP;
|
||||
PCRE2_SIZE \fInext_item_length\fP;
|
||||
PCRE2_SIZE \fIcallout_string_offset\fP;
|
||||
PCRE2_SPTR \fIcallout_string\fP;
|
||||
uint32_t \fIcallout_string_length\fP;
|
||||
|
||||
.sp
|
||||
The \fIversion\fP field contains the version number of the block format. The
|
||||
current version is 0. The version number will change in future if additional
|
||||
fields are added, but the intention is never to remove any of the existing
|
||||
fields.
|
||||
current version is 1; the three callout string fields were added for this
|
||||
version. If you are writing an application that might use an earlier release of
|
||||
PCRE2, you should check the version number before accessing any of these
|
||||
fields. The version number will increase in future if more fields are added,
|
||||
but the intention is never to remove any of the existing fields.
|
||||
.
|
||||
.
|
||||
.SS "Fields for numerical callouts"
|
||||
.rs
|
||||
.sp
|
||||
For a numerical callout, \fIcallout_string\fP is NULL, and \fIcallout_number\fP
|
||||
contains the number of the callout, in the range 0-255. This is the number
|
||||
that follows (?C for manual callouts; it is 255 for automatically generated
|
||||
callouts.
|
||||
.
|
||||
.
|
||||
.SS "Fields for string callouts"
|
||||
.rs
|
||||
.sp
|
||||
For callouts with string arguments, \fIcallout_number\fP is always zero, and
|
||||
\fIcallout_string\fP points to the string that is contained within the compiled
|
||||
pattern. Its length is given by \fIcallout_string_length\fP. Duplicated ending
|
||||
delimiters that were present in the original pattern string have been turned
|
||||
into single characters. An additional code unit containing binary zero is
|
||||
present after the string, but is not included in the length. The delimiter that
|
||||
was used to start the string is also stored within the pattern, immediately
|
||||
before the string itself. You can therefore access this delimiter as
|
||||
\fIcallout_string\fP[-1] if you need it.
|
||||
.P
|
||||
The \fIcallout_number\fP field contains the number of the callout, as compiled
|
||||
into the pattern (that is, the number after ?C for manual callouts, and 255 for
|
||||
automatically generated callouts).
|
||||
The \fIcallout_string_offset\fP field is the code unit offset to the start of
|
||||
the callout argument string within the original pattern string. This is
|
||||
provided for the benefit of applications such as script languages that might
|
||||
need to report errors in the callout string within the pattern.
|
||||
.
|
||||
.
|
||||
.SS "Fields for all callouts"
|
||||
.rs
|
||||
.sp
|
||||
The remaining fields in the callout block are the same for both kinds of
|
||||
callout.
|
||||
.P
|
||||
The \fIoffset_vector\fP field is a pointer to the vector of capturing offsets
|
||||
(the "ovector") that was passed to the matching function in the match data
|
||||
|
@ -246,7 +286,9 @@ of the entire subpattern.
|
|||
.P
|
||||
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
|
||||
help in distinguishing between different automatic callouts, which all have the
|
||||
same callout number. However, they are set for all callouts.
|
||||
same callout number. However, they are set for all callouts, and are used by
|
||||
\fBpcre2test\fP to show the next item to be matched when displaying callout
|
||||
information.
|
||||
.P
|
||||
In callouts from \fBpcre2_match()\fP the \fImark\fP field contains a pointer to
|
||||
the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
|
||||
|
@ -285,6 +327,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 02 January 2015
|
||||
Last updated: 15 March 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2COMPAT 3 "28 September 2014" "PCRE2 10.0"
|
||||
.TH PCRE2COMPAT 3 "15 March 2015" "PCRE2 10.20"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
||||
|
@ -69,11 +69,11 @@ the
|
|||
.\"
|
||||
documentation for details.
|
||||
.P
|
||||
8. Subpatterns that are called as subroutines (whether or not recursively) are
|
||||
always treated as atomic groups in PCRE2. This is like Python, but unlike Perl.
|
||||
Captured values that are set outside a subroutine call can be reference from
|
||||
inside in PCRE2, but not in Perl. There is a discussion that explains these
|
||||
differences in more detail in the
|
||||
8. Subroutine calls (whether recursive or not) are treated as atomic groups.
|
||||
Atomic recursion is like Python, but unlike Perl. Captured values that are set
|
||||
outside a subroutine call can be referenced from inside in PCRE2, but not in
|
||||
Perl. There is a discussion that explains these differences in more detail in
|
||||
the
|
||||
.\" HTML <a href="pcre2pattern.html#recursiondifference">
|
||||
.\" </a>
|
||||
section on recursion differences from Perl
|
||||
|
@ -185,6 +185,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 28 September 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
Last updated: 15 March 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "28 January 2015" "PCRE2 10.00"
|
||||
.TH PCRE2PATTERN 3 "15 March 2015" "PCRE2 10.20"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -2821,42 +2821,69 @@ same pair of parentheses when there is a repetition.
|
|||
PCRE2 provides a similar feature, but of course it cannot obey arbitrary Perl
|
||||
code. The feature is called "callout". The caller of PCRE2 provides an external
|
||||
function by putting its entry point in a match context using the function
|
||||
\fBpcre2_set_callout()\fP and passing the context to \fBpcre2_match()\fP or
|
||||
\fBpcre2_dfa_match()\fP. If no match context is passed, or if the callout entry
|
||||
point is set to NULL, callouts are disabled.
|
||||
\fBpcre2_set_callout()\fP, and then passing that context to \fBpcre2_match()\fP
|
||||
or \fBpcre2_dfa_match()\fP. If no match context is passed, or if the callout
|
||||
entry point is set to NULL, callouts are disabled.
|
||||
.P
|
||||
Within a regular expression, (?C) indicates the points at which the external
|
||||
function is to be called. If you want to identify different callout points, you
|
||||
can put a number less than 256 after the letter C. The default value is zero.
|
||||
For example, this pattern has two callout points:
|
||||
Within a regular expression, (?C<arg>) indicates a point at which the external
|
||||
function is to be called. There are two kinds of callout: those with a
|
||||
numerical argument and those with a string argument. (?C) on its own with no
|
||||
argument is treated as (?C0). A numerical argument allows the application to
|
||||
distinguish between different callouts. String arguments were added for release
|
||||
10.20 to make it possible for script languages that use PCRE2 to embed short
|
||||
scripts within patterns in a similar way to Perl.
|
||||
.P
|
||||
During matching, when PCRE2 reaches a callout point, the external function is
|
||||
called. It is provided with the number or string argument of the callout, the
|
||||
position in the pattern, and one item of data that is also set in the match
|
||||
block. The callout function may cause matching to proceed, to backtrack, or to
|
||||
fail.
|
||||
.P
|
||||
By default, PCRE2 implements a number of optimizations at matching time, and
|
||||
one side-effect is that sometimes callouts are skipped. If you need all
|
||||
possible callouts to happen, you need to set options that disable the relevant
|
||||
optimizations. More details, including a complete description of the
|
||||
programming interface to the callout function, are given in the
|
||||
.\" HREF
|
||||
\fBpcre2callout\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.SS "Callouts with numerical arguments"
|
||||
.rs
|
||||
.sp
|
||||
If you just want to have a means of identifying different callout points, put a
|
||||
number less than 256 after the letter C. For example, this pattern has two
|
||||
callout points:
|
||||
.sp
|
||||
(?C1)abc(?C2)def
|
||||
.sp
|
||||
If the PCRE2_AUTO_CALLOUT flag is passed to \fBpcre2_compile()\fP, callouts are
|
||||
automatically installed before each item in the pattern. They are all numbered
|
||||
255. If there is a conditional group in the pattern whose condition is an
|
||||
assertion, an additional callout is inserted just before the condition. An
|
||||
explicit callout may also be set at this position, as in this example:
|
||||
If the PCRE2_AUTO_CALLOUT flag is passed to \fBpcre2_compile()\fP, numerical
|
||||
callouts are automatically installed before each item in the pattern. They are
|
||||
all numbered 255. If there is a conditional group in the pattern whose
|
||||
condition is an assertion, an additional callout is inserted just before the
|
||||
condition. An explicit callout may also be set at this position, as in this
|
||||
example:
|
||||
.sp
|
||||
(?(?C9)(?=a)abc|def)
|
||||
.sp
|
||||
Note that this applies only to assertion conditions, not to other types of
|
||||
condition.
|
||||
.P
|
||||
During matching, when PCRE2 reaches a callout point, the external function is
|
||||
called. It is provided with the number of the callout, the position in the
|
||||
pattern, and one item of data that is also set in the match block. The callout
|
||||
function may cause matching to proceed, to backtrack, or to fail.
|
||||
.P
|
||||
By default, PCRE2 implements a number of optimizations at matching time, and
|
||||
one side-effect is that sometimes callouts are skipped. If you need all
|
||||
possible callouts to happen, you need to set options that disable the relevant
|
||||
optimizations. More details, and a complete description of the interface to the
|
||||
callout function, are given in the
|
||||
.\" HREF
|
||||
\fBpcre2callout\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.SS "Callouts with string arguments"
|
||||
.rs
|
||||
.sp
|
||||
A delimited string may be used instead of a number as a callout argument. The
|
||||
starting delimiter must be one of ` ' " ^ % # $ { and the ending delimiter is
|
||||
the same as the start, except for {, where the ending delimiter is }. If the
|
||||
ending delimiter is needed within the string, it must be doubled. For
|
||||
example:
|
||||
.sp
|
||||
(?C'ab ''c'' d')xyz(?C{any text})pqr
|
||||
.sp
|
||||
The doubling is removed before the string is passed to the callout function.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="backtrackcontrol"></a>
|
||||
|
@ -3302,6 +3329,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 28 January 2015
|
||||
Last updated: 15 March 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2SYNTAX 3 "26 January 2015" "PCRE2 10.00"
|
||||
.TH PCRE2SYNTAX 3 "15 March 2015" "PCRE2 10.20"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||
|
@ -513,8 +513,13 @@ pattern is not anchored.
|
|||
.SH "CALLOUTS"
|
||||
.rs
|
||||
.sp
|
||||
(?C) callout
|
||||
(?Cn) callout with data n
|
||||
(?C) callout (assumed number 0)
|
||||
(?Cn) callout with numerical data n
|
||||
(?C"text") callout with string data
|
||||
.sp
|
||||
The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
|
||||
start and the end), and the starting delimiter { matched with the ending
|
||||
delimiter }. To encode the ending delimiter within the string, double it.
|
||||
.
|
||||
.
|
||||
.SH "SEE ALSO"
|
||||
|
@ -538,6 +543,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 26 January 2015
|
||||
Last updated: 15 March 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2TEST 1 "23 January 2015" "PCRE 10.10"
|
||||
.TH PCRE2TEST 1 "14 March 2015" "PCRE 10.20"
|
||||
.SH NAME
|
||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -875,11 +875,14 @@ set, the current captured groups are output when a callout occurs.
|
|||
The \fBcallout_fail\fP modifier can be given one or two numbers. If there is
|
||||
only one number, 1 is returned instead of 0 when a callout of that number is
|
||||
reached. If two numbers are given, 1 is returned when callout <n> is reached
|
||||
for the <m>th time.
|
||||
for the <m>th time. Note that callouts with string arguments are always given
|
||||
the number zero. See "Callouts" below for a description of the output when a
|
||||
callout it taken.
|
||||
.P
|
||||
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
|
||||
Any value other than zero is used as a return from \fBpcre2test\fP's callout
|
||||
function.
|
||||
This is set as the "user data" that is passed to the matching function, and
|
||||
passed back when the callout function is invoked. Any value other than zero is
|
||||
used as a return from \fBpcre2test\fP's callout function.
|
||||
.
|
||||
.
|
||||
.SS "Finding all matches in a string"
|
||||
|
@ -1231,10 +1234,31 @@ documentation.
|
|||
.rs
|
||||
.sp
|
||||
If the pattern contains any callout requests, \fBpcre2test\fP's callout
|
||||
function is called during matching. This works with both matching functions. By
|
||||
default, the called function displays the callout number, the start and current
|
||||
positions in the text at the callout time, and the next pattern item to be
|
||||
tested. For example:
|
||||
function is called during matching unless \fBcallout_none\fP is specified.
|
||||
This works with both matching functions.
|
||||
.P
|
||||
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
|
||||
default, but you can use a \fBcallout_fail\fP modifier in a subject line (as
|
||||
described above) to change this and other parameters of the callout.
|
||||
.P
|
||||
Inserting callouts can be helpful when using \fBpcre2test\fP to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
.\" HREF
|
||||
\fBpcre2callout\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
The output for callouts with numerical arguments and those with string
|
||||
arguments is slightly different.
|
||||
.
|
||||
.
|
||||
.SS "Callouts with numerical arguments"
|
||||
.rs
|
||||
.sp
|
||||
By default, the callout function displays the callout number, the start and
|
||||
current positions in the subject text at the callout time, and the next pattern
|
||||
item to be tested. For example:
|
||||
.sp
|
||||
--->pqrabcdef
|
||||
0 ^ ^ \ed
|
||||
|
@ -1275,18 +1299,27 @@ a change of latest mark is passed to the callout function. For example:
|
|||
The mark changes between matching "a" and "b", but stays the same for the rest
|
||||
of the match, so nothing more is output. If, as a result of backtracking, the
|
||||
mark reverts to being unset, the text "<unset>" is output.
|
||||
.P
|
||||
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
|
||||
default, but you can use a \fBcallout_fail\fP modifier in a subject line (as
|
||||
described above) to change this and other parameters of the callout.
|
||||
.P
|
||||
Inserting callouts can be helpful when using \fBpcre2test\fP to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
.\" HREF
|
||||
\fBpcre2callout\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.SS "Callouts with string arguments"
|
||||
.rs
|
||||
.sp
|
||||
The output for a callout with a string argument is similar, except that instead
|
||||
of outputting a callout number before the position indicators, the callout
|
||||
string and its offset in the pattern string are output before the reflection of
|
||||
the subject string, and the subject string is reflected for each callout. For
|
||||
example:
|
||||
.sp
|
||||
re> /^ab(?C'first')cd(?C"second")ef/
|
||||
data> abcdefg
|
||||
Callout (7): 'first'
|
||||
--->abcdefg
|
||||
^ ^ c
|
||||
Callout (20): "second"
|
||||
--->abcdefg
|
||||
^ ^ e
|
||||
0: abcdef
|
||||
.sp
|
||||
.
|
||||
.
|
||||
.
|
||||
|
@ -1398,6 +1431,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 23 January 2015
|
||||
Last updated: 14 March 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
Loading…
Reference in New Issue