Documentation for callouts with string arguments.
This commit is contained in:
parent
15e034c9c2
commit
2ec7cbf9b5
76
HACKING
76
HACKING
|
@ -8,7 +8,7 @@ library is referred to as PCRE1 below. For information about testing PCRE2, see
|
||||||
the pcre2test documentation and the comment at the head of the RunTest file.
|
the pcre2test documentation and the comment at the head of the RunTest file.
|
||||||
|
|
||||||
PCRE1 releases were up to 8.3x when PCRE2 was developed. The 8.xx series will
|
PCRE1 releases were up to 8.3x when PCRE2 was developed. The 8.xx series will
|
||||||
continue for bugfixes if necessary. PCRE2 releases start at 10.0 to avoid
|
continue for bugfixes if necessary. PCRE2 releases started at 10.00 to avoid
|
||||||
confusion with PCRE1.
|
confusion with PCRE1.
|
||||||
|
|
||||||
|
|
||||||
|
@ -39,7 +39,7 @@ subsequently heavily modified for Perl) compiles the expression twice: once in
|
||||||
a dummy mode in order to find out how much store will be needed, and then for
|
a dummy mode in order to find out how much store will be needed, and then for
|
||||||
real. (The Perl version probably doesn't do this any more; I'm talking about
|
real. (The Perl version probably doesn't do this any more; I'm talking about
|
||||||
the original library.) The execution function operates by backtracking and
|
the original library.) The execution function operates by backtracking and
|
||||||
maximizing (or, optionally, minimizing in Perl) the amount of the subject that
|
maximizing (or, optionally, minimizing, in Perl) the amount of the subject that
|
||||||
matches individual wild portions of the pattern. This is an "NFA algorithm" in
|
matches individual wild portions of the pattern. This is an "NFA algorithm" in
|
||||||
Friedl's terminology.
|
Friedl's terminology.
|
||||||
|
|
||||||
|
@ -63,7 +63,7 @@ modes, creating up to three different libraries. In the description that
|
||||||
follows, the word "short" is used for a 16-bit data quantity, and the phrase
|
follows, the word "short" is used for a 16-bit data quantity, and the phrase
|
||||||
"code unit" is used for a quantity that is a byte in 8-bit mode, a short in
|
"code unit" is used for a quantity that is a byte in 8-bit mode, a short in
|
||||||
16-bit mode and a 32-bit word in 32-bit mode. The names of PCRE2 functions are
|
16-bit mode and a 32-bit word in 32-bit mode. The names of PCRE2 functions are
|
||||||
given in generic form, without a _8, _16, or _32 suffix.
|
given in generic form, without the _8, _16, or _32 suffix.
|
||||||
|
|
||||||
|
|
||||||
Computing the memory requirement: how it was
|
Computing the memory requirement: how it was
|
||||||
|
@ -100,8 +100,9 @@ issue, and in the event, nobody has commented on it.
|
||||||
|
|
||||||
At release 8.34, a limit on the nesting depth of parentheses was re-introduced
|
At release 8.34, a limit on the nesting depth of parentheses was re-introduced
|
||||||
(default 250, settable at build time) so as to put a limit on the amount of
|
(default 250, settable at build time) so as to put a limit on the amount of
|
||||||
system stack used by the compile function. This is a safety feature for
|
system stack used by the compile function, which uses recursive function calls
|
||||||
environments with small stacks where the patterns are provided by users.
|
for nested parenthesized groups. This is a safety feature for environments with
|
||||||
|
small stacks where the patterns are provided by users.
|
||||||
|
|
||||||
|
|
||||||
Traditional matching function
|
Traditional matching function
|
||||||
|
@ -158,8 +159,9 @@ default value for LINK_SIZE is 2, except for the 32-bit library, where it can
|
||||||
only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
|
only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
|
||||||
and the 16-bit library can be compiled to use 4-byte values, though this
|
and the 16-bit library can be compiled to use 4-byte values, though this
|
||||||
impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
|
impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
|
||||||
necessary only when patterns whose compiled length is greater than 64K are
|
necessary only when patterns whose compiled length is greater than 64K code
|
||||||
going to be processed.
|
units are going to be processed. When a LINK_SIZE value uses more than one code
|
||||||
|
unit, the most significant unit is first.
|
||||||
|
|
||||||
In this description, we assume the "normal" compilation options. Data values
|
In this description, we assume the "normal" compilation options. Data values
|
||||||
that are counts (e.g. quantifiers) are always two bytes long in 8-bit mode
|
that are counts (e.g. quantifiers) are always two bytes long in 8-bit mode
|
||||||
|
@ -343,7 +345,7 @@ For classes containing characters with values greater than 255 or that contain
|
||||||
code points are less than 256, followed by a list of pairs (for a range) and
|
code points are less than 256, followed by a list of pairs (for a range) and
|
||||||
single characters. In caseless mode, both cases are explicitly listed.
|
single characters. In caseless mode, both cases are explicitly listed.
|
||||||
|
|
||||||
OP_XCLASS is followed by a LINK_SIZE item containing the total length of the
|
OP_XCLASS is followed by a LINK_SIZE value containing the total length of the
|
||||||
opcode and its data. This is followed by a code unit containing flag bits:
|
opcode and its data. This is followed by a code unit containing flag bits:
|
||||||
XCL_NOT indicates that this is a negative class, and XCL_MAP indicates that a
|
XCL_NOT indicates that this is a negative class, and XCL_MAP indicates that a
|
||||||
bit map is present. There follows the bit map, if XCL_MAP is set, and then a
|
bit map is present. There follows the bit map, if XCL_MAP is set, and then a
|
||||||
|
@ -356,7 +358,7 @@ sequence of items coded as follows:
|
||||||
XCL_NOTPROP a Unicode property (type, value) follows
|
XCL_NOTPROP a Unicode property (type, value) follows
|
||||||
|
|
||||||
If a range starts with a code point less than 256 and ends with one greater
|
If a range starts with a code point less than 256 and ends with one greater
|
||||||
than 256, it is split into two ranges, with characters less than 256 being
|
than 255, it is split into two ranges, with characters less than 256 being
|
||||||
indicated in the bit map, and the rest with XCL_RANGE.
|
indicated in the bit map, and the rest with XCL_RANGE.
|
||||||
|
|
||||||
When XCL_NOT is set, the bit map, if present, contains bits for characters that
|
When XCL_NOT is set, the bit map, if present, contains bits for characters that
|
||||||
|
@ -412,17 +414,17 @@ compile time, so alternation always happens in the context of brackets.
|
||||||
myself, can be round, square, curly, or pointy. Hence this usage rather than
|
myself, can be round, square, curly, or pointy. Hence this usage rather than
|
||||||
"parentheses".]
|
"parentheses".]
|
||||||
|
|
||||||
Non-capturing brackets use the opcode OP_BRA, capturing brackets use OP_CBRA.
|
Non-capturing brackets use the opcode OP_BRA, capturing brackets use OP_CBRA. A
|
||||||
A bracket opcode is followed by LINK_SIZE bytes which give the offset to the
|
bracket opcode is followed by a LINK_SIZE value which gives the offset to the
|
||||||
next alternative OP_ALT or, if there aren't any branches, to the matching
|
next alternative OP_ALT or, if there aren't any branches, to the matching
|
||||||
OP_KET opcode. Each OP_ALT is followed by LINK_SIZE bytes giving the offset to
|
OP_KET opcode. Each OP_ALT is followed by a LINK_SIZE value giving the offset
|
||||||
the next one, or to the OP_KET opcode. For capturing brackets, the bracket
|
to the next one, or to the OP_KET opcode. For capturing brackets, the bracket
|
||||||
number is a count that immediately follows the offset.
|
number is a count that immediately follows the offset.
|
||||||
|
|
||||||
OP_KET is used for subpatterns that do not repeat indefinitely, and OP_KETRMIN
|
OP_KET is used for subpatterns that do not repeat indefinitely, and OP_KETRMIN
|
||||||
and OP_KETRMAX are used for indefinite repetitions, minimally or maximally
|
and OP_KETRMAX are used for indefinite repetitions, minimally or maximally
|
||||||
respectively (see below for possessive repetitions). All three are followed by
|
respectively (see below for possessive repetitions). All three are followed by
|
||||||
LINK_SIZE bytes giving (as a positive number) the offset back to the matching
|
a LINK_SIZE value giving (as a positive number) the offset back to the matching
|
||||||
bracket opcode.
|
bracket opcode.
|
||||||
|
|
||||||
If a subpattern is quantified such that it is permitted to match zero times, it
|
If a subpattern is quantified such that it is permitted to match zero times, it
|
||||||
|
@ -520,8 +522,11 @@ tests the PCRE2 version number. This compiles into one of the opcodes OP_TRUE
|
||||||
or OP_FALSE.
|
or OP_FALSE.
|
||||||
|
|
||||||
If a condition is not a back reference, recursion test, DEFINE, or VERSION, it
|
If a condition is not a back reference, recursion test, DEFINE, or VERSION, it
|
||||||
must start with an assertion, whose opcode immediately follows OP_COND or
|
must start with an assertion, whose opcode normally immediately follows OP_COND
|
||||||
OP_SCOND.
|
or OP_SCOND. However, if automatic callouts are enabled, a callout is inserted
|
||||||
|
immediately before the assertion. It is also possible to insert a manual
|
||||||
|
callout at this point. Only assertion conditions may have callouts preceding
|
||||||
|
the condition.
|
||||||
|
|
||||||
|
|
||||||
Recursion
|
Recursion
|
||||||
|
@ -529,22 +534,43 @@ Recursion
|
||||||
|
|
||||||
Recursion either matches the current pattern, or some subexpression. The opcode
|
Recursion either matches the current pattern, or some subexpression. The opcode
|
||||||
OP_RECURSE is followed by a LINK_SIZE value that is the offset to the starting
|
OP_RECURSE is followed by a LINK_SIZE value that is the offset to the starting
|
||||||
bracket from the start of the whole pattern. OP_RECURSE is automatically
|
bracket from the start of the whole pattern. OP_RECURSE is also used for
|
||||||
wrapped inside OP_ONCE brackets, because otherwise some patterns broke it.
|
"subroutine" calls, even though they are not strictly a recursion. Repeated
|
||||||
OP_RECURSE is also used for "subroutine" calls, even though they are not
|
recursions are automatically wrapped inside OP_ONCE brackets, because otherwise
|
||||||
strictly a recursion.
|
some patterns broke them. A non-repeated recursion is not wrapped in OP_ONCE
|
||||||
|
brackets, but it is nevertheless still treated as an atomic group.
|
||||||
|
|
||||||
|
|
||||||
Callout
|
Callout
|
||||||
-------
|
-------
|
||||||
|
|
||||||
OP_CALLOUT is followed by one code unit of data that holds a callout number in
|
A callout can nowadays have either a numerical argument or a string argument.
|
||||||
the range 0 to 254 for manual callouts, or 255 for an automatic callout. In
|
These use OP_CALLOUT or OP_CALLOUT_STR, respectively. In each case these are
|
||||||
both cases there follows a count giving the offset in the pattern string to the
|
followed by two LINK_SIZE values giving the offset in the pattern string to the
|
||||||
start of the following item, and another count giving the length of this item.
|
start of the following item, and another count giving the length of this item.
|
||||||
These values make it possible for pcre2test to output useful tracing
|
These values make it possible for pcre2test to output useful tracing
|
||||||
information using automatic callouts.
|
information using callouts.
|
||||||
|
|
||||||
|
In the case of a numeric callout, after these two values there is a single code
|
||||||
|
unit containing the callout number, in the range 0-255, with 255 being used for
|
||||||
|
callouts that are automatically inserted as a result of the PCRE2_AUTO_CALLOUT
|
||||||
|
option. Thus, this opcode item is of fixed length:
|
||||||
|
|
||||||
|
[OP_CALLOUT] [PATTERN_OFFSET] [PATTERN_LENGTH] [NUMBER]
|
||||||
|
|
||||||
|
For callouts with string arguments, OP_CALLOUT_STR has three more data items:
|
||||||
|
a LINK_SIZE value giving the complete length of the entire opcode item, a
|
||||||
|
LINK_SIZE item containing the offset within the pattern string to the start of
|
||||||
|
the string argument, and the string itself, preceded by its starting delimiter
|
||||||
|
and followed by a binary zero. When a callout function is called, a pointer to
|
||||||
|
the actual string is passed, but the delimiter can be accessed as string[-1] if
|
||||||
|
the application needs it. In the 8-bit library, the callout in /X(?C'abc')Y/ is
|
||||||
|
compiled as the following bytes (decimal numbers represent binary values):
|
||||||
|
|
||||||
|
[OP_CALLOUT] [0] [10] [0] [1] [0] [14] [0] [5] ['] [a] [b] [c] [0]
|
||||||
|
-------- ------- -------- -------
|
||||||
|
| | | |
|
||||||
|
------- LINK_SIZE items ------
|
||||||
|
|
||||||
Opcode table checking
|
Opcode table checking
|
||||||
---------------------
|
---------------------
|
||||||
|
@ -554,4 +580,4 @@ not a real opcode, but is used to check that tables indexed by opcode are the
|
||||||
correct length, in order to catch updating errors.
|
correct length, in order to catch updating errors.
|
||||||
|
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
February 2015
|
March 2015
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2CALLOUT 3 "02 January 2015" "PCRE2 10.00"
|
.TH PCRE2CALLOUT 3 "15 March 2015" "PCRE2 10.20"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -15,18 +15,22 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
PCRE2 provides a feature called "callout", which is a means of temporarily
|
PCRE2 provides a feature called "callout", which is a means of temporarily
|
||||||
passing control to the caller of PCRE2 in the middle of pattern matching. The
|
passing control to the caller of PCRE2 in the middle of pattern matching. The
|
||||||
caller of PCRE2 provides an external function by putting its entry point in
|
caller of PCRE2 provides an external function by putting its entry point in
|
||||||
a match context (see \fBpcre2_set_callout()\fP) in the
|
a match context (see \fBpcre2_set_callout()\fP in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2api\fP
|
\fBpcre2api\fP
|
||||||
.\"
|
.\"
|
||||||
documentation).
|
documentation).
|
||||||
.P
|
.P
|
||||||
Within a regular expression, (?C) indicates the points at which the external
|
Within a regular expression, (?C<arg>) indicates a point at which the external
|
||||||
function is to be called. Different callout points can be identified by putting
|
function is to be called. Different callout points can be identified by putting
|
||||||
a number less than 256 after the letter C. The default value is zero.
|
a number less than 256 after the letter C. The default value is zero.
|
||||||
For example, this pattern has two callout points:
|
Alternatively, the argument may be a delimited string. The starting delimiter
|
||||||
|
must be one of ` ' " ^ % # $ { and the ending delimiter is the same as the
|
||||||
|
start, except for {, where the ending delimiter is }. If the ending delimiter
|
||||||
|
is needed within the string, it must be doubled. For example, this pattern has
|
||||||
|
two callout points:
|
||||||
.sp
|
.sp
|
||||||
(?C1)abc(?C2)def
|
(?C1)abc(?C"some ""arbitrary"" text")def
|
||||||
.sp
|
.sp
|
||||||
If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
|
If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
|
||||||
automatically inserts callouts, all with number 255, before each item in the
|
automatically inserts callouts, all with number 255, before each item in the
|
||||||
|
@ -43,20 +47,19 @@ alternation bar. If the pattern contains a conditional group whose condition is
|
||||||
an assertion, an automatic callout is inserted immediately before the
|
an assertion, an automatic callout is inserted immediately before the
|
||||||
condition. Such a callout may also be inserted explicitly, for example:
|
condition. Such a callout may also be inserted explicitly, for example:
|
||||||
.sp
|
.sp
|
||||||
(?(?C9)(?=a)ab|de)
|
(?(?C9)(?=a)ab|de) (?(?C%text%)(?!=d)ab|de)
|
||||||
.sp
|
.sp
|
||||||
This applies only to assertion conditions (because they are themselves
|
This applies only to assertion conditions (because they are themselves
|
||||||
independent groups).
|
independent groups).
|
||||||
.P
|
.P
|
||||||
Automatic callouts can be used for tracking the progress of pattern matching.
|
Callouts can be useful for tracking the progress of pattern matching. The
|
||||||
The
|
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2test\fP
|
\fBpcre2test\fP
|
||||||
.\"
|
.\"
|
||||||
program has a pattern qualifier (/auto_callout) that sets automatic callouts;
|
program has a pattern qualifier (/auto_callout) that sets automatic callouts.
|
||||||
when it is used, the output indicates how the pattern is being matched. This is
|
When any callouts are present, the output from \fBpcre2test\fP indicates how
|
||||||
useful information when you are trying to optimize the performance of a
|
the pattern is being matched. This is useful information when you are trying to
|
||||||
particular pattern.
|
optimize the performance of a particular pattern.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "MISSING CALLOUTS"
|
.SH "MISSING CALLOUTS"
|
||||||
|
@ -193,15 +196,52 @@ documentation). The callout block structure contains the following fields:
|
||||||
PCRE2_SIZE \fIcurrent_position\fP;
|
PCRE2_SIZE \fIcurrent_position\fP;
|
||||||
PCRE2_SIZE \fIpattern_position\fP;
|
PCRE2_SIZE \fIpattern_position\fP;
|
||||||
PCRE2_SIZE \fInext_item_length\fP;
|
PCRE2_SIZE \fInext_item_length\fP;
|
||||||
|
PCRE2_SIZE \fIcallout_string_offset\fP;
|
||||||
|
PCRE2_SPTR \fIcallout_string\fP;
|
||||||
|
uint32_t \fIcallout_string_length\fP;
|
||||||
|
|
||||||
.sp
|
.sp
|
||||||
The \fIversion\fP field contains the version number of the block format. The
|
The \fIversion\fP field contains the version number of the block format. The
|
||||||
current version is 0. The version number will change in future if additional
|
current version is 1; the three callout string fields were added for this
|
||||||
fields are added, but the intention is never to remove any of the existing
|
version. If you are writing an application that might use an earlier release of
|
||||||
fields.
|
PCRE2, you should check the version number before accessing any of these
|
||||||
|
fields. The version number will increase in future if more fields are added,
|
||||||
|
but the intention is never to remove any of the existing fields.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SS "Fields for numerical callouts"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
For a numerical callout, \fIcallout_string\fP is NULL, and \fIcallout_number\fP
|
||||||
|
contains the number of the callout, in the range 0-255. This is the number
|
||||||
|
that follows (?C for manual callouts; it is 255 for automatically generated
|
||||||
|
callouts.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SS "Fields for string callouts"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
For callouts with string arguments, \fIcallout_number\fP is always zero, and
|
||||||
|
\fIcallout_string\fP points to the string that is contained within the compiled
|
||||||
|
pattern. Its length is given by \fIcallout_string_length\fP. Duplicated ending
|
||||||
|
delimiters that were present in the original pattern string have been turned
|
||||||
|
into single characters. An additional code unit containing binary zero is
|
||||||
|
present after the string, but is not included in the length. The delimiter that
|
||||||
|
was used to start the string is also stored within the pattern, immediately
|
||||||
|
before the string itself. You can therefore access this delimiter as
|
||||||
|
\fIcallout_string\fP[-1] if you need it.
|
||||||
.P
|
.P
|
||||||
The \fIcallout_number\fP field contains the number of the callout, as compiled
|
The \fIcallout_string_offset\fP field is the code unit offset to the start of
|
||||||
into the pattern (that is, the number after ?C for manual callouts, and 255 for
|
the callout argument string within the original pattern string. This is
|
||||||
automatically generated callouts).
|
provided for the benefit of applications such as script languages that might
|
||||||
|
need to report errors in the callout string within the pattern.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SS "Fields for all callouts"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
The remaining fields in the callout block are the same for both kinds of
|
||||||
|
callout.
|
||||||
.P
|
.P
|
||||||
The \fIoffset_vector\fP field is a pointer to the vector of capturing offsets
|
The \fIoffset_vector\fP field is a pointer to the vector of capturing offsets
|
||||||
(the "ovector") that was passed to the matching function in the match data
|
(the "ovector") that was passed to the matching function in the match data
|
||||||
|
@ -246,7 +286,9 @@ of the entire subpattern.
|
||||||
.P
|
.P
|
||||||
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
|
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
|
||||||
help in distinguishing between different automatic callouts, which all have the
|
help in distinguishing between different automatic callouts, which all have the
|
||||||
same callout number. However, they are set for all callouts.
|
same callout number. However, they are set for all callouts, and are used by
|
||||||
|
\fBpcre2test\fP to show the next item to be matched when displaying callout
|
||||||
|
information.
|
||||||
.P
|
.P
|
||||||
In callouts from \fBpcre2_match()\fP the \fImark\fP field contains a pointer to
|
In callouts from \fBpcre2_match()\fP the \fImark\fP field contains a pointer to
|
||||||
the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
|
the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
|
||||||
|
@ -285,6 +327,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 02 January 2015
|
Last updated: 15 March 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2COMPAT 3 "28 September 2014" "PCRE2 10.0"
|
.TH PCRE2COMPAT 3 "15 March 2015" "PCRE2 10.20"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
||||||
|
@ -69,11 +69,11 @@ the
|
||||||
.\"
|
.\"
|
||||||
documentation for details.
|
documentation for details.
|
||||||
.P
|
.P
|
||||||
8. Subpatterns that are called as subroutines (whether or not recursively) are
|
8. Subroutine calls (whether recursive or not) are treated as atomic groups.
|
||||||
always treated as atomic groups in PCRE2. This is like Python, but unlike Perl.
|
Atomic recursion is like Python, but unlike Perl. Captured values that are set
|
||||||
Captured values that are set outside a subroutine call can be reference from
|
outside a subroutine call can be referenced from inside in PCRE2, but not in
|
||||||
inside in PCRE2, but not in Perl. There is a discussion that explains these
|
Perl. There is a discussion that explains these differences in more detail in
|
||||||
differences in more detail in the
|
the
|
||||||
.\" HTML <a href="pcre2pattern.html#recursiondifference">
|
.\" HTML <a href="pcre2pattern.html#recursiondifference">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
section on recursion differences from Perl
|
section on recursion differences from Perl
|
||||||
|
@ -185,6 +185,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 28 September 2014
|
Last updated: 15 March 2015
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "28 January 2015" "PCRE2 10.00"
|
.TH PCRE2PATTERN 3 "15 March 2015" "PCRE2 10.20"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -2821,42 +2821,69 @@ same pair of parentheses when there is a repetition.
|
||||||
PCRE2 provides a similar feature, but of course it cannot obey arbitrary Perl
|
PCRE2 provides a similar feature, but of course it cannot obey arbitrary Perl
|
||||||
code. The feature is called "callout". The caller of PCRE2 provides an external
|
code. The feature is called "callout". The caller of PCRE2 provides an external
|
||||||
function by putting its entry point in a match context using the function
|
function by putting its entry point in a match context using the function
|
||||||
\fBpcre2_set_callout()\fP and passing the context to \fBpcre2_match()\fP or
|
\fBpcre2_set_callout()\fP, and then passing that context to \fBpcre2_match()\fP
|
||||||
\fBpcre2_dfa_match()\fP. If no match context is passed, or if the callout entry
|
or \fBpcre2_dfa_match()\fP. If no match context is passed, or if the callout
|
||||||
point is set to NULL, callouts are disabled.
|
entry point is set to NULL, callouts are disabled.
|
||||||
.P
|
.P
|
||||||
Within a regular expression, (?C) indicates the points at which the external
|
Within a regular expression, (?C<arg>) indicates a point at which the external
|
||||||
function is to be called. If you want to identify different callout points, you
|
function is to be called. There are two kinds of callout: those with a
|
||||||
can put a number less than 256 after the letter C. The default value is zero.
|
numerical argument and those with a string argument. (?C) on its own with no
|
||||||
For example, this pattern has two callout points:
|
argument is treated as (?C0). A numerical argument allows the application to
|
||||||
|
distinguish between different callouts. String arguments were added for release
|
||||||
|
10.20 to make it possible for script languages that use PCRE2 to embed short
|
||||||
|
scripts within patterns in a similar way to Perl.
|
||||||
|
.P
|
||||||
|
During matching, when PCRE2 reaches a callout point, the external function is
|
||||||
|
called. It is provided with the number or string argument of the callout, the
|
||||||
|
position in the pattern, and one item of data that is also set in the match
|
||||||
|
block. The callout function may cause matching to proceed, to backtrack, or to
|
||||||
|
fail.
|
||||||
|
.P
|
||||||
|
By default, PCRE2 implements a number of optimizations at matching time, and
|
||||||
|
one side-effect is that sometimes callouts are skipped. If you need all
|
||||||
|
possible callouts to happen, you need to set options that disable the relevant
|
||||||
|
optimizations. More details, including a complete description of the
|
||||||
|
programming interface to the callout function, are given in the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2callout\fP
|
||||||
|
.\"
|
||||||
|
documentation.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SS "Callouts with numerical arguments"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
If you just want to have a means of identifying different callout points, put a
|
||||||
|
number less than 256 after the letter C. For example, this pattern has two
|
||||||
|
callout points:
|
||||||
.sp
|
.sp
|
||||||
(?C1)abc(?C2)def
|
(?C1)abc(?C2)def
|
||||||
.sp
|
.sp
|
||||||
If the PCRE2_AUTO_CALLOUT flag is passed to \fBpcre2_compile()\fP, callouts are
|
If the PCRE2_AUTO_CALLOUT flag is passed to \fBpcre2_compile()\fP, numerical
|
||||||
automatically installed before each item in the pattern. They are all numbered
|
callouts are automatically installed before each item in the pattern. They are
|
||||||
255. If there is a conditional group in the pattern whose condition is an
|
all numbered 255. If there is a conditional group in the pattern whose
|
||||||
assertion, an additional callout is inserted just before the condition. An
|
condition is an assertion, an additional callout is inserted just before the
|
||||||
explicit callout may also be set at this position, as in this example:
|
condition. An explicit callout may also be set at this position, as in this
|
||||||
|
example:
|
||||||
.sp
|
.sp
|
||||||
(?(?C9)(?=a)abc|def)
|
(?(?C9)(?=a)abc|def)
|
||||||
.sp
|
.sp
|
||||||
Note that this applies only to assertion conditions, not to other types of
|
Note that this applies only to assertion conditions, not to other types of
|
||||||
condition.
|
condition.
|
||||||
.P
|
.
|
||||||
During matching, when PCRE2 reaches a callout point, the external function is
|
.
|
||||||
called. It is provided with the number of the callout, the position in the
|
.SS "Callouts with string arguments"
|
||||||
pattern, and one item of data that is also set in the match block. The callout
|
.rs
|
||||||
function may cause matching to proceed, to backtrack, or to fail.
|
.sp
|
||||||
.P
|
A delimited string may be used instead of a number as a callout argument. The
|
||||||
By default, PCRE2 implements a number of optimizations at matching time, and
|
starting delimiter must be one of ` ' " ^ % # $ { and the ending delimiter is
|
||||||
one side-effect is that sometimes callouts are skipped. If you need all
|
the same as the start, except for {, where the ending delimiter is }. If the
|
||||||
possible callouts to happen, you need to set options that disable the relevant
|
ending delimiter is needed within the string, it must be doubled. For
|
||||||
optimizations. More details, and a complete description of the interface to the
|
example:
|
||||||
callout function, are given in the
|
.sp
|
||||||
.\" HREF
|
(?C'ab ''c'' d')xyz(?C{any text})pqr
|
||||||
\fBpcre2callout\fP
|
.sp
|
||||||
.\"
|
The doubling is removed before the string is passed to the callout function.
|
||||||
documentation.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="backtrackcontrol"></a>
|
.\" HTML <a name="backtrackcontrol"></a>
|
||||||
|
@ -3302,6 +3329,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 28 January 2015
|
Last updated: 15 March 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2SYNTAX 3 "26 January 2015" "PCRE2 10.00"
|
.TH PCRE2SYNTAX 3 "15 March 2015" "PCRE2 10.20"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||||
|
@ -513,8 +513,13 @@ pattern is not anchored.
|
||||||
.SH "CALLOUTS"
|
.SH "CALLOUTS"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
(?C) callout
|
(?C) callout (assumed number 0)
|
||||||
(?Cn) callout with data n
|
(?Cn) callout with numerical data n
|
||||||
|
(?C"text") callout with string data
|
||||||
|
.sp
|
||||||
|
The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
|
||||||
|
start and the end), and the starting delimiter { matched with the ending
|
||||||
|
delimiter }. To encode the ending delimiter within the string, double it.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "SEE ALSO"
|
.SH "SEE ALSO"
|
||||||
|
@ -538,6 +543,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 26 January 2015
|
Last updated: 15 March 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "23 January 2015" "PCRE 10.10"
|
.TH PCRE2TEST 1 "14 March 2015" "PCRE 10.20"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -875,11 +875,14 @@ set, the current captured groups are output when a callout occurs.
|
||||||
The \fBcallout_fail\fP modifier can be given one or two numbers. If there is
|
The \fBcallout_fail\fP modifier can be given one or two numbers. If there is
|
||||||
only one number, 1 is returned instead of 0 when a callout of that number is
|
only one number, 1 is returned instead of 0 when a callout of that number is
|
||||||
reached. If two numbers are given, 1 is returned when callout <n> is reached
|
reached. If two numbers are given, 1 is returned when callout <n> is reached
|
||||||
for the <m>th time.
|
for the <m>th time. Note that callouts with string arguments are always given
|
||||||
|
the number zero. See "Callouts" below for a description of the output when a
|
||||||
|
callout it taken.
|
||||||
.P
|
.P
|
||||||
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
|
The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
|
||||||
Any value other than zero is used as a return from \fBpcre2test\fP's callout
|
This is set as the "user data" that is passed to the matching function, and
|
||||||
function.
|
passed back when the callout function is invoked. Any value other than zero is
|
||||||
|
used as a return from \fBpcre2test\fP's callout function.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Finding all matches in a string"
|
.SS "Finding all matches in a string"
|
||||||
|
@ -1231,10 +1234,31 @@ documentation.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
If the pattern contains any callout requests, \fBpcre2test\fP's callout
|
If the pattern contains any callout requests, \fBpcre2test\fP's callout
|
||||||
function is called during matching. This works with both matching functions. By
|
function is called during matching unless \fBcallout_none\fP is specified.
|
||||||
default, the called function displays the callout number, the start and current
|
This works with both matching functions.
|
||||||
positions in the text at the callout time, and the next pattern item to be
|
.P
|
||||||
tested. For example:
|
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
|
||||||
|
default, but you can use a \fBcallout_fail\fP modifier in a subject line (as
|
||||||
|
described above) to change this and other parameters of the callout.
|
||||||
|
.P
|
||||||
|
Inserting callouts can be helpful when using \fBpcre2test\fP to check
|
||||||
|
complicated regular expressions. For further information about callouts, see
|
||||||
|
the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2callout\fP
|
||||||
|
.\"
|
||||||
|
documentation.
|
||||||
|
.P
|
||||||
|
The output for callouts with numerical arguments and those with string
|
||||||
|
arguments is slightly different.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SS "Callouts with numerical arguments"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
By default, the callout function displays the callout number, the start and
|
||||||
|
current positions in the subject text at the callout time, and the next pattern
|
||||||
|
item to be tested. For example:
|
||||||
.sp
|
.sp
|
||||||
--->pqrabcdef
|
--->pqrabcdef
|
||||||
0 ^ ^ \ed
|
0 ^ ^ \ed
|
||||||
|
@ -1275,18 +1299,27 @@ a change of latest mark is passed to the callout function. For example:
|
||||||
The mark changes between matching "a" and "b", but stays the same for the rest
|
The mark changes between matching "a" and "b", but stays the same for the rest
|
||||||
of the match, so nothing more is output. If, as a result of backtracking, the
|
of the match, so nothing more is output. If, as a result of backtracking, the
|
||||||
mark reverts to being unset, the text "<unset>" is output.
|
mark reverts to being unset, the text "<unset>" is output.
|
||||||
.P
|
.
|
||||||
The callout function in \fBpcre2test\fP returns zero (carry on matching) by
|
.
|
||||||
default, but you can use a \fBcallout_fail\fP modifier in a subject line (as
|
.SS "Callouts with string arguments"
|
||||||
described above) to change this and other parameters of the callout.
|
.rs
|
||||||
.P
|
.sp
|
||||||
Inserting callouts can be helpful when using \fBpcre2test\fP to check
|
The output for a callout with a string argument is similar, except that instead
|
||||||
complicated regular expressions. For further information about callouts, see
|
of outputting a callout number before the position indicators, the callout
|
||||||
the
|
string and its offset in the pattern string are output before the reflection of
|
||||||
.\" HREF
|
the subject string, and the subject string is reflected for each callout. For
|
||||||
\fBpcre2callout\fP
|
example:
|
||||||
.\"
|
.sp
|
||||||
documentation.
|
re> /^ab(?C'first')cd(?C"second")ef/
|
||||||
|
data> abcdefg
|
||||||
|
Callout (7): 'first'
|
||||||
|
--->abcdefg
|
||||||
|
^ ^ c
|
||||||
|
Callout (20): "second"
|
||||||
|
--->abcdefg
|
||||||
|
^ ^ e
|
||||||
|
0: abcdef
|
||||||
|
.sp
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
@ -1398,6 +1431,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 23 January 2015
|
Last updated: 14 March 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
Loading…
Reference in New Issue