Documentation update.
This commit is contained in:
parent
0b64d9cfca
commit
e7a762ddff
|
@ -2841,22 +2841,23 @@ undefined.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
|
After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
|
||||||
to match (PCRE2_ERROR_NOMATCH), a (*MARK), (*PRUNE), or (*THEN) name may be
|
to match (PCRE2_ERROR_NOMATCH), a mark name may be available. The function
|
||||||
available. The function <b>pcre2_get_mark()</b> can be called to access this
|
<b>pcre2_get_mark()</b> can be called to access this name, which can be
|
||||||
name. The same function applies to all three verbs. It returns a pointer to the
|
specified in the pattern by any of the backtracking control verbs, not just
|
||||||
zero-terminated name, which is within the compiled pattern. If no name is
|
(*MARK). The same function applies to all the verbs. It returns a pointer to
|
||||||
|
the zero-terminated name, which is within the compiled pattern. If no name is
|
||||||
available, NULL is returned. The length of the name (excluding the terminating
|
available, NULL is returned. The length of the name (excluding the terminating
|
||||||
zero) is stored in the code unit that precedes the name. You should use this
|
zero) is stored in the code unit that precedes the name. You should use this
|
||||||
length instead of relying on the terminating zero if the name might contain a
|
length instead of relying on the terminating zero if the name might contain a
|
||||||
binary zero.
|
binary zero.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
After a successful match, the name that is returned is the last (*MARK),
|
After a successful match, the name that is returned is the last mark name
|
||||||
(*PRUNE), or (*THEN) name encountered on the matching path through the pattern.
|
encountered on the matching path through the pattern. Instances of backtracking
|
||||||
Instances of (*PRUNE) and (*THEN) without names are ignored. Thus, for example,
|
verbs without names do not count. Thus, for example, if the matching path
|
||||||
if the matching path contains (*MARK:A)(*PRUNE), the name "A" is returned.
|
contains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a
|
||||||
After a "no match" or a partial match, the last encountered name is returned.
|
partial match, the last encountered name is returned. For example, consider
|
||||||
For example, consider this pattern:
|
this pattern:
|
||||||
<pre>
|
<pre>
|
||||||
^(*MARK:A)((*MARK:B)a|b)c
|
^(*MARK:A)((*MARK:B)a|b)c
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2871,7 +2872,7 @@ is removed from the pattern above, there is an initial check for the presence
|
||||||
of "c" in the subject before running the matching engine. This check fails for
|
of "c" in the subject before running the matching engine. This check fails for
|
||||||
"bx", causing a match failure without seeing any marks. You can disable the
|
"bx", causing a match failure without seeing any marks. You can disable the
|
||||||
start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for
|
start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for
|
||||||
<b>pcre2_compile()</b> or starting the pattern with (*NO_START_OPT).
|
<b>pcre2_compile()</b> or by starting the pattern with (*NO_START_OPT).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
After a successful match, a partial match, or one of the invalid UTF errors
|
After a successful match, a partial match, or one of the invalid UTF errors
|
||||||
|
@ -3286,13 +3287,12 @@ For example, if the pattern a(b)c is matched with "=abc=" and the replacement
|
||||||
string "+$1$0$1+", the result is "=+babcb+=".
|
string "+$1$0$1+", the result is "=+babcb+=".
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
$*MARK inserts the name from the last encountered (*ACCEPT), (*COMMIT),
|
$*MARK inserts the name from the last encountered backtracking control verb on
|
||||||
(*MARK), (*PRUNE), or (*THEN) on the matching path that has a name. (*MARK)
|
the matching path that has a name. (*MARK) must always include a name, but the
|
||||||
must always include a name, but the other verbs need not. For example, in
|
other verbs need not. For example, in the case of (*MARK:A)(*PRUNE) the name
|
||||||
the case of (*MARK:A)(*PRUNE) the name inserted is "A", but for
|
inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This
|
||||||
(*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be used to
|
facility can be used to perform simple simultaneous substitutions, as this
|
||||||
perform simple simultaneous substitutions, as this <b>pcre2test</b> example
|
<b>pcre2test</b> example shows:
|
||||||
shows:
|
|
||||||
<pre>
|
<pre>
|
||||||
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
||||||
apple lemon
|
apple lemon
|
||||||
|
@ -3782,7 +3782,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 12 November 2018
|
Last updated: 27 November 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -871,9 +871,14 @@ only callouts with string arguments are useful.
|
||||||
Calling external programs or scripts
|
Calling external programs or scripts
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
This facility can be independently disabled when <b>pcre2grep</b> is built. If
|
This facility can be independently disabled when <b>pcre2grep</b> is built. It
|
||||||
the callout string does not start with a pipe (vertical bar) character, it is
|
is supported for Windows, where a call to <b>_spawnvp()</b> is used, for VMS,
|
||||||
parsed into a list of substrings separated by pipe characters. The first
|
where <b>lib$spawn()</b> is used, and for any other Unix-like environment where
|
||||||
|
<b>fork()</b> and <b>execv()</b> are available.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If the callout string does not start with a pipe (vertical bar) character, it
|
||||||
|
is parsed into a list of substrings separated by pipe characters. The first
|
||||||
substring must be an executable name, with the following substrings specifying
|
substring must be an executable name, with the following substrings specifying
|
||||||
arguments:
|
arguments:
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -900,7 +905,7 @@ a single dollar and $| is replaced by a pipe character. Here is an example:
|
||||||
Arg1: [1] [234] [4] Arg2: |1| ()
|
Arg1: [1] [234] [4] Arg2: |1| ()
|
||||||
12345
|
12345
|
||||||
</pre>
|
</pre>
|
||||||
The parameters for the <b>execv()</b> system call that is used to run the
|
The parameters for the system call that is used to run the
|
||||||
program or script are zero-terminated strings. This means that binary zero
|
program or script are zero-terminated strings. This means that binary zero
|
||||||
characters in the callout argument will cause premature termination of their
|
characters in the callout argument will cause premature termination of their
|
||||||
substrings, and therefore should not be present. Any syntax errors in the
|
substrings, and therefore should not be present. Any syntax errors in the
|
||||||
|
@ -966,7 +971,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 17 November 2018
|
Last updated: 24 November 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -2623,9 +2623,9 @@ can be used:
|
||||||
<pre>
|
<pre>
|
||||||
\s+(?=\p{Latin})(*sr:\S+)
|
\s+(?=\p{Latin})(*sr:\S+)
|
||||||
</pre>
|
</pre>
|
||||||
This works as long as the first character is expected to be a character in that
|
This works as long as the first character is expected to be a character in that
|
||||||
script, and not (for example) punctuation, which is allowed with any script. If
|
script, and not (for example) punctuation, which is allowed with any script. If
|
||||||
this is not the case, a more creative lookahead is needed. For example, if
|
this is not the case, a more creative lookahead is needed. For example, if
|
||||||
digits, underscore, and dots are permitted at the start:
|
digits, underscore, and dots are permitted at the start:
|
||||||
<pre>
|
<pre>
|
||||||
\s+(?=[0-9_.]*\p{Latin})(*sr:\S+)
|
\s+(?=[0-9_.]*\p{Latin})(*sr:\S+)
|
||||||
|
@ -3223,6 +3223,7 @@ There are a number of special "Backtracking Control Verbs" (to use Perl's
|
||||||
terminology) that modify the behaviour of backtracking during matching. They
|
terminology) that modify the behaviour of backtracking during matching. They
|
||||||
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
|
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
|
||||||
possibly behaving differently depending on whether or not a name is present.
|
possibly behaving differently depending on whether or not a name is present.
|
||||||
|
The names are not required to be unique within the pattern.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
By default, for compatibility with Perl, a name is any sequence of characters
|
By default, for compatibility with Perl, a name is any sequence of characters
|
||||||
|
@ -3331,8 +3332,8 @@ A match with the string "aaaa" always fails, but the callout is taken before
|
||||||
each backtrack happens (in this example, 10 times).
|
each backtrack happens (in this example, 10 times).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
|
(*ACCEPT:NAME) and (*FAIL:NAME) are treated as (*MARK:NAME)(*ACCEPT) and
|
||||||
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
|
(*MARK:NAME)(*FAIL), respectively.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Recording which path was taken
|
Recording which path was taken
|
||||||
|
@ -3344,27 +3345,25 @@ starting point (see (*SKIP) below).
|
||||||
<pre>
|
<pre>
|
||||||
(*MARK:NAME) or (*:NAME)
|
(*MARK:NAME) or (*:NAME)
|
||||||
</pre>
|
</pre>
|
||||||
A name is always required with this verb. There may be as many instances of
|
A name is always required with this verb. For all the other backtracking
|
||||||
(*MARK) as you like in a pattern, and their names do not have to be unique.
|
control verbs, a NAME argument is optional.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When a match succeeds, the name of the last-encountered (*MARK:NAME) on the
|
When a match succeeds, the name of the last-encountered mark name on the
|
||||||
matching path is passed back to the caller as described in the section entitled
|
matching path is passed back to the caller as described in the section entitled
|
||||||
<a href="pcre2api.html#matchotherdata">"Other information about the match"</a>
|
<a href="pcre2api.html#matchotherdata">"Other information about the match"</a>
|
||||||
in the
|
in the
|
||||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
documentation. This applies to all instances of (*MARK), including those inside
|
documentation. This applies to all instances of (*MARK) and other verbs,
|
||||||
assertions and atomic groups. (There are differences in those cases when
|
including those inside assertions and atomic groups. However, there are
|
||||||
(*MARK) is used in conjunction with (*SKIP) as described below.)
|
differences in those cases when (*MARK) is used in conjunction with (*SKIP) as
|
||||||
|
described below.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
|
The mark name that was last encountered on the matching path is passed back. A
|
||||||
associated NAME arguments. Whichever is last on the matching path is passed
|
verb without a NAME argument is ignored for this purpose. Here is an example of
|
||||||
back. See below for more details of these other verbs.
|
<b>pcre2test</b> output, where the "mark" modifier requests the retrieval and
|
||||||
</P>
|
outputting of (*MARK) data:
|
||||||
<P>
|
|
||||||
Here is an example of <b>pcre2test</b> output, where the "mark" modifier
|
|
||||||
requests the retrieval and outputting of (*MARK) data:
|
|
||||||
<pre>
|
<pre>
|
||||||
re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
|
re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
|
||||||
data> XY
|
data> XY
|
||||||
|
@ -3414,7 +3413,7 @@ to the left of the verb. However, when one of these verbs appears inside an
|
||||||
atomic group or in a lookaround assertion that is true, its effect is confined
|
atomic group or in a lookaround assertion that is true, its effect is confined
|
||||||
to that group, because once the group has been matched, there is never any
|
to that group, because once the group has been matched, there is never any
|
||||||
backtracking into it. Backtracking from beyond an assertion or an atomic group
|
backtracking into it. Backtracking from beyond an assertion or an atomic group
|
||||||
ignores the entire group, and seeks a preceeding backtracking point.
|
ignores the entire group, and seeks a preceding backtracking point.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
These verbs differ in exactly what kind of failure occurs when backtracking
|
These verbs differ in exactly what kind of failure occurs when backtracking
|
||||||
|
@ -3439,8 +3438,8 @@ dynamic anchor, or "I've started, so I must finish."
|
||||||
<P>
|
<P>
|
||||||
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
|
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
|
||||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
caller. However, (*SKIP:NAME) searches only for names that are set with
|
||||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
(*MARK), ignoring those set by any of the other backtracking verbs.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If there is more than one backtracking verb in a pattern, a different one that
|
If there is more than one backtracking verb in a pattern, a different one that
|
||||||
|
@ -3484,7 +3483,7 @@ as (*COMMIT).
|
||||||
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
||||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||||
ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
|
ignoring those set by other backtracking verbs.
|
||||||
<pre>
|
<pre>
|
||||||
(*SKIP)
|
(*SKIP)
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3539,7 +3538,7 @@ the second branch of the pattern.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
||||||
names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
|
names that are set by other backtracking verbs.
|
||||||
<pre>
|
<pre>
|
||||||
(*THEN) or (*THEN:NAME)
|
(*THEN) or (*THEN:NAME)
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3561,7 +3560,7 @@ group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
|
||||||
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
|
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
|
||||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
ignoring those set by other backtracking verbs.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
A subpattern that does not contain a | character is just a part of the
|
A subpattern that does not contain a | character is just a part of the
|
||||||
|
@ -3656,10 +3655,10 @@ subpattern.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
||||||
without any further processing; captured strings and a (*MARK) name (if set)
|
without any further processing; captured strings and a mark name (if set) are
|
||||||
are retained. In a standalone negative assertion, (*ACCEPT) causes the
|
retained. In a standalone negative assertion, (*ACCEPT) causes the assertion to
|
||||||
assertion to fail without any further processing; captured substrings and any
|
fail without any further processing; captured substrings and any mark name are
|
||||||
(*MARK) name are discarded.
|
discarded.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
||||||
|
@ -3731,7 +3730,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 12 October 2018
|
Last updated: 27 November 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
624
doc/pcre2.txt
624
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "12 November 2018" "PCRE2 10.33"
|
.TH PCRE2API 3 "27 November 2018" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -2842,21 +2842,22 @@ appropriate circumstances. If they are called at other times, the result is
|
||||||
undefined.
|
undefined.
|
||||||
.P
|
.P
|
||||||
After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
|
After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
|
||||||
to match (PCRE2_ERROR_NOMATCH), a (*MARK), (*PRUNE), or (*THEN) name may be
|
to match (PCRE2_ERROR_NOMATCH), a mark name may be available. The function
|
||||||
available. The function \fBpcre2_get_mark()\fP can be called to access this
|
\fBpcre2_get_mark()\fP can be called to access this name, which can be
|
||||||
name. The same function applies to all three verbs. It returns a pointer to the
|
specified in the pattern by any of the backtracking control verbs, not just
|
||||||
zero-terminated name, which is within the compiled pattern. If no name is
|
(*MARK). The same function applies to all the verbs. It returns a pointer to
|
||||||
|
the zero-terminated name, which is within the compiled pattern. If no name is
|
||||||
available, NULL is returned. The length of the name (excluding the terminating
|
available, NULL is returned. The length of the name (excluding the terminating
|
||||||
zero) is stored in the code unit that precedes the name. You should use this
|
zero) is stored in the code unit that precedes the name. You should use this
|
||||||
length instead of relying on the terminating zero if the name might contain a
|
length instead of relying on the terminating zero if the name might contain a
|
||||||
binary zero.
|
binary zero.
|
||||||
.P
|
.P
|
||||||
After a successful match, the name that is returned is the last (*MARK),
|
After a successful match, the name that is returned is the last mark name
|
||||||
(*PRUNE), or (*THEN) name encountered on the matching path through the pattern.
|
encountered on the matching path through the pattern. Instances of backtracking
|
||||||
Instances of (*PRUNE) and (*THEN) without names are ignored. Thus, for example,
|
verbs without names do not count. Thus, for example, if the matching path
|
||||||
if the matching path contains (*MARK:A)(*PRUNE), the name "A" is returned.
|
contains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a
|
||||||
After a "no match" or a partial match, the last encountered name is returned.
|
partial match, the last encountered name is returned. For example, consider
|
||||||
For example, consider this pattern:
|
this pattern:
|
||||||
.sp
|
.sp
|
||||||
^(*MARK:A)((*MARK:B)a|b)c
|
^(*MARK:A)((*MARK:B)a|b)c
|
||||||
.sp
|
.sp
|
||||||
|
@ -2870,7 +2871,7 @@ is removed from the pattern above, there is an initial check for the presence
|
||||||
of "c" in the subject before running the matching engine. This check fails for
|
of "c" in the subject before running the matching engine. This check fails for
|
||||||
"bx", causing a match failure without seeing any marks. You can disable the
|
"bx", causing a match failure without seeing any marks. You can disable the
|
||||||
start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for
|
start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for
|
||||||
\fBpcre2_compile()\fP or starting the pattern with (*NO_START_OPT).
|
\fBpcre2_compile()\fP or by starting the pattern with (*NO_START_OPT).
|
||||||
.P
|
.P
|
||||||
After a successful match, a partial match, or one of the invalid UTF errors
|
After a successful match, a partial match, or one of the invalid UTF errors
|
||||||
(for example, PCRE2_ERROR_UTF8_ERR5), \fBpcre2_get_startchar()\fP can be
|
(for example, PCRE2_ERROR_UTF8_ERR5), \fBpcre2_get_startchar()\fP can be
|
||||||
|
@ -3297,13 +3298,12 @@ number or name. The number may be zero to include the entire matched string.
|
||||||
For example, if the pattern a(b)c is matched with "=abc=" and the replacement
|
For example, if the pattern a(b)c is matched with "=abc=" and the replacement
|
||||||
string "+$1$0$1+", the result is "=+babcb+=".
|
string "+$1$0$1+", the result is "=+babcb+=".
|
||||||
.P
|
.P
|
||||||
$*MARK inserts the name from the last encountered (*ACCEPT), (*COMMIT),
|
$*MARK inserts the name from the last encountered backtracking control verb on
|
||||||
(*MARK), (*PRUNE), or (*THEN) on the matching path that has a name. (*MARK)
|
the matching path that has a name. (*MARK) must always include a name, but the
|
||||||
must always include a name, but the other verbs need not. For example, in
|
other verbs need not. For example, in the case of (*MARK:A)(*PRUNE) the name
|
||||||
the case of (*MARK:A)(*PRUNE) the name inserted is "A", but for
|
inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This
|
||||||
(*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be used to
|
facility can be used to perform simple simultaneous substitutions, as this
|
||||||
perform simple simultaneous substitutions, as this \fBpcre2test\fP example
|
\fBpcre2test\fP example shows:
|
||||||
shows:
|
|
||||||
.sp
|
.sp
|
||||||
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
||||||
apple lemon
|
apple lemon
|
||||||
|
@ -3790,6 +3790,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 12 November 2018
|
Last updated: 27 November 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -847,11 +847,15 @@ USING PCRE2'S CALLOUT FACILITY
|
||||||
|
|
||||||
Calling external programs or scripts
|
Calling external programs or scripts
|
||||||
|
|
||||||
This facility can be independently disabled when pcre2grep is built. If
|
This facility can be independently disabled when pcre2grep is built. It
|
||||||
the callout string does not start with a pipe (vertical bar) character,
|
is supported for Windows, where a call to _spawnvp() is used, for VMS,
|
||||||
it is parsed into a list of substrings separated by pipe characters.
|
where lib$spawn() is used, and for any other Unix-like environment
|
||||||
The first substring must be an executable name, with the following sub-
|
where fork() and execv() are available.
|
||||||
strings specifying arguments:
|
|
||||||
|
If the callout string does not start with a pipe (vertical bar) charac-
|
||||||
|
ter, it is parsed into a list of substrings separated by pipe charac-
|
||||||
|
ters. The first substring must be an executable name, with the follow-
|
||||||
|
ing substrings specifying arguments:
|
||||||
|
|
||||||
executable_name|arg1|arg2|...
|
executable_name|arg1|arg2|...
|
||||||
|
|
||||||
|
@ -877,15 +881,14 @@ USING PCRE2'S CALLOUT FACILITY
|
||||||
Arg1: [1] [234] [4] Arg2: |1| ()
|
Arg1: [1] [234] [4] Arg2: |1| ()
|
||||||
12345
|
12345
|
||||||
|
|
||||||
The parameters for the execv() system call that is used to run the pro-
|
The parameters for the system call that is used to run the program or
|
||||||
gram or script are zero-terminated strings. This means that binary zero
|
script are zero-terminated strings. This means that binary zero charac-
|
||||||
characters in the callout argument will cause premature termination of
|
ters in the callout argument will cause premature termination of their
|
||||||
their substrings, and therefore should not be present. Any syntax
|
substrings, and therefore should not be present. Any syntax errors in
|
||||||
errors in the string (for example, a dollar not followed by another
|
the string (for example, a dollar not followed by another character)
|
||||||
character) cause the callout to be ignored. If running the program
|
cause the callout to be ignored. If running the program fails for any
|
||||||
fails for any reason (including the non-existence of the executable), a
|
reason (including the non-existence of the executable), a local match-
|
||||||
local matching failure occurs and the matcher backtracks in the normal
|
ing failure occurs and the matcher backtracks in the normal way.
|
||||||
way.
|
|
||||||
|
|
||||||
Echoing a specific string
|
Echoing a specific string
|
||||||
|
|
||||||
|
@ -893,41 +896,41 @@ USING PCRE2'S CALLOUT FACILITY
|
||||||
pletely disabled when pcre2grep was built. If the callout string starts
|
pletely disabled when pcre2grep was built. If the callout string starts
|
||||||
with a pipe (vertical bar) character, the rest of the string is written
|
with a pipe (vertical bar) character, the rest of the string is written
|
||||||
to the output, having been passed through the same escape processing as
|
to the output, having been passed through the same escape processing as
|
||||||
text from the --output option. This provides a simple echoing facility
|
text from the --output option. This provides a simple echoing facility
|
||||||
that avoids calling an external program or script. No terminator is
|
that avoids calling an external program or script. No terminator is
|
||||||
added to the string, so if you want a newline, you must include it
|
added to the string, so if you want a newline, you must include it
|
||||||
explicitly. Matching continues normally after the string is output. If
|
explicitly. Matching continues normally after the string is output. If
|
||||||
you want to see only the callout output but not any output from an
|
you want to see only the callout output but not any output from an
|
||||||
actual match, you should end the relevant pattern with (*FAIL).
|
actual match, you should end the relevant pattern with (*FAIL).
|
||||||
|
|
||||||
|
|
||||||
MATCHING ERRORS
|
MATCHING ERRORS
|
||||||
|
|
||||||
It is possible to supply a regular expression that takes a very long
|
It is possible to supply a regular expression that takes a very long
|
||||||
time to fail to match certain lines. Such patterns normally involve
|
time to fail to match certain lines. Such patterns normally involve
|
||||||
nested indefinite repeats, for example: (a+)*\d when matched against a
|
nested indefinite repeats, for example: (a+)*\d when matched against a
|
||||||
line of a's with no final digit. The PCRE2 matching function has a
|
line of a's with no final digit. The PCRE2 matching function has a
|
||||||
resource limit that causes it to abort in these circumstances. If this
|
resource limit that causes it to abort in these circumstances. If this
|
||||||
happens, pcre2grep outputs an error message and the line that caused
|
happens, pcre2grep outputs an error message and the line that caused
|
||||||
the problem to the standard error stream. If there are more than 20
|
the problem to the standard error stream. If there are more than 20
|
||||||
such errors, pcre2grep gives up.
|
such errors, pcre2grep gives up.
|
||||||
|
|
||||||
The --match-limit option of pcre2grep can be used to set the overall
|
The --match-limit option of pcre2grep can be used to set the overall
|
||||||
resource limit. There are also other limits that affect the amount of
|
resource limit. There are also other limits that affect the amount of
|
||||||
memory used during matching; see the discussion of --heap-limit and
|
memory used during matching; see the discussion of --heap-limit and
|
||||||
--depth-limit above.
|
--depth-limit above.
|
||||||
|
|
||||||
|
|
||||||
DIAGNOSTICS
|
DIAGNOSTICS
|
||||||
|
|
||||||
Exit status is 0 if any matches were found, 1 if no matches were found,
|
Exit status is 0 if any matches were found, 1 if no matches were found,
|
||||||
and 2 for syntax errors, overlong lines, non-existent or inaccessible
|
and 2 for syntax errors, overlong lines, non-existent or inaccessible
|
||||||
files (even if matches were found in other files) or too many matching
|
files (even if matches were found in other files) or too many matching
|
||||||
errors. Using the -s option to suppress error messages about inaccessi-
|
errors. Using the -s option to suppress error messages about inaccessi-
|
||||||
ble files does not affect the return code.
|
ble files does not affect the return code.
|
||||||
|
|
||||||
When run under VMS, the return code is placed in the symbol
|
When run under VMS, the return code is placed in the symbol
|
||||||
PCRE2GREP_RC because VMS does not distinguish between exit(0) and
|
PCRE2GREP_RC because VMS does not distinguish between exit(0) and
|
||||||
exit(1).
|
exit(1).
|
||||||
|
|
||||||
|
|
||||||
|
@ -945,5 +948,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 17 November 2018
|
Last updated: 24 November 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "12 October 2018" "PCRE2 10.33"
|
.TH PCRE2PATTERN 3 "27 November 2018" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -2640,9 +2640,9 @@ can be used:
|
||||||
.sp
|
.sp
|
||||||
\es+(?=\ep{Latin})(*sr:\eS+)
|
\es+(?=\ep{Latin})(*sr:\eS+)
|
||||||
.sp
|
.sp
|
||||||
This works as long as the first character is expected to be a character in that
|
This works as long as the first character is expected to be a character in that
|
||||||
script, and not (for example) punctuation, which is allowed with any script. If
|
script, and not (for example) punctuation, which is allowed with any script. If
|
||||||
this is not the case, a more creative lookahead is needed. For example, if
|
this is not the case, a more creative lookahead is needed. For example, if
|
||||||
digits, underscore, and dots are permitted at the start:
|
digits, underscore, and dots are permitted at the start:
|
||||||
.sp
|
.sp
|
||||||
\es+(?=[0-9_.]*\ep{Latin})(*sr:\eS+)
|
\es+(?=[0-9_.]*\ep{Latin})(*sr:\eS+)
|
||||||
|
@ -3262,6 +3262,7 @@ There are a number of special "Backtracking Control Verbs" (to use Perl's
|
||||||
terminology) that modify the behaviour of backtracking during matching. They
|
terminology) that modify the behaviour of backtracking during matching. They
|
||||||
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
|
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
|
||||||
possibly behaving differently depending on whether or not a name is present.
|
possibly behaving differently depending on whether or not a name is present.
|
||||||
|
The names are not required to be unique within the pattern.
|
||||||
.P
|
.P
|
||||||
By default, for compatibility with Perl, a name is any sequence of characters
|
By default, for compatibility with Perl, a name is any sequence of characters
|
||||||
that does not include a closing parenthesis. The name is not processed in
|
that does not include a closing parenthesis. The name is not processed in
|
||||||
|
@ -3376,8 +3377,8 @@ nearest equivalent is the callout feature, as for example in this pattern:
|
||||||
A match with the string "aaaa" always fails, but the callout is taken before
|
A match with the string "aaaa" always fails, but the callout is taken before
|
||||||
each backtrack happens (in this example, 10 times).
|
each backtrack happens (in this example, 10 times).
|
||||||
.P
|
.P
|
||||||
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
|
(*ACCEPT:NAME) and (*FAIL:NAME) are treated as (*MARK:NAME)(*ACCEPT) and
|
||||||
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
|
(*MARK:NAME)(*FAIL), respectively.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Recording which path was taken"
|
.SS "Recording which path was taken"
|
||||||
|
@ -3389,10 +3390,10 @@ starting point (see (*SKIP) below).
|
||||||
.sp
|
.sp
|
||||||
(*MARK:NAME) or (*:NAME)
|
(*MARK:NAME) or (*:NAME)
|
||||||
.sp
|
.sp
|
||||||
A name is always required with this verb. There may be as many instances of
|
A name is always required with this verb. For all the other backtracking
|
||||||
(*MARK) as you like in a pattern, and their names do not have to be unique.
|
control verbs, a NAME argument is optional.
|
||||||
.P
|
.P
|
||||||
When a match succeeds, the name of the last-encountered (*MARK:NAME) on the
|
When a match succeeds, the name of the last-encountered mark name on the
|
||||||
matching path is passed back to the caller as described in the section entitled
|
matching path is passed back to the caller as described in the section entitled
|
||||||
.\" HTML <a href="pcre2api.html#matchotherdata">
|
.\" HTML <a href="pcre2api.html#matchotherdata">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
|
@ -3402,16 +3403,15 @@ in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2api\fP
|
\fBpcre2api\fP
|
||||||
.\"
|
.\"
|
||||||
documentation. This applies to all instances of (*MARK), including those inside
|
documentation. This applies to all instances of (*MARK) and other verbs,
|
||||||
assertions and atomic groups. (There are differences in those cases when
|
including those inside assertions and atomic groups. However, there are
|
||||||
(*MARK) is used in conjunction with (*SKIP) as described below.)
|
differences in those cases when (*MARK) is used in conjunction with (*SKIP) as
|
||||||
|
described below.
|
||||||
.P
|
.P
|
||||||
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
|
The mark name that was last encountered on the matching path is passed back. A
|
||||||
associated NAME arguments. Whichever is last on the matching path is passed
|
verb without a NAME argument is ignored for this purpose. Here is an example of
|
||||||
back. See below for more details of these other verbs.
|
\fBpcre2test\fP output, where the "mark" modifier requests the retrieval and
|
||||||
.P
|
outputting of (*MARK) data:
|
||||||
Here is an example of \fBpcre2test\fP output, where the "mark" modifier
|
|
||||||
requests the retrieval and outputting of (*MARK) data:
|
|
||||||
.sp
|
.sp
|
||||||
re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
|
re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
|
||||||
data> XY
|
data> XY
|
||||||
|
@ -3461,7 +3461,7 @@ to the left of the verb. However, when one of these verbs appears inside an
|
||||||
atomic group or in a lookaround assertion that is true, its effect is confined
|
atomic group or in a lookaround assertion that is true, its effect is confined
|
||||||
to that group, because once the group has been matched, there is never any
|
to that group, because once the group has been matched, there is never any
|
||||||
backtracking into it. Backtracking from beyond an assertion or an atomic group
|
backtracking into it. Backtracking from beyond an assertion or an atomic group
|
||||||
ignores the entire group, and seeks a preceeding backtracking point.
|
ignores the entire group, and seeks a preceding backtracking point.
|
||||||
.P
|
.P
|
||||||
These verbs differ in exactly what kind of failure occurs when backtracking
|
These verbs differ in exactly what kind of failure occurs when backtracking
|
||||||
reaches them. The behaviour described below is what happens when the verb is
|
reaches them. The behaviour described below is what happens when the verb is
|
||||||
|
@ -3484,8 +3484,8 @@ dynamic anchor, or "I've started, so I must finish."
|
||||||
.P
|
.P
|
||||||
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
|
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
|
||||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
caller. However, (*SKIP:NAME) searches only for names that are set with
|
||||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
(*MARK), ignoring those set by any of the other backtracking verbs.
|
||||||
.P
|
.P
|
||||||
If there is more than one backtracking verb in a pattern, a different one that
|
If there is more than one backtracking verb in a pattern, a different one that
|
||||||
follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
|
follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
|
||||||
|
@ -3526,7 +3526,7 @@ as (*COMMIT).
|
||||||
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
||||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||||
ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
|
ignoring those set by other backtracking verbs.
|
||||||
.sp
|
.sp
|
||||||
(*SKIP)
|
(*SKIP)
|
||||||
.sp
|
.sp
|
||||||
|
@ -3579,7 +3579,7 @@ never seen because "a" does not match "b", so the matcher immediately jumps to
|
||||||
the second branch of the pattern.
|
the second branch of the pattern.
|
||||||
.P
|
.P
|
||||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
||||||
names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
|
names that are set by other backtracking verbs.
|
||||||
.sp
|
.sp
|
||||||
(*THEN) or (*THEN:NAME)
|
(*THEN) or (*THEN:NAME)
|
||||||
.sp
|
.sp
|
||||||
|
@ -3600,7 +3600,7 @@ group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
|
||||||
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
|
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
|
||||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
ignoring those set by other backtracking verbs.
|
||||||
.P
|
.P
|
||||||
A subpattern that does not contain a | character is just a part of the
|
A subpattern that does not contain a | character is just a part of the
|
||||||
enclosing alternative; it is not a nested alternation with only one
|
enclosing alternative; it is not a nested alternation with only one
|
||||||
|
@ -3693,10 +3693,10 @@ not the assertion is standalone or acting as the condition in a conditional
|
||||||
subpattern.
|
subpattern.
|
||||||
.P
|
.P
|
||||||
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
||||||
without any further processing; captured strings and a (*MARK) name (if set)
|
without any further processing; captured strings and a mark name (if set) are
|
||||||
are retained. In a standalone negative assertion, (*ACCEPT) causes the
|
retained. In a standalone negative assertion, (*ACCEPT) causes the assertion to
|
||||||
assertion to fail without any further processing; captured substrings and any
|
fail without any further processing; captured substrings and any mark name are
|
||||||
(*MARK) name are discarded.
|
discarded.
|
||||||
.P
|
.P
|
||||||
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
||||||
a positive assertion and false for a negative one; captured substrings are
|
a positive assertion and false for a negative one; captured substrings are
|
||||||
|
@ -3767,6 +3767,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 12 October 2018
|
Last updated: 27 November 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
Loading…
Reference in New Issue