Documentation update.
This commit is contained in:
parent
0b64d9cfca
commit
e7a762ddff
|
@ -2841,22 +2841,23 @@ undefined.
|
|||
</P>
|
||||
<P>
|
||||
After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
|
||||
to match (PCRE2_ERROR_NOMATCH), a (*MARK), (*PRUNE), or (*THEN) name may be
|
||||
available. The function <b>pcre2_get_mark()</b> can be called to access this
|
||||
name. The same function applies to all three verbs. It returns a pointer to the
|
||||
zero-terminated name, which is within the compiled pattern. If no name is
|
||||
to match (PCRE2_ERROR_NOMATCH), a mark name may be available. The function
|
||||
<b>pcre2_get_mark()</b> can be called to access this name, which can be
|
||||
specified in the pattern by any of the backtracking control verbs, not just
|
||||
(*MARK). The same function applies to all the verbs. It returns a pointer to
|
||||
the zero-terminated name, which is within the compiled pattern. If no name is
|
||||
available, NULL is returned. The length of the name (excluding the terminating
|
||||
zero) is stored in the code unit that precedes the name. You should use this
|
||||
length instead of relying on the terminating zero if the name might contain a
|
||||
binary zero.
|
||||
</P>
|
||||
<P>
|
||||
After a successful match, the name that is returned is the last (*MARK),
|
||||
(*PRUNE), or (*THEN) name encountered on the matching path through the pattern.
|
||||
Instances of (*PRUNE) and (*THEN) without names are ignored. Thus, for example,
|
||||
if the matching path contains (*MARK:A)(*PRUNE), the name "A" is returned.
|
||||
After a "no match" or a partial match, the last encountered name is returned.
|
||||
For example, consider this pattern:
|
||||
After a successful match, the name that is returned is the last mark name
|
||||
encountered on the matching path through the pattern. Instances of backtracking
|
||||
verbs without names do not count. Thus, for example, if the matching path
|
||||
contains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a
|
||||
partial match, the last encountered name is returned. For example, consider
|
||||
this pattern:
|
||||
<pre>
|
||||
^(*MARK:A)((*MARK:B)a|b)c
|
||||
</pre>
|
||||
|
@ -2871,7 +2872,7 @@ is removed from the pattern above, there is an initial check for the presence
|
|||
of "c" in the subject before running the matching engine. This check fails for
|
||||
"bx", causing a match failure without seeing any marks. You can disable the
|
||||
start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for
|
||||
<b>pcre2_compile()</b> or starting the pattern with (*NO_START_OPT).
|
||||
<b>pcre2_compile()</b> or by starting the pattern with (*NO_START_OPT).
|
||||
</P>
|
||||
<P>
|
||||
After a successful match, a partial match, or one of the invalid UTF errors
|
||||
|
@ -3286,13 +3287,12 @@ For example, if the pattern a(b)c is matched with "=abc=" and the replacement
|
|||
string "+$1$0$1+", the result is "=+babcb+=".
|
||||
</P>
|
||||
<P>
|
||||
$*MARK inserts the name from the last encountered (*ACCEPT), (*COMMIT),
|
||||
(*MARK), (*PRUNE), or (*THEN) on the matching path that has a name. (*MARK)
|
||||
must always include a name, but the other verbs need not. For example, in
|
||||
the case of (*MARK:A)(*PRUNE) the name inserted is "A", but for
|
||||
(*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be used to
|
||||
perform simple simultaneous substitutions, as this <b>pcre2test</b> example
|
||||
shows:
|
||||
$*MARK inserts the name from the last encountered backtracking control verb on
|
||||
the matching path that has a name. (*MARK) must always include a name, but the
|
||||
other verbs need not. For example, in the case of (*MARK:A)(*PRUNE) the name
|
||||
inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This
|
||||
facility can be used to perform simple simultaneous substitutions, as this
|
||||
<b>pcre2test</b> example shows:
|
||||
<pre>
|
||||
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
||||
apple lemon
|
||||
|
@ -3782,7 +3782,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 12 November 2018
|
||||
Last updated: 27 November 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -871,9 +871,14 @@ only callouts with string arguments are useful.
|
|||
Calling external programs or scripts
|
||||
</b><br>
|
||||
<P>
|
||||
This facility can be independently disabled when <b>pcre2grep</b> is built. If
|
||||
the callout string does not start with a pipe (vertical bar) character, it is
|
||||
parsed into a list of substrings separated by pipe characters. The first
|
||||
This facility can be independently disabled when <b>pcre2grep</b> is built. It
|
||||
is supported for Windows, where a call to <b>_spawnvp()</b> is used, for VMS,
|
||||
where <b>lib$spawn()</b> is used, and for any other Unix-like environment where
|
||||
<b>fork()</b> and <b>execv()</b> are available.
|
||||
</P>
|
||||
<P>
|
||||
If the callout string does not start with a pipe (vertical bar) character, it
|
||||
is parsed into a list of substrings separated by pipe characters. The first
|
||||
substring must be an executable name, with the following substrings specifying
|
||||
arguments:
|
||||
<pre>
|
||||
|
@ -900,7 +905,7 @@ a single dollar and $| is replaced by a pipe character. Here is an example:
|
|||
Arg1: [1] [234] [4] Arg2: |1| ()
|
||||
12345
|
||||
</pre>
|
||||
The parameters for the <b>execv()</b> system call that is used to run the
|
||||
The parameters for the system call that is used to run the
|
||||
program or script are zero-terminated strings. This means that binary zero
|
||||
characters in the callout argument will cause premature termination of their
|
||||
substrings, and therefore should not be present. Any syntax errors in the
|
||||
|
@ -966,7 +971,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 17 November 2018
|
||||
Last updated: 24 November 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -2623,9 +2623,9 @@ can be used:
|
|||
<pre>
|
||||
\s+(?=\p{Latin})(*sr:\S+)
|
||||
</pre>
|
||||
This works as long as the first character is expected to be a character in that
|
||||
This works as long as the first character is expected to be a character in that
|
||||
script, and not (for example) punctuation, which is allowed with any script. If
|
||||
this is not the case, a more creative lookahead is needed. For example, if
|
||||
this is not the case, a more creative lookahead is needed. For example, if
|
||||
digits, underscore, and dots are permitted at the start:
|
||||
<pre>
|
||||
\s+(?=[0-9_.]*\p{Latin})(*sr:\S+)
|
||||
|
@ -3223,6 +3223,7 @@ There are a number of special "Backtracking Control Verbs" (to use Perl's
|
|||
terminology) that modify the behaviour of backtracking during matching. They
|
||||
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
|
||||
possibly behaving differently depending on whether or not a name is present.
|
||||
The names are not required to be unique within the pattern.
|
||||
</P>
|
||||
<P>
|
||||
By default, for compatibility with Perl, a name is any sequence of characters
|
||||
|
@ -3331,8 +3332,8 @@ A match with the string "aaaa" always fails, but the callout is taken before
|
|||
each backtrack happens (in this example, 10 times).
|
||||
</P>
|
||||
<P>
|
||||
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
|
||||
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
|
||||
(*ACCEPT:NAME) and (*FAIL:NAME) are treated as (*MARK:NAME)(*ACCEPT) and
|
||||
(*MARK:NAME)(*FAIL), respectively.
|
||||
</P>
|
||||
<br><b>
|
||||
Recording which path was taken
|
||||
|
@ -3344,27 +3345,25 @@ starting point (see (*SKIP) below).
|
|||
<pre>
|
||||
(*MARK:NAME) or (*:NAME)
|
||||
</pre>
|
||||
A name is always required with this verb. There may be as many instances of
|
||||
(*MARK) as you like in a pattern, and their names do not have to be unique.
|
||||
A name is always required with this verb. For all the other backtracking
|
||||
control verbs, a NAME argument is optional.
|
||||
</P>
|
||||
<P>
|
||||
When a match succeeds, the name of the last-encountered (*MARK:NAME) on the
|
||||
When a match succeeds, the name of the last-encountered mark name on the
|
||||
matching path is passed back to the caller as described in the section entitled
|
||||
<a href="pcre2api.html#matchotherdata">"Other information about the match"</a>
|
||||
in the
|
||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||
documentation. This applies to all instances of (*MARK), including those inside
|
||||
assertions and atomic groups. (There are differences in those cases when
|
||||
(*MARK) is used in conjunction with (*SKIP) as described below.)
|
||||
documentation. This applies to all instances of (*MARK) and other verbs,
|
||||
including those inside assertions and atomic groups. However, there are
|
||||
differences in those cases when (*MARK) is used in conjunction with (*SKIP) as
|
||||
described below.
|
||||
</P>
|
||||
<P>
|
||||
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
|
||||
associated NAME arguments. Whichever is last on the matching path is passed
|
||||
back. See below for more details of these other verbs.
|
||||
</P>
|
||||
<P>
|
||||
Here is an example of <b>pcre2test</b> output, where the "mark" modifier
|
||||
requests the retrieval and outputting of (*MARK) data:
|
||||
The mark name that was last encountered on the matching path is passed back. A
|
||||
verb without a NAME argument is ignored for this purpose. Here is an example of
|
||||
<b>pcre2test</b> output, where the "mark" modifier requests the retrieval and
|
||||
outputting of (*MARK) data:
|
||||
<pre>
|
||||
re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
|
||||
data> XY
|
||||
|
@ -3414,7 +3413,7 @@ to the left of the verb. However, when one of these verbs appears inside an
|
|||
atomic group or in a lookaround assertion that is true, its effect is confined
|
||||
to that group, because once the group has been matched, there is never any
|
||||
backtracking into it. Backtracking from beyond an assertion or an atomic group
|
||||
ignores the entire group, and seeks a preceeding backtracking point.
|
||||
ignores the entire group, and seeks a preceding backtracking point.
|
||||
</P>
|
||||
<P>
|
||||
These verbs differ in exactly what kind of failure occurs when backtracking
|
||||
|
@ -3439,8 +3438,8 @@ dynamic anchor, or "I've started, so I must finish."
|
|||
<P>
|
||||
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||
caller. However, (*SKIP:NAME) searches only for names that are set with
|
||||
(*MARK), ignoring those set by any of the other backtracking verbs.
|
||||
</P>
|
||||
<P>
|
||||
If there is more than one backtracking verb in a pattern, a different one that
|
||||
|
@ -3484,7 +3483,7 @@ as (*COMMIT).
|
|||
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
|
||||
ignoring those set by other backtracking verbs.
|
||||
<pre>
|
||||
(*SKIP)
|
||||
</pre>
|
||||
|
@ -3539,7 +3538,7 @@ the second branch of the pattern.
|
|||
</P>
|
||||
<P>
|
||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
||||
names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
|
||||
names that are set by other backtracking verbs.
|
||||
<pre>
|
||||
(*THEN) or (*THEN:NAME)
|
||||
</pre>
|
||||
|
@ -3561,7 +3560,7 @@ group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
|
|||
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||
ignoring those set by other backtracking verbs.
|
||||
</P>
|
||||
<P>
|
||||
A subpattern that does not contain a | character is just a part of the
|
||||
|
@ -3656,10 +3655,10 @@ subpattern.
|
|||
</P>
|
||||
<P>
|
||||
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
||||
without any further processing; captured strings and a (*MARK) name (if set)
|
||||
are retained. In a standalone negative assertion, (*ACCEPT) causes the
|
||||
assertion to fail without any further processing; captured substrings and any
|
||||
(*MARK) name are discarded.
|
||||
without any further processing; captured strings and a mark name (if set) are
|
||||
retained. In a standalone negative assertion, (*ACCEPT) causes the assertion to
|
||||
fail without any further processing; captured substrings and any mark name are
|
||||
discarded.
|
||||
</P>
|
||||
<P>
|
||||
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
||||
|
@ -3731,7 +3730,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 12 October 2018
|
||||
Last updated: 27 November 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
624
doc/pcre2.txt
624
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "12 November 2018" "PCRE2 10.33"
|
||||
.TH PCRE2API 3 "27 November 2018" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -2842,21 +2842,22 @@ appropriate circumstances. If they are called at other times, the result is
|
|||
undefined.
|
||||
.P
|
||||
After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
|
||||
to match (PCRE2_ERROR_NOMATCH), a (*MARK), (*PRUNE), or (*THEN) name may be
|
||||
available. The function \fBpcre2_get_mark()\fP can be called to access this
|
||||
name. The same function applies to all three verbs. It returns a pointer to the
|
||||
zero-terminated name, which is within the compiled pattern. If no name is
|
||||
to match (PCRE2_ERROR_NOMATCH), a mark name may be available. The function
|
||||
\fBpcre2_get_mark()\fP can be called to access this name, which can be
|
||||
specified in the pattern by any of the backtracking control verbs, not just
|
||||
(*MARK). The same function applies to all the verbs. It returns a pointer to
|
||||
the zero-terminated name, which is within the compiled pattern. If no name is
|
||||
available, NULL is returned. The length of the name (excluding the terminating
|
||||
zero) is stored in the code unit that precedes the name. You should use this
|
||||
length instead of relying on the terminating zero if the name might contain a
|
||||
binary zero.
|
||||
.P
|
||||
After a successful match, the name that is returned is the last (*MARK),
|
||||
(*PRUNE), or (*THEN) name encountered on the matching path through the pattern.
|
||||
Instances of (*PRUNE) and (*THEN) without names are ignored. Thus, for example,
|
||||
if the matching path contains (*MARK:A)(*PRUNE), the name "A" is returned.
|
||||
After a "no match" or a partial match, the last encountered name is returned.
|
||||
For example, consider this pattern:
|
||||
After a successful match, the name that is returned is the last mark name
|
||||
encountered on the matching path through the pattern. Instances of backtracking
|
||||
verbs without names do not count. Thus, for example, if the matching path
|
||||
contains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a
|
||||
partial match, the last encountered name is returned. For example, consider
|
||||
this pattern:
|
||||
.sp
|
||||
^(*MARK:A)((*MARK:B)a|b)c
|
||||
.sp
|
||||
|
@ -2870,7 +2871,7 @@ is removed from the pattern above, there is an initial check for the presence
|
|||
of "c" in the subject before running the matching engine. This check fails for
|
||||
"bx", causing a match failure without seeing any marks. You can disable the
|
||||
start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for
|
||||
\fBpcre2_compile()\fP or starting the pattern with (*NO_START_OPT).
|
||||
\fBpcre2_compile()\fP or by starting the pattern with (*NO_START_OPT).
|
||||
.P
|
||||
After a successful match, a partial match, or one of the invalid UTF errors
|
||||
(for example, PCRE2_ERROR_UTF8_ERR5), \fBpcre2_get_startchar()\fP can be
|
||||
|
@ -3297,13 +3298,12 @@ number or name. The number may be zero to include the entire matched string.
|
|||
For example, if the pattern a(b)c is matched with "=abc=" and the replacement
|
||||
string "+$1$0$1+", the result is "=+babcb+=".
|
||||
.P
|
||||
$*MARK inserts the name from the last encountered (*ACCEPT), (*COMMIT),
|
||||
(*MARK), (*PRUNE), or (*THEN) on the matching path that has a name. (*MARK)
|
||||
must always include a name, but the other verbs need not. For example, in
|
||||
the case of (*MARK:A)(*PRUNE) the name inserted is "A", but for
|
||||
(*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be used to
|
||||
perform simple simultaneous substitutions, as this \fBpcre2test\fP example
|
||||
shows:
|
||||
$*MARK inserts the name from the last encountered backtracking control verb on
|
||||
the matching path that has a name. (*MARK) must always include a name, but the
|
||||
other verbs need not. For example, in the case of (*MARK:A)(*PRUNE) the name
|
||||
inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This
|
||||
facility can be used to perform simple simultaneous substitutions, as this
|
||||
\fBpcre2test\fP example shows:
|
||||
.sp
|
||||
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
|
||||
apple lemon
|
||||
|
@ -3790,6 +3790,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 12 November 2018
|
||||
Last updated: 27 November 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -847,11 +847,15 @@ USING PCRE2'S CALLOUT FACILITY
|
|||
|
||||
Calling external programs or scripts
|
||||
|
||||
This facility can be independently disabled when pcre2grep is built. If
|
||||
the callout string does not start with a pipe (vertical bar) character,
|
||||
it is parsed into a list of substrings separated by pipe characters.
|
||||
The first substring must be an executable name, with the following sub-
|
||||
strings specifying arguments:
|
||||
This facility can be independently disabled when pcre2grep is built. It
|
||||
is supported for Windows, where a call to _spawnvp() is used, for VMS,
|
||||
where lib$spawn() is used, and for any other Unix-like environment
|
||||
where fork() and execv() are available.
|
||||
|
||||
If the callout string does not start with a pipe (vertical bar) charac-
|
||||
ter, it is parsed into a list of substrings separated by pipe charac-
|
||||
ters. The first substring must be an executable name, with the follow-
|
||||
ing substrings specifying arguments:
|
||||
|
||||
executable_name|arg1|arg2|...
|
||||
|
||||
|
@ -877,15 +881,14 @@ USING PCRE2'S CALLOUT FACILITY
|
|||
Arg1: [1] [234] [4] Arg2: |1| ()
|
||||
12345
|
||||
|
||||
The parameters for the execv() system call that is used to run the pro-
|
||||
gram or script are zero-terminated strings. This means that binary zero
|
||||
characters in the callout argument will cause premature termination of
|
||||
their substrings, and therefore should not be present. Any syntax
|
||||
errors in the string (for example, a dollar not followed by another
|
||||
character) cause the callout to be ignored. If running the program
|
||||
fails for any reason (including the non-existence of the executable), a
|
||||
local matching failure occurs and the matcher backtracks in the normal
|
||||
way.
|
||||
The parameters for the system call that is used to run the program or
|
||||
script are zero-terminated strings. This means that binary zero charac-
|
||||
ters in the callout argument will cause premature termination of their
|
||||
substrings, and therefore should not be present. Any syntax errors in
|
||||
the string (for example, a dollar not followed by another character)
|
||||
cause the callout to be ignored. If running the program fails for any
|
||||
reason (including the non-existence of the executable), a local match-
|
||||
ing failure occurs and the matcher backtracks in the normal way.
|
||||
|
||||
Echoing a specific string
|
||||
|
||||
|
@ -893,41 +896,41 @@ USING PCRE2'S CALLOUT FACILITY
|
|||
pletely disabled when pcre2grep was built. If the callout string starts
|
||||
with a pipe (vertical bar) character, the rest of the string is written
|
||||
to the output, having been passed through the same escape processing as
|
||||
text from the --output option. This provides a simple echoing facility
|
||||
that avoids calling an external program or script. No terminator is
|
||||
added to the string, so if you want a newline, you must include it
|
||||
explicitly. Matching continues normally after the string is output. If
|
||||
you want to see only the callout output but not any output from an
|
||||
text from the --output option. This provides a simple echoing facility
|
||||
that avoids calling an external program or script. No terminator is
|
||||
added to the string, so if you want a newline, you must include it
|
||||
explicitly. Matching continues normally after the string is output. If
|
||||
you want to see only the callout output but not any output from an
|
||||
actual match, you should end the relevant pattern with (*FAIL).
|
||||
|
||||
|
||||
MATCHING ERRORS
|
||||
|
||||
It is possible to supply a regular expression that takes a very long
|
||||
time to fail to match certain lines. Such patterns normally involve
|
||||
nested indefinite repeats, for example: (a+)*\d when matched against a
|
||||
line of a's with no final digit. The PCRE2 matching function has a
|
||||
resource limit that causes it to abort in these circumstances. If this
|
||||
happens, pcre2grep outputs an error message and the line that caused
|
||||
the problem to the standard error stream. If there are more than 20
|
||||
It is possible to supply a regular expression that takes a very long
|
||||
time to fail to match certain lines. Such patterns normally involve
|
||||
nested indefinite repeats, for example: (a+)*\d when matched against a
|
||||
line of a's with no final digit. The PCRE2 matching function has a
|
||||
resource limit that causes it to abort in these circumstances. If this
|
||||
happens, pcre2grep outputs an error message and the line that caused
|
||||
the problem to the standard error stream. If there are more than 20
|
||||
such errors, pcre2grep gives up.
|
||||
|
||||
The --match-limit option of pcre2grep can be used to set the overall
|
||||
resource limit. There are also other limits that affect the amount of
|
||||
memory used during matching; see the discussion of --heap-limit and
|
||||
The --match-limit option of pcre2grep can be used to set the overall
|
||||
resource limit. There are also other limits that affect the amount of
|
||||
memory used during matching; see the discussion of --heap-limit and
|
||||
--depth-limit above.
|
||||
|
||||
|
||||
DIAGNOSTICS
|
||||
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found,
|
||||
and 2 for syntax errors, overlong lines, non-existent or inaccessible
|
||||
files (even if matches were found in other files) or too many matching
|
||||
and 2 for syntax errors, overlong lines, non-existent or inaccessible
|
||||
files (even if matches were found in other files) or too many matching
|
||||
errors. Using the -s option to suppress error messages about inaccessi-
|
||||
ble files does not affect the return code.
|
||||
|
||||
When run under VMS, the return code is placed in the symbol
|
||||
PCRE2GREP_RC because VMS does not distinguish between exit(0) and
|
||||
When run under VMS, the return code is placed in the symbol
|
||||
PCRE2GREP_RC because VMS does not distinguish between exit(0) and
|
||||
exit(1).
|
||||
|
||||
|
||||
|
@ -945,5 +948,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 17 November 2018
|
||||
Last updated: 24 November 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "12 October 2018" "PCRE2 10.33"
|
||||
.TH PCRE2PATTERN 3 "27 November 2018" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -2640,9 +2640,9 @@ can be used:
|
|||
.sp
|
||||
\es+(?=\ep{Latin})(*sr:\eS+)
|
||||
.sp
|
||||
This works as long as the first character is expected to be a character in that
|
||||
This works as long as the first character is expected to be a character in that
|
||||
script, and not (for example) punctuation, which is allowed with any script. If
|
||||
this is not the case, a more creative lookahead is needed. For example, if
|
||||
this is not the case, a more creative lookahead is needed. For example, if
|
||||
digits, underscore, and dots are permitted at the start:
|
||||
.sp
|
||||
\es+(?=[0-9_.]*\ep{Latin})(*sr:\eS+)
|
||||
|
@ -3262,6 +3262,7 @@ There are a number of special "Backtracking Control Verbs" (to use Perl's
|
|||
terminology) that modify the behaviour of backtracking during matching. They
|
||||
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
|
||||
possibly behaving differently depending on whether or not a name is present.
|
||||
The names are not required to be unique within the pattern.
|
||||
.P
|
||||
By default, for compatibility with Perl, a name is any sequence of characters
|
||||
that does not include a closing parenthesis. The name is not processed in
|
||||
|
@ -3376,8 +3377,8 @@ nearest equivalent is the callout feature, as for example in this pattern:
|
|||
A match with the string "aaaa" always fails, but the callout is taken before
|
||||
each backtrack happens (in this example, 10 times).
|
||||
.P
|
||||
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
|
||||
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
|
||||
(*ACCEPT:NAME) and (*FAIL:NAME) are treated as (*MARK:NAME)(*ACCEPT) and
|
||||
(*MARK:NAME)(*FAIL), respectively.
|
||||
.
|
||||
.
|
||||
.SS "Recording which path was taken"
|
||||
|
@ -3389,10 +3390,10 @@ starting point (see (*SKIP) below).
|
|||
.sp
|
||||
(*MARK:NAME) or (*:NAME)
|
||||
.sp
|
||||
A name is always required with this verb. There may be as many instances of
|
||||
(*MARK) as you like in a pattern, and their names do not have to be unique.
|
||||
A name is always required with this verb. For all the other backtracking
|
||||
control verbs, a NAME argument is optional.
|
||||
.P
|
||||
When a match succeeds, the name of the last-encountered (*MARK:NAME) on the
|
||||
When a match succeeds, the name of the last-encountered mark name on the
|
||||
matching path is passed back to the caller as described in the section entitled
|
||||
.\" HTML <a href="pcre2api.html#matchotherdata">
|
||||
.\" </a>
|
||||
|
@ -3402,16 +3403,15 @@ in the
|
|||
.\" HREF
|
||||
\fBpcre2api\fP
|
||||
.\"
|
||||
documentation. This applies to all instances of (*MARK), including those inside
|
||||
assertions and atomic groups. (There are differences in those cases when
|
||||
(*MARK) is used in conjunction with (*SKIP) as described below.)
|
||||
documentation. This applies to all instances of (*MARK) and other verbs,
|
||||
including those inside assertions and atomic groups. However, there are
|
||||
differences in those cases when (*MARK) is used in conjunction with (*SKIP) as
|
||||
described below.
|
||||
.P
|
||||
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
|
||||
associated NAME arguments. Whichever is last on the matching path is passed
|
||||
back. See below for more details of these other verbs.
|
||||
.P
|
||||
Here is an example of \fBpcre2test\fP output, where the "mark" modifier
|
||||
requests the retrieval and outputting of (*MARK) data:
|
||||
The mark name that was last encountered on the matching path is passed back. A
|
||||
verb without a NAME argument is ignored for this purpose. Here is an example of
|
||||
\fBpcre2test\fP output, where the "mark" modifier requests the retrieval and
|
||||
outputting of (*MARK) data:
|
||||
.sp
|
||||
re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
|
||||
data> XY
|
||||
|
@ -3461,7 +3461,7 @@ to the left of the verb. However, when one of these verbs appears inside an
|
|||
atomic group or in a lookaround assertion that is true, its effect is confined
|
||||
to that group, because once the group has been matched, there is never any
|
||||
backtracking into it. Backtracking from beyond an assertion or an atomic group
|
||||
ignores the entire group, and seeks a preceeding backtracking point.
|
||||
ignores the entire group, and seeks a preceding backtracking point.
|
||||
.P
|
||||
These verbs differ in exactly what kind of failure occurs when backtracking
|
||||
reaches them. The behaviour described below is what happens when the verb is
|
||||
|
@ -3484,8 +3484,8 @@ dynamic anchor, or "I've started, so I must finish."
|
|||
.P
|
||||
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||
caller. However, (*SKIP:NAME) searches only for names that are set with
|
||||
(*MARK), ignoring those set by any of the other backtracking verbs.
|
||||
.P
|
||||
If there is more than one backtracking verb in a pattern, a different one that
|
||||
follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
|
||||
|
@ -3526,7 +3526,7 @@ as (*COMMIT).
|
|||
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*COMMIT), (*PRUNE) or (*THEN).
|
||||
ignoring those set by other backtracking verbs.
|
||||
.sp
|
||||
(*SKIP)
|
||||
.sp
|
||||
|
@ -3579,7 +3579,7 @@ never seen because "a" does not match "b", so the matcher immediately jumps to
|
|||
the second branch of the pattern.
|
||||
.P
|
||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
||||
names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME).
|
||||
names that are set by other backtracking verbs.
|
||||
.sp
|
||||
(*THEN) or (*THEN:NAME)
|
||||
.sp
|
||||
|
@ -3600,7 +3600,7 @@ group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
|
|||
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
|
||||
like (*MARK:NAME) in that the name is remembered for passing back to the
|
||||
caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
|
||||
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN).
|
||||
ignoring those set by other backtracking verbs.
|
||||
.P
|
||||
A subpattern that does not contain a | character is just a part of the
|
||||
enclosing alternative; it is not a nested alternation with only one
|
||||
|
@ -3693,10 +3693,10 @@ not the assertion is standalone or acting as the condition in a conditional
|
|||
subpattern.
|
||||
.P
|
||||
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed
|
||||
without any further processing; captured strings and a (*MARK) name (if set)
|
||||
are retained. In a standalone negative assertion, (*ACCEPT) causes the
|
||||
assertion to fail without any further processing; captured substrings and any
|
||||
(*MARK) name are discarded.
|
||||
without any further processing; captured strings and a mark name (if set) are
|
||||
retained. In a standalone negative assertion, (*ACCEPT) causes the assertion to
|
||||
fail without any further processing; captured substrings and any mark name are
|
||||
discarded.
|
||||
.P
|
||||
If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
||||
a positive assertion and false for a negative one; captured substrings are
|
||||
|
@ -3767,6 +3767,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 12 October 2018
|
||||
Last updated: 27 November 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
Loading…
Reference in New Issue