Documentation update.

This commit is contained in:
Philip.Hazel 2018-11-27 16:41:20 +00:00
parent 0b64d9cfca
commit e7a762ddff
7 changed files with 451 additions and 448 deletions

View File

@ -2841,22 +2841,23 @@ undefined.
</P> </P>
<P> <P>
After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
to match (PCRE2_ERROR_NOMATCH), a (*MARK), (*PRUNE), or (*THEN) name may be to match (PCRE2_ERROR_NOMATCH), a mark name may be available. The function
available. The function <b>pcre2_get_mark()</b> can be called to access this <b>pcre2_get_mark()</b> can be called to access this name, which can be
name. The same function applies to all three verbs. It returns a pointer to the specified in the pattern by any of the backtracking control verbs, not just
zero-terminated name, which is within the compiled pattern. If no name is (*MARK). The same function applies to all the verbs. It returns a pointer to
the zero-terminated name, which is within the compiled pattern. If no name is
available, NULL is returned. The length of the name (excluding the terminating available, NULL is returned. The length of the name (excluding the terminating
zero) is stored in the code unit that precedes the name. You should use this zero) is stored in the code unit that precedes the name. You should use this
length instead of relying on the terminating zero if the name might contain a length instead of relying on the terminating zero if the name might contain a
binary zero. binary zero.
</P> </P>
<P> <P>
After a successful match, the name that is returned is the last (*MARK), After a successful match, the name that is returned is the last mark name
(*PRUNE), or (*THEN) name encountered on the matching path through the pattern. encountered on the matching path through the pattern. Instances of backtracking
Instances of (*PRUNE) and (*THEN) without names are ignored. Thus, for example, verbs without names do not count. Thus, for example, if the matching path
if the matching path contains (*MARK:A)(*PRUNE), the name "A" is returned. contains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a
After a "no match" or a partial match, the last encountered name is returned. partial match, the last encountered name is returned. For example, consider
For example, consider this pattern: this pattern:
<pre> <pre>
^(*MARK:A)((*MARK:B)a|b)c ^(*MARK:A)((*MARK:B)a|b)c
</pre> </pre>
@ -2871,7 +2872,7 @@ is removed from the pattern above, there is an initial check for the presence
of "c" in the subject before running the matching engine. This check fails for of "c" in the subject before running the matching engine. This check fails for
"bx", causing a match failure without seeing any marks. You can disable the "bx", causing a match failure without seeing any marks. You can disable the
start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for
<b>pcre2_compile()</b> or starting the pattern with (*NO_START_OPT). <b>pcre2_compile()</b> or by starting the pattern with (*NO_START_OPT).
</P> </P>
<P> <P>
After a successful match, a partial match, or one of the invalid UTF errors After a successful match, a partial match, or one of the invalid UTF errors
@ -3286,13 +3287,12 @@ For example, if the pattern a(b)c is matched with "=abc=" and the replacement
string "+$1$0$1+", the result is "=+babcb+=". string "+$1$0$1+", the result is "=+babcb+=".
</P> </P>
<P> <P>
$*MARK inserts the name from the last encountered (*ACCEPT), (*COMMIT), $*MARK inserts the name from the last encountered backtracking control verb on
(*MARK), (*PRUNE), or (*THEN) on the matching path that has a name. (*MARK) the matching path that has a name. (*MARK) must always include a name, but the
must always include a name, but the other verbs need not. For example, in other verbs need not. For example, in the case of (*MARK:A)(*PRUNE) the name
the case of (*MARK:A)(*PRUNE) the name inserted is "A", but for inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This
(*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be used to facility can be used to perform simple simultaneous substitutions, as this
perform simple simultaneous substitutions, as this <b>pcre2test</b> example <b>pcre2test</b> example shows:
shows:
<pre> <pre>
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK} /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
apple lemon apple lemon
@ -3782,7 +3782,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 12 November 2018 Last updated: 27 November 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -871,9 +871,14 @@ only callouts with string arguments are useful.
Calling external programs or scripts Calling external programs or scripts
</b><br> </b><br>
<P> <P>
This facility can be independently disabled when <b>pcre2grep</b> is built. If This facility can be independently disabled when <b>pcre2grep</b> is built. It
the callout string does not start with a pipe (vertical bar) character, it is is supported for Windows, where a call to <b>_spawnvp()</b> is used, for VMS,
parsed into a list of substrings separated by pipe characters. The first where <b>lib$spawn()</b> is used, and for any other Unix-like environment where
<b>fork()</b> and <b>execv()</b> are available.
</P>
<P>
If the callout string does not start with a pipe (vertical bar) character, it
is parsed into a list of substrings separated by pipe characters. The first
substring must be an executable name, with the following substrings specifying substring must be an executable name, with the following substrings specifying
arguments: arguments:
<pre> <pre>
@ -900,7 +905,7 @@ a single dollar and $| is replaced by a pipe character. Here is an example:
Arg1: [1] [234] [4] Arg2: |1| () Arg1: [1] [234] [4] Arg2: |1| ()
12345 12345
</pre> </pre>
The parameters for the <b>execv()</b> system call that is used to run the The parameters for the system call that is used to run the
program or script are zero-terminated strings. This means that binary zero program or script are zero-terminated strings. This means that binary zero
characters in the callout argument will cause premature termination of their characters in the callout argument will cause premature termination of their
substrings, and therefore should not be present. Any syntax errors in the substrings, and therefore should not be present. Any syntax errors in the
@ -966,7 +971,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC16" href="#TOC1">REVISION</a><br> <br><a name="SEC16" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 17 November 2018 Last updated: 24 November 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -2623,9 +2623,9 @@ can be used:
<pre> <pre>
\s+(?=\p{Latin})(*sr:\S+) \s+(?=\p{Latin})(*sr:\S+)
</pre> </pre>
This works as long as the first character is expected to be a character in that This works as long as the first character is expected to be a character in that
script, and not (for example) punctuation, which is allowed with any script. If script, and not (for example) punctuation, which is allowed with any script. If
this is not the case, a more creative lookahead is needed. For example, if this is not the case, a more creative lookahead is needed. For example, if
digits, underscore, and dots are permitted at the start: digits, underscore, and dots are permitted at the start:
<pre> <pre>
\s+(?=[0-9_.]*\p{Latin})(*sr:\S+) \s+(?=[0-9_.]*\p{Latin})(*sr:\S+)
@ -3223,6 +3223,7 @@ There are a number of special "Backtracking Control Verbs" (to use Perl's
terminology) that modify the behaviour of backtracking during matching. They terminology) that modify the behaviour of backtracking during matching. They
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form, are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
possibly behaving differently depending on whether or not a name is present. possibly behaving differently depending on whether or not a name is present.
The names are not required to be unique within the pattern.
</P> </P>
<P> <P>
By default, for compatibility with Perl, a name is any sequence of characters By default, for compatibility with Perl, a name is any sequence of characters
@ -3331,8 +3332,8 @@ A match with the string "aaaa" always fails, but the callout is taken before
each backtrack happens (in this example, 10 times). each backtrack happens (in this example, 10 times).
</P> </P>
<P> <P>
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as (*ACCEPT:NAME) and (*FAIL:NAME) are treated as (*MARK:NAME)(*ACCEPT) and
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively. (*MARK:NAME)(*FAIL), respectively.
</P> </P>
<br><b> <br><b>
Recording which path was taken Recording which path was taken
@ -3344,27 +3345,25 @@ starting point (see (*SKIP) below).
<pre> <pre>
(*MARK:NAME) or (*:NAME) (*MARK:NAME) or (*:NAME)
</pre> </pre>
A name is always required with this verb. There may be as many instances of A name is always required with this verb. For all the other backtracking
(*MARK) as you like in a pattern, and their names do not have to be unique. control verbs, a NAME argument is optional.
</P> </P>
<P> <P>
When a match succeeds, the name of the last-encountered (*MARK:NAME) on the When a match succeeds, the name of the last-encountered mark name on the
matching path is passed back to the caller as described in the section entitled matching path is passed back to the caller as described in the section entitled
<a href="pcre2api.html#matchotherdata">"Other information about the match"</a> <a href="pcre2api.html#matchotherdata">"Other information about the match"</a>
in the in the
<a href="pcre2api.html"><b>pcre2api</b></a> <a href="pcre2api.html"><b>pcre2api</b></a>
documentation. This applies to all instances of (*MARK), including those inside documentation. This applies to all instances of (*MARK) and other verbs,
assertions and atomic groups. (There are differences in those cases when including those inside assertions and atomic groups. However, there are
(*MARK) is used in conjunction with (*SKIP) as described below.) differences in those cases when (*MARK) is used in conjunction with (*SKIP) as
described below.
</P> </P>
<P> <P>
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have The mark name that was last encountered on the matching path is passed back. A
associated NAME arguments. Whichever is last on the matching path is passed verb without a NAME argument is ignored for this purpose. Here is an example of
back. See below for more details of these other verbs. <b>pcre2test</b> output, where the "mark" modifier requests the retrieval and
</P> outputting of (*MARK) data:
<P>
Here is an example of <b>pcre2test</b> output, where the "mark" modifier
requests the retrieval and outputting of (*MARK) data:
<pre> <pre>
re&#62; /X(*MARK:A)Y|X(*MARK:B)Z/mark re&#62; /X(*MARK:A)Y|X(*MARK:B)Z/mark
data&#62; XY data&#62; XY
@ -3414,7 +3413,7 @@ to the left of the verb. However, when one of these verbs appears inside an
atomic group or in a lookaround assertion that is true, its effect is confined atomic group or in a lookaround assertion that is true, its effect is confined
to that group, because once the group has been matched, there is never any to that group, because once the group has been matched, there is never any
backtracking into it. Backtracking from beyond an assertion or an atomic group backtracking into it. Backtracking from beyond an assertion or an atomic group
ignores the entire group, and seeks a preceeding backtracking point. ignores the entire group, and seeks a preceding backtracking point.
</P> </P>
<P> <P>
These verbs differ in exactly what kind of failure occurs when backtracking These verbs differ in exactly what kind of failure occurs when backtracking
@ -3439,8 +3438,8 @@ dynamic anchor, or "I've started, so I must finish."
<P> <P>
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
like (*MARK:NAME) in that the name is remembered for passing back to the like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK), caller. However, (*SKIP:NAME) searches only for names that are set with
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN). (*MARK), ignoring those set by any of the other backtracking verbs.
</P> </P>
<P> <P>
If there is more than one backtracking verb in a pattern, a different one that If there is more than one backtracking verb in a pattern, a different one that
@ -3484,7 +3483,7 @@ as (*COMMIT).
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
like (*MARK:NAME) in that the name is remembered for passing back to the like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK), caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
ignoring those set by (*COMMIT), (*PRUNE) or (*THEN). ignoring those set by other backtracking verbs.
<pre> <pre>
(*SKIP) (*SKIP)
</pre> </pre>
@ -3539,7 +3538,7 @@ the second branch of the pattern.
</P> </P>
<P> <P>
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME). names that are set by other backtracking verbs.
<pre> <pre>
(*THEN) or (*THEN:NAME) (*THEN) or (*THEN:NAME)
</pre> </pre>
@ -3561,7 +3560,7 @@ group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
like (*MARK:NAME) in that the name is remembered for passing back to the like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK), caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN). ignoring those set by other backtracking verbs.
</P> </P>
<P> <P>
A subpattern that does not contain a | character is just a part of the A subpattern that does not contain a | character is just a part of the
@ -3656,10 +3655,10 @@ subpattern.
</P> </P>
<P> <P>
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed (*ACCEPT) in a standalone positive assertion causes the assertion to succeed
without any further processing; captured strings and a (*MARK) name (if set) without any further processing; captured strings and a mark name (if set) are
are retained. In a standalone negative assertion, (*ACCEPT) causes the retained. In a standalone negative assertion, (*ACCEPT) causes the assertion to
assertion to fail without any further processing; captured substrings and any fail without any further processing; captured substrings and any mark name are
(*MARK) name are discarded. discarded.
</P> </P>
<P> <P>
If the assertion is a condition, (*ACCEPT) causes the condition to be true for If the assertion is a condition, (*ACCEPT) causes the condition to be true for
@ -3731,7 +3730,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC31" href="#TOC1">REVISION</a><br> <br><a name="SEC31" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 12 October 2018 Last updated: 27 November 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "12 November 2018" "PCRE2 10.33" .TH PCRE2API 3 "27 November 2018" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -2842,21 +2842,22 @@ appropriate circumstances. If they are called at other times, the result is
undefined. undefined.
.P .P
After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
to match (PCRE2_ERROR_NOMATCH), a (*MARK), (*PRUNE), or (*THEN) name may be to match (PCRE2_ERROR_NOMATCH), a mark name may be available. The function
available. The function \fBpcre2_get_mark()\fP can be called to access this \fBpcre2_get_mark()\fP can be called to access this name, which can be
name. The same function applies to all three verbs. It returns a pointer to the specified in the pattern by any of the backtracking control verbs, not just
zero-terminated name, which is within the compiled pattern. If no name is (*MARK). The same function applies to all the verbs. It returns a pointer to
the zero-terminated name, which is within the compiled pattern. If no name is
available, NULL is returned. The length of the name (excluding the terminating available, NULL is returned. The length of the name (excluding the terminating
zero) is stored in the code unit that precedes the name. You should use this zero) is stored in the code unit that precedes the name. You should use this
length instead of relying on the terminating zero if the name might contain a length instead of relying on the terminating zero if the name might contain a
binary zero. binary zero.
.P .P
After a successful match, the name that is returned is the last (*MARK), After a successful match, the name that is returned is the last mark name
(*PRUNE), or (*THEN) name encountered on the matching path through the pattern. encountered on the matching path through the pattern. Instances of backtracking
Instances of (*PRUNE) and (*THEN) without names are ignored. Thus, for example, verbs without names do not count. Thus, for example, if the matching path
if the matching path contains (*MARK:A)(*PRUNE), the name "A" is returned. contains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a
After a "no match" or a partial match, the last encountered name is returned. partial match, the last encountered name is returned. For example, consider
For example, consider this pattern: this pattern:
.sp .sp
^(*MARK:A)((*MARK:B)a|b)c ^(*MARK:A)((*MARK:B)a|b)c
.sp .sp
@ -2870,7 +2871,7 @@ is removed from the pattern above, there is an initial check for the presence
of "c" in the subject before running the matching engine. This check fails for of "c" in the subject before running the matching engine. This check fails for
"bx", causing a match failure without seeing any marks. You can disable the "bx", causing a match failure without seeing any marks. You can disable the
start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for
\fBpcre2_compile()\fP or starting the pattern with (*NO_START_OPT). \fBpcre2_compile()\fP or by starting the pattern with (*NO_START_OPT).
.P .P
After a successful match, a partial match, or one of the invalid UTF errors After a successful match, a partial match, or one of the invalid UTF errors
(for example, PCRE2_ERROR_UTF8_ERR5), \fBpcre2_get_startchar()\fP can be (for example, PCRE2_ERROR_UTF8_ERR5), \fBpcre2_get_startchar()\fP can be
@ -3297,13 +3298,12 @@ number or name. The number may be zero to include the entire matched string.
For example, if the pattern a(b)c is matched with "=abc=" and the replacement For example, if the pattern a(b)c is matched with "=abc=" and the replacement
string "+$1$0$1+", the result is "=+babcb+=". string "+$1$0$1+", the result is "=+babcb+=".
.P .P
$*MARK inserts the name from the last encountered (*ACCEPT), (*COMMIT), $*MARK inserts the name from the last encountered backtracking control verb on
(*MARK), (*PRUNE), or (*THEN) on the matching path that has a name. (*MARK) the matching path that has a name. (*MARK) must always include a name, but the
must always include a name, but the other verbs need not. For example, in other verbs need not. For example, in the case of (*MARK:A)(*PRUNE) the name
the case of (*MARK:A)(*PRUNE) the name inserted is "A", but for inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This
(*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be used to facility can be used to perform simple simultaneous substitutions, as this
perform simple simultaneous substitutions, as this \fBpcre2test\fP example \fBpcre2test\fP example shows:
shows:
.sp .sp
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK} /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
apple lemon apple lemon
@ -3790,6 +3790,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 12 November 2018 Last updated: 27 November 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
.fi .fi

View File

@ -847,11 +847,15 @@ USING PCRE2'S CALLOUT FACILITY
Calling external programs or scripts Calling external programs or scripts
This facility can be independently disabled when pcre2grep is built. If This facility can be independently disabled when pcre2grep is built. It
the callout string does not start with a pipe (vertical bar) character, is supported for Windows, where a call to _spawnvp() is used, for VMS,
it is parsed into a list of substrings separated by pipe characters. where lib$spawn() is used, and for any other Unix-like environment
The first substring must be an executable name, with the following sub- where fork() and execv() are available.
strings specifying arguments:
If the callout string does not start with a pipe (vertical bar) charac-
ter, it is parsed into a list of substrings separated by pipe charac-
ters. The first substring must be an executable name, with the follow-
ing substrings specifying arguments:
executable_name|arg1|arg2|... executable_name|arg1|arg2|...
@ -877,15 +881,14 @@ USING PCRE2'S CALLOUT FACILITY
Arg1: [1] [234] [4] Arg2: |1| () Arg1: [1] [234] [4] Arg2: |1| ()
12345 12345
The parameters for the execv() system call that is used to run the pro- The parameters for the system call that is used to run the program or
gram or script are zero-terminated strings. This means that binary zero script are zero-terminated strings. This means that binary zero charac-
characters in the callout argument will cause premature termination of ters in the callout argument will cause premature termination of their
their substrings, and therefore should not be present. Any syntax substrings, and therefore should not be present. Any syntax errors in
errors in the string (for example, a dollar not followed by another the string (for example, a dollar not followed by another character)
character) cause the callout to be ignored. If running the program cause the callout to be ignored. If running the program fails for any
fails for any reason (including the non-existence of the executable), a reason (including the non-existence of the executable), a local match-
local matching failure occurs and the matcher backtracks in the normal ing failure occurs and the matcher backtracks in the normal way.
way.
Echoing a specific string Echoing a specific string
@ -893,41 +896,41 @@ USING PCRE2'S CALLOUT FACILITY
pletely disabled when pcre2grep was built. If the callout string starts pletely disabled when pcre2grep was built. If the callout string starts
with a pipe (vertical bar) character, the rest of the string is written with a pipe (vertical bar) character, the rest of the string is written
to the output, having been passed through the same escape processing as to the output, having been passed through the same escape processing as
text from the --output option. This provides a simple echoing facility text from the --output option. This provides a simple echoing facility
that avoids calling an external program or script. No terminator is that avoids calling an external program or script. No terminator is
added to the string, so if you want a newline, you must include it added to the string, so if you want a newline, you must include it
explicitly. Matching continues normally after the string is output. If explicitly. Matching continues normally after the string is output. If
you want to see only the callout output but not any output from an you want to see only the callout output but not any output from an
actual match, you should end the relevant pattern with (*FAIL). actual match, you should end the relevant pattern with (*FAIL).
MATCHING ERRORS MATCHING ERRORS
It is possible to supply a regular expression that takes a very long It is possible to supply a regular expression that takes a very long
time to fail to match certain lines. Such patterns normally involve time to fail to match certain lines. Such patterns normally involve
nested indefinite repeats, for example: (a+)*\d when matched against a nested indefinite repeats, for example: (a+)*\d when matched against a
line of a's with no final digit. The PCRE2 matching function has a line of a's with no final digit. The PCRE2 matching function has a
resource limit that causes it to abort in these circumstances. If this resource limit that causes it to abort in these circumstances. If this
happens, pcre2grep outputs an error message and the line that caused happens, pcre2grep outputs an error message and the line that caused
the problem to the standard error stream. If there are more than 20 the problem to the standard error stream. If there are more than 20
such errors, pcre2grep gives up. such errors, pcre2grep gives up.
The --match-limit option of pcre2grep can be used to set the overall The --match-limit option of pcre2grep can be used to set the overall
resource limit. There are also other limits that affect the amount of resource limit. There are also other limits that affect the amount of
memory used during matching; see the discussion of --heap-limit and memory used during matching; see the discussion of --heap-limit and
--depth-limit above. --depth-limit above.
DIAGNOSTICS DIAGNOSTICS
Exit status is 0 if any matches were found, 1 if no matches were found, Exit status is 0 if any matches were found, 1 if no matches were found,
and 2 for syntax errors, overlong lines, non-existent or inaccessible and 2 for syntax errors, overlong lines, non-existent or inaccessible
files (even if matches were found in other files) or too many matching files (even if matches were found in other files) or too many matching
errors. Using the -s option to suppress error messages about inaccessi- errors. Using the -s option to suppress error messages about inaccessi-
ble files does not affect the return code. ble files does not affect the return code.
When run under VMS, the return code is placed in the symbol When run under VMS, the return code is placed in the symbol
PCRE2GREP_RC because VMS does not distinguish between exit(0) and PCRE2GREP_RC because VMS does not distinguish between exit(0) and
exit(1). exit(1).
@ -945,5 +948,5 @@ AUTHOR
REVISION REVISION
Last updated: 17 November 2018 Last updated: 24 November 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "12 October 2018" "PCRE2 10.33" .TH PCRE2PATTERN 3 "27 November 2018" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -2640,9 +2640,9 @@ can be used:
.sp .sp
\es+(?=\ep{Latin})(*sr:\eS+) \es+(?=\ep{Latin})(*sr:\eS+)
.sp .sp
This works as long as the first character is expected to be a character in that This works as long as the first character is expected to be a character in that
script, and not (for example) punctuation, which is allowed with any script. If script, and not (for example) punctuation, which is allowed with any script. If
this is not the case, a more creative lookahead is needed. For example, if this is not the case, a more creative lookahead is needed. For example, if
digits, underscore, and dots are permitted at the start: digits, underscore, and dots are permitted at the start:
.sp .sp
\es+(?=[0-9_.]*\ep{Latin})(*sr:\eS+) \es+(?=[0-9_.]*\ep{Latin})(*sr:\eS+)
@ -3262,6 +3262,7 @@ There are a number of special "Backtracking Control Verbs" (to use Perl's
terminology) that modify the behaviour of backtracking during matching. They terminology) that modify the behaviour of backtracking during matching. They
are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form, are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
possibly behaving differently depending on whether or not a name is present. possibly behaving differently depending on whether or not a name is present.
The names are not required to be unique within the pattern.
.P .P
By default, for compatibility with Perl, a name is any sequence of characters By default, for compatibility with Perl, a name is any sequence of characters
that does not include a closing parenthesis. The name is not processed in that does not include a closing parenthesis. The name is not processed in
@ -3376,8 +3377,8 @@ nearest equivalent is the callout feature, as for example in this pattern:
A match with the string "aaaa" always fails, but the callout is taken before A match with the string "aaaa" always fails, but the callout is taken before
each backtrack happens (in this example, 10 times). each backtrack happens (in this example, 10 times).
.P .P
(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as (*ACCEPT:NAME) and (*FAIL:NAME) are treated as (*MARK:NAME)(*ACCEPT) and
(*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively. (*MARK:NAME)(*FAIL), respectively.
. .
. .
.SS "Recording which path was taken" .SS "Recording which path was taken"
@ -3389,10 +3390,10 @@ starting point (see (*SKIP) below).
.sp .sp
(*MARK:NAME) or (*:NAME) (*MARK:NAME) or (*:NAME)
.sp .sp
A name is always required with this verb. There may be as many instances of A name is always required with this verb. For all the other backtracking
(*MARK) as you like in a pattern, and their names do not have to be unique. control verbs, a NAME argument is optional.
.P .P
When a match succeeds, the name of the last-encountered (*MARK:NAME) on the When a match succeeds, the name of the last-encountered mark name on the
matching path is passed back to the caller as described in the section entitled matching path is passed back to the caller as described in the section entitled
.\" HTML <a href="pcre2api.html#matchotherdata"> .\" HTML <a href="pcre2api.html#matchotherdata">
.\" </a> .\" </a>
@ -3402,16 +3403,15 @@ in the
.\" HREF .\" HREF
\fBpcre2api\fP \fBpcre2api\fP
.\" .\"
documentation. This applies to all instances of (*MARK), including those inside documentation. This applies to all instances of (*MARK) and other verbs,
assertions and atomic groups. (There are differences in those cases when including those inside assertions and atomic groups. However, there are
(*MARK) is used in conjunction with (*SKIP) as described below.) differences in those cases when (*MARK) is used in conjunction with (*SKIP) as
described below.
.P .P
As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have The mark name that was last encountered on the matching path is passed back. A
associated NAME arguments. Whichever is last on the matching path is passed verb without a NAME argument is ignored for this purpose. Here is an example of
back. See below for more details of these other verbs. \fBpcre2test\fP output, where the "mark" modifier requests the retrieval and
.P outputting of (*MARK) data:
Here is an example of \fBpcre2test\fP output, where the "mark" modifier
requests the retrieval and outputting of (*MARK) data:
.sp .sp
re> /X(*MARK:A)Y|X(*MARK:B)Z/mark re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
data> XY data> XY
@ -3461,7 +3461,7 @@ to the left of the verb. However, when one of these verbs appears inside an
atomic group or in a lookaround assertion that is true, its effect is confined atomic group or in a lookaround assertion that is true, its effect is confined
to that group, because once the group has been matched, there is never any to that group, because once the group has been matched, there is never any
backtracking into it. Backtracking from beyond an assertion or an atomic group backtracking into it. Backtracking from beyond an assertion or an atomic group
ignores the entire group, and seeks a preceeding backtracking point. ignores the entire group, and seeks a preceding backtracking point.
.P .P
These verbs differ in exactly what kind of failure occurs when backtracking These verbs differ in exactly what kind of failure occurs when backtracking
reaches them. The behaviour described below is what happens when the verb is reaches them. The behaviour described below is what happens when the verb is
@ -3484,8 +3484,8 @@ dynamic anchor, or "I've started, so I must finish."
.P .P
The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
like (*MARK:NAME) in that the name is remembered for passing back to the like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK), caller. However, (*SKIP:NAME) searches only for names that are set with
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN). (*MARK), ignoring those set by any of the other backtracking verbs.
.P .P
If there is more than one backtracking verb in a pattern, a different one that If there is more than one backtracking verb in a pattern, a different one that
follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
@ -3526,7 +3526,7 @@ as (*COMMIT).
The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
like (*MARK:NAME) in that the name is remembered for passing back to the like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK), caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
ignoring those set by (*COMMIT), (*PRUNE) or (*THEN). ignoring those set by other backtracking verbs.
.sp .sp
(*SKIP) (*SKIP)
.sp .sp
@ -3579,7 +3579,7 @@ never seen because "a" does not match "b", so the matcher immediately jumps to
the second branch of the pattern. the second branch of the pattern.
.P .P
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
names that are set by (*COMMIT:NAME), (*PRUNE:NAME) or (*THEN:NAME). names that are set by other backtracking verbs.
.sp .sp
(*THEN) or (*THEN:NAME) (*THEN) or (*THEN:NAME)
.sp .sp
@ -3600,7 +3600,7 @@ group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is
like (*MARK:NAME) in that the name is remembered for passing back to the like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK), caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
ignoring those set by (*COMMIT), (*PRUNE) and (*THEN). ignoring those set by other backtracking verbs.
.P .P
A subpattern that does not contain a | character is just a part of the A subpattern that does not contain a | character is just a part of the
enclosing alternative; it is not a nested alternation with only one enclosing alternative; it is not a nested alternation with only one
@ -3693,10 +3693,10 @@ not the assertion is standalone or acting as the condition in a conditional
subpattern. subpattern.
.P .P
(*ACCEPT) in a standalone positive assertion causes the assertion to succeed (*ACCEPT) in a standalone positive assertion causes the assertion to succeed
without any further processing; captured strings and a (*MARK) name (if set) without any further processing; captured strings and a mark name (if set) are
are retained. In a standalone negative assertion, (*ACCEPT) causes the retained. In a standalone negative assertion, (*ACCEPT) causes the assertion to
assertion to fail without any further processing; captured substrings and any fail without any further processing; captured substrings and any mark name are
(*MARK) name are discarded. discarded.
.P .P
If the assertion is a condition, (*ACCEPT) causes the condition to be true for If the assertion is a condition, (*ACCEPT) causes the condition to be true for
a positive assertion and false for a negative one; captured substrings are a positive assertion and false for a negative one; captured substrings are
@ -3767,6 +3767,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 12 October 2018 Last updated: 27 November 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
.fi .fi