Documentation update.

This commit is contained in:
Philip.Hazel 2018-08-03 16:56:54 +00:00
parent b196143523
commit c722bf2399
3 changed files with 889 additions and 809 deletions

View File

@ -1813,41 +1813,68 @@ duplicate named subpatterns, as described in the next section.
<br><a name="SEC16" href="#TOC1">NAMED SUBPATTERNS</a><br>
<P>
Identifying capturing parentheses by number is simple, but it can be very hard
to keep track of the numbers in complicated regular expressions. Furthermore,
if an expression is modified, the numbers may change. To help with this
difficulty, PCRE2 supports the naming of subpatterns. This feature was not
added to Perl until release 5.10. Python had the feature earlier, and PCRE1
to keep track of the numbers in complicated patterns. Furthermore, if an
expression is modified, the numbers may change. To help with this difficulty,
PCRE2 supports the naming of capturing subpatterns. This feature was not added
to Perl until release 5.10. Python had the feature earlier, and PCRE1
introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
Perl and the Python syntax. Perl allows identically numbered subpatterns to
have different names, but PCRE2 does not.
Perl and the Python syntax.
</P>
<P>
In PCRE2, a subpattern can be named in one of three ways: (?&#60;name&#62;...) or
(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. References to capturing
parentheses from other parts of the pattern, such as
In PCRE2, a capturing subpattern can be named in one of three ways:
(?&#60;name&#62;...) or (?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. Names
consist of up to 32 alphanumeric characters and underscores, but must start
with a non-digit. References to capturing parentheses from other parts of the
pattern, such as
<a href="#backreferences">backreferences,</a>
<a href="#recursion">recursion,</a>
and
<a href="#conditions">conditions,</a>
can be made by name as well as by number.
can all be made by name as well as by number.
</P>
<P>
Names consist of up to 32 alphanumeric characters and underscores, but must
start with a non-digit. Named capturing parentheses are still allocated numbers
as well as names, exactly as if the names were not present. The PCRE2 API
provides function calls for extracting the name-to-number translation table
from a compiled pattern. There are also convenience functions for extracting a
captured substring by name.
Named capturing parentheses are allocated numbers as well as names, exactly as
if the names were not present. In both PCRE2 and Perl, capturing subpatterns
are primarily identified by numbers; any names are just aliases for these
numbers. The PCRE2 API provides function calls for extracting the complete
name-to-number translation table from a compiled pattern, as well as
convenience functions for extracting captured substrings by name.
</P>
<P>
By default, a name must be unique within a pattern, but it is possible to relax
this constraint by setting the PCRE2_DUPNAMES option at compile time.
(Duplicate names are also always permitted for subpatterns with the same
number, set up as described in the previous section.) Duplicate names can be
useful for patterns where only one instance of the named parentheses can match.
Suppose you want to match the name of a weekday, either as a 3-letter
abbreviation or as the full name, and in both cases you want to extract the
abbreviation. This pattern (ignoring the line breaks) does the job:
<b>Warning:</b> When more than one subpattern has the same number, as described
in the previous section, a name given to one of them applies to all of them.
Perl allows identically numbered subpatterns to have different names. Consider
this pattern, where there are two capturing subpatterns, both numbered 1:
<pre>
(?|(?&#60;AA&#62;aa)|(?&#60;BB&#62;bb))
</pre>
Perl allows this, with both names AA and BB as aliases of group 1. Thus, after
a successful match, both names yield the same value (either "aa" or "bb").
</P>
<P>
In an attempt to reduce confusion, PCRE2 does not allow the same group number
to be associated with more than one name. The example above provokes a
compile-time error. However, there is still scope for confusion. Consider this
pattern:
<pre>
(?|(?&#60;AA&#62;aa)|(bb))
</pre>
Although the second subpattern number 1 is not explicitly named, the name AA is
still an alias for subpattern 1. Whether the pattern matches "aa" or "bb", a
reference by name to group AA yields the matched string.
</P>
<P>
By default, a name must be unique within a pattern, except that duplicate names
are permitted for subpatterns with the same number, for example:
<pre>
(?|(?&#60;AA&#62;aa)|(?&#60;AA&#62;bb))
</pre>
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
option at compile time, or by the use of (?J) within the pattern. Duplicate
names can be useful for patterns where only one instance of the named
parentheses can match. Suppose you want to match the name of a weekday, either
as a 3-letter abbreviation or as the full name, and in both cases you want to
extract the abbreviation. This pattern (ignoring the line breaks) does the job:
<pre>
(?&#60;DN&#62;Mon|Fri|Sun)(?:day)?|
(?&#60;DN&#62;Tue)(?:sday)?|
@ -1856,13 +1883,11 @@ abbreviation. This pattern (ignoring the line breaks) does the job:
(?&#60;DN&#62;Sat)(?:urday)?
</pre>
There are five capturing substrings, but only one is ever set after a match.
(An alternative way of solving this problem is to use a "branch reset"
subpattern, as described in the previous section.)
</P>
<P>
The convenience functions for extracting the data by name returns the substring
for the first (and in this example, the only) subpattern of that name that
matched. This saves searching to find which numbered subpattern it was.
matched. This saves searching to find which numbered subpattern it was. (An
alternative way of solving this problem is to use a "branch reset" subpattern,
as described in the previous section.)
</P>
<P>
If you make a backreference to a non-unique named subpattern from elsewhere in
@ -1878,8 +1903,7 @@ for the reference. For example, this pattern matches both "foofoo" and
<P>
If you make a subroutine call to a non-unique named subpattern, the one that
corresponds to the first occurrence of the name is used. In the absence of
duplicate numbers (see the previous section) this is the one with the lowest
number.
duplicate numbers this is the one with the lowest number.
</P>
<P>
If you use a named reference in a condition
@ -1893,14 +1917,6 @@ handling named subpatterns, see the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation.
</P>
<P>
<b>Warning:</b> You cannot use different names to distinguish between two
subpatterns with the same number because PCRE2 uses only the numbers when
matching. For this reason, an error is given at compile time if different names
are given to subpatterns with the same number. However, you can always give the
same name to subpatterns with the same number, even when PCRE2_DUPNAMES is not
set.
</P>
<br><a name="SEC17" href="#TOC1">REPETITION</a><br>
<P>
Repetition is specified by quantifiers, which can follow any of the following
@ -2957,10 +2973,12 @@ later versions (I tried 5.024) it now works.
<br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
<P>
If the syntax for a recursive subpattern call (either by number or by
name) is used outside the parentheses to which it refers, it operates like a
subroutine in a programming language. The called subpattern may be defined
before or after the reference. A numbered reference can be absolute or
relative, as in these examples:
name) is used outside the parentheses to which it refers, it operates a bit
like a subroutine in a programming language. More accurately, PCRE2 treats the
referenced subpattern as an independent subpattern which it tries to match at
the current matching position. The called subpattern may be defined before or
after the reference. A numbered reference can be absolute or relative, as in
these examples:
<pre>
(...(absolute)...)...(?2)...
(...(relative)...)...(?-1)...
@ -2993,6 +3011,13 @@ different calls. For example, consider this pattern:
</pre>
It matches "abcabc". It does not match "abcABC" because the change of
processing option does not affect the called subpattern.
</P>
<P>
The behaviour of
<a href="#backtrackcontrol">backtracking control verbs</a>
in subpatterns when called as subroutines is described in the section entitled
<a href="#btsub">"Backtracking verbs in subroutines"</a>
below.
<a name="onigurumasubroutines"></a></P>
<br><a name="SEC25" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
<P>

View File

@ -7393,32 +7393,60 @@ DUPLICATE SUBPATTERN NUMBERS
NAMED SUBPATTERNS
Identifying capturing parentheses by number is simple, but it can be
very hard to keep track of the numbers in complicated regular expres-
sions. Furthermore, if an expression is modified, the numbers may
change. To help with this difficulty, PCRE2 supports the naming of sub-
patterns. This feature was not added to Perl until release 5.10. Python
very hard to keep track of the numbers in complicated patterns. Fur-
thermore, if an expression is modified, the numbers may change. To help
with this difficulty, PCRE2 supports the naming of capturing subpat-
terns. This feature was not added to Perl until release 5.10. Python
had the feature earlier, and PCRE1 introduced it at release 4.0, using
the Python syntax. PCRE2 supports both the Perl and the Python syntax.
Perl allows identically numbered subpatterns to have different names,
but PCRE2 does not.
In PCRE2, a subpattern can be named in one of three ways: (?<name>...)
or (?'name'...) as in Perl, or (?P<name>...) as in Python. References
to capturing parentheses from other parts of the pattern, such as back-
references, recursion, and conditions, can be made by name as well as
by number.
In PCRE2, a capturing subpattern can be named in one of three ways:
(?<name>...) or (?'name'...) as in Perl, or (?P<name>...) as in Python.
Names consist of up to 32 alphanumeric characters and underscores, but
must start with a non-digit. Named capturing parentheses are still
allocated numbers as well as names, exactly as if the names were not
present. The PCRE2 API provides function calls for extracting the name-
to-number translation table from a compiled pattern. There are also
convenience functions for extracting a captured substring by name.
must start with a non-digit. References to capturing parentheses from
other parts of the pattern, such as backreferences, recursion, and con-
ditions, can all be made by name as well as by number.
By default, a name must be unique within a pattern, but it is possible
to relax this constraint by setting the PCRE2_DUPNAMES option at com-
pile time. (Duplicate names are also always permitted for subpatterns
with the same number, set up as described in the previous section.)
Named capturing parentheses are allocated numbers as well as names,
exactly as if the names were not present. In both PCRE2 and Perl, cap-
turing subpatterns are primarily identified by numbers; any names are
just aliases for these numbers. The PCRE2 API provides function calls
for extracting the complete name-to-number translation table from a
compiled pattern, as well as convenience functions for extracting cap-
tured substrings by name.
Warning: When more than one subpattern has the same number, as
described in the previous section, a name given to one of them applies
to all of them. Perl allows identically numbered subpatterns to have
different names. Consider this pattern, where there are two capturing
subpatterns, both numbered 1:
(?|(?<AA>aa)|(?<BB>bb))
Perl allows this, with both names AA and BB as aliases of group 1.
Thus, after a successful match, both names yield the same value (either
"aa" or "bb").
In an attempt to reduce confusion, PCRE2 does not allow the same group
number to be associated with more than one name. The example above pro-
vokes a compile-time error. However, there is still scope for confu-
sion. Consider this pattern:
(?|(?<AA>aa)|(bb))
Although the second subpattern number 1 is not explicitly named, the
name AA is still an alias for subpattern 1. Whether the pattern matches
"aa" or "bb", a reference by name to group AA yields the matched
string.
By default, a name must be unique within a pattern, except that dupli-
cate names are permitted for subpatterns with the same number, for
example:
(?|(?<AA>aa)|(?<AA>bb))
The duplicate name constraint can be disabled by setting the PCRE2_DUP-
NAMES option at compile time, or by the use of (?J) within the pattern.
Duplicate names can be useful for patterns where only one instance of
the named parentheses can match. Suppose you want to match the name of
a weekday, either as a 3-letter abbreviation or as the full name, and
@ -7432,13 +7460,12 @@ NAMED SUBPATTERNS
(?<DN>Sat)(?:urday)?
There are five capturing substrings, but only one is ever set after a
match. (An alternative way of solving this problem is to use a "branch
reset" subpattern, as described in the previous section.)
The convenience functions for extracting the data by name returns the
substring for the first (and in this example, the only) subpattern of
that name that matched. This saves searching to find which numbered
subpattern it was.
match. The convenience functions for extracting the data by name
returns the substring for the first (and in this example, the only)
subpattern of that name that matched. This saves searching to find
which numbered subpattern it was. (An alternative way of solving this
problem is to use a "branch reset" subpattern, as described in the pre-
vious section.)
If you make a backreference to a non-unique named subpattern from else-
where in the pattern, the subpatterns to which the name refers are
@ -7451,8 +7478,7 @@ NAMED SUBPATTERNS
If you make a subroutine call to a non-unique named subpattern, the one
that corresponds to the first occurrence of the name is used. In the
absence of duplicate numbers (see the previous section) this is the one
with the lowest number.
absence of duplicate numbers this is the one with the lowest number.
If you use a named reference in a condition test (see the section about
conditions below), either to check whether a subpattern has matched, or
@ -7462,13 +7488,6 @@ NAMED SUBPATTERNS
details of the interfaces for handling named subpatterns, see the
pcre2api documentation.
Warning: You cannot use different names to distinguish between two sub-
patterns with the same number because PCRE2 uses only the numbers when
matching. For this reason, an error is given at compile time if differ-
ent names are given to subpatterns with the same number. However, you
can always give the same name to subpatterns with the same number, even
when PCRE2_DUPNAMES is not set.
REPETITION
@ -8472,10 +8491,12 @@ RECURSIVE PATTERNS
SUBPATTERNS AS SUBROUTINES
If the syntax for a recursive subpattern call (either by number or by
name) is used outside the parentheses to which it refers, it operates
like a subroutine in a programming language. The called subpattern may
be defined before or after the reference. A numbered reference can be
absolute or relative, as in these examples:
name) is used outside the parentheses to which it refers, it operates a
bit like a subroutine in a programming language. More accurately, PCRE2
treats the referenced subpattern as an independent subpattern which it
tries to match at the current matching position. The called subpattern
may be defined before or after the reference. A numbered reference can
be absolute or relative, as in these examples:
(...(absolute)...)...(?2)...
(...(relative)...)...(?-1)...
@ -8508,6 +8529,10 @@ SUBPATTERNS AS SUBROUTINES
It matches "abcabc". It does not match "abcABC" because the change of
processing option does not affect the called subpattern.
The behaviour of backtracking control verbs in subpatterns when called
as subroutines is described in the section entitled "Backtracking verbs
in subroutines" below.
ONIGURUMA SUBROUTINE SYNTAX

View File

@ -1815,17 +1815,18 @@ duplicate named subpatterns, as described in the next section.
.rs
.sp
Identifying capturing parentheses by number is simple, but it can be very hard
to keep track of the numbers in complicated regular expressions. Furthermore,
if an expression is modified, the numbers may change. To help with this
difficulty, PCRE2 supports the naming of subpatterns. This feature was not
added to Perl until release 5.10. Python had the feature earlier, and PCRE1
to keep track of the numbers in complicated patterns. Furthermore, if an
expression is modified, the numbers may change. To help with this difficulty,
PCRE2 supports the naming of capturing subpatterns. This feature was not added
to Perl until release 5.10. Python had the feature earlier, and PCRE1
introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
Perl and the Python syntax. Perl allows identically numbered subpatterns to
have different names, but PCRE2 does not.
Perl and the Python syntax.
.P
In PCRE2, a subpattern can be named in one of three ways: (?<name>...) or
(?'name'...) as in Perl, or (?P<name>...) as in Python. References to capturing
parentheses from other parts of the pattern, such as
In PCRE2, a capturing subpattern can be named in one of three ways:
(?<name>...) or (?'name'...) as in Perl, or (?P<name>...) as in Python. Names
consist of up to 32 alphanumeric characters and underscores, but must start
with a non-digit. References to capturing parentheses from other parts of the
pattern, such as
.\" HTML <a href="#backreferences">
.\" </a>
backreferences,
@ -1839,23 +1840,47 @@ and
.\" </a>
conditions,
.\"
can be made by name as well as by number.
can all be made by name as well as by number.
.P
Names consist of up to 32 alphanumeric characters and underscores, but must
start with a non-digit. Named capturing parentheses are still allocated numbers
as well as names, exactly as if the names were not present. The PCRE2 API
provides function calls for extracting the name-to-number translation table
from a compiled pattern. There are also convenience functions for extracting a
captured substring by name.
Named capturing parentheses are allocated numbers as well as names, exactly as
if the names were not present. In both PCRE2 and Perl, capturing subpatterns
are primarily identified by numbers; any names are just aliases for these
numbers. The PCRE2 API provides function calls for extracting the complete
name-to-number translation table from a compiled pattern, as well as
convenience functions for extracting captured substrings by name.
.P
By default, a name must be unique within a pattern, but it is possible to relax
this constraint by setting the PCRE2_DUPNAMES option at compile time.
(Duplicate names are also always permitted for subpatterns with the same
number, set up as described in the previous section.) Duplicate names can be
useful for patterns where only one instance of the named parentheses can match.
Suppose you want to match the name of a weekday, either as a 3-letter
abbreviation or as the full name, and in both cases you want to extract the
abbreviation. This pattern (ignoring the line breaks) does the job:
\fBWarning:\fP When more than one subpattern has the same number, as described
in the previous section, a name given to one of them applies to all of them.
Perl allows identically numbered subpatterns to have different names. Consider
this pattern, where there are two capturing subpatterns, both numbered 1:
.sp
(?|(?<AA>aa)|(?<BB>bb))
.sp
Perl allows this, with both names AA and BB as aliases of group 1. Thus, after
a successful match, both names yield the same value (either "aa" or "bb").
.P
In an attempt to reduce confusion, PCRE2 does not allow the same group number
to be associated with more than one name. The example above provokes a
compile-time error. However, there is still scope for confusion. Consider this
pattern:
.sp
(?|(?<AA>aa)|(bb))
.sp
Although the second subpattern number 1 is not explicitly named, the name AA is
still an alias for subpattern 1. Whether the pattern matches "aa" or "bb", a
reference by name to group AA yields the matched string.
.P
By default, a name must be unique within a pattern, except that duplicate names
are permitted for subpatterns with the same number, for example:
.sp
(?|(?<AA>aa)|(?<AA>bb))
.sp
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
option at compile time, or by the use of (?J) within the pattern. Duplicate
names can be useful for patterns where only one instance of the named
parentheses can match. Suppose you want to match the name of a weekday, either
as a 3-letter abbreviation or as the full name, and in both cases you want to
extract the abbreviation. This pattern (ignoring the line breaks) does the job:
.sp
(?<DN>Mon|Fri|Sun)(?:day)?|
(?<DN>Tue)(?:sday)?|
@ -1864,12 +1889,11 @@ abbreviation. This pattern (ignoring the line breaks) does the job:
(?<DN>Sat)(?:urday)?
.sp
There are five capturing substrings, but only one is ever set after a match.
(An alternative way of solving this problem is to use a "branch reset"
subpattern, as described in the previous section.)
.P
The convenience functions for extracting the data by name returns the substring
for the first (and in this example, the only) subpattern of that name that
matched. This saves searching to find which numbered subpattern it was.
matched. This saves searching to find which numbered subpattern it was. (An
alternative way of solving this problem is to use a "branch reset" subpattern,
as described in the previous section.)
.P
If you make a backreference to a non-unique named subpattern from elsewhere in
the pattern, the subpatterns to which the name refers are checked in the order
@ -1882,8 +1906,7 @@ for the reference. For example, this pattern matches both "foofoo" and
.P
If you make a subroutine call to a non-unique named subpattern, the one that
corresponds to the first occurrence of the name is used. In the absence of
duplicate numbers (see the previous section) this is the one with the lowest
number.
duplicate numbers this is the one with the lowest number.
.P
If you use a named reference in a condition
test (see the
@ -1901,13 +1924,6 @@ handling named subpatterns, see the
\fBpcre2api\fP
.\"
documentation.
.P
\fBWarning:\fP You cannot use different names to distinguish between two
subpatterns with the same number because PCRE2 uses only the numbers when
matching. For this reason, an error is given at compile time if different names
are given to subpatterns with the same number. However, you can always give the
same name to subpatterns with the same number, even when PCRE2_DUPNAMES is not
set.
.
.
.SH REPETITION
@ -2982,10 +2998,12 @@ later versions (I tried 5.024) it now works.
.rs
.sp
If the syntax for a recursive subpattern call (either by number or by
name) is used outside the parentheses to which it refers, it operates like a
subroutine in a programming language. The called subpattern may be defined
before or after the reference. A numbered reference can be absolute or
relative, as in these examples:
name) is used outside the parentheses to which it refers, it operates a bit
like a subroutine in a programming language. More accurately, PCRE2 treats the
referenced subpattern as an independent subpattern which it tries to match at
the current matching position. The called subpattern may be defined before or
after the reference. A numbered reference can be absolute or relative, as in
these examples:
.sp
(...(absolute)...)...(?2)...
(...(relative)...)...(?-1)...
@ -3016,6 +3034,18 @@ different calls. For example, consider this pattern:
.sp
It matches "abcabc". It does not match "abcABC" because the change of
processing option does not affect the called subpattern.
.P
The behaviour of
.\" HTML <a href="#backtrackcontrol">
.\" </a>
backtracking control verbs
.\"
in subpatterns when called as subroutines is described in the section entitled
.\" HTML <a href="#btsub">
.\" </a>
"Backtracking verbs in subroutines"
.\"
below.
.
.
.\" HTML <a name="onigurumasubroutines"></a>