Documentation update.

This commit is contained in:
Philip.Hazel 2018-08-03 16:56:54 +00:00
parent b196143523
commit c722bf2399
3 changed files with 889 additions and 809 deletions

View File

@ -1813,41 +1813,68 @@ duplicate named subpatterns, as described in the next section.
<br><a name="SEC16" href="#TOC1">NAMED SUBPATTERNS</a><br>
<P>
Identifying capturing parentheses by number is simple, but it can be very hard
to keep track of the numbers in complicated regular expressions. Furthermore,
if an expression is modified, the numbers may change. To help with this
difficulty, PCRE2 supports the naming of subpatterns. This feature was not
added to Perl until release 5.10. Python had the feature earlier, and PCRE1
to keep track of the numbers in complicated patterns. Furthermore, if an
expression is modified, the numbers may change. To help with this difficulty,
PCRE2 supports the naming of capturing subpatterns. This feature was not added
to Perl until release 5.10. Python had the feature earlier, and PCRE1
introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
Perl and the Python syntax. Perl allows identically numbered subpatterns to
have different names, but PCRE2 does not.
Perl and the Python syntax.
</P>
<P>
In PCRE2, a subpattern can be named in one of three ways: (?&#60;name&#62;...) or
(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. References to capturing
parentheses from other parts of the pattern, such as
In PCRE2, a capturing subpattern can be named in one of three ways:
(?&#60;name&#62;...) or (?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. Names
consist of up to 32 alphanumeric characters and underscores, but must start
with a non-digit. References to capturing parentheses from other parts of the
pattern, such as
<a href="#backreferences">backreferences,</a>
<a href="#recursion">recursion,</a>
and
<a href="#conditions">conditions,</a>
can be made by name as well as by number.
can all be made by name as well as by number.
</P>
<P>
Names consist of up to 32 alphanumeric characters and underscores, but must
start with a non-digit. Named capturing parentheses are still allocated numbers
as well as names, exactly as if the names were not present. The PCRE2 API
provides function calls for extracting the name-to-number translation table
from a compiled pattern. There are also convenience functions for extracting a
captured substring by name.
Named capturing parentheses are allocated numbers as well as names, exactly as
if the names were not present. In both PCRE2 and Perl, capturing subpatterns
are primarily identified by numbers; any names are just aliases for these
numbers. The PCRE2 API provides function calls for extracting the complete
name-to-number translation table from a compiled pattern, as well as
convenience functions for extracting captured substrings by name.
</P>
<P>
By default, a name must be unique within a pattern, but it is possible to relax
this constraint by setting the PCRE2_DUPNAMES option at compile time.
(Duplicate names are also always permitted for subpatterns with the same
number, set up as described in the previous section.) Duplicate names can be
useful for patterns where only one instance of the named parentheses can match.
Suppose you want to match the name of a weekday, either as a 3-letter
abbreviation or as the full name, and in both cases you want to extract the
abbreviation. This pattern (ignoring the line breaks) does the job:
<b>Warning:</b> When more than one subpattern has the same number, as described
in the previous section, a name given to one of them applies to all of them.
Perl allows identically numbered subpatterns to have different names. Consider
this pattern, where there are two capturing subpatterns, both numbered 1:
<pre>
(?|(?&#60;AA&#62;aa)|(?&#60;BB&#62;bb))
</pre>
Perl allows this, with both names AA and BB as aliases of group 1. Thus, after
a successful match, both names yield the same value (either "aa" or "bb").
</P>
<P>
In an attempt to reduce confusion, PCRE2 does not allow the same group number
to be associated with more than one name. The example above provokes a
compile-time error. However, there is still scope for confusion. Consider this
pattern:
<pre>
(?|(?&#60;AA&#62;aa)|(bb))
</pre>
Although the second subpattern number 1 is not explicitly named, the name AA is
still an alias for subpattern 1. Whether the pattern matches "aa" or "bb", a
reference by name to group AA yields the matched string.
</P>
<P>
By default, a name must be unique within a pattern, except that duplicate names
are permitted for subpatterns with the same number, for example:
<pre>
(?|(?&#60;AA&#62;aa)|(?&#60;AA&#62;bb))
</pre>
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
option at compile time, or by the use of (?J) within the pattern. Duplicate
names can be useful for patterns where only one instance of the named
parentheses can match. Suppose you want to match the name of a weekday, either
as a 3-letter abbreviation or as the full name, and in both cases you want to
extract the abbreviation. This pattern (ignoring the line breaks) does the job:
<pre>
(?&#60;DN&#62;Mon|Fri|Sun)(?:day)?|
(?&#60;DN&#62;Tue)(?:sday)?|
@ -1856,13 +1883,11 @@ abbreviation. This pattern (ignoring the line breaks) does the job:
(?&#60;DN&#62;Sat)(?:urday)?
</pre>
There are five capturing substrings, but only one is ever set after a match.
(An alternative way of solving this problem is to use a "branch reset"
subpattern, as described in the previous section.)
</P>
<P>
The convenience functions for extracting the data by name returns the substring
for the first (and in this example, the only) subpattern of that name that
matched. This saves searching to find which numbered subpattern it was.
matched. This saves searching to find which numbered subpattern it was. (An
alternative way of solving this problem is to use a "branch reset" subpattern,
as described in the previous section.)
</P>
<P>
If you make a backreference to a non-unique named subpattern from elsewhere in
@ -1878,8 +1903,7 @@ for the reference. For example, this pattern matches both "foofoo" and
<P>
If you make a subroutine call to a non-unique named subpattern, the one that
corresponds to the first occurrence of the name is used. In the absence of
duplicate numbers (see the previous section) this is the one with the lowest
number.
duplicate numbers this is the one with the lowest number.
</P>
<P>
If you use a named reference in a condition
@ -1893,14 +1917,6 @@ handling named subpatterns, see the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation.
</P>
<P>
<b>Warning:</b> You cannot use different names to distinguish between two
subpatterns with the same number because PCRE2 uses only the numbers when
matching. For this reason, an error is given at compile time if different names
are given to subpatterns with the same number. However, you can always give the
same name to subpatterns with the same number, even when PCRE2_DUPNAMES is not
set.
</P>
<br><a name="SEC17" href="#TOC1">REPETITION</a><br>
<P>
Repetition is specified by quantifiers, which can follow any of the following
@ -2957,10 +2973,12 @@ later versions (I tried 5.024) it now works.
<br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
<P>
If the syntax for a recursive subpattern call (either by number or by
name) is used outside the parentheses to which it refers, it operates like a
subroutine in a programming language. The called subpattern may be defined
before or after the reference. A numbered reference can be absolute or
relative, as in these examples:
name) is used outside the parentheses to which it refers, it operates a bit
like a subroutine in a programming language. More accurately, PCRE2 treats the
referenced subpattern as an independent subpattern which it tries to match at
the current matching position. The called subpattern may be defined before or
after the reference. A numbered reference can be absolute or relative, as in
these examples:
<pre>
(...(absolute)...)...(?2)...
(...(relative)...)...(?-1)...
@ -2993,6 +3011,13 @@ different calls. For example, consider this pattern:
</pre>
It matches "abcabc". It does not match "abcABC" because the change of
processing option does not affect the called subpattern.
</P>
<P>
The behaviour of
<a href="#backtrackcontrol">backtracking control verbs</a>
in subpatterns when called as subroutines is described in the section entitled
<a href="#btsub">"Backtracking verbs in subroutines"</a>
below.
<a name="onigurumasubroutines"></a></P>
<br><a name="SEC25" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
<P>

File diff suppressed because it is too large Load Diff

View File

@ -1815,17 +1815,18 @@ duplicate named subpatterns, as described in the next section.
.rs
.sp
Identifying capturing parentheses by number is simple, but it can be very hard
to keep track of the numbers in complicated regular expressions. Furthermore,
if an expression is modified, the numbers may change. To help with this
difficulty, PCRE2 supports the naming of subpatterns. This feature was not
added to Perl until release 5.10. Python had the feature earlier, and PCRE1
to keep track of the numbers in complicated patterns. Furthermore, if an
expression is modified, the numbers may change. To help with this difficulty,
PCRE2 supports the naming of capturing subpatterns. This feature was not added
to Perl until release 5.10. Python had the feature earlier, and PCRE1
introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
Perl and the Python syntax. Perl allows identically numbered subpatterns to
have different names, but PCRE2 does not.
Perl and the Python syntax.
.P
In PCRE2, a subpattern can be named in one of three ways: (?<name>...) or
(?'name'...) as in Perl, or (?P<name>...) as in Python. References to capturing
parentheses from other parts of the pattern, such as
In PCRE2, a capturing subpattern can be named in one of three ways:
(?<name>...) or (?'name'...) as in Perl, or (?P<name>...) as in Python. Names
consist of up to 32 alphanumeric characters and underscores, but must start
with a non-digit. References to capturing parentheses from other parts of the
pattern, such as
.\" HTML <a href="#backreferences">
.\" </a>
backreferences,
@ -1839,23 +1840,47 @@ and
.\" </a>
conditions,
.\"
can be made by name as well as by number.
can all be made by name as well as by number.
.P
Names consist of up to 32 alphanumeric characters and underscores, but must
start with a non-digit. Named capturing parentheses are still allocated numbers
as well as names, exactly as if the names were not present. The PCRE2 API
provides function calls for extracting the name-to-number translation table
from a compiled pattern. There are also convenience functions for extracting a
captured substring by name.
Named capturing parentheses are allocated numbers as well as names, exactly as
if the names were not present. In both PCRE2 and Perl, capturing subpatterns
are primarily identified by numbers; any names are just aliases for these
numbers. The PCRE2 API provides function calls for extracting the complete
name-to-number translation table from a compiled pattern, as well as
convenience functions for extracting captured substrings by name.
.P
By default, a name must be unique within a pattern, but it is possible to relax
this constraint by setting the PCRE2_DUPNAMES option at compile time.
(Duplicate names are also always permitted for subpatterns with the same
number, set up as described in the previous section.) Duplicate names can be
useful for patterns where only one instance of the named parentheses can match.
Suppose you want to match the name of a weekday, either as a 3-letter
abbreviation or as the full name, and in both cases you want to extract the
abbreviation. This pattern (ignoring the line breaks) does the job:
\fBWarning:\fP When more than one subpattern has the same number, as described
in the previous section, a name given to one of them applies to all of them.
Perl allows identically numbered subpatterns to have different names. Consider
this pattern, where there are two capturing subpatterns, both numbered 1:
.sp
(?|(?<AA>aa)|(?<BB>bb))
.sp
Perl allows this, with both names AA and BB as aliases of group 1. Thus, after
a successful match, both names yield the same value (either "aa" or "bb").
.P
In an attempt to reduce confusion, PCRE2 does not allow the same group number
to be associated with more than one name. The example above provokes a
compile-time error. However, there is still scope for confusion. Consider this
pattern:
.sp
(?|(?<AA>aa)|(bb))
.sp
Although the second subpattern number 1 is not explicitly named, the name AA is
still an alias for subpattern 1. Whether the pattern matches "aa" or "bb", a
reference by name to group AA yields the matched string.
.P
By default, a name must be unique within a pattern, except that duplicate names
are permitted for subpatterns with the same number, for example:
.sp
(?|(?<AA>aa)|(?<AA>bb))
.sp
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
option at compile time, or by the use of (?J) within the pattern. Duplicate
names can be useful for patterns where only one instance of the named
parentheses can match. Suppose you want to match the name of a weekday, either
as a 3-letter abbreviation or as the full name, and in both cases you want to
extract the abbreviation. This pattern (ignoring the line breaks) does the job:
.sp
(?<DN>Mon|Fri|Sun)(?:day)?|
(?<DN>Tue)(?:sday)?|
@ -1864,12 +1889,11 @@ abbreviation. This pattern (ignoring the line breaks) does the job:
(?<DN>Sat)(?:urday)?
.sp
There are five capturing substrings, but only one is ever set after a match.
(An alternative way of solving this problem is to use a "branch reset"
subpattern, as described in the previous section.)
.P
The convenience functions for extracting the data by name returns the substring
for the first (and in this example, the only) subpattern of that name that
matched. This saves searching to find which numbered subpattern it was.
matched. This saves searching to find which numbered subpattern it was. (An
alternative way of solving this problem is to use a "branch reset" subpattern,
as described in the previous section.)
.P
If you make a backreference to a non-unique named subpattern from elsewhere in
the pattern, the subpatterns to which the name refers are checked in the order
@ -1882,8 +1906,7 @@ for the reference. For example, this pattern matches both "foofoo" and
.P
If you make a subroutine call to a non-unique named subpattern, the one that
corresponds to the first occurrence of the name is used. In the absence of
duplicate numbers (see the previous section) this is the one with the lowest
number.
duplicate numbers this is the one with the lowest number.
.P
If you use a named reference in a condition
test (see the
@ -1901,13 +1924,6 @@ handling named subpatterns, see the
\fBpcre2api\fP
.\"
documentation.
.P
\fBWarning:\fP You cannot use different names to distinguish between two
subpatterns with the same number because PCRE2 uses only the numbers when
matching. For this reason, an error is given at compile time if different names
are given to subpatterns with the same number. However, you can always give the
same name to subpatterns with the same number, even when PCRE2_DUPNAMES is not
set.
.
.
.SH REPETITION
@ -2982,10 +2998,12 @@ later versions (I tried 5.024) it now works.
.rs
.sp
If the syntax for a recursive subpattern call (either by number or by
name) is used outside the parentheses to which it refers, it operates like a
subroutine in a programming language. The called subpattern may be defined
before or after the reference. A numbered reference can be absolute or
relative, as in these examples:
name) is used outside the parentheses to which it refers, it operates a bit
like a subroutine in a programming language. More accurately, PCRE2 treats the
referenced subpattern as an independent subpattern which it tries to match at
the current matching position. The called subpattern may be defined before or
after the reference. A numbered reference can be absolute or relative, as in
these examples:
.sp
(...(absolute)...)...(?2)...
(...(relative)...)...(?-1)...
@ -3016,6 +3034,18 @@ different calls. For example, consider this pattern:
.sp
It matches "abcabc". It does not match "abcABC" because the change of
processing option does not affect the called subpattern.
.P
The behaviour of
.\" HTML <a href="#backtrackcontrol">
.\" </a>
backtracking control verbs
.\"
in subpatterns when called as subroutines is described in the section entitled
.\" HTML <a href="#btsub">
.\" </a>
"Backtracking verbs in subroutines"
.\"
below.
.
.
.\" HTML <a name="onigurumasubroutines"></a>