Documentation update.
This commit is contained in:
parent
b196143523
commit
c722bf2399
|
@ -1813,41 +1813,68 @@ duplicate named subpatterns, as described in the next section.
|
|||
<br><a name="SEC16" href="#TOC1">NAMED SUBPATTERNS</a><br>
|
||||
<P>
|
||||
Identifying capturing parentheses by number is simple, but it can be very hard
|
||||
to keep track of the numbers in complicated regular expressions. Furthermore,
|
||||
if an expression is modified, the numbers may change. To help with this
|
||||
difficulty, PCRE2 supports the naming of subpatterns. This feature was not
|
||||
added to Perl until release 5.10. Python had the feature earlier, and PCRE1
|
||||
to keep track of the numbers in complicated patterns. Furthermore, if an
|
||||
expression is modified, the numbers may change. To help with this difficulty,
|
||||
PCRE2 supports the naming of capturing subpatterns. This feature was not added
|
||||
to Perl until release 5.10. Python had the feature earlier, and PCRE1
|
||||
introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
|
||||
Perl and the Python syntax. Perl allows identically numbered subpatterns to
|
||||
have different names, but PCRE2 does not.
|
||||
Perl and the Python syntax.
|
||||
</P>
|
||||
<P>
|
||||
In PCRE2, a subpattern can be named in one of three ways: (?<name>...) or
|
||||
(?'name'...) as in Perl, or (?P<name>...) as in Python. References to capturing
|
||||
parentheses from other parts of the pattern, such as
|
||||
In PCRE2, a capturing subpattern can be named in one of three ways:
|
||||
(?<name>...) or (?'name'...) as in Perl, or (?P<name>...) as in Python. Names
|
||||
consist of up to 32 alphanumeric characters and underscores, but must start
|
||||
with a non-digit. References to capturing parentheses from other parts of the
|
||||
pattern, such as
|
||||
<a href="#backreferences">backreferences,</a>
|
||||
<a href="#recursion">recursion,</a>
|
||||
and
|
||||
<a href="#conditions">conditions,</a>
|
||||
can be made by name as well as by number.
|
||||
can all be made by name as well as by number.
|
||||
</P>
|
||||
<P>
|
||||
Names consist of up to 32 alphanumeric characters and underscores, but must
|
||||
start with a non-digit. Named capturing parentheses are still allocated numbers
|
||||
as well as names, exactly as if the names were not present. The PCRE2 API
|
||||
provides function calls for extracting the name-to-number translation table
|
||||
from a compiled pattern. There are also convenience functions for extracting a
|
||||
captured substring by name.
|
||||
Named capturing parentheses are allocated numbers as well as names, exactly as
|
||||
if the names were not present. In both PCRE2 and Perl, capturing subpatterns
|
||||
are primarily identified by numbers; any names are just aliases for these
|
||||
numbers. The PCRE2 API provides function calls for extracting the complete
|
||||
name-to-number translation table from a compiled pattern, as well as
|
||||
convenience functions for extracting captured substrings by name.
|
||||
</P>
|
||||
<P>
|
||||
By default, a name must be unique within a pattern, but it is possible to relax
|
||||
this constraint by setting the PCRE2_DUPNAMES option at compile time.
|
||||
(Duplicate names are also always permitted for subpatterns with the same
|
||||
number, set up as described in the previous section.) Duplicate names can be
|
||||
useful for patterns where only one instance of the named parentheses can match.
|
||||
Suppose you want to match the name of a weekday, either as a 3-letter
|
||||
abbreviation or as the full name, and in both cases you want to extract the
|
||||
abbreviation. This pattern (ignoring the line breaks) does the job:
|
||||
<b>Warning:</b> When more than one subpattern has the same number, as described
|
||||
in the previous section, a name given to one of them applies to all of them.
|
||||
Perl allows identically numbered subpatterns to have different names. Consider
|
||||
this pattern, where there are two capturing subpatterns, both numbered 1:
|
||||
<pre>
|
||||
(?|(?<AA>aa)|(?<BB>bb))
|
||||
</pre>
|
||||
Perl allows this, with both names AA and BB as aliases of group 1. Thus, after
|
||||
a successful match, both names yield the same value (either "aa" or "bb").
|
||||
</P>
|
||||
<P>
|
||||
In an attempt to reduce confusion, PCRE2 does not allow the same group number
|
||||
to be associated with more than one name. The example above provokes a
|
||||
compile-time error. However, there is still scope for confusion. Consider this
|
||||
pattern:
|
||||
<pre>
|
||||
(?|(?<AA>aa)|(bb))
|
||||
</pre>
|
||||
Although the second subpattern number 1 is not explicitly named, the name AA is
|
||||
still an alias for subpattern 1. Whether the pattern matches "aa" or "bb", a
|
||||
reference by name to group AA yields the matched string.
|
||||
</P>
|
||||
<P>
|
||||
By default, a name must be unique within a pattern, except that duplicate names
|
||||
are permitted for subpatterns with the same number, for example:
|
||||
<pre>
|
||||
(?|(?<AA>aa)|(?<AA>bb))
|
||||
</pre>
|
||||
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
|
||||
option at compile time, or by the use of (?J) within the pattern. Duplicate
|
||||
names can be useful for patterns where only one instance of the named
|
||||
parentheses can match. Suppose you want to match the name of a weekday, either
|
||||
as a 3-letter abbreviation or as the full name, and in both cases you want to
|
||||
extract the abbreviation. This pattern (ignoring the line breaks) does the job:
|
||||
<pre>
|
||||
(?<DN>Mon|Fri|Sun)(?:day)?|
|
||||
(?<DN>Tue)(?:sday)?|
|
||||
|
@ -1856,13 +1883,11 @@ abbreviation. This pattern (ignoring the line breaks) does the job:
|
|||
(?<DN>Sat)(?:urday)?
|
||||
</pre>
|
||||
There are five capturing substrings, but only one is ever set after a match.
|
||||
(An alternative way of solving this problem is to use a "branch reset"
|
||||
subpattern, as described in the previous section.)
|
||||
</P>
|
||||
<P>
|
||||
The convenience functions for extracting the data by name returns the substring
|
||||
for the first (and in this example, the only) subpattern of that name that
|
||||
matched. This saves searching to find which numbered subpattern it was.
|
||||
matched. This saves searching to find which numbered subpattern it was. (An
|
||||
alternative way of solving this problem is to use a "branch reset" subpattern,
|
||||
as described in the previous section.)
|
||||
</P>
|
||||
<P>
|
||||
If you make a backreference to a non-unique named subpattern from elsewhere in
|
||||
|
@ -1878,8 +1903,7 @@ for the reference. For example, this pattern matches both "foofoo" and
|
|||
<P>
|
||||
If you make a subroutine call to a non-unique named subpattern, the one that
|
||||
corresponds to the first occurrence of the name is used. In the absence of
|
||||
duplicate numbers (see the previous section) this is the one with the lowest
|
||||
number.
|
||||
duplicate numbers this is the one with the lowest number.
|
||||
</P>
|
||||
<P>
|
||||
If you use a named reference in a condition
|
||||
|
@ -1893,14 +1917,6 @@ handling named subpatterns, see the
|
|||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
<b>Warning:</b> You cannot use different names to distinguish between two
|
||||
subpatterns with the same number because PCRE2 uses only the numbers when
|
||||
matching. For this reason, an error is given at compile time if different names
|
||||
are given to subpatterns with the same number. However, you can always give the
|
||||
same name to subpatterns with the same number, even when PCRE2_DUPNAMES is not
|
||||
set.
|
||||
</P>
|
||||
<br><a name="SEC17" href="#TOC1">REPETITION</a><br>
|
||||
<P>
|
||||
Repetition is specified by quantifiers, which can follow any of the following
|
||||
|
@ -2957,10 +2973,12 @@ later versions (I tried 5.024) it now works.
|
|||
<br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
|
||||
<P>
|
||||
If the syntax for a recursive subpattern call (either by number or by
|
||||
name) is used outside the parentheses to which it refers, it operates like a
|
||||
subroutine in a programming language. The called subpattern may be defined
|
||||
before or after the reference. A numbered reference can be absolute or
|
||||
relative, as in these examples:
|
||||
name) is used outside the parentheses to which it refers, it operates a bit
|
||||
like a subroutine in a programming language. More accurately, PCRE2 treats the
|
||||
referenced subpattern as an independent subpattern which it tries to match at
|
||||
the current matching position. The called subpattern may be defined before or
|
||||
after the reference. A numbered reference can be absolute or relative, as in
|
||||
these examples:
|
||||
<pre>
|
||||
(...(absolute)...)...(?2)...
|
||||
(...(relative)...)...(?-1)...
|
||||
|
@ -2993,6 +3011,13 @@ different calls. For example, consider this pattern:
|
|||
</pre>
|
||||
It matches "abcabc". It does not match "abcABC" because the change of
|
||||
processing option does not affect the called subpattern.
|
||||
</P>
|
||||
<P>
|
||||
The behaviour of
|
||||
<a href="#backtrackcontrol">backtracking control verbs</a>
|
||||
in subpatterns when called as subroutines is described in the section entitled
|
||||
<a href="#btsub">"Backtracking verbs in subroutines"</a>
|
||||
below.
|
||||
<a name="onigurumasubroutines"></a></P>
|
||||
<br><a name="SEC25" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
|
||||
<P>
|
||||
|
|
1361
doc/pcre2.txt
1361
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1815,17 +1815,18 @@ duplicate named subpatterns, as described in the next section.
|
|||
.rs
|
||||
.sp
|
||||
Identifying capturing parentheses by number is simple, but it can be very hard
|
||||
to keep track of the numbers in complicated regular expressions. Furthermore,
|
||||
if an expression is modified, the numbers may change. To help with this
|
||||
difficulty, PCRE2 supports the naming of subpatterns. This feature was not
|
||||
added to Perl until release 5.10. Python had the feature earlier, and PCRE1
|
||||
to keep track of the numbers in complicated patterns. Furthermore, if an
|
||||
expression is modified, the numbers may change. To help with this difficulty,
|
||||
PCRE2 supports the naming of capturing subpatterns. This feature was not added
|
||||
to Perl until release 5.10. Python had the feature earlier, and PCRE1
|
||||
introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
|
||||
Perl and the Python syntax. Perl allows identically numbered subpatterns to
|
||||
have different names, but PCRE2 does not.
|
||||
Perl and the Python syntax.
|
||||
.P
|
||||
In PCRE2, a subpattern can be named in one of three ways: (?<name>...) or
|
||||
(?'name'...) as in Perl, or (?P<name>...) as in Python. References to capturing
|
||||
parentheses from other parts of the pattern, such as
|
||||
In PCRE2, a capturing subpattern can be named in one of three ways:
|
||||
(?<name>...) or (?'name'...) as in Perl, or (?P<name>...) as in Python. Names
|
||||
consist of up to 32 alphanumeric characters and underscores, but must start
|
||||
with a non-digit. References to capturing parentheses from other parts of the
|
||||
pattern, such as
|
||||
.\" HTML <a href="#backreferences">
|
||||
.\" </a>
|
||||
backreferences,
|
||||
|
@ -1839,23 +1840,47 @@ and
|
|||
.\" </a>
|
||||
conditions,
|
||||
.\"
|
||||
can be made by name as well as by number.
|
||||
can all be made by name as well as by number.
|
||||
.P
|
||||
Names consist of up to 32 alphanumeric characters and underscores, but must
|
||||
start with a non-digit. Named capturing parentheses are still allocated numbers
|
||||
as well as names, exactly as if the names were not present. The PCRE2 API
|
||||
provides function calls for extracting the name-to-number translation table
|
||||
from a compiled pattern. There are also convenience functions for extracting a
|
||||
captured substring by name.
|
||||
Named capturing parentheses are allocated numbers as well as names, exactly as
|
||||
if the names were not present. In both PCRE2 and Perl, capturing subpatterns
|
||||
are primarily identified by numbers; any names are just aliases for these
|
||||
numbers. The PCRE2 API provides function calls for extracting the complete
|
||||
name-to-number translation table from a compiled pattern, as well as
|
||||
convenience functions for extracting captured substrings by name.
|
||||
.P
|
||||
By default, a name must be unique within a pattern, but it is possible to relax
|
||||
this constraint by setting the PCRE2_DUPNAMES option at compile time.
|
||||
(Duplicate names are also always permitted for subpatterns with the same
|
||||
number, set up as described in the previous section.) Duplicate names can be
|
||||
useful for patterns where only one instance of the named parentheses can match.
|
||||
Suppose you want to match the name of a weekday, either as a 3-letter
|
||||
abbreviation or as the full name, and in both cases you want to extract the
|
||||
abbreviation. This pattern (ignoring the line breaks) does the job:
|
||||
\fBWarning:\fP When more than one subpattern has the same number, as described
|
||||
in the previous section, a name given to one of them applies to all of them.
|
||||
Perl allows identically numbered subpatterns to have different names. Consider
|
||||
this pattern, where there are two capturing subpatterns, both numbered 1:
|
||||
.sp
|
||||
(?|(?<AA>aa)|(?<BB>bb))
|
||||
.sp
|
||||
Perl allows this, with both names AA and BB as aliases of group 1. Thus, after
|
||||
a successful match, both names yield the same value (either "aa" or "bb").
|
||||
.P
|
||||
In an attempt to reduce confusion, PCRE2 does not allow the same group number
|
||||
to be associated with more than one name. The example above provokes a
|
||||
compile-time error. However, there is still scope for confusion. Consider this
|
||||
pattern:
|
||||
.sp
|
||||
(?|(?<AA>aa)|(bb))
|
||||
.sp
|
||||
Although the second subpattern number 1 is not explicitly named, the name AA is
|
||||
still an alias for subpattern 1. Whether the pattern matches "aa" or "bb", a
|
||||
reference by name to group AA yields the matched string.
|
||||
.P
|
||||
By default, a name must be unique within a pattern, except that duplicate names
|
||||
are permitted for subpatterns with the same number, for example:
|
||||
.sp
|
||||
(?|(?<AA>aa)|(?<AA>bb))
|
||||
.sp
|
||||
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
|
||||
option at compile time, or by the use of (?J) within the pattern. Duplicate
|
||||
names can be useful for patterns where only one instance of the named
|
||||
parentheses can match. Suppose you want to match the name of a weekday, either
|
||||
as a 3-letter abbreviation or as the full name, and in both cases you want to
|
||||
extract the abbreviation. This pattern (ignoring the line breaks) does the job:
|
||||
.sp
|
||||
(?<DN>Mon|Fri|Sun)(?:day)?|
|
||||
(?<DN>Tue)(?:sday)?|
|
||||
|
@ -1864,12 +1889,11 @@ abbreviation. This pattern (ignoring the line breaks) does the job:
|
|||
(?<DN>Sat)(?:urday)?
|
||||
.sp
|
||||
There are five capturing substrings, but only one is ever set after a match.
|
||||
(An alternative way of solving this problem is to use a "branch reset"
|
||||
subpattern, as described in the previous section.)
|
||||
.P
|
||||
The convenience functions for extracting the data by name returns the substring
|
||||
for the first (and in this example, the only) subpattern of that name that
|
||||
matched. This saves searching to find which numbered subpattern it was.
|
||||
matched. This saves searching to find which numbered subpattern it was. (An
|
||||
alternative way of solving this problem is to use a "branch reset" subpattern,
|
||||
as described in the previous section.)
|
||||
.P
|
||||
If you make a backreference to a non-unique named subpattern from elsewhere in
|
||||
the pattern, the subpatterns to which the name refers are checked in the order
|
||||
|
@ -1882,8 +1906,7 @@ for the reference. For example, this pattern matches both "foofoo" and
|
|||
.P
|
||||
If you make a subroutine call to a non-unique named subpattern, the one that
|
||||
corresponds to the first occurrence of the name is used. In the absence of
|
||||
duplicate numbers (see the previous section) this is the one with the lowest
|
||||
number.
|
||||
duplicate numbers this is the one with the lowest number.
|
||||
.P
|
||||
If you use a named reference in a condition
|
||||
test (see the
|
||||
|
@ -1901,13 +1924,6 @@ handling named subpatterns, see the
|
|||
\fBpcre2api\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
\fBWarning:\fP You cannot use different names to distinguish between two
|
||||
subpatterns with the same number because PCRE2 uses only the numbers when
|
||||
matching. For this reason, an error is given at compile time if different names
|
||||
are given to subpatterns with the same number. However, you can always give the
|
||||
same name to subpatterns with the same number, even when PCRE2_DUPNAMES is not
|
||||
set.
|
||||
.
|
||||
.
|
||||
.SH REPETITION
|
||||
|
@ -2982,10 +2998,12 @@ later versions (I tried 5.024) it now works.
|
|||
.rs
|
||||
.sp
|
||||
If the syntax for a recursive subpattern call (either by number or by
|
||||
name) is used outside the parentheses to which it refers, it operates like a
|
||||
subroutine in a programming language. The called subpattern may be defined
|
||||
before or after the reference. A numbered reference can be absolute or
|
||||
relative, as in these examples:
|
||||
name) is used outside the parentheses to which it refers, it operates a bit
|
||||
like a subroutine in a programming language. More accurately, PCRE2 treats the
|
||||
referenced subpattern as an independent subpattern which it tries to match at
|
||||
the current matching position. The called subpattern may be defined before or
|
||||
after the reference. A numbered reference can be absolute or relative, as in
|
||||
these examples:
|
||||
.sp
|
||||
(...(absolute)...)...(?2)...
|
||||
(...(relative)...)...(?-1)...
|
||||
|
@ -3016,6 +3034,18 @@ different calls. For example, consider this pattern:
|
|||
.sp
|
||||
It matches "abcabc". It does not match "abcABC" because the change of
|
||||
processing option does not affect the called subpattern.
|
||||
.P
|
||||
The behaviour of
|
||||
.\" HTML <a href="#backtrackcontrol">
|
||||
.\" </a>
|
||||
backtracking control verbs
|
||||
.\"
|
||||
in subpatterns when called as subroutines is described in the section entitled
|
||||
.\" HTML <a href="#btsub">
|
||||
.\" </a>
|
||||
"Backtracking verbs in subroutines"
|
||||
.\"
|
||||
below.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="onigurumasubroutines"></a>
|
||||
|
|
Loading…
Reference in New Issue