Documentation update.
This commit is contained in:
parent
b196143523
commit
c722bf2399
|
@ -1813,41 +1813,68 @@ duplicate named subpatterns, as described in the next section.
|
||||||
<br><a name="SEC16" href="#TOC1">NAMED SUBPATTERNS</a><br>
|
<br><a name="SEC16" href="#TOC1">NAMED SUBPATTERNS</a><br>
|
||||||
<P>
|
<P>
|
||||||
Identifying capturing parentheses by number is simple, but it can be very hard
|
Identifying capturing parentheses by number is simple, but it can be very hard
|
||||||
to keep track of the numbers in complicated regular expressions. Furthermore,
|
to keep track of the numbers in complicated patterns. Furthermore, if an
|
||||||
if an expression is modified, the numbers may change. To help with this
|
expression is modified, the numbers may change. To help with this difficulty,
|
||||||
difficulty, PCRE2 supports the naming of subpatterns. This feature was not
|
PCRE2 supports the naming of capturing subpatterns. This feature was not added
|
||||||
added to Perl until release 5.10. Python had the feature earlier, and PCRE1
|
to Perl until release 5.10. Python had the feature earlier, and PCRE1
|
||||||
introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
|
introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
|
||||||
Perl and the Python syntax. Perl allows identically numbered subpatterns to
|
Perl and the Python syntax.
|
||||||
have different names, but PCRE2 does not.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
In PCRE2, a subpattern can be named in one of three ways: (?<name>...) or
|
In PCRE2, a capturing subpattern can be named in one of three ways:
|
||||||
(?'name'...) as in Perl, or (?P<name>...) as in Python. References to capturing
|
(?<name>...) or (?'name'...) as in Perl, or (?P<name>...) as in Python. Names
|
||||||
parentheses from other parts of the pattern, such as
|
consist of up to 32 alphanumeric characters and underscores, but must start
|
||||||
|
with a non-digit. References to capturing parentheses from other parts of the
|
||||||
|
pattern, such as
|
||||||
<a href="#backreferences">backreferences,</a>
|
<a href="#backreferences">backreferences,</a>
|
||||||
<a href="#recursion">recursion,</a>
|
<a href="#recursion">recursion,</a>
|
||||||
and
|
and
|
||||||
<a href="#conditions">conditions,</a>
|
<a href="#conditions">conditions,</a>
|
||||||
can be made by name as well as by number.
|
can all be made by name as well as by number.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Names consist of up to 32 alphanumeric characters and underscores, but must
|
Named capturing parentheses are allocated numbers as well as names, exactly as
|
||||||
start with a non-digit. Named capturing parentheses are still allocated numbers
|
if the names were not present. In both PCRE2 and Perl, capturing subpatterns
|
||||||
as well as names, exactly as if the names were not present. The PCRE2 API
|
are primarily identified by numbers; any names are just aliases for these
|
||||||
provides function calls for extracting the name-to-number translation table
|
numbers. The PCRE2 API provides function calls for extracting the complete
|
||||||
from a compiled pattern. There are also convenience functions for extracting a
|
name-to-number translation table from a compiled pattern, as well as
|
||||||
captured substring by name.
|
convenience functions for extracting captured substrings by name.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
By default, a name must be unique within a pattern, but it is possible to relax
|
<b>Warning:</b> When more than one subpattern has the same number, as described
|
||||||
this constraint by setting the PCRE2_DUPNAMES option at compile time.
|
in the previous section, a name given to one of them applies to all of them.
|
||||||
(Duplicate names are also always permitted for subpatterns with the same
|
Perl allows identically numbered subpatterns to have different names. Consider
|
||||||
number, set up as described in the previous section.) Duplicate names can be
|
this pattern, where there are two capturing subpatterns, both numbered 1:
|
||||||
useful for patterns where only one instance of the named parentheses can match.
|
<pre>
|
||||||
Suppose you want to match the name of a weekday, either as a 3-letter
|
(?|(?<AA>aa)|(?<BB>bb))
|
||||||
abbreviation or as the full name, and in both cases you want to extract the
|
</pre>
|
||||||
abbreviation. This pattern (ignoring the line breaks) does the job:
|
Perl allows this, with both names AA and BB as aliases of group 1. Thus, after
|
||||||
|
a successful match, both names yield the same value (either "aa" or "bb").
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
In an attempt to reduce confusion, PCRE2 does not allow the same group number
|
||||||
|
to be associated with more than one name. The example above provokes a
|
||||||
|
compile-time error. However, there is still scope for confusion. Consider this
|
||||||
|
pattern:
|
||||||
|
<pre>
|
||||||
|
(?|(?<AA>aa)|(bb))
|
||||||
|
</pre>
|
||||||
|
Although the second subpattern number 1 is not explicitly named, the name AA is
|
||||||
|
still an alias for subpattern 1. Whether the pattern matches "aa" or "bb", a
|
||||||
|
reference by name to group AA yields the matched string.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
By default, a name must be unique within a pattern, except that duplicate names
|
||||||
|
are permitted for subpatterns with the same number, for example:
|
||||||
|
<pre>
|
||||||
|
(?|(?<AA>aa)|(?<AA>bb))
|
||||||
|
</pre>
|
||||||
|
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
|
||||||
|
option at compile time, or by the use of (?J) within the pattern. Duplicate
|
||||||
|
names can be useful for patterns where only one instance of the named
|
||||||
|
parentheses can match. Suppose you want to match the name of a weekday, either
|
||||||
|
as a 3-letter abbreviation or as the full name, and in both cases you want to
|
||||||
|
extract the abbreviation. This pattern (ignoring the line breaks) does the job:
|
||||||
<pre>
|
<pre>
|
||||||
(?<DN>Mon|Fri|Sun)(?:day)?|
|
(?<DN>Mon|Fri|Sun)(?:day)?|
|
||||||
(?<DN>Tue)(?:sday)?|
|
(?<DN>Tue)(?:sday)?|
|
||||||
|
@ -1856,13 +1883,11 @@ abbreviation. This pattern (ignoring the line breaks) does the job:
|
||||||
(?<DN>Sat)(?:urday)?
|
(?<DN>Sat)(?:urday)?
|
||||||
</pre>
|
</pre>
|
||||||
There are five capturing substrings, but only one is ever set after a match.
|
There are five capturing substrings, but only one is ever set after a match.
|
||||||
(An alternative way of solving this problem is to use a "branch reset"
|
|
||||||
subpattern, as described in the previous section.)
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
The convenience functions for extracting the data by name returns the substring
|
The convenience functions for extracting the data by name returns the substring
|
||||||
for the first (and in this example, the only) subpattern of that name that
|
for the first (and in this example, the only) subpattern of that name that
|
||||||
matched. This saves searching to find which numbered subpattern it was.
|
matched. This saves searching to find which numbered subpattern it was. (An
|
||||||
|
alternative way of solving this problem is to use a "branch reset" subpattern,
|
||||||
|
as described in the previous section.)
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If you make a backreference to a non-unique named subpattern from elsewhere in
|
If you make a backreference to a non-unique named subpattern from elsewhere in
|
||||||
|
@ -1878,8 +1903,7 @@ for the reference. For example, this pattern matches both "foofoo" and
|
||||||
<P>
|
<P>
|
||||||
If you make a subroutine call to a non-unique named subpattern, the one that
|
If you make a subroutine call to a non-unique named subpattern, the one that
|
||||||
corresponds to the first occurrence of the name is used. In the absence of
|
corresponds to the first occurrence of the name is used. In the absence of
|
||||||
duplicate numbers (see the previous section) this is the one with the lowest
|
duplicate numbers this is the one with the lowest number.
|
||||||
number.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If you use a named reference in a condition
|
If you use a named reference in a condition
|
||||||
|
@ -1893,14 +1917,6 @@ handling named subpatterns, see the
|
||||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
documentation.
|
documentation.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
|
||||||
<b>Warning:</b> You cannot use different names to distinguish between two
|
|
||||||
subpatterns with the same number because PCRE2 uses only the numbers when
|
|
||||||
matching. For this reason, an error is given at compile time if different names
|
|
||||||
are given to subpatterns with the same number. However, you can always give the
|
|
||||||
same name to subpatterns with the same number, even when PCRE2_DUPNAMES is not
|
|
||||||
set.
|
|
||||||
</P>
|
|
||||||
<br><a name="SEC17" href="#TOC1">REPETITION</a><br>
|
<br><a name="SEC17" href="#TOC1">REPETITION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Repetition is specified by quantifiers, which can follow any of the following
|
Repetition is specified by quantifiers, which can follow any of the following
|
||||||
|
@ -2957,10 +2973,12 @@ later versions (I tried 5.024) it now works.
|
||||||
<br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
|
<br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
|
||||||
<P>
|
<P>
|
||||||
If the syntax for a recursive subpattern call (either by number or by
|
If the syntax for a recursive subpattern call (either by number or by
|
||||||
name) is used outside the parentheses to which it refers, it operates like a
|
name) is used outside the parentheses to which it refers, it operates a bit
|
||||||
subroutine in a programming language. The called subpattern may be defined
|
like a subroutine in a programming language. More accurately, PCRE2 treats the
|
||||||
before or after the reference. A numbered reference can be absolute or
|
referenced subpattern as an independent subpattern which it tries to match at
|
||||||
relative, as in these examples:
|
the current matching position. The called subpattern may be defined before or
|
||||||
|
after the reference. A numbered reference can be absolute or relative, as in
|
||||||
|
these examples:
|
||||||
<pre>
|
<pre>
|
||||||
(...(absolute)...)...(?2)...
|
(...(absolute)...)...(?2)...
|
||||||
(...(relative)...)...(?-1)...
|
(...(relative)...)...(?-1)...
|
||||||
|
@ -2993,6 +3011,13 @@ different calls. For example, consider this pattern:
|
||||||
</pre>
|
</pre>
|
||||||
It matches "abcabc". It does not match "abcABC" because the change of
|
It matches "abcabc". It does not match "abcABC" because the change of
|
||||||
processing option does not affect the called subpattern.
|
processing option does not affect the called subpattern.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The behaviour of
|
||||||
|
<a href="#backtrackcontrol">backtracking control verbs</a>
|
||||||
|
in subpatterns when called as subroutines is described in the section entitled
|
||||||
|
<a href="#btsub">"Backtracking verbs in subroutines"</a>
|
||||||
|
below.
|
||||||
<a name="onigurumasubroutines"></a></P>
|
<a name="onigurumasubroutines"></a></P>
|
||||||
<br><a name="SEC25" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
|
<br><a name="SEC25" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
|
1361
doc/pcre2.txt
1361
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1815,17 +1815,18 @@ duplicate named subpatterns, as described in the next section.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
Identifying capturing parentheses by number is simple, but it can be very hard
|
Identifying capturing parentheses by number is simple, but it can be very hard
|
||||||
to keep track of the numbers in complicated regular expressions. Furthermore,
|
to keep track of the numbers in complicated patterns. Furthermore, if an
|
||||||
if an expression is modified, the numbers may change. To help with this
|
expression is modified, the numbers may change. To help with this difficulty,
|
||||||
difficulty, PCRE2 supports the naming of subpatterns. This feature was not
|
PCRE2 supports the naming of capturing subpatterns. This feature was not added
|
||||||
added to Perl until release 5.10. Python had the feature earlier, and PCRE1
|
to Perl until release 5.10. Python had the feature earlier, and PCRE1
|
||||||
introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
|
introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
|
||||||
Perl and the Python syntax. Perl allows identically numbered subpatterns to
|
Perl and the Python syntax.
|
||||||
have different names, but PCRE2 does not.
|
|
||||||
.P
|
.P
|
||||||
In PCRE2, a subpattern can be named in one of three ways: (?<name>...) or
|
In PCRE2, a capturing subpattern can be named in one of three ways:
|
||||||
(?'name'...) as in Perl, or (?P<name>...) as in Python. References to capturing
|
(?<name>...) or (?'name'...) as in Perl, or (?P<name>...) as in Python. Names
|
||||||
parentheses from other parts of the pattern, such as
|
consist of up to 32 alphanumeric characters and underscores, but must start
|
||||||
|
with a non-digit. References to capturing parentheses from other parts of the
|
||||||
|
pattern, such as
|
||||||
.\" HTML <a href="#backreferences">
|
.\" HTML <a href="#backreferences">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
backreferences,
|
backreferences,
|
||||||
|
@ -1839,23 +1840,47 @@ and
|
||||||
.\" </a>
|
.\" </a>
|
||||||
conditions,
|
conditions,
|
||||||
.\"
|
.\"
|
||||||
can be made by name as well as by number.
|
can all be made by name as well as by number.
|
||||||
.P
|
.P
|
||||||
Names consist of up to 32 alphanumeric characters and underscores, but must
|
Named capturing parentheses are allocated numbers as well as names, exactly as
|
||||||
start with a non-digit. Named capturing parentheses are still allocated numbers
|
if the names were not present. In both PCRE2 and Perl, capturing subpatterns
|
||||||
as well as names, exactly as if the names were not present. The PCRE2 API
|
are primarily identified by numbers; any names are just aliases for these
|
||||||
provides function calls for extracting the name-to-number translation table
|
numbers. The PCRE2 API provides function calls for extracting the complete
|
||||||
from a compiled pattern. There are also convenience functions for extracting a
|
name-to-number translation table from a compiled pattern, as well as
|
||||||
captured substring by name.
|
convenience functions for extracting captured substrings by name.
|
||||||
.P
|
.P
|
||||||
By default, a name must be unique within a pattern, but it is possible to relax
|
\fBWarning:\fP When more than one subpattern has the same number, as described
|
||||||
this constraint by setting the PCRE2_DUPNAMES option at compile time.
|
in the previous section, a name given to one of them applies to all of them.
|
||||||
(Duplicate names are also always permitted for subpatterns with the same
|
Perl allows identically numbered subpatterns to have different names. Consider
|
||||||
number, set up as described in the previous section.) Duplicate names can be
|
this pattern, where there are two capturing subpatterns, both numbered 1:
|
||||||
useful for patterns where only one instance of the named parentheses can match.
|
.sp
|
||||||
Suppose you want to match the name of a weekday, either as a 3-letter
|
(?|(?<AA>aa)|(?<BB>bb))
|
||||||
abbreviation or as the full name, and in both cases you want to extract the
|
.sp
|
||||||
abbreviation. This pattern (ignoring the line breaks) does the job:
|
Perl allows this, with both names AA and BB as aliases of group 1. Thus, after
|
||||||
|
a successful match, both names yield the same value (either "aa" or "bb").
|
||||||
|
.P
|
||||||
|
In an attempt to reduce confusion, PCRE2 does not allow the same group number
|
||||||
|
to be associated with more than one name. The example above provokes a
|
||||||
|
compile-time error. However, there is still scope for confusion. Consider this
|
||||||
|
pattern:
|
||||||
|
.sp
|
||||||
|
(?|(?<AA>aa)|(bb))
|
||||||
|
.sp
|
||||||
|
Although the second subpattern number 1 is not explicitly named, the name AA is
|
||||||
|
still an alias for subpattern 1. Whether the pattern matches "aa" or "bb", a
|
||||||
|
reference by name to group AA yields the matched string.
|
||||||
|
.P
|
||||||
|
By default, a name must be unique within a pattern, except that duplicate names
|
||||||
|
are permitted for subpatterns with the same number, for example:
|
||||||
|
.sp
|
||||||
|
(?|(?<AA>aa)|(?<AA>bb))
|
||||||
|
.sp
|
||||||
|
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
|
||||||
|
option at compile time, or by the use of (?J) within the pattern. Duplicate
|
||||||
|
names can be useful for patterns where only one instance of the named
|
||||||
|
parentheses can match. Suppose you want to match the name of a weekday, either
|
||||||
|
as a 3-letter abbreviation or as the full name, and in both cases you want to
|
||||||
|
extract the abbreviation. This pattern (ignoring the line breaks) does the job:
|
||||||
.sp
|
.sp
|
||||||
(?<DN>Mon|Fri|Sun)(?:day)?|
|
(?<DN>Mon|Fri|Sun)(?:day)?|
|
||||||
(?<DN>Tue)(?:sday)?|
|
(?<DN>Tue)(?:sday)?|
|
||||||
|
@ -1864,12 +1889,11 @@ abbreviation. This pattern (ignoring the line breaks) does the job:
|
||||||
(?<DN>Sat)(?:urday)?
|
(?<DN>Sat)(?:urday)?
|
||||||
.sp
|
.sp
|
||||||
There are five capturing substrings, but only one is ever set after a match.
|
There are five capturing substrings, but only one is ever set after a match.
|
||||||
(An alternative way of solving this problem is to use a "branch reset"
|
|
||||||
subpattern, as described in the previous section.)
|
|
||||||
.P
|
|
||||||
The convenience functions for extracting the data by name returns the substring
|
The convenience functions for extracting the data by name returns the substring
|
||||||
for the first (and in this example, the only) subpattern of that name that
|
for the first (and in this example, the only) subpattern of that name that
|
||||||
matched. This saves searching to find which numbered subpattern it was.
|
matched. This saves searching to find which numbered subpattern it was. (An
|
||||||
|
alternative way of solving this problem is to use a "branch reset" subpattern,
|
||||||
|
as described in the previous section.)
|
||||||
.P
|
.P
|
||||||
If you make a backreference to a non-unique named subpattern from elsewhere in
|
If you make a backreference to a non-unique named subpattern from elsewhere in
|
||||||
the pattern, the subpatterns to which the name refers are checked in the order
|
the pattern, the subpatterns to which the name refers are checked in the order
|
||||||
|
@ -1882,8 +1906,7 @@ for the reference. For example, this pattern matches both "foofoo" and
|
||||||
.P
|
.P
|
||||||
If you make a subroutine call to a non-unique named subpattern, the one that
|
If you make a subroutine call to a non-unique named subpattern, the one that
|
||||||
corresponds to the first occurrence of the name is used. In the absence of
|
corresponds to the first occurrence of the name is used. In the absence of
|
||||||
duplicate numbers (see the previous section) this is the one with the lowest
|
duplicate numbers this is the one with the lowest number.
|
||||||
number.
|
|
||||||
.P
|
.P
|
||||||
If you use a named reference in a condition
|
If you use a named reference in a condition
|
||||||
test (see the
|
test (see the
|
||||||
|
@ -1901,13 +1924,6 @@ handling named subpatterns, see the
|
||||||
\fBpcre2api\fP
|
\fBpcre2api\fP
|
||||||
.\"
|
.\"
|
||||||
documentation.
|
documentation.
|
||||||
.P
|
|
||||||
\fBWarning:\fP You cannot use different names to distinguish between two
|
|
||||||
subpatterns with the same number because PCRE2 uses only the numbers when
|
|
||||||
matching. For this reason, an error is given at compile time if different names
|
|
||||||
are given to subpatterns with the same number. However, you can always give the
|
|
||||||
same name to subpatterns with the same number, even when PCRE2_DUPNAMES is not
|
|
||||||
set.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH REPETITION
|
.SH REPETITION
|
||||||
|
@ -2982,10 +2998,12 @@ later versions (I tried 5.024) it now works.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
If the syntax for a recursive subpattern call (either by number or by
|
If the syntax for a recursive subpattern call (either by number or by
|
||||||
name) is used outside the parentheses to which it refers, it operates like a
|
name) is used outside the parentheses to which it refers, it operates a bit
|
||||||
subroutine in a programming language. The called subpattern may be defined
|
like a subroutine in a programming language. More accurately, PCRE2 treats the
|
||||||
before or after the reference. A numbered reference can be absolute or
|
referenced subpattern as an independent subpattern which it tries to match at
|
||||||
relative, as in these examples:
|
the current matching position. The called subpattern may be defined before or
|
||||||
|
after the reference. A numbered reference can be absolute or relative, as in
|
||||||
|
these examples:
|
||||||
.sp
|
.sp
|
||||||
(...(absolute)...)...(?2)...
|
(...(absolute)...)...(?2)...
|
||||||
(...(relative)...)...(?-1)...
|
(...(relative)...)...(?-1)...
|
||||||
|
@ -3016,6 +3034,18 @@ different calls. For example, consider this pattern:
|
||||||
.sp
|
.sp
|
||||||
It matches "abcabc". It does not match "abcABC" because the change of
|
It matches "abcabc". It does not match "abcABC" because the change of
|
||||||
processing option does not affect the called subpattern.
|
processing option does not affect the called subpattern.
|
||||||
|
.P
|
||||||
|
The behaviour of
|
||||||
|
.\" HTML <a href="#backtrackcontrol">
|
||||||
|
.\" </a>
|
||||||
|
backtracking control verbs
|
||||||
|
.\"
|
||||||
|
in subpatterns when called as subroutines is described in the section entitled
|
||||||
|
.\" HTML <a href="#btsub">
|
||||||
|
.\" </a>
|
||||||
|
"Backtracking verbs in subroutines"
|
||||||
|
.\"
|
||||||
|
below.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="onigurumasubroutines"></a>
|
.\" HTML <a name="onigurumasubroutines"></a>
|
||||||
|
|
Loading…
Reference in New Issue