Documentation update.

2018-08-03 16:56:54 +00:00 · 2018-08-03 16:56:54 +00:00 · c722bf2399
parent b196143523
commit c722bf2399
3 changed files with 889 additions and 809 deletions
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@ -249,7 +249,7 @@ is used.
 <P>
 The newline convention affects where the circumflex and dollar assertions are
 true. It also affects the interpretation of the dot metacharacter when
-PCRE2_DOTALL is not set, and the behaviour of \N when not followed by an 
+PCRE2_DOTALL is not set, and the behaviour of \N when not followed by an
 opening brace. However, it does not affect what the \R escape sequence
 matches. By default, this is any Unicode newline sequence, for Perl
 compatibility. However, this can be changed; see the next section and the
@ -357,7 +357,7 @@ of the pattern.
 If you want to remove the special meaning from a sequence of characters, you
 can do so by putting them between \Q and \E. This is different from Perl in
 that $ and @ are handled as literals in \Q...\E sequences in PCRE2, whereas
-in Perl, $ and @ cause variable interpolation. Also, Perl does "double-quotish 
+in Perl, $ and @ cause variable interpolation. Also, Perl does "double-quotish
 backslash interpolation" on any backslashes between \Q and \E which, its
 documentation says, "may lead to confusing results". PCRE2 treats a backslash
 between \Q and \E just like any other character. Note the following examples:
@ -400,7 +400,7 @@ these escapes are as follows:
  \o{ddd..}   character with octal code ddd..
  \xhh        character with hex code hh
  \x{hhh..}   character with hex code hhh.. (default mode)
-  \N{U+hhh..} character with Unicode code point hhh.. 
+  \N{U+hhh..} character with Unicode code point hhh..
  \uhhhh      character with hex code hhhh (when PCRE2_ALT_BSUX is set)
 </pre>
 Note that when \N is not followed by an opening brace (curly bracket) it has
@ -590,7 +590,7 @@ Another use of backslash is for specifying generic character types:
  \D     any character that is not a decimal digit
  \h     any horizontal white space character
  \H     any character that is not a horizontal white space character
-  \N     any character that is not a newline 
+  \N     any character that is not a newline
  \s     any white space character
  \S     any character that is not a white space character
  \v     any vertical white space character
@ -600,8 +600,8 @@ Another use of backslash is for specifying generic character types:
 </pre>
 The \N escape sequence has the same meaning as
 <a href="#fullstopdot">the "." metacharacter</a>
-when PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change the 
-meaning of \N. Note that when \N is followed by an opening brace it has a 
+when PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change the
+meaning of \N. Note that when \N is followed by an opening brace it has a
 different meaning. See the section entitled
 <a href="#digitsafterbackslash">"Non-printing characters"</a>
 above for details. Perl also uses \N{name} to specify characters by Unicode
@ -1030,8 +1030,8 @@ grapheme cluster", and treats the sequence as an atomic group
 Unicode supports various kinds of composite character by giving each character
 a grapheme breaking property, and having rules that use these properties to
 define the boundaries of extended grapheme clusters. The rules are defined in
-Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0 
-abandoned the use of some previous properties that had been used for emojis. 
+Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0
+abandoned the use of some previous properties that had been used for emojis.
 Instead it introduced various emoji-specific properties. PCRE2 uses only the
 Extended Pictographic property.
 </P>
@ -1316,7 +1316,7 @@ special meaning in a character class.
 <P>
 The escape sequence \N when not followed by an opening brace behaves like a
 dot, except that it is not affected by the PCRE2_DOTALL option. In other words,
-it matches any character except one that signifies the end of a line. 
+it matches any character except one that signifies the end of a line.
 </P>
 <P>
 When \N is followed by an opening brace it has a different meaning. See the
@ -1642,7 +1642,7 @@ documentation. The option letters are:
  xx for PCRE2_EXTENDED_MORE
 </pre>
 For example, (?im) sets caseless, multiline matching. It is also possible to
-unset these options by preceding the relevant letters with a hyphen, for 
+unset these options by preceding the relevant letters with a hyphen, for
 example (?-im). The two "extended" options are not independent; unsetting either
 one cancels the effects of both of them.
 </P>
@ -1654,9 +1654,9 @@ appears both before and after the hyphen, the option is unset. An empty options
 setting "(?)" is allowed. Needless to say, it has no effect.
 </P>
 <P>
-If the first character following (? is a circumflex, it causes all of the above 
-options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow 
-the circumflex to cause some options to be re-instated, but a hyphen may not 
+If the first character following (? is a circumflex, it causes all of the above
+options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow
+the circumflex to cause some options to be re-instated, but a hyphen may not
 appear.
 </P>
 <P>
@ -1813,41 +1813,68 @@ duplicate named subpatterns, as described in the next section.
 <br><a name="SEC16" href="#TOC1">NAMED SUBPATTERNS</a><br>
 <P>
 Identifying capturing parentheses by number is simple, but it can be very hard
-to keep track of the numbers in complicated regular expressions. Furthermore,
-if an expression is modified, the numbers may change. To help with this
-difficulty, PCRE2 supports the naming of subpatterns. This feature was not
-added to Perl until release 5.10. Python had the feature earlier, and PCRE1
+to keep track of the numbers in complicated patterns. Furthermore, if an
+expression is modified, the numbers may change. To help with this difficulty,
+PCRE2 supports the naming of capturing subpatterns. This feature was not added
+to Perl until release 5.10. Python had the feature earlier, and PCRE1
 introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
-Perl and the Python syntax. Perl allows identically numbered subpatterns to
-have different names, but PCRE2 does not.
+Perl and the Python syntax.
 </P>
 <P>
-In PCRE2, a subpattern can be named in one of three ways: (?&#60;name&#62;...) or
-(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. References to capturing
-parentheses from other parts of the pattern, such as
+In PCRE2, a capturing subpattern can be named in one of three ways:
+(?&#60;name&#62;...) or (?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. Names
+consist of up to 32 alphanumeric characters and underscores, but must start
+with a non-digit. References to capturing parentheses from other parts of the
+pattern, such as
 <a href="#backreferences">backreferences,</a>
 <a href="#recursion">recursion,</a>
 and
 <a href="#conditions">conditions,</a>
-can be made by name as well as by number.
+can all be made by name as well as by number.
 </P>
 <P>
-Names consist of up to 32 alphanumeric characters and underscores, but must
-start with a non-digit. Named capturing parentheses are still allocated numbers
-as well as names, exactly as if the names were not present. The PCRE2 API
-provides function calls for extracting the name-to-number translation table
-from a compiled pattern. There are also convenience functions for extracting a
-captured substring by name.
+Named capturing parentheses are allocated numbers as well as names, exactly as
+if the names were not present. In both PCRE2 and Perl, capturing subpatterns
+are primarily identified by numbers; any names are just aliases for these
+numbers. The PCRE2 API provides function calls for extracting the complete
+name-to-number translation table from a compiled pattern, as well as
+convenience functions for extracting captured substrings by name.
 </P>
 <P>
-By default, a name must be unique within a pattern, but it is possible to relax
-this constraint by setting the PCRE2_DUPNAMES option at compile time.
-(Duplicate names are also always permitted for subpatterns with the same
-number, set up as described in the previous section.) Duplicate names can be
-useful for patterns where only one instance of the named parentheses can match.
-Suppose you want to match the name of a weekday, either as a 3-letter
-abbreviation or as the full name, and in both cases you want to extract the
-abbreviation. This pattern (ignoring the line breaks) does the job:
+<b>Warning:</b> When more than one subpattern has the same number, as described
+in the previous section, a name given to one of them applies to all of them.
+Perl allows identically numbered subpatterns to have different names. Consider
+this pattern, where there are two capturing subpatterns, both numbered 1:
+<pre>
+  (?|(?&#60;AA&#62;aa)|(?&#60;BB&#62;bb))
+</pre>
+Perl allows this, with both names AA and BB as aliases of group 1. Thus, after
+a successful match, both names yield the same value (either "aa" or "bb").
+</P>
+<P>
+In an attempt to reduce confusion, PCRE2 does not allow the same group number
+to be associated with more than one name. The example above provokes a
+compile-time error. However, there is still scope for confusion. Consider this
+pattern:
+<pre>
+  (?|(?&#60;AA&#62;aa)|(bb))
+</pre>
+Although the second subpattern number 1 is not explicitly named, the name AA is
+still an alias for subpattern 1. Whether the pattern matches "aa" or "bb", a
+reference by name to group AA yields the matched string.
+</P>
+<P>
+By default, a name must be unique within a pattern, except that duplicate names
+are permitted for subpatterns with the same number, for example:
+<pre>
+  (?|(?&#60;AA&#62;aa)|(?&#60;AA&#62;bb))
+</pre>
+The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
+option at compile time, or by the use of (?J) within the pattern. Duplicate
+names can be useful for patterns where only one instance of the named
+parentheses can match. Suppose you want to match the name of a weekday, either
+as a 3-letter abbreviation or as the full name, and in both cases you want to
+extract the abbreviation. This pattern (ignoring the line breaks) does the job:
 <pre>
  (?&#60;DN&#62;Mon|Fri|Sun)(?:day)?|
  (?&#60;DN&#62;Tue)(?:sday)?|
@ -1856,13 +1883,11 @@ abbreviation. This pattern (ignoring the line breaks) does the job:
  (?&#60;DN&#62;Sat)(?:urday)?
 </pre>
 There are five capturing substrings, but only one is ever set after a match.
-(An alternative way of solving this problem is to use a "branch reset"
-subpattern, as described in the previous section.)
-</P>
-<P>
 The convenience functions for extracting the data by name returns the substring
 for the first (and in this example, the only) subpattern of that name that
-matched. This saves searching to find which numbered subpattern it was.
+matched. This saves searching to find which numbered subpattern it was. (An
+alternative way of solving this problem is to use a "branch reset" subpattern,
+as described in the previous section.)
 </P>
 <P>
 If you make a backreference to a non-unique named subpattern from elsewhere in
@ -1878,8 +1903,7 @@ for the reference. For example, this pattern matches both "foofoo" and
 <P>
 If you make a subroutine call to a non-unique named subpattern, the one that
 corresponds to the first occurrence of the name is used. In the absence of
-duplicate numbers (see the previous section) this is the one with the lowest
-number.
+duplicate numbers this is the one with the lowest number.
 </P>
 <P>
 If you use a named reference in a condition
@ -1893,14 +1917,6 @@ handling named subpatterns, see the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation.
 </P>
-<P>
-<b>Warning:</b> You cannot use different names to distinguish between two
-subpatterns with the same number because PCRE2 uses only the numbers when
-matching. For this reason, an error is given at compile time if different names
-are given to subpatterns with the same number. However, you can always give the
-same name to subpatterns with the same number, even when PCRE2_DUPNAMES is not
-set.
-</P>
 <br><a name="SEC17" href="#TOC1">REPETITION</a><br>
 <P>
 Repetition is specified by quantifiers, which can follow any of the following
@ -2327,14 +2343,14 @@ the subject string is as it was before the assertion was processed.
 <P>
 Assertion subpatterns are not capturing subpatterns. If an assertion contains
 capturing subpatterns within it, these are counted for the purposes of
-numbering the capturing subpatterns in the whole pattern. Within each branch of 
+numbering the capturing subpatterns in the whole pattern. Within each branch of
 an assertion, locally captured substrings may be referenced in the usual way.
-For example, a sequence such as (.)\g{-1} can be used to check that two 
+For example, a sequence such as (.)\g{-1} can be used to check that two
 adjacent characters are the same.
 </P>
 <P>
 When a branch within an assertion fails to match, any substrings that were
-captured are discarded (as happens with any pattern branch that fails to 
+captured are discarded (as happens with any pattern branch that fails to
 match). A negative assertion succeeds only when all its branches fail to match;
 this means that no captured substrings are ever retained after a successful
 negative assertion. When an assertion contains a matching branch, what happens
@ -2348,7 +2364,7 @@ assertion has failed. If the assertion is being used as a condition in a
 <a href="#conditions">conditional subpattern</a>
 (see below), captured substrings are retained, because matching continues with
 the "no" branch of the condition. For other failing negative assertions,
-control passes to the previous backtracking point, thus discarding any captured 
+control passes to the previous backtracking point, thus discarding any captured
 strings within the assertion.
 </P>
 <P>
@ -2957,10 +2973,12 @@ later versions (I tried 5.024) it now works.
 <br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
 <P>
 If the syntax for a recursive subpattern call (either by number or by
-name) is used outside the parentheses to which it refers, it operates like a
-subroutine in a programming language. The called subpattern may be defined
-before or after the reference. A numbered reference can be absolute or
-relative, as in these examples:
+name) is used outside the parentheses to which it refers, it operates a bit
+like a subroutine in a programming language. More accurately, PCRE2 treats the
+referenced subpattern as an independent subpattern which it tries to match at
+the current matching position. The called subpattern may be defined before or
+after the reference. A numbered reference can be absolute or relative, as in
+these examples:
 <pre>
  (...(absolute)...)...(?2)...
  (...(relative)...)...(?-1)...
@ -2993,6 +3011,13 @@ different calls. For example, consider this pattern:
 </pre>
 It matches "abcabc". It does not match "abcABC" because the change of
 processing option does not affect the called subpattern.
+</P>
+<P>
+The behaviour of
+<a href="#backtrackcontrol">backtracking control verbs</a>
+in subpatterns when called as subroutines is described in the section entitled
+<a href="#btsub">"Backtracking verbs in subroutines"</a>
+below.
 <a name="onigurumasubroutines"></a></P>
 <br><a name="SEC25" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
 <P>
@ -3111,7 +3136,7 @@ are faulted.
 </P>
 <P>
 A closing parenthesis can be included in a name either as \) or between \Q
-and \E. In addition to backslash processing, if the PCRE2_EXTENDED or 
+and \E. In addition to backslash processing, if the PCRE2_EXTENDED or
 PCRE2_EXTENDED_MORE option is also set, unescaped whitespace in verb names is
 skipped, and #-comments are recognized, exactly as in the rest of the pattern.
 PCRE2_EXTENDED and PCRE2_EXTENDED_MORE do not affect verb names unless
@ -3157,7 +3182,7 @@ in the
 documentation.
 </P>
 <P>
-Experiments with Perl suggest that it too has similar optimizations, and like 
+Experiments with Perl suggest that it too has similar optimizations, and like
 PCRE2, turning them off can change the result of a match.
 </P>
 <br><b>
@ -3185,7 +3210,7 @@ the outer parentheses.
 <pre>
  (*FAIL) or (*FAIL:NAME)
 </pre>
-This verb causes a matching failure, forcing backtracking to occur. It may be 
+This verb causes a matching failure, forcing backtracking to occur. It may be
 abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
 documentation notes that it is probably useful only when combined with (?{}) or
 (??{}). Those are, of course, Perl features that are not present in PCRE2. The
@ -3197,7 +3222,7 @@ A match with the string "aaaa" always fails, but the callout is taken before
 each backtrack happens (in this example, 10 times).
 </P>
 <P>
-(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as 
+(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
 (*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
 </P>
 <br><b>
@ -3220,7 +3245,7 @@ matching path is passed back to the caller as described in the section entitled
 in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation. This applies to all instances of (*MARK), including those inside
-assertions and atomic groups. (There are differences in those cases when 
+assertions and atomic groups. (There are differences in those cases when
 (*MARK) is used in conjunction with (*SKIP) as described below.)
 </P>
 <P>
@ -3300,7 +3325,7 @@ the current starting point, or not at all. For example:
  a+(*COMMIT)b
 </pre>
 This matches "xxaab" but not "aacaab". It can be thought of as a kind of
-dynamic anchor, or "I've started, so I must finish." 
+dynamic anchor, or "I've started, so I must finish."
 </P>
 <P>
 The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
@ -3524,7 +3549,7 @@ subpattern.
 (*ACCEPT) in a standalone positive assertion causes the assertion to succeed
 without any further processing; captured strings and a (*MARK) name (if set)
 are retained. In a standalone negative assertion, (*ACCEPT) causes the
-assertion to fail without any further processing; captured substrings and any 
+assertion to fail without any further processing; captured substrings and any
 (*MARK) name are discarded.
 </P>
 <P>
@ -3533,11 +3558,11 @@ a positive assertion and false for a negative one; captured substrings are
 retained in both cases.
 </P>
 <P>
-The remaining verbs act only when a later failure causes a backtrack to 
-reach them. This means that their effect is confined to the assertion, 
+The remaining verbs act only when a later failure causes a backtrack to
+reach them. This means that their effect is confined to the assertion,
 because lookaround assertions are atomic. A backtrack that occurs after an
-assertion is complete does not jump back into the assertion. Note in particular 
-that a (*MARK) name that is set in an assertion is not "seen" by an instance of 
+assertion is complete does not jump back into the assertion. Note in particular
+that a (*MARK) name that is set in an assertion is not "seen" by an instance of
 (*SKIP:NAME) latter in the pattern.
 </P>
 <P>
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@ -218,7 +218,7 @@ is used.
 .P
 The newline convention affects where the circumflex and dollar assertions are
 true. It also affects the interpretation of the dot metacharacter when
-PCRE2_DOTALL is not set, and the behaviour of \eN when not followed by an 
+PCRE2_DOTALL is not set, and the behaviour of \eN when not followed by an
 opening brace. However, it does not affect what the \eR escape sequence
 matches. By default, this is any Unicode newline sequence, for Perl
 compatibility. However, this can be changed; see the next section and the
@ -331,7 +331,7 @@ of the pattern.
 If you want to remove the special meaning from a sequence of characters, you
 can do so by putting them between \eQ and \eE. This is different from Perl in
 that $ and @ are handled as literals in \eQ...\eE sequences in PCRE2, whereas
-in Perl, $ and @ cause variable interpolation. Also, Perl does "double-quotish 
+in Perl, $ and @ cause variable interpolation. Also, Perl does "double-quotish
 backslash interpolation" on any backslashes between \eQ and \eE which, its
 documentation says, "may lead to confusing results". PCRE2 treats a backslash
 between \eQ and \eE just like any other character. Note the following examples:
@ -377,7 +377,7 @@ these escapes are as follows:
  \eo{ddd..}   character with octal code ddd..
  \exhh        character with hex code hh
  \ex{hhh..}   character with hex code hhh.. (default mode)
-  \eN{U+hhh..} character with Unicode code point hhh.. 
+  \eN{U+hhh..} character with Unicode code point hhh..
  \euhhhh      character with hex code hhhh (when PCRE2_ALT_BSUX is set)
 .sp
 Note that when \eN is not followed by an opening brace (curly bracket) it has
@ -581,7 +581,7 @@ Another use of backslash is for specifying generic character types:
  \eD     any character that is not a decimal digit
  \eh     any horizontal white space character
  \eH     any character that is not a horizontal white space character
-  \eN     any character that is not a newline 
+  \eN     any character that is not a newline
  \es     any white space character
  \eS     any character that is not a white space character
  \ev     any vertical white space character
@ -594,8 +594,8 @@ The \eN escape sequence has the same meaning as
 .\" </a>
 the "." metacharacter
 .\"
-when PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change the 
-meaning of \eN. Note that when \eN is followed by an opening brace it has a 
+when PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change the
+meaning of \eN. Note that when \eN is followed by an opening brace it has a
 different meaning. See the section entitled
 .\" HTML <a href="#digitsafterbackslash">
 .\" </a>
@ -1029,8 +1029,8 @@ grapheme cluster", and treats the sequence as an atomic group
 Unicode supports various kinds of composite character by giving each character
 a grapheme breaking property, and having rules that use these properties to
 define the boundaries of extended grapheme clusters. The rules are defined in
-Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0 
-abandoned the use of some previous properties that had been used for emojis. 
+Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0
+abandoned the use of some previous properties that had been used for emojis.
 Instead it introduced various emoji-specific properties. PCRE2 uses only the
 Extended Pictographic property.
 .P
@ -1310,7 +1310,7 @@ special meaning in a character class.
 .P
 The escape sequence \eN when not followed by an opening brace behaves like a
 dot, except that it is not affected by the PCRE2_DOTALL option. In other words,
-it matches any character except one that signifies the end of a line. 
+it matches any character except one that signifies the end of a line.
 .P
 When \eN is followed by an opening brace it has a different meaning. See the
 section entitled
@ -1643,7 +1643,7 @@ documentation. The option letters are:
  xx for PCRE2_EXTENDED_MORE
 .sp
 For example, (?im) sets caseless, multiline matching. It is also possible to
-unset these options by preceding the relevant letters with a hyphen, for 
+unset these options by preceding the relevant letters with a hyphen, for
 example (?-im). The two "extended" options are not independent; unsetting either
 one cancels the effects of both of them.
 .P
@ -1653,9 +1653,9 @@ permitted. Only one hyphen may appear in the options string. If a letter
 appears both before and after the hyphen, the option is unset. An empty options
 setting "(?)" is allowed. Needless to say, it has no effect.
 .P
-If the first character following (? is a circumflex, it causes all of the above 
-options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow 
-the circumflex to cause some options to be re-instated, but a hyphen may not 
+If the first character following (? is a circumflex, it causes all of the above
+options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow
+the circumflex to cause some options to be re-instated, but a hyphen may not
 appear.
 .P
 The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
@ -1815,17 +1815,18 @@ duplicate named subpatterns, as described in the next section.
 .rs
 .sp
 Identifying capturing parentheses by number is simple, but it can be very hard
-to keep track of the numbers in complicated regular expressions. Furthermore,
-if an expression is modified, the numbers may change. To help with this
-difficulty, PCRE2 supports the naming of subpatterns. This feature was not
-added to Perl until release 5.10. Python had the feature earlier, and PCRE1
+to keep track of the numbers in complicated patterns. Furthermore, if an
+expression is modified, the numbers may change. To help with this difficulty,
+PCRE2 supports the naming of capturing subpatterns. This feature was not added
+to Perl until release 5.10. Python had the feature earlier, and PCRE1
 introduced it at release 4.0, using the Python syntax. PCRE2 supports both the
-Perl and the Python syntax. Perl allows identically numbered subpatterns to
-have different names, but PCRE2 does not.
+Perl and the Python syntax.
 .P
-In PCRE2, a subpattern can be named in one of three ways: (?<name>...) or
-(?'name'...) as in Perl, or (?P<name>...) as in Python. References to capturing
-parentheses from other parts of the pattern, such as
+In PCRE2, a capturing subpattern can be named in one of three ways:
+(?<name>...) or (?'name'...) as in Perl, or (?P<name>...) as in Python. Names
+consist of up to 32 alphanumeric characters and underscores, but must start
+with a non-digit. References to capturing parentheses from other parts of the
+pattern, such as
 .\" HTML <a href="#backreferences">
 .\" </a>
 backreferences,
@ -1839,23 +1840,47 @@ and
 .\" </a>
 conditions,
 .\"
-can be made by name as well as by number.
+can all be made by name as well as by number.
 .P
-Names consist of up to 32 alphanumeric characters and underscores, but must
-start with a non-digit. Named capturing parentheses are still allocated numbers
-as well as names, exactly as if the names were not present. The PCRE2 API
-provides function calls for extracting the name-to-number translation table
-from a compiled pattern. There are also convenience functions for extracting a
-captured substring by name.
+Named capturing parentheses are allocated numbers as well as names, exactly as
+if the names were not present. In both PCRE2 and Perl, capturing subpatterns
+are primarily identified by numbers; any names are just aliases for these
+numbers. The PCRE2 API provides function calls for extracting the complete
+name-to-number translation table from a compiled pattern, as well as
+convenience functions for extracting captured substrings by name.
 .P
-By default, a name must be unique within a pattern, but it is possible to relax
-this constraint by setting the PCRE2_DUPNAMES option at compile time.
-(Duplicate names are also always permitted for subpatterns with the same
-number, set up as described in the previous section.) Duplicate names can be
-useful for patterns where only one instance of the named parentheses can match.
-Suppose you want to match the name of a weekday, either as a 3-letter
-abbreviation or as the full name, and in both cases you want to extract the
-abbreviation. This pattern (ignoring the line breaks) does the job:
+\fBWarning:\fP When more than one subpattern has the same number, as described
+in the previous section, a name given to one of them applies to all of them.
+Perl allows identically numbered subpatterns to have different names. Consider
+this pattern, where there are two capturing subpatterns, both numbered 1:
+.sp
+  (?|(?<AA>aa)|(?<BB>bb))
+.sp
+Perl allows this, with both names AA and BB as aliases of group 1. Thus, after
+a successful match, both names yield the same value (either "aa" or "bb").
+.P
+In an attempt to reduce confusion, PCRE2 does not allow the same group number
+to be associated with more than one name. The example above provokes a
+compile-time error. However, there is still scope for confusion. Consider this
+pattern:
+.sp
+  (?|(?<AA>aa)|(bb))
+.sp
+Although the second subpattern number 1 is not explicitly named, the name AA is
+still an alias for subpattern 1. Whether the pattern matches "aa" or "bb", a
+reference by name to group AA yields the matched string.
+.P
+By default, a name must be unique within a pattern, except that duplicate names
+are permitted for subpatterns with the same number, for example:
+.sp
+  (?|(?<AA>aa)|(?<AA>bb))
+.sp
+The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
+option at compile time, or by the use of (?J) within the pattern. Duplicate
+names can be useful for patterns where only one instance of the named
+parentheses can match. Suppose you want to match the name of a weekday, either
+as a 3-letter abbreviation or as the full name, and in both cases you want to
+extract the abbreviation. This pattern (ignoring the line breaks) does the job:
 .sp
  (?<DN>Mon|Fri|Sun)(?:day)?|
  (?<DN>Tue)(?:sday)?|
@ -1864,12 +1889,11 @@ abbreviation. This pattern (ignoring the line breaks) does the job:
  (?<DN>Sat)(?:urday)?
 .sp
 There are five capturing substrings, but only one is ever set after a match.
-(An alternative way of solving this problem is to use a "branch reset"
-subpattern, as described in the previous section.)
-.P
 The convenience functions for extracting the data by name returns the substring
 for the first (and in this example, the only) subpattern of that name that
-matched. This saves searching to find which numbered subpattern it was.
+matched. This saves searching to find which numbered subpattern it was. (An
+alternative way of solving this problem is to use a "branch reset" subpattern,
+as described in the previous section.)
 .P
 If you make a backreference to a non-unique named subpattern from elsewhere in
 the pattern, the subpatterns to which the name refers are checked in the order
@ -1882,8 +1906,7 @@ for the reference. For example, this pattern matches both "foofoo" and
 .P
 If you make a subroutine call to a non-unique named subpattern, the one that
 corresponds to the first occurrence of the name is used. In the absence of
-duplicate numbers (see the previous section) this is the one with the lowest
-number.
+duplicate numbers this is the one with the lowest number.
 .P
 If you use a named reference in a condition
 test (see the
@ -1901,13 +1924,6 @@ handling named subpatterns, see the
 \fBpcre2api\fP
 .\"
 documentation.
-.P
-\fBWarning:\fP You cannot use different names to distinguish between two
-subpatterns with the same number because PCRE2 uses only the numbers when
-matching. For this reason, an error is given at compile time if different names
-are given to subpatterns with the same number. However, you can always give the
-same name to subpatterns with the same number, even when PCRE2_DUPNAMES is not
-set.
 .
 .
 .SH REPETITION
@ -2336,13 +2352,13 @@ the subject string is as it was before the assertion was processed.
 .P
 Assertion subpatterns are not capturing subpatterns. If an assertion contains
 capturing subpatterns within it, these are counted for the purposes of
-numbering the capturing subpatterns in the whole pattern. Within each branch of 
+numbering the capturing subpatterns in the whole pattern. Within each branch of
 an assertion, locally captured substrings may be referenced in the usual way.
-For example, a sequence such as (.)\eg{-1} can be used to check that two 
+For example, a sequence such as (.)\eg{-1} can be used to check that two
 adjacent characters are the same.
 .P
 When a branch within an assertion fails to match, any substrings that were
-captured are discarded (as happens with any pattern branch that fails to 
+captured are discarded (as happens with any pattern branch that fails to
 match). A negative assertion succeeds only when all its branches fail to match;
 this means that no captured substrings are ever retained after a successful
 negative assertion. When an assertion contains a matching branch, what happens
@ -2358,7 +2374,7 @@ conditional subpattern
 .\"
 (see below), captured substrings are retained, because matching continues with
 the "no" branch of the condition. For other failing negative assertions,
-control passes to the previous backtracking point, thus discarding any captured 
+control passes to the previous backtracking point, thus discarding any captured
 strings within the assertion.
 .P
 For compatibility with Perl, most assertion subpatterns may be repeated; though
@ -2982,10 +2998,12 @@ later versions (I tried 5.024) it now works.
 .rs
 .sp
 If the syntax for a recursive subpattern call (either by number or by
-name) is used outside the parentheses to which it refers, it operates like a
-subroutine in a programming language. The called subpattern may be defined
-before or after the reference. A numbered reference can be absolute or
-relative, as in these examples:
+name) is used outside the parentheses to which it refers, it operates a bit
+like a subroutine in a programming language. More accurately, PCRE2 treats the
+referenced subpattern as an independent subpattern which it tries to match at
+the current matching position. The called subpattern may be defined before or
+after the reference. A numbered reference can be absolute or relative, as in
+these examples:
 .sp
  (...(absolute)...)...(?2)...
  (...(relative)...)...(?-1)...
@ -3016,6 +3034,18 @@ different calls. For example, consider this pattern:
 .sp
 It matches "abcabc". It does not match "abcABC" because the change of
 processing option does not affect the called subpattern.
+.P
+The behaviour of
+.\" HTML <a href="#backtrackcontrol">
+.\" </a>
+backtracking control verbs
+.\"
+in subpatterns when called as subroutines is described in the section entitled
+.\" HTML <a href="#btsub">
+.\" </a>
+"Backtracking verbs in subroutines"
+.\"
+below.
 .
 .
 .\" HTML <a name="onigurumasubroutines"></a>
@ -3137,7 +3167,7 @@ only backslash items that are permitted are \eQ, \eE, and sequences such as
 are faulted.
 .P
 A closing parenthesis can be included in a name either as \e) or between \eQ
-and \eE. In addition to backslash processing, if the PCRE2_EXTENDED or 
+and \eE. In addition to backslash processing, if the PCRE2_EXTENDED or
 PCRE2_EXTENDED_MORE option is also set, unescaped whitespace in verb names is
 skipped, and #-comments are recognized, exactly as in the rest of the pattern.
 PCRE2_EXTENDED and PCRE2_EXTENDED_MORE do not affect verb names unless
@ -3194,7 +3224,7 @@ in the
 .\"
 documentation.
 .P
-Experiments with Perl suggest that it too has similar optimizations, and like 
+Experiments with Perl suggest that it too has similar optimizations, and like
 PCRE2, turning them off can change the result of a match.
 .
 .
@ -3221,7 +3251,7 @@ the outer parentheses.
 .sp
  (*FAIL) or (*FAIL:NAME)
 .sp
-This verb causes a matching failure, forcing backtracking to occur. It may be 
+This verb causes a matching failure, forcing backtracking to occur. It may be
 abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl
 documentation notes that it is probably useful only when combined with (?{}) or
 (??{}). Those are, of course, Perl features that are not present in PCRE2. The
@ -3232,7 +3262,7 @@ nearest equivalent is the callout feature, as for example in this pattern:
 A match with the string "aaaa" always fails, but the callout is taken before
 each backtrack happens (in this example, 10 times).
 .P
-(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as 
+(*ACCEPT:NAME) and (*FAIL:NAME) behave exactly the same as
 (*MARK:NAME)(*ACCEPT) and (*MARK:NAME)(*FAIL), respectively.
 .
 .
@ -3259,7 +3289,7 @@ in the
 \fBpcre2api\fP
 .\"
 documentation. This applies to all instances of (*MARK), including those inside
-assertions and atomic groups. (There are differences in those cases when 
+assertions and atomic groups. (There are differences in those cases when
 (*MARK) is used in conjunction with (*SKIP) as described below.)
 .P
 As well as (*MARK), the (*COMMIT), (*PRUNE) and (*THEN) verbs may have
@ -3336,7 +3366,7 @@ the current starting point, or not at all. For example:
  a+(*COMMIT)b
 .sp
 This matches "xxaab" but not "aacaab". It can be thought of as a kind of
-dynamic anchor, or "I've started, so I must finish." 
+dynamic anchor, or "I've started, so I must finish."
 .P
 The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is
 like (*MARK:NAME) in that the name is remembered for passing back to the
@ -3424,7 +3454,7 @@ following \fBpcre2test\fP examples:
  data: abc
   0: b
   1: b
-.sp    
+.sp
 In the first example, the (*MARK) setting is in an atomic group, so it is not
 seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
 the second branch of the pattern to be tried at the first character position.
@ -3551,18 +3581,18 @@ subpattern.
 (*ACCEPT) in a standalone positive assertion causes the assertion to succeed
 without any further processing; captured strings and a (*MARK) name (if set)
 are retained. In a standalone negative assertion, (*ACCEPT) causes the
-assertion to fail without any further processing; captured substrings and any 
+assertion to fail without any further processing; captured substrings and any
 (*MARK) name are discarded.
 .P
 If the assertion is a condition, (*ACCEPT) causes the condition to be true for
 a positive assertion and false for a negative one; captured substrings are
 retained in both cases.
 .P
-The remaining verbs act only when a later failure causes a backtrack to 
-reach them. This means that their effect is confined to the assertion, 
+The remaining verbs act only when a later failure causes a backtrack to
+reach them. This means that their effect is confined to the assertion,
 because lookaround assertions are atomic. A backtrack that occurs after an
-assertion is complete does not jump back into the assertion. Note in particular 
-that a (*MARK) name that is set in an assertion is not "seen" by an instance of 
+assertion is complete does not jump back into the assertion. Note in particular
+that a (*MARK) name that is set in an assertion is not "seen" by an instance of
 (*SKIP:NAME) latter in the pattern.
 .P
 The effect of (*THEN) is not allowed to escape beyond an assertion. If there