From 7be3fef0ea4e7795ace469579782c41cad693953 Mon Sep 17 00:00:00 2001 From: "Philip.Hazel" Date: Mon, 3 Apr 2017 18:00:37 +0000 Subject: [PATCH] Documentation update. --- doc/html/pcre2compat.html | 8 ++-- doc/html/pcre2pattern.html | 75 +++++++++++++++++++++++-------------- doc/pcre2compat.3 | 10 ++--- doc/pcre2pattern.3 | 76 ++++++++++++++++++++++++-------------- 4 files changed, 104 insertions(+), 65 deletions(-) diff --git a/doc/html/pcre2compat.html b/doc/html/pcre2compat.html index b55ab82..02fab13 100644 --- a/doc/html/pcre2compat.html +++ b/doc/html/pcre2compat.html @@ -37,9 +37,9 @@ for example, \b* (but not \b{3}), but these do not seem to have any use.

3. Capturing subpatterns that occur inside negative lookaround assertions are -counted, but their entries in the offsets vector are set only if the assertion -is a condition. Perl has changed its behaviour in this regard from time to -time. +counted, but their entries in the offsets vector are set only when a negative +assertion is a condition that has a matching branch (that is, the condition is +false).

4. The following Perl escape sequences are not supported: \l, \u, \L, @@ -214,7 +214,7 @@ Cambridge, England. REVISION

-Last updated: 29 March 2017 +Last updated: 03 April 2017
Copyright © 1997-2017 University of Cambridge.
diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html index 6c9ae2d..87258fb 100644 --- a/doc/html/pcre2pattern.html +++ b/doc/html/pcre2pattern.html @@ -2216,15 +2216,27 @@ coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described

More complicated assertions are coded as subpatterns. There are two kinds: those that look ahead of the current position in the subject string, and those -that look behind it. An assertion subpattern is matched in the normal way, -except that it does not cause the current matching position to be changed. +that look behind it, and in each case an assertion may be positive (must +succeed for matching to continue) or negative (must not succeed for matching to +continue). An assertion subpattern is matched in the normal way, except that, +when matching continues afterwards, the matching position in the subject string +is as it was at the start of the assertion.

-Assertion subpatterns are not capturing subpatterns. If such an assertion -contains capturing subpatterns within it, these are counted for the purposes of +Assertion subpatterns are not capturing subpatterns. If an assertion contains +capturing subpatterns within it, these are counted for the purposes of numbering the capturing subpatterns in the whole pattern. However, substring -capturing is normally carried out only for positive assertions (but see the -discussion of conditional subpatterns below). +capturing is carried out only for positive assertions that succeed, that is, +one of their branches matches, so matching continues after the assertion. If +all branches of a positive assertion fail to match, nothing is captured, and +control is passed to the previous backtracking point. +

+

+No capturing is done for a negative assertion unless it is being used as a +condition in a +conditional subpattern +(see the discussion below). Matching continues after a non-conditional negative +assertion only if all its branches fail to match.

For compatibility with Perl, most assertion subpatterns may be repeated; though @@ -2604,10 +2616,11 @@ against the second. This pattern matches strings in one of the two forms dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.

-For Perl compatibility, if an assertion that is a condition contains capturing -subpatterns, any capturing that occurs is retained afterwards, for both -positive and negative assertions. (Compare non-conditional assertions, when -captures are retained only for positive assertions.) +When an assertion that is a condition contains capturing subpatterns, any +capturing that occurs in a matching branch is retained afterwards, for both +positive and negative assertions, because matching always continues after the +assertion, whether it succeeds or fails. (Compare non-conditional assertions, +when captures are retained only for positive assertions that succeed.)


COMMENTS

@@ -3351,28 +3364,34 @@ in the second repeat of the group acts. Backtracking verbs in assertions

-(*FAIL) in an assertion has its normal effect: it forces an immediate -backtrack. +(*FAIL) in any assertion has its normal effect: it forces an immediate +backtrack. The behaviour of the other backtracking verbs depends on whether or +not the assertion is standalone or acting as the condition in a conditional +subpattern.

-(*ACCEPT) in a positive assertion causes the assertion to succeed without any -further processing. In a negative assertion, (*ACCEPT) causes the assertion to -fail without any further processing. +(*ACCEPT) in a standalone positive assertion causes the assertion to succeed +without any further processing; captured strings are retained. In a standalone +negative assertion, (*ACCEPT) causes the assertion to fail without any further +processing; captured substrings are discarded. +

+

+If the assertion is a condition, (*ACCEPT) causes the condition to be true for +a positive assertion and false for a negative one; captured substrings are +retained in both cases. +

+

+The effect of (*THEN) is not allowed to escape beyond an assertion. If there +are no more branches to try, (*THEN) causes a positive assertion to be false, +and a negative assertion to be true.

The other backtracking verbs are not treated specially if they appear in a -positive assertion. In particular, (*THEN) skips to the next alternative in the -innermost enclosing group that has alternations, whether or not this is within -the assertion. -

-

-Negative assertions are, however, different, in order to ensure that changing a -positive assertion into a negative assertion changes its result. Backtracking -into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative assertion to be true, -without considering any further alternative branches in the assertion. -Backtracking into (*THEN) causes it to skip to the next enclosing alternative -within the assertion (the normal behaviour), but if the assertion does not have -such an alternative, (*THEN) behaves like (*PRUNE). +standalone positive assertion. In a conditional positive assertion, +backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be +false. However, for both standalone and conditional negative assertions, +backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be +true, without considering any further alternative branches.


Backtracking verbs in subroutines @@ -3415,7 +3434,7 @@ Cambridge, England.


REVISION

-Last updated: 18 March 2017 +Last updated: 03 April 2017
Copyright © 1997-2017 University of Cambridge.
diff --git a/doc/pcre2compat.3 b/doc/pcre2compat.3 index 807b268..da58c93 100644 --- a/doc/pcre2compat.3 +++ b/doc/pcre2compat.3 @@ -1,4 +1,4 @@ -.TH PCRE2COMPAT 3 "29 March 2017" "PCRE2 10.30" +.TH PCRE2COMPAT 3 "03 April 2017" "PCRE2 10.30" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "DIFFERENCES BETWEEN PCRE2 AND PERL" @@ -24,9 +24,9 @@ assertion just once). Perl allows some repeat quantifiers on other assertions, for example, \eb* (but not \eb{3}), but these do not seem to have any use. .P 3. Capturing subpatterns that occur inside negative lookaround assertions are -counted, but their entries in the offsets vector are set only if the assertion -is a condition. Perl has changed its behaviour in this regard from time to -time. +counted, but their entries in the offsets vector are set only when a negative +assertion is a condition that has a matching branch (that is, the condition is +false). .P 4. The following Perl escape sequences are not supported: \el, \eu, \eL, \eU, and \eN when followed by a character name or Unicode value. (\eN on its @@ -179,6 +179,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 29 March 2017 +Last updated: 03 April 2017 Copyright (c) 1997-2017 University of Cambridge. .fi diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3 index 2325e0c..a622cd2 100644 --- a/doc/pcre2pattern.3 +++ b/doc/pcre2pattern.3 @@ -1,4 +1,4 @@ -.TH PCRE2PATTERN 3 "18 March 2017" "PCRE2 10.30" +.TH PCRE2PATTERN 3 "03 April 2017" "PCRE2 10.30" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH "PCRE2 REGULAR EXPRESSION DETAILS" @@ -2225,14 +2225,28 @@ above. .P More complicated assertions are coded as subpatterns. There are two kinds: those that look ahead of the current position in the subject string, and those -that look behind it. An assertion subpattern is matched in the normal way, -except that it does not cause the current matching position to be changed. +that look behind it, and in each case an assertion may be positive (must +succeed for matching to continue) or negative (must not succeed for matching to +continue). An assertion subpattern is matched in the normal way, except that, +when matching continues afterwards, the matching position in the subject string +is as it was at the start of the assertion. .P -Assertion subpatterns are not capturing subpatterns. If such an assertion -contains capturing subpatterns within it, these are counted for the purposes of +Assertion subpatterns are not capturing subpatterns. If an assertion contains +capturing subpatterns within it, these are counted for the purposes of numbering the capturing subpatterns in the whole pattern. However, substring -capturing is normally carried out only for positive assertions (but see the -discussion of conditional subpatterns below). +capturing is carried out only for positive assertions that succeed, that is, +one of their branches matches, so matching continues after the assertion. If +all branches of a positive assertion fail to match, nothing is captured, and +control is passed to the previous backtracking point. +.P +No capturing is done for a negative assertion unless it is being used as a +condition in a +.\" HTML +.\" +conditional subpattern +.\" +(see the discussion below). Matching continues after a non-conditional negative +assertion only if all its branches fail to match. .P For compatibility with Perl, most assertion subpatterns may be repeated; though it makes no sense to assert the same thing several times, the side effect of @@ -2620,10 +2634,11 @@ subject is matched against the first alternative; otherwise it is matched against the second. This pattern matches strings in one of the two forms dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits. .P -For Perl compatibility, if an assertion that is a condition contains capturing -subpatterns, any capturing that occurs is retained afterwards, for both -positive and negative assertions. (Compare non-conditional assertions, when -captures are retained only for positive assertions.) +When an assertion that is a condition contains capturing subpatterns, any +capturing that occurs in a matching branch is retained afterwards, for both +positive and negative assertions, because matching always continues after the +assertion, whether it succeeds or fails. (Compare non-conditional assertions, +when captures are retained only for positive assertions that succeed.) . . .\" HTML @@ -3381,25 +3396,30 @@ in the second repeat of the group acts. .SS "Backtracking verbs in assertions" .rs .sp -(*FAIL) in an assertion has its normal effect: it forces an immediate -backtrack. +(*FAIL) in any assertion has its normal effect: it forces an immediate +backtrack. The behaviour of the other backtracking verbs depends on whether or +not the assertion is standalone or acting as the condition in a conditional +subpattern. .P -(*ACCEPT) in a positive assertion causes the assertion to succeed without any -further processing. In a negative assertion, (*ACCEPT) causes the assertion to -fail without any further processing. +(*ACCEPT) in a standalone positive assertion causes the assertion to succeed +without any further processing; captured strings are retained. In a standalone +negative assertion, (*ACCEPT) causes the assertion to fail without any further +processing; captured substrings are discarded. +.P +If the assertion is a condition, (*ACCEPT) causes the condition to be true for +a positive assertion and false for a negative one; captured substrings are +retained in both cases. +.P +The effect of (*THEN) is not allowed to escape beyond an assertion. If there +are no more branches to try, (*THEN) causes a positive assertion to be false, +and a negative assertion to be true. .P The other backtracking verbs are not treated specially if they appear in a -positive assertion. In particular, (*THEN) skips to the next alternative in the -innermost enclosing group that has alternations, whether or not this is within -the assertion. -.P -Negative assertions are, however, different, in order to ensure that changing a -positive assertion into a negative assertion changes its result. Backtracking -into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative assertion to be true, -without considering any further alternative branches in the assertion. -Backtracking into (*THEN) causes it to skip to the next enclosing alternative -within the assertion (the normal behaviour), but if the assertion does not have -such an alternative, (*THEN) behaves like (*PRUNE). +standalone positive assertion. In a conditional positive assertion, +backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be +false. However, for both standalone and conditional negative assertions, +backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be +true, without considering any further alternative branches. . . .\" HTML @@ -3445,6 +3465,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 18 March 2017 +Last updated: 03 April 2017 Copyright (c) 1997-2017 University of Cambridge. .fi