Allow real repetition of assertions.

This commit is contained in:
Philip.Hazel 2020-01-01 12:07:02 +00:00
parent eaf4572ff8
commit 5ba5230b82
8 changed files with 114 additions and 81 deletions

View File

@ -32,6 +32,13 @@ now correctly backtracked, so this unnecessary restriction has been removed.
regex engine. The Perl regex folks are aware of this usage and have made a note
about it.
9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to
1, believing that repeating an assertion is pointless. However, if a positive
assertion contains capturing groups, repetition can be useful. In any case, an
assertion could always be wrapped in a repeated group. The only restriction
that is now imposed is that an unlimited maximum is changed to one more than
the minimum.
Version 10.34 21-November-2019
------------------------------

View File

@ -1901,8 +1901,8 @@ are permitted for groups with the same number, for example:
(?|(?<AA>aa)|(?<AA>bb))
</pre>
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
option at compile time, or by the use of (?J) within the pattern, as described
in the section entitled
option at compile time, or by the use of (?J) within the pattern, as described
in the section entitled
<a href="#internaloptions">"Internal Option Setting"</a>
above.
</P>
@ -1968,7 +1968,7 @@ items:
an escape such as \d or \pL that matches a single character
a character class
a backreference
a parenthesized group (including most assertions)
a parenthesized group (including lookaround assertions)
a subroutine call (recursive or otherwise)
</pre>
The general repetition quantifier specifies a minimum and maximum number of
@ -2359,7 +2359,7 @@ of zero.
For versions of PCRE2 less than 10.25, backreferences of this type used to
cause the group that they reference to be treated as an
<a href="#atomicgroup">atomic group.</a>
This restriction no longer applies, and backtracking into such groups can occur
This restriction no longer applies, and backtracking into such groups can occur
as normal.
<a name="bigassertions"></a></P>
<br><a name="SEC20" href="#TOC1">ASSERTIONS</a><br>
@ -2420,26 +2420,13 @@ control passes to the previous backtracking point, thus discarding any captured
strings within the assertion.
</P>
<P>
For compatibility with Perl, most assertion groups may be repeated; though it
makes no sense to assert the same thing several times, the side effect of
capturing may occasionally be useful. However, an assertion that forms the
condition for a conditional group may not be quantified. In practice, for
other assertions, there only three cases:
<br>
<br>
(1) If the quantifier is {0}, the assertion is never obeyed during matching.
However, it may contain internal capture groups that are called from elsewhere
via the
<a href="#groupsassubroutines">subroutine mechanism.</a>
<br>
<br>
(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
were {0,1}. At run time, the rest of the pattern match is tried with and
without the assertion, the order depending on the greediness of the quantifier.
<br>
<br>
(3) If the minimum repetition is greater than zero, the quantifier is ignored.
The assertion is obeyed just once when encountered during matching.
Most assertion groups may be repeated; though it makes no sense to assert the
same thing several times, the side effect of capturing in positive assertions
may occasionally be useful. However, an assertion that forms the condition for
a conditional group may not be quantified. PCRE2 used to restrict the
repetition of assertions, but from release 10.35 the only restriction is that
an unlimited maximum repetition is changed to be one more than the minimum. For
example, {3,} is treated as {3,4}.
</P>
<br><b>
Alphabetic assertion names
@ -3840,9 +3827,9 @@ Cambridge, England.
</P>
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
<P>
Last updated: 29 December 2019
Last updated: 01 January 2020
<br>
Copyright &copy; 1997-2019 University of Cambridge.
Copyright &copy; 1997-2020 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -7729,7 +7729,7 @@ REPETITION
an escape such as \d or \pL that matches a single character
a character class
a backreference
a parenthesized group (including most assertions)
a parenthesized group (including lookaround assertions)
a subroutine call (recursive or otherwise)
The general repetition quantifier specifies a minimum and maximum num-
@ -8162,24 +8162,14 @@ ASSERTIONS
passes to the previous backtracking point, thus discarding any captured
strings within the assertion.
For compatibility with Perl, most assertion groups may be repeated;
though it makes no sense to assert the same thing several times, the
side effect of capturing may occasionally be useful. However, an asser-
tion that forms the condition for a conditional group may not be quan-
tified. In practice, for other assertions, there only three cases:
(1) If the quantifier is {0}, the assertion is never obeyed during
matching. However, it may contain internal capture groups that are
called from elsewhere via the subroutine mechanism.
(2) If quantifier is {0,n} where n is greater than zero, it is treated
as if it were {0,1}. At run time, the rest of the pattern match is
tried with and without the assertion, the order depending on the greed-
iness of the quantifier.
(3) If the minimum repetition is greater than zero, the quantifier is
ignored. The assertion is obeyed just once when encountered during
matching.
Most assertion groups may be repeated; though it makes no sense to as-
sert the same thing several times, the side effect of capturing in pos-
itive assertions may occasionally be useful. However, an assertion that
forms the condition for a conditional group may not be quantified.
PCRE2 used to restrict the repetition of assertions, but from release
10.35 the only restriction is that an unlimited maximum repetition is
changed to be one more than the minimum. For example, {3,} is treated
as {3,4}.
Alphabetic assertion names
@ -9490,8 +9480,8 @@ AUTHOR
REVISION
Last updated: 29 December 2019
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 01 January 2020
Copyright (c) 1997-2020 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "29 December 2019" "PCRE2 10.35"
.TH PCRE2PATTERN 3 "01 January 2020" "PCRE2 10.35"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -1902,8 +1902,8 @@ are permitted for groups with the same number, for example:
(?|(?<AA>aa)|(?<AA>bb))
.sp
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
option at compile time, or by the use of (?J) within the pattern, as described
in the section entitled
option at compile time, or by the use of (?J) within the pattern, as described
in the section entitled
.\" HTML <a href="#internaloptions">
.\" </a>
"Internal Option Setting"
@ -1975,7 +1975,7 @@ items:
an escape such as \ed or \epL that matches a single character
a character class
a backreference
a parenthesized group (including most assertions)
a parenthesized group (including lookaround assertions)
a subroutine call (recursive or otherwise)
.sp
The general repetition quantifier specifies a minimum and maximum number of
@ -2362,7 +2362,7 @@ cause the group that they reference to be treated as an
.\" </a>
atomic group.
.\"
This restriction no longer applies, and backtracking into such groups can occur
This restriction no longer applies, and backtracking into such groups can occur
as normal.
.
.
@ -2431,26 +2431,13 @@ the "no" branch of the condition. For other failing negative assertions,
control passes to the previous backtracking point, thus discarding any captured
strings within the assertion.
.P
For compatibility with Perl, most assertion groups may be repeated; though it
makes no sense to assert the same thing several times, the side effect of
capturing may occasionally be useful. However, an assertion that forms the
condition for a conditional group may not be quantified. In practice, for
other assertions, there only three cases:
.sp
(1) If the quantifier is {0}, the assertion is never obeyed during matching.
However, it may contain internal capture groups that are called from elsewhere
via the
.\" HTML <a href="#groupsassubroutines">
.\" </a>
subroutine mechanism.
.\"
.sp
(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
were {0,1}. At run time, the rest of the pattern match is tried with and
without the assertion, the order depending on the greediness of the quantifier.
.sp
(3) If the minimum repetition is greater than zero, the quantifier is ignored.
The assertion is obeyed just once when encountered during matching.
Most assertion groups may be repeated; though it makes no sense to assert the
same thing several times, the side effect of capturing in positive assertions
may occasionally be useful. However, an assertion that forms the condition for
a conditional group may not be quantified. PCRE2 used to restrict the
repetition of assertions, but from release 10.35 the only restriction is that
an unlimited maximum repetition is changed to be one more than the minimum. For
example, {3,} is treated as {3,4}.
.
.
.SS "Alphabetic assertion names"
@ -3884,6 +3871,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 29 December 2019
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 01 January 2020
Copyright (c) 1997-2020 University of Cambridge.
.fi

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2019 University of Cambridge
New API code Copyright (c) 2016-2020 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -7074,15 +7074,18 @@ for (;; pptr++)
previous[GET(previous, 1)] != OP_ALT)
goto END_REPEAT;
/* There is no sense in actually repeating assertions. The only
potential use of repetition is in cases when the assertion is optional.
Therefore, if the minimum is greater than zero, just ignore the repeat.
If the maximum is not zero or one, set it to 1. */
/* Perl allows all assertions to be quantified, and when they contain
capturing parentheses and/or are optional there are potential uses for
this feature. PCRE2 used to force the maximum quantifier to 1 on the
invalid grounds that further repetition was never useful. This was
always a bit pointless, since an assertion could be wrapped with a
repeated group to achieve the effect. General repetition is now
permitted, but if the maximum is unlimited it is set to one more than
the minimum. */
if (op_previous < OP_ONCE) /* Assertion */
{
if (repeat_min > 0) goto END_REPEAT;
if (repeat_max > 1) repeat_max = 1;
if (repeat_max == REPEAT_UNLIMITED) repeat_max = repeat_min + 1;
}
/* The case of a zero minimum is special because of the need to stick

9
testdata/testinput1 vendored
View File

@ -6393,4 +6393,13 @@ ef) x/x,mark
/^((\1+)|\d)+133X$/
111133X
/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i
The quick brown fox jumps over the lazy dog.
Jackdaws love my big sphinx of quartz.
Pack my box with five dozen liquor jugs.
\= Expect no match
The quick brown fox jumps over the lazy cat.
Hackdaws love my big sphinx of quartz.
Pack my fox with five dozen liquor jugs.
# End of testinput1

21
testdata/testoutput1 vendored
View File

@ -10126,4 +10126,25 @@ No match
1: 11
2: 11
/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i
The quick brown fox jumps over the lazy dog.
0:
1: quick brown fox jumps over the lazy dog.
2: q
Jackdaws love my big sphinx of quartz.
0:
1: Jackdaws love my big sphinx of quartz.
2: J
Pack my box with five dozen liquor jugs.
0:
1: Pack my box with five dozen liquor jugs.
2: P
\= Expect no match
The quick brown fox jumps over the lazy cat.
No match
Hackdaws love my big sphinx of quartz.
No match
Pack my fox with five dozen liquor jugs.
No match
# End of testinput1

29
testdata/testoutput2 vendored
View File

@ -10962,6 +10962,12 @@ Matched, but too many substrings
Assert
abc
Ket
Assert
abc
Ket
Assert
abc
Ket
abc
Ket
End
@ -10973,6 +10979,10 @@ Matched, but too many substrings
Assert
abc
Ket
Brazero
Assert
abc
Ket
abc
Ket
End
@ -10981,9 +10991,15 @@ Matched, but too many substrings
/(?=abc)++abc/B
------------------------------------------------------------------
Bra
Once
Assert
abc
Ket
Brazero
Assert
abc
Ket
Ket
abc
Ket
End
@ -16610,6 +16626,19 @@ No match
Assert
Any
Ket
Assert
Any
Ket
Assert
Any
Ket
Assert
Any
Ket
Brazero
Assert
Any
Ket
x
Ket
Ket