Allow real repetition of assertions.
This commit is contained in:
parent
eaf4572ff8
commit
5ba5230b82
|
@ -32,6 +32,13 @@ now correctly backtracked, so this unnecessary restriction has been removed.
|
|||
regex engine. The Perl regex folks are aware of this usage and have made a note
|
||||
about it.
|
||||
|
||||
9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to
|
||||
1, believing that repeating an assertion is pointless. However, if a positive
|
||||
assertion contains capturing groups, repetition can be useful. In any case, an
|
||||
assertion could always be wrapped in a repeated group. The only restriction
|
||||
that is now imposed is that an unlimited maximum is changed to one more than
|
||||
the minimum.
|
||||
|
||||
|
||||
Version 10.34 21-November-2019
|
||||
------------------------------
|
||||
|
|
|
@ -1901,8 +1901,8 @@ are permitted for groups with the same number, for example:
|
|||
(?|(?<AA>aa)|(?<AA>bb))
|
||||
</pre>
|
||||
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
|
||||
option at compile time, or by the use of (?J) within the pattern, as described
|
||||
in the section entitled
|
||||
option at compile time, or by the use of (?J) within the pattern, as described
|
||||
in the section entitled
|
||||
<a href="#internaloptions">"Internal Option Setting"</a>
|
||||
above.
|
||||
</P>
|
||||
|
@ -1968,7 +1968,7 @@ items:
|
|||
an escape such as \d or \pL that matches a single character
|
||||
a character class
|
||||
a backreference
|
||||
a parenthesized group (including most assertions)
|
||||
a parenthesized group (including lookaround assertions)
|
||||
a subroutine call (recursive or otherwise)
|
||||
</pre>
|
||||
The general repetition quantifier specifies a minimum and maximum number of
|
||||
|
@ -2359,7 +2359,7 @@ of zero.
|
|||
For versions of PCRE2 less than 10.25, backreferences of this type used to
|
||||
cause the group that they reference to be treated as an
|
||||
<a href="#atomicgroup">atomic group.</a>
|
||||
This restriction no longer applies, and backtracking into such groups can occur
|
||||
This restriction no longer applies, and backtracking into such groups can occur
|
||||
as normal.
|
||||
<a name="bigassertions"></a></P>
|
||||
<br><a name="SEC20" href="#TOC1">ASSERTIONS</a><br>
|
||||
|
@ -2420,26 +2420,13 @@ control passes to the previous backtracking point, thus discarding any captured
|
|||
strings within the assertion.
|
||||
</P>
|
||||
<P>
|
||||
For compatibility with Perl, most assertion groups may be repeated; though it
|
||||
makes no sense to assert the same thing several times, the side effect of
|
||||
capturing may occasionally be useful. However, an assertion that forms the
|
||||
condition for a conditional group may not be quantified. In practice, for
|
||||
other assertions, there only three cases:
|
||||
<br>
|
||||
<br>
|
||||
(1) If the quantifier is {0}, the assertion is never obeyed during matching.
|
||||
However, it may contain internal capture groups that are called from elsewhere
|
||||
via the
|
||||
<a href="#groupsassubroutines">subroutine mechanism.</a>
|
||||
<br>
|
||||
<br>
|
||||
(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
|
||||
were {0,1}. At run time, the rest of the pattern match is tried with and
|
||||
without the assertion, the order depending on the greediness of the quantifier.
|
||||
<br>
|
||||
<br>
|
||||
(3) If the minimum repetition is greater than zero, the quantifier is ignored.
|
||||
The assertion is obeyed just once when encountered during matching.
|
||||
Most assertion groups may be repeated; though it makes no sense to assert the
|
||||
same thing several times, the side effect of capturing in positive assertions
|
||||
may occasionally be useful. However, an assertion that forms the condition for
|
||||
a conditional group may not be quantified. PCRE2 used to restrict the
|
||||
repetition of assertions, but from release 10.35 the only restriction is that
|
||||
an unlimited maximum repetition is changed to be one more than the minimum. For
|
||||
example, {3,} is treated as {3,4}.
|
||||
</P>
|
||||
<br><b>
|
||||
Alphabetic assertion names
|
||||
|
@ -3840,9 +3827,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 29 December 2019
|
||||
Last updated: 01 January 2020
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
Copyright © 1997-2020 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -7729,7 +7729,7 @@ REPETITION
|
|||
an escape such as \d or \pL that matches a single character
|
||||
a character class
|
||||
a backreference
|
||||
a parenthesized group (including most assertions)
|
||||
a parenthesized group (including lookaround assertions)
|
||||
a subroutine call (recursive or otherwise)
|
||||
|
||||
The general repetition quantifier specifies a minimum and maximum num-
|
||||
|
@ -8162,24 +8162,14 @@ ASSERTIONS
|
|||
passes to the previous backtracking point, thus discarding any captured
|
||||
strings within the assertion.
|
||||
|
||||
For compatibility with Perl, most assertion groups may be repeated;
|
||||
though it makes no sense to assert the same thing several times, the
|
||||
side effect of capturing may occasionally be useful. However, an asser-
|
||||
tion that forms the condition for a conditional group may not be quan-
|
||||
tified. In practice, for other assertions, there only three cases:
|
||||
|
||||
(1) If the quantifier is {0}, the assertion is never obeyed during
|
||||
matching. However, it may contain internal capture groups that are
|
||||
called from elsewhere via the subroutine mechanism.
|
||||
|
||||
(2) If quantifier is {0,n} where n is greater than zero, it is treated
|
||||
as if it were {0,1}. At run time, the rest of the pattern match is
|
||||
tried with and without the assertion, the order depending on the greed-
|
||||
iness of the quantifier.
|
||||
|
||||
(3) If the minimum repetition is greater than zero, the quantifier is
|
||||
ignored. The assertion is obeyed just once when encountered during
|
||||
matching.
|
||||
Most assertion groups may be repeated; though it makes no sense to as-
|
||||
sert the same thing several times, the side effect of capturing in pos-
|
||||
itive assertions may occasionally be useful. However, an assertion that
|
||||
forms the condition for a conditional group may not be quantified.
|
||||
PCRE2 used to restrict the repetition of assertions, but from release
|
||||
10.35 the only restriction is that an unlimited maximum repetition is
|
||||
changed to be one more than the minimum. For example, {3,} is treated
|
||||
as {3,4}.
|
||||
|
||||
Alphabetic assertion names
|
||||
|
||||
|
@ -9490,8 +9480,8 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 29 December 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
Last updated: 01 January 2020
|
||||
Copyright (c) 1997-2020 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PATTERN 3 "29 December 2019" "PCRE2 10.35"
|
||||
.TH PCRE2PATTERN 3 "01 January 2020" "PCRE2 10.35"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
|
@ -1902,8 +1902,8 @@ are permitted for groups with the same number, for example:
|
|||
(?|(?<AA>aa)|(?<AA>bb))
|
||||
.sp
|
||||
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
|
||||
option at compile time, or by the use of (?J) within the pattern, as described
|
||||
in the section entitled
|
||||
option at compile time, or by the use of (?J) within the pattern, as described
|
||||
in the section entitled
|
||||
.\" HTML <a href="#internaloptions">
|
||||
.\" </a>
|
||||
"Internal Option Setting"
|
||||
|
@ -1975,7 +1975,7 @@ items:
|
|||
an escape such as \ed or \epL that matches a single character
|
||||
a character class
|
||||
a backreference
|
||||
a parenthesized group (including most assertions)
|
||||
a parenthesized group (including lookaround assertions)
|
||||
a subroutine call (recursive or otherwise)
|
||||
.sp
|
||||
The general repetition quantifier specifies a minimum and maximum number of
|
||||
|
@ -2362,7 +2362,7 @@ cause the group that they reference to be treated as an
|
|||
.\" </a>
|
||||
atomic group.
|
||||
.\"
|
||||
This restriction no longer applies, and backtracking into such groups can occur
|
||||
This restriction no longer applies, and backtracking into such groups can occur
|
||||
as normal.
|
||||
.
|
||||
.
|
||||
|
@ -2431,26 +2431,13 @@ the "no" branch of the condition. For other failing negative assertions,
|
|||
control passes to the previous backtracking point, thus discarding any captured
|
||||
strings within the assertion.
|
||||
.P
|
||||
For compatibility with Perl, most assertion groups may be repeated; though it
|
||||
makes no sense to assert the same thing several times, the side effect of
|
||||
capturing may occasionally be useful. However, an assertion that forms the
|
||||
condition for a conditional group may not be quantified. In practice, for
|
||||
other assertions, there only three cases:
|
||||
.sp
|
||||
(1) If the quantifier is {0}, the assertion is never obeyed during matching.
|
||||
However, it may contain internal capture groups that are called from elsewhere
|
||||
via the
|
||||
.\" HTML <a href="#groupsassubroutines">
|
||||
.\" </a>
|
||||
subroutine mechanism.
|
||||
.\"
|
||||
.sp
|
||||
(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
|
||||
were {0,1}. At run time, the rest of the pattern match is tried with and
|
||||
without the assertion, the order depending on the greediness of the quantifier.
|
||||
.sp
|
||||
(3) If the minimum repetition is greater than zero, the quantifier is ignored.
|
||||
The assertion is obeyed just once when encountered during matching.
|
||||
Most assertion groups may be repeated; though it makes no sense to assert the
|
||||
same thing several times, the side effect of capturing in positive assertions
|
||||
may occasionally be useful. However, an assertion that forms the condition for
|
||||
a conditional group may not be quantified. PCRE2 used to restrict the
|
||||
repetition of assertions, but from release 10.35 the only restriction is that
|
||||
an unlimited maximum repetition is changed to be one more than the minimum. For
|
||||
example, {3,} is treated as {3,4}.
|
||||
.
|
||||
.
|
||||
.SS "Alphabetic assertion names"
|
||||
|
@ -3884,6 +3871,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 29 December 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
Last updated: 01 January 2020
|
||||
Copyright (c) 1997-2020 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
|||
|
||||
Written by Philip Hazel
|
||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||
New API code Copyright (c) 2016-2019 University of Cambridge
|
||||
New API code Copyright (c) 2016-2020 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
|
@ -7074,15 +7074,18 @@ for (;; pptr++)
|
|||
previous[GET(previous, 1)] != OP_ALT)
|
||||
goto END_REPEAT;
|
||||
|
||||
/* There is no sense in actually repeating assertions. The only
|
||||
potential use of repetition is in cases when the assertion is optional.
|
||||
Therefore, if the minimum is greater than zero, just ignore the repeat.
|
||||
If the maximum is not zero or one, set it to 1. */
|
||||
/* Perl allows all assertions to be quantified, and when they contain
|
||||
capturing parentheses and/or are optional there are potential uses for
|
||||
this feature. PCRE2 used to force the maximum quantifier to 1 on the
|
||||
invalid grounds that further repetition was never useful. This was
|
||||
always a bit pointless, since an assertion could be wrapped with a
|
||||
repeated group to achieve the effect. General repetition is now
|
||||
permitted, but if the maximum is unlimited it is set to one more than
|
||||
the minimum. */
|
||||
|
||||
if (op_previous < OP_ONCE) /* Assertion */
|
||||
{
|
||||
if (repeat_min > 0) goto END_REPEAT;
|
||||
if (repeat_max > 1) repeat_max = 1;
|
||||
if (repeat_max == REPEAT_UNLIMITED) repeat_max = repeat_min + 1;
|
||||
}
|
||||
|
||||
/* The case of a zero minimum is special because of the need to stick
|
||||
|
|
|
@ -6393,4 +6393,13 @@ ef) x/x,mark
|
|||
/^((\1+)|\d)+133X$/
|
||||
111133X
|
||||
|
||||
/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i
|
||||
The quick brown fox jumps over the lazy dog.
|
||||
Jackdaws love my big sphinx of quartz.
|
||||
Pack my box with five dozen liquor jugs.
|
||||
\= Expect no match
|
||||
The quick brown fox jumps over the lazy cat.
|
||||
Hackdaws love my big sphinx of quartz.
|
||||
Pack my fox with five dozen liquor jugs.
|
||||
|
||||
# End of testinput1
|
||||
|
|
|
@ -10126,4 +10126,25 @@ No match
|
|||
1: 11
|
||||
2: 11
|
||||
|
||||
/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i
|
||||
The quick brown fox jumps over the lazy dog.
|
||||
0:
|
||||
1: quick brown fox jumps over the lazy dog.
|
||||
2: q
|
||||
Jackdaws love my big sphinx of quartz.
|
||||
0:
|
||||
1: Jackdaws love my big sphinx of quartz.
|
||||
2: J
|
||||
Pack my box with five dozen liquor jugs.
|
||||
0:
|
||||
1: Pack my box with five dozen liquor jugs.
|
||||
2: P
|
||||
\= Expect no match
|
||||
The quick brown fox jumps over the lazy cat.
|
||||
No match
|
||||
Hackdaws love my big sphinx of quartz.
|
||||
No match
|
||||
Pack my fox with five dozen liquor jugs.
|
||||
No match
|
||||
|
||||
# End of testinput1
|
||||
|
|
|
@ -10962,6 +10962,12 @@ Matched, but too many substrings
|
|||
Assert
|
||||
abc
|
||||
Ket
|
||||
Assert
|
||||
abc
|
||||
Ket
|
||||
Assert
|
||||
abc
|
||||
Ket
|
||||
abc
|
||||
Ket
|
||||
End
|
||||
|
@ -10973,6 +10979,10 @@ Matched, but too many substrings
|
|||
Assert
|
||||
abc
|
||||
Ket
|
||||
Brazero
|
||||
Assert
|
||||
abc
|
||||
Ket
|
||||
abc
|
||||
Ket
|
||||
End
|
||||
|
@ -10981,9 +10991,15 @@ Matched, but too many substrings
|
|||
/(?=abc)++abc/B
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
Once
|
||||
Assert
|
||||
abc
|
||||
Ket
|
||||
Brazero
|
||||
Assert
|
||||
abc
|
||||
Ket
|
||||
Ket
|
||||
abc
|
||||
Ket
|
||||
End
|
||||
|
@ -16610,6 +16626,19 @@ No match
|
|||
Assert
|
||||
Any
|
||||
Ket
|
||||
Assert
|
||||
Any
|
||||
Ket
|
||||
Assert
|
||||
Any
|
||||
Ket
|
||||
Assert
|
||||
Any
|
||||
Ket
|
||||
Brazero
|
||||
Assert
|
||||
Any
|
||||
Ket
|
||||
x
|
||||
Ket
|
||||
Ket
|
||||
|
|
Loading…
Reference in New Issue