Allow real repetition of assertions.
This commit is contained in:
parent
eaf4572ff8
commit
5ba5230b82
|
@ -32,6 +32,13 @@ now correctly backtracked, so this unnecessary restriction has been removed.
|
||||||
regex engine. The Perl regex folks are aware of this usage and have made a note
|
regex engine. The Perl regex folks are aware of this usage and have made a note
|
||||||
about it.
|
about it.
|
||||||
|
|
||||||
|
9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to
|
||||||
|
1, believing that repeating an assertion is pointless. However, if a positive
|
||||||
|
assertion contains capturing groups, repetition can be useful. In any case, an
|
||||||
|
assertion could always be wrapped in a repeated group. The only restriction
|
||||||
|
that is now imposed is that an unlimited maximum is changed to one more than
|
||||||
|
the minimum.
|
||||||
|
|
||||||
|
|
||||||
Version 10.34 21-November-2019
|
Version 10.34 21-November-2019
|
||||||
------------------------------
|
------------------------------
|
||||||
|
|
|
@ -1901,8 +1901,8 @@ are permitted for groups with the same number, for example:
|
||||||
(?|(?<AA>aa)|(?<AA>bb))
|
(?|(?<AA>aa)|(?<AA>bb))
|
||||||
</pre>
|
</pre>
|
||||||
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
|
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
|
||||||
option at compile time, or by the use of (?J) within the pattern, as described
|
option at compile time, or by the use of (?J) within the pattern, as described
|
||||||
in the section entitled
|
in the section entitled
|
||||||
<a href="#internaloptions">"Internal Option Setting"</a>
|
<a href="#internaloptions">"Internal Option Setting"</a>
|
||||||
above.
|
above.
|
||||||
</P>
|
</P>
|
||||||
|
@ -1968,7 +1968,7 @@ items:
|
||||||
an escape such as \d or \pL that matches a single character
|
an escape such as \d or \pL that matches a single character
|
||||||
a character class
|
a character class
|
||||||
a backreference
|
a backreference
|
||||||
a parenthesized group (including most assertions)
|
a parenthesized group (including lookaround assertions)
|
||||||
a subroutine call (recursive or otherwise)
|
a subroutine call (recursive or otherwise)
|
||||||
</pre>
|
</pre>
|
||||||
The general repetition quantifier specifies a minimum and maximum number of
|
The general repetition quantifier specifies a minimum and maximum number of
|
||||||
|
@ -2359,7 +2359,7 @@ of zero.
|
||||||
For versions of PCRE2 less than 10.25, backreferences of this type used to
|
For versions of PCRE2 less than 10.25, backreferences of this type used to
|
||||||
cause the group that they reference to be treated as an
|
cause the group that they reference to be treated as an
|
||||||
<a href="#atomicgroup">atomic group.</a>
|
<a href="#atomicgroup">atomic group.</a>
|
||||||
This restriction no longer applies, and backtracking into such groups can occur
|
This restriction no longer applies, and backtracking into such groups can occur
|
||||||
as normal.
|
as normal.
|
||||||
<a name="bigassertions"></a></P>
|
<a name="bigassertions"></a></P>
|
||||||
<br><a name="SEC20" href="#TOC1">ASSERTIONS</a><br>
|
<br><a name="SEC20" href="#TOC1">ASSERTIONS</a><br>
|
||||||
|
@ -2420,26 +2420,13 @@ control passes to the previous backtracking point, thus discarding any captured
|
||||||
strings within the assertion.
|
strings within the assertion.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
For compatibility with Perl, most assertion groups may be repeated; though it
|
Most assertion groups may be repeated; though it makes no sense to assert the
|
||||||
makes no sense to assert the same thing several times, the side effect of
|
same thing several times, the side effect of capturing in positive assertions
|
||||||
capturing may occasionally be useful. However, an assertion that forms the
|
may occasionally be useful. However, an assertion that forms the condition for
|
||||||
condition for a conditional group may not be quantified. In practice, for
|
a conditional group may not be quantified. PCRE2 used to restrict the
|
||||||
other assertions, there only three cases:
|
repetition of assertions, but from release 10.35 the only restriction is that
|
||||||
<br>
|
an unlimited maximum repetition is changed to be one more than the minimum. For
|
||||||
<br>
|
example, {3,} is treated as {3,4}.
|
||||||
(1) If the quantifier is {0}, the assertion is never obeyed during matching.
|
|
||||||
However, it may contain internal capture groups that are called from elsewhere
|
|
||||||
via the
|
|
||||||
<a href="#groupsassubroutines">subroutine mechanism.</a>
|
|
||||||
<br>
|
|
||||||
<br>
|
|
||||||
(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
|
|
||||||
were {0,1}. At run time, the rest of the pattern match is tried with and
|
|
||||||
without the assertion, the order depending on the greediness of the quantifier.
|
|
||||||
<br>
|
|
||||||
<br>
|
|
||||||
(3) If the minimum repetition is greater than zero, the quantifier is ignored.
|
|
||||||
The assertion is obeyed just once when encountered during matching.
|
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Alphabetic assertion names
|
Alphabetic assertion names
|
||||||
|
@ -3840,9 +3827,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 29 December 2019
|
Last updated: 01 January 2020
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2020 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -7729,7 +7729,7 @@ REPETITION
|
||||||
an escape such as \d or \pL that matches a single character
|
an escape such as \d or \pL that matches a single character
|
||||||
a character class
|
a character class
|
||||||
a backreference
|
a backreference
|
||||||
a parenthesized group (including most assertions)
|
a parenthesized group (including lookaround assertions)
|
||||||
a subroutine call (recursive or otherwise)
|
a subroutine call (recursive or otherwise)
|
||||||
|
|
||||||
The general repetition quantifier specifies a minimum and maximum num-
|
The general repetition quantifier specifies a minimum and maximum num-
|
||||||
|
@ -8162,24 +8162,14 @@ ASSERTIONS
|
||||||
passes to the previous backtracking point, thus discarding any captured
|
passes to the previous backtracking point, thus discarding any captured
|
||||||
strings within the assertion.
|
strings within the assertion.
|
||||||
|
|
||||||
For compatibility with Perl, most assertion groups may be repeated;
|
Most assertion groups may be repeated; though it makes no sense to as-
|
||||||
though it makes no sense to assert the same thing several times, the
|
sert the same thing several times, the side effect of capturing in pos-
|
||||||
side effect of capturing may occasionally be useful. However, an asser-
|
itive assertions may occasionally be useful. However, an assertion that
|
||||||
tion that forms the condition for a conditional group may not be quan-
|
forms the condition for a conditional group may not be quantified.
|
||||||
tified. In practice, for other assertions, there only three cases:
|
PCRE2 used to restrict the repetition of assertions, but from release
|
||||||
|
10.35 the only restriction is that an unlimited maximum repetition is
|
||||||
(1) If the quantifier is {0}, the assertion is never obeyed during
|
changed to be one more than the minimum. For example, {3,} is treated
|
||||||
matching. However, it may contain internal capture groups that are
|
as {3,4}.
|
||||||
called from elsewhere via the subroutine mechanism.
|
|
||||||
|
|
||||||
(2) If quantifier is {0,n} where n is greater than zero, it is treated
|
|
||||||
as if it were {0,1}. At run time, the rest of the pattern match is
|
|
||||||
tried with and without the assertion, the order depending on the greed-
|
|
||||||
iness of the quantifier.
|
|
||||||
|
|
||||||
(3) If the minimum repetition is greater than zero, the quantifier is
|
|
||||||
ignored. The assertion is obeyed just once when encountered during
|
|
||||||
matching.
|
|
||||||
|
|
||||||
Alphabetic assertion names
|
Alphabetic assertion names
|
||||||
|
|
||||||
|
@ -9490,8 +9480,8 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 29 December 2019
|
Last updated: 01 January 2020
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2020 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "29 December 2019" "PCRE2 10.35"
|
.TH PCRE2PATTERN 3 "01 January 2020" "PCRE2 10.35"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -1902,8 +1902,8 @@ are permitted for groups with the same number, for example:
|
||||||
(?|(?<AA>aa)|(?<AA>bb))
|
(?|(?<AA>aa)|(?<AA>bb))
|
||||||
.sp
|
.sp
|
||||||
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
|
The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES
|
||||||
option at compile time, or by the use of (?J) within the pattern, as described
|
option at compile time, or by the use of (?J) within the pattern, as described
|
||||||
in the section entitled
|
in the section entitled
|
||||||
.\" HTML <a href="#internaloptions">
|
.\" HTML <a href="#internaloptions">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
"Internal Option Setting"
|
"Internal Option Setting"
|
||||||
|
@ -1975,7 +1975,7 @@ items:
|
||||||
an escape such as \ed or \epL that matches a single character
|
an escape such as \ed or \epL that matches a single character
|
||||||
a character class
|
a character class
|
||||||
a backreference
|
a backreference
|
||||||
a parenthesized group (including most assertions)
|
a parenthesized group (including lookaround assertions)
|
||||||
a subroutine call (recursive or otherwise)
|
a subroutine call (recursive or otherwise)
|
||||||
.sp
|
.sp
|
||||||
The general repetition quantifier specifies a minimum and maximum number of
|
The general repetition quantifier specifies a minimum and maximum number of
|
||||||
|
@ -2362,7 +2362,7 @@ cause the group that they reference to be treated as an
|
||||||
.\" </a>
|
.\" </a>
|
||||||
atomic group.
|
atomic group.
|
||||||
.\"
|
.\"
|
||||||
This restriction no longer applies, and backtracking into such groups can occur
|
This restriction no longer applies, and backtracking into such groups can occur
|
||||||
as normal.
|
as normal.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
@ -2431,26 +2431,13 @@ the "no" branch of the condition. For other failing negative assertions,
|
||||||
control passes to the previous backtracking point, thus discarding any captured
|
control passes to the previous backtracking point, thus discarding any captured
|
||||||
strings within the assertion.
|
strings within the assertion.
|
||||||
.P
|
.P
|
||||||
For compatibility with Perl, most assertion groups may be repeated; though it
|
Most assertion groups may be repeated; though it makes no sense to assert the
|
||||||
makes no sense to assert the same thing several times, the side effect of
|
same thing several times, the side effect of capturing in positive assertions
|
||||||
capturing may occasionally be useful. However, an assertion that forms the
|
may occasionally be useful. However, an assertion that forms the condition for
|
||||||
condition for a conditional group may not be quantified. In practice, for
|
a conditional group may not be quantified. PCRE2 used to restrict the
|
||||||
other assertions, there only three cases:
|
repetition of assertions, but from release 10.35 the only restriction is that
|
||||||
.sp
|
an unlimited maximum repetition is changed to be one more than the minimum. For
|
||||||
(1) If the quantifier is {0}, the assertion is never obeyed during matching.
|
example, {3,} is treated as {3,4}.
|
||||||
However, it may contain internal capture groups that are called from elsewhere
|
|
||||||
via the
|
|
||||||
.\" HTML <a href="#groupsassubroutines">
|
|
||||||
.\" </a>
|
|
||||||
subroutine mechanism.
|
|
||||||
.\"
|
|
||||||
.sp
|
|
||||||
(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
|
|
||||||
were {0,1}. At run time, the rest of the pattern match is tried with and
|
|
||||||
without the assertion, the order depending on the greediness of the quantifier.
|
|
||||||
.sp
|
|
||||||
(3) If the minimum repetition is greater than zero, the quantifier is ignored.
|
|
||||||
The assertion is obeyed just once when encountered during matching.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Alphabetic assertion names"
|
.SS "Alphabetic assertion names"
|
||||||
|
@ -3884,6 +3871,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 29 December 2019
|
Last updated: 01 January 2020
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2020 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2019 University of Cambridge
|
New API code Copyright (c) 2016-2020 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -7074,15 +7074,18 @@ for (;; pptr++)
|
||||||
previous[GET(previous, 1)] != OP_ALT)
|
previous[GET(previous, 1)] != OP_ALT)
|
||||||
goto END_REPEAT;
|
goto END_REPEAT;
|
||||||
|
|
||||||
/* There is no sense in actually repeating assertions. The only
|
/* Perl allows all assertions to be quantified, and when they contain
|
||||||
potential use of repetition is in cases when the assertion is optional.
|
capturing parentheses and/or are optional there are potential uses for
|
||||||
Therefore, if the minimum is greater than zero, just ignore the repeat.
|
this feature. PCRE2 used to force the maximum quantifier to 1 on the
|
||||||
If the maximum is not zero or one, set it to 1. */
|
invalid grounds that further repetition was never useful. This was
|
||||||
|
always a bit pointless, since an assertion could be wrapped with a
|
||||||
|
repeated group to achieve the effect. General repetition is now
|
||||||
|
permitted, but if the maximum is unlimited it is set to one more than
|
||||||
|
the minimum. */
|
||||||
|
|
||||||
if (op_previous < OP_ONCE) /* Assertion */
|
if (op_previous < OP_ONCE) /* Assertion */
|
||||||
{
|
{
|
||||||
if (repeat_min > 0) goto END_REPEAT;
|
if (repeat_max == REPEAT_UNLIMITED) repeat_max = repeat_min + 1;
|
||||||
if (repeat_max > 1) repeat_max = 1;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/* The case of a zero minimum is special because of the need to stick
|
/* The case of a zero minimum is special because of the need to stick
|
||||||
|
|
|
@ -6393,4 +6393,13 @@ ef) x/x,mark
|
||||||
/^((\1+)|\d)+133X$/
|
/^((\1+)|\d)+133X$/
|
||||||
111133X
|
111133X
|
||||||
|
|
||||||
|
/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i
|
||||||
|
The quick brown fox jumps over the lazy dog.
|
||||||
|
Jackdaws love my big sphinx of quartz.
|
||||||
|
Pack my box with five dozen liquor jugs.
|
||||||
|
\= Expect no match
|
||||||
|
The quick brown fox jumps over the lazy cat.
|
||||||
|
Hackdaws love my big sphinx of quartz.
|
||||||
|
Pack my fox with five dozen liquor jugs.
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -10126,4 +10126,25 @@ No match
|
||||||
1: 11
|
1: 11
|
||||||
2: 11
|
2: 11
|
||||||
|
|
||||||
|
/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i
|
||||||
|
The quick brown fox jumps over the lazy dog.
|
||||||
|
0:
|
||||||
|
1: quick brown fox jumps over the lazy dog.
|
||||||
|
2: q
|
||||||
|
Jackdaws love my big sphinx of quartz.
|
||||||
|
0:
|
||||||
|
1: Jackdaws love my big sphinx of quartz.
|
||||||
|
2: J
|
||||||
|
Pack my box with five dozen liquor jugs.
|
||||||
|
0:
|
||||||
|
1: Pack my box with five dozen liquor jugs.
|
||||||
|
2: P
|
||||||
|
\= Expect no match
|
||||||
|
The quick brown fox jumps over the lazy cat.
|
||||||
|
No match
|
||||||
|
Hackdaws love my big sphinx of quartz.
|
||||||
|
No match
|
||||||
|
Pack my fox with five dozen liquor jugs.
|
||||||
|
No match
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -10962,6 +10962,12 @@ Matched, but too many substrings
|
||||||
Assert
|
Assert
|
||||||
abc
|
abc
|
||||||
Ket
|
Ket
|
||||||
|
Assert
|
||||||
|
abc
|
||||||
|
Ket
|
||||||
|
Assert
|
||||||
|
abc
|
||||||
|
Ket
|
||||||
abc
|
abc
|
||||||
Ket
|
Ket
|
||||||
End
|
End
|
||||||
|
@ -10973,6 +10979,10 @@ Matched, but too many substrings
|
||||||
Assert
|
Assert
|
||||||
abc
|
abc
|
||||||
Ket
|
Ket
|
||||||
|
Brazero
|
||||||
|
Assert
|
||||||
|
abc
|
||||||
|
Ket
|
||||||
abc
|
abc
|
||||||
Ket
|
Ket
|
||||||
End
|
End
|
||||||
|
@ -10981,9 +10991,15 @@ Matched, but too many substrings
|
||||||
/(?=abc)++abc/B
|
/(?=abc)++abc/B
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
Bra
|
Bra
|
||||||
|
Once
|
||||||
Assert
|
Assert
|
||||||
abc
|
abc
|
||||||
Ket
|
Ket
|
||||||
|
Brazero
|
||||||
|
Assert
|
||||||
|
abc
|
||||||
|
Ket
|
||||||
|
Ket
|
||||||
abc
|
abc
|
||||||
Ket
|
Ket
|
||||||
End
|
End
|
||||||
|
@ -16610,6 +16626,19 @@ No match
|
||||||
Assert
|
Assert
|
||||||
Any
|
Any
|
||||||
Ket
|
Ket
|
||||||
|
Assert
|
||||||
|
Any
|
||||||
|
Ket
|
||||||
|
Assert
|
||||||
|
Any
|
||||||
|
Ket
|
||||||
|
Assert
|
||||||
|
Any
|
||||||
|
Ket
|
||||||
|
Brazero
|
||||||
|
Assert
|
||||||
|
Any
|
||||||
|
Ket
|
||||||
x
|
x
|
||||||
Ket
|
Ket
|
||||||
Ket
|
Ket
|
||||||
|
|
Loading…
Reference in New Issue