Fix partial matching bug in pcre2_dfa_match().
This commit is contained in:
parent
434e3f7468
commit
c0d0ee5365
56
ChangeLog
56
ChangeLog
|
@ -5,7 +5,7 @@ Change Log for PCRE2
|
|||
Version 10.34 22-April-2019
|
||||
---------------------------
|
||||
|
||||
1. The maximum number of capturing subpatterns is 65535 (documented), but no
|
||||
1. The maximum number of capturing subpatterns is 65535 (documented), but no
|
||||
check on this was ever implemented. This omission has been rectified; it fixes
|
||||
ClusterFuzz 14376.
|
||||
|
||||
|
@ -25,40 +25,40 @@ PCRE2_MATCH_INVALID_UTF compile-time option.
|
|||
7. Adjust the limit for "must have" code unit searching, in particular,
|
||||
increase it substantially for non-anchored patterns.
|
||||
|
||||
8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero
|
||||
8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero
|
||||
minimum is potentially useful.
|
||||
|
||||
9. Some changes to the way the minimum subject length is handled:
|
||||
|
||||
* When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed;
|
||||
* When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed;
|
||||
pcre2test now omits this item instead of showing a value of zero.
|
||||
|
||||
* An incorrect minimum length could be calculated for a pattern that
|
||||
contained (*ACCEPT) inside a qualified group whose minimum repetition was
|
||||
|
||||
* An incorrect minimum length could be calculated for a pattern that
|
||||
contained (*ACCEPT) inside a qualified group whose minimum repetition was
|
||||
zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum
|
||||
of 2. The minimum length scan no longer happens for a pattern that
|
||||
of 2. The minimum length scan no longer happens for a pattern that
|
||||
contains (*ACCEPT).
|
||||
|
||||
* When no minimum length is set by the normal scan, but a first and/or last
|
||||
|
||||
* When no minimum length is set by the normal scan, but a first and/or last
|
||||
code unit is recorded, set the minimum to 1 or 2 as appropriate.
|
||||
|
||||
|
||||
* When a pattern contains multiple groups with the same number, a back
|
||||
reference cannot know which one to scan for a minimum length. This used to
|
||||
cause the minimum length finder to give up with no result. Now it treats
|
||||
such references as not adding to the minimum length (which it should have
|
||||
such references as not adding to the minimum length (which it should have
|
||||
done all along).
|
||||
|
||||
* Furthermore, the above action now happens only if the back reference is to
|
||||
a group that exists more than once in a pattern instead of any back
|
||||
reference in a pattern with duplicate numbers.
|
||||
|
||||
10. A (*MARK) value inside a successful condition was not being returned by the
|
||||
|
||||
* Furthermore, the above action now happens only if the back reference is to
|
||||
a group that exists more than once in a pattern instead of any back
|
||||
reference in a pattern with duplicate numbers.
|
||||
|
||||
10. A (*MARK) value inside a successful condition was not being returned by the
|
||||
interpretive matcher (it was returned by JIT). This bug has been mended.
|
||||
|
||||
11. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work
|
||||
if the pattern had more than 32 capturing parentheses. This is fixed. In
|
||||
addition (a) the default limit for groups requested by -o<n> has been raised to
|
||||
50, (b) the new --om-capture option changes the limit, (c) an error is raised
|
||||
11. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work
|
||||
if the pattern had more than 32 capturing parentheses. This is fixed. In
|
||||
addition (a) the default limit for groups requested by -o<n> has been raised to
|
||||
50, (b) the new --om-capture option changes the limit, (c) an error is raised
|
||||
if -o asks for a group that is above the limit.
|
||||
|
||||
12. The quantifier {1} was always being ignored, but this is incorrect when it
|
||||
|
@ -66,19 +66,23 @@ is made possessive and applied to an item in parentheses, because a
|
|||
parenthesized item may contain multiple branches or other backtracking points,
|
||||
for example /(a|ab){1}+c/ or /(a+){1}+a/.
|
||||
|
||||
13. Nested lookbehinds are now taken into account when computing the maximum
|
||||
lookbehind value. For example /(?<=a(?<=ba)c)/ previously set a maximum
|
||||
lookbehind of 2, because that is the largest individual lookbehind. Now it sets
|
||||
13. Nested lookbehinds are now taken into account when computing the maximum
|
||||
lookbehind value. For example /(?<=a(?<=ba)c)/ previously set a maximum
|
||||
lookbehind of 2, because that is the largest individual lookbehind. Now it sets
|
||||
it to 3, because matching looks back 3 characters.
|
||||
|
||||
14. For partial matches, pcre2test was always showing the maximum lookbehind
|
||||
characters, flagged with "<", which is misleading when the lookbehind didn't
|
||||
14. For partial matches, pcre2test was always showing the maximum lookbehind
|
||||
characters, flagged with "<", which is misleading when the lookbehind didn't
|
||||
actually look behind the start (because it was later in the pattern). Showing
|
||||
all consulted preceding characters for partial matches is now controlled by the
|
||||
existing "allusedtext" modifier and, as for complete matches, this facility is
|
||||
available only for non-JIT matching, because JIT does not maintain the first
|
||||
and last consulted characters.
|
||||
|
||||
15. DFA matching (using pcre2_dfa_match()) was not recognising a partial match
|
||||
if the end of the subject was encountered in a lookahead (conditional or
|
||||
otherwise), an atomic group, or a recursion.
|
||||
|
||||
|
||||
Version 10.33 16-April-2019
|
||||
---------------------------
|
||||
|
|
|
@ -3152,8 +3152,8 @@ for (;;)
|
|||
|
||||
/* We have finished the processing at the current subject character. If no
|
||||
new states have been set for the next character, we have found all the
|
||||
matches that we are going to find. If we are at the top level and partial
|
||||
matching has been requested, check for appropriate conditions.
|
||||
matches that we are going to find. If partial matching has been requested,
|
||||
check for appropriate conditions.
|
||||
|
||||
The "forced_ fail" variable counts the number of (*F) encountered for the
|
||||
character. If it is equal to the original active_count (saved in
|
||||
|
@ -3165,8 +3165,7 @@ for (;;)
|
|||
|
||||
if (new_count <= 0)
|
||||
{
|
||||
if (rlevel == 1 && /* Top level, and */
|
||||
could_continue && /* Some could go on, and */
|
||||
if (could_continue && /* Some could go on, and */
|
||||
forced_fail != workspace[1] && /* Not all forced fail & */
|
||||
( /* either... */
|
||||
(mb->moptions & PCRE2_PARTIAL_HARD) != 0 /* Hard partial */
|
||||
|
@ -3175,8 +3174,8 @@ for (;;)
|
|||
match_count < 0) /* no matches */
|
||||
) && /* And... */
|
||||
(
|
||||
partial_newline || /* Either partial NL */
|
||||
( /* or ... */
|
||||
partial_newline || /* Either partial NL */
|
||||
( /* or ... */
|
||||
ptr >= end_subject && /* End of subject and */
|
||||
ptr > mb->start_used_ptr) /* Inspected non-empty string */
|
||||
)
|
||||
|
|
|
@ -4972,4 +4972,26 @@
|
|||
\= Expect no match
|
||||
0
|
||||
|
||||
/(?<=pqr)abc(?=xyz)/
|
||||
123pqrabcxy\=ps,allusedtext
|
||||
123pqrabcxyz\=ps,allusedtext
|
||||
|
||||
/(?>a+b)/
|
||||
aaaa\=ps
|
||||
aaaab\=ps
|
||||
|
||||
/(abc)(?1)/
|
||||
abca\=ps
|
||||
abcabc\=ps
|
||||
|
||||
/(?(?=abc).*|Z)/
|
||||
ab\=ps
|
||||
abcxyz\=ps
|
||||
|
||||
/(abc)++x/
|
||||
abcab\=ps
|
||||
abc\=ps
|
||||
ab\=ps
|
||||
abcx
|
||||
|
||||
# End of testinput6
|
||||
|
|
|
@ -7809,4 +7809,40 @@ No match
|
|||
0
|
||||
No match
|
||||
|
||||
/(?<=pqr)abc(?=xyz)/
|
||||
123pqrabcxy\=ps,allusedtext
|
||||
Partial match: pqrabcxy
|
||||
<<<
|
||||
123pqrabcxyz\=ps,allusedtext
|
||||
0: pqrabcxyz
|
||||
<<< >>>
|
||||
|
||||
/(?>a+b)/
|
||||
aaaa\=ps
|
||||
Partial match: aaaa
|
||||
aaaab\=ps
|
||||
0: aaaab
|
||||
|
||||
/(abc)(?1)/
|
||||
abca\=ps
|
||||
Partial match: abca
|
||||
abcabc\=ps
|
||||
0: abcabc
|
||||
|
||||
/(?(?=abc).*|Z)/
|
||||
ab\=ps
|
||||
Partial match: ab
|
||||
abcxyz\=ps
|
||||
0: abcxyz
|
||||
|
||||
/(abc)++x/
|
||||
abcab\=ps
|
||||
Partial match: abcab
|
||||
abc\=ps
|
||||
Partial match: abc
|
||||
ab\=ps
|
||||
Partial match: ab
|
||||
abcx
|
||||
0: abcx
|
||||
|
||||
# End of testinput6
|
||||
|
|
Loading…
Reference in New Issue