Fix partial matching bug in pcre2_dfa_match().

This commit is contained in:
Philip.Hazel 2019-06-26 16:13:28 +00:00
parent 434e3f7468
commit c0d0ee5365
4 changed files with 93 additions and 32 deletions

View File

@ -5,7 +5,7 @@ Change Log for PCRE2
Version 10.34 22-April-2019 Version 10.34 22-April-2019
--------------------------- ---------------------------
1. The maximum number of capturing subpatterns is 65535 (documented), but no 1. The maximum number of capturing subpatterns is 65535 (documented), but no
check on this was ever implemented. This omission has been rectified; it fixes check on this was ever implemented. This omission has been rectified; it fixes
ClusterFuzz 14376. ClusterFuzz 14376.
@ -25,40 +25,40 @@ PCRE2_MATCH_INVALID_UTF compile-time option.
7. Adjust the limit for "must have" code unit searching, in particular, 7. Adjust the limit for "must have" code unit searching, in particular,
increase it substantially for non-anchored patterns. increase it substantially for non-anchored patterns.
8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero 8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero
minimum is potentially useful. minimum is potentially useful.
9. Some changes to the way the minimum subject length is handled: 9. Some changes to the way the minimum subject length is handled:
* When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed; * When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed;
pcre2test now omits this item instead of showing a value of zero. pcre2test now omits this item instead of showing a value of zero.
* An incorrect minimum length could be calculated for a pattern that * An incorrect minimum length could be calculated for a pattern that
contained (*ACCEPT) inside a qualified group whose minimum repetition was contained (*ACCEPT) inside a qualified group whose minimum repetition was
zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum
of 2. The minimum length scan no longer happens for a pattern that of 2. The minimum length scan no longer happens for a pattern that
contains (*ACCEPT). contains (*ACCEPT).
* When no minimum length is set by the normal scan, but a first and/or last * When no minimum length is set by the normal scan, but a first and/or last
code unit is recorded, set the minimum to 1 or 2 as appropriate. code unit is recorded, set the minimum to 1 or 2 as appropriate.
* When a pattern contains multiple groups with the same number, a back * When a pattern contains multiple groups with the same number, a back
reference cannot know which one to scan for a minimum length. This used to reference cannot know which one to scan for a minimum length. This used to
cause the minimum length finder to give up with no result. Now it treats cause the minimum length finder to give up with no result. Now it treats
such references as not adding to the minimum length (which it should have such references as not adding to the minimum length (which it should have
done all along). done all along).
* Furthermore, the above action now happens only if the back reference is to * Furthermore, the above action now happens only if the back reference is to
a group that exists more than once in a pattern instead of any back a group that exists more than once in a pattern instead of any back
reference in a pattern with duplicate numbers. reference in a pattern with duplicate numbers.
10. A (*MARK) value inside a successful condition was not being returned by the 10. A (*MARK) value inside a successful condition was not being returned by the
interpretive matcher (it was returned by JIT). This bug has been mended. interpretive matcher (it was returned by JIT). This bug has been mended.
11. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work 11. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work
if the pattern had more than 32 capturing parentheses. This is fixed. In if the pattern had more than 32 capturing parentheses. This is fixed. In
addition (a) the default limit for groups requested by -o<n> has been raised to addition (a) the default limit for groups requested by -o<n> has been raised to
50, (b) the new --om-capture option changes the limit, (c) an error is raised 50, (b) the new --om-capture option changes the limit, (c) an error is raised
if -o asks for a group that is above the limit. if -o asks for a group that is above the limit.
12. The quantifier {1} was always being ignored, but this is incorrect when it 12. The quantifier {1} was always being ignored, but this is incorrect when it
@ -66,19 +66,23 @@ is made possessive and applied to an item in parentheses, because a
parenthesized item may contain multiple branches or other backtracking points, parenthesized item may contain multiple branches or other backtracking points,
for example /(a|ab){1}+c/ or /(a+){1}+a/. for example /(a|ab){1}+c/ or /(a+){1}+a/.
13. Nested lookbehinds are now taken into account when computing the maximum 13. Nested lookbehinds are now taken into account when computing the maximum
lookbehind value. For example /(?<=a(?<=ba)c)/ previously set a maximum lookbehind value. For example /(?<=a(?<=ba)c)/ previously set a maximum
lookbehind of 2, because that is the largest individual lookbehind. Now it sets lookbehind of 2, because that is the largest individual lookbehind. Now it sets
it to 3, because matching looks back 3 characters. it to 3, because matching looks back 3 characters.
14. For partial matches, pcre2test was always showing the maximum lookbehind 14. For partial matches, pcre2test was always showing the maximum lookbehind
characters, flagged with "<", which is misleading when the lookbehind didn't characters, flagged with "<", which is misleading when the lookbehind didn't
actually look behind the start (because it was later in the pattern). Showing actually look behind the start (because it was later in the pattern). Showing
all consulted preceding characters for partial matches is now controlled by the all consulted preceding characters for partial matches is now controlled by the
existing "allusedtext" modifier and, as for complete matches, this facility is existing "allusedtext" modifier and, as for complete matches, this facility is
available only for non-JIT matching, because JIT does not maintain the first available only for non-JIT matching, because JIT does not maintain the first
and last consulted characters. and last consulted characters.
15. DFA matching (using pcre2_dfa_match()) was not recognising a partial match
if the end of the subject was encountered in a lookahead (conditional or
otherwise), an atomic group, or a recursion.
Version 10.33 16-April-2019 Version 10.33 16-April-2019
--------------------------- ---------------------------

View File

@ -3152,8 +3152,8 @@ for (;;)
/* We have finished the processing at the current subject character. If no /* We have finished the processing at the current subject character. If no
new states have been set for the next character, we have found all the new states have been set for the next character, we have found all the
matches that we are going to find. If we are at the top level and partial matches that we are going to find. If partial matching has been requested,
matching has been requested, check for appropriate conditions. check for appropriate conditions.
The "forced_ fail" variable counts the number of (*F) encountered for the The "forced_ fail" variable counts the number of (*F) encountered for the
character. If it is equal to the original active_count (saved in character. If it is equal to the original active_count (saved in
@ -3165,8 +3165,7 @@ for (;;)
if (new_count <= 0) if (new_count <= 0)
{ {
if (rlevel == 1 && /* Top level, and */ if (could_continue && /* Some could go on, and */
could_continue && /* Some could go on, and */
forced_fail != workspace[1] && /* Not all forced fail & */ forced_fail != workspace[1] && /* Not all forced fail & */
( /* either... */ ( /* either... */
(mb->moptions & PCRE2_PARTIAL_HARD) != 0 /* Hard partial */ (mb->moptions & PCRE2_PARTIAL_HARD) != 0 /* Hard partial */
@ -3175,8 +3174,8 @@ for (;;)
match_count < 0) /* no matches */ match_count < 0) /* no matches */
) && /* And... */ ) && /* And... */
( (
partial_newline || /* Either partial NL */ partial_newline || /* Either partial NL */
( /* or ... */ ( /* or ... */
ptr >= end_subject && /* End of subject and */ ptr >= end_subject && /* End of subject and */
ptr > mb->start_used_ptr) /* Inspected non-empty string */ ptr > mb->start_used_ptr) /* Inspected non-empty string */
) )

22
testdata/testinput6 vendored
View File

@ -4972,4 +4972,26 @@
\= Expect no match \= Expect no match
0 0
/(?<=pqr)abc(?=xyz)/
123pqrabcxy\=ps,allusedtext
123pqrabcxyz\=ps,allusedtext
/(?>a+b)/
aaaa\=ps
aaaab\=ps
/(abc)(?1)/
abca\=ps
abcabc\=ps
/(?(?=abc).*|Z)/
ab\=ps
abcxyz\=ps
/(abc)++x/
abcab\=ps
abc\=ps
ab\=ps
abcx
# End of testinput6 # End of testinput6

36
testdata/testoutput6 vendored
View File

@ -7809,4 +7809,40 @@ No match
0 0
No match No match
/(?<=pqr)abc(?=xyz)/
123pqrabcxy\=ps,allusedtext
Partial match: pqrabcxy
<<<
123pqrabcxyz\=ps,allusedtext
0: pqrabcxyz
<<< >>>
/(?>a+b)/
aaaa\=ps
Partial match: aaaa
aaaab\=ps
0: aaaab
/(abc)(?1)/
abca\=ps
Partial match: abca
abcabc\=ps
0: abcabc
/(?(?=abc).*|Z)/
ab\=ps
Partial match: ab
abcxyz\=ps
0: abcxyz
/(abc)++x/
abcab\=ps
Partial match: abcab
abc\=ps
Partial match: abc
ab\=ps
Partial match: ab
abcx
0: abcx
# End of testinput6 # End of testinput6