Fix incorrect first matching character when a backreference with zero minimum

repeat starts a pattern (possibly after assertions).
This commit is contained in:
Philip.Hazel 2017-12-12 15:01:51 +00:00
parent 1a81b738fe
commit 59d85d7b55
4 changed files with 44 additions and 1 deletions

View File

@ -65,6 +65,11 @@ were all int variables, causing overflow when files with more than 2147483647
lines were processed (assuming 32-bit ints). They have all been changed to lines were processed (assuming 32-bit ints). They have all been changed to
unsigned long ints. unsigned long ints.
17. If a backreference with a minimum repeat count of zero was first in a
pattern, apart from assertions, an incorrect first matching character could be
recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set
as the first character of a match.
Version 10.30 14-August-2017 Version 10.30 14-August-2017
---------------------------- ----------------------------

View File

@ -7135,7 +7135,7 @@ for (;; pptr++)
later. */ later. */
HANDLE_SINGLE_REFERENCE: HANDLE_SINGLE_REFERENCE:
if (firstcuflags == REQ_UNSET) firstcuflags = REQ_NONE; if (firstcuflags == REQ_UNSET) zerofirstcuflags = firstcuflags = REQ_NONE;
*code++ = ((options & PCRE2_CASELESS) != 0)? OP_REFI : OP_REF; *code++ = ((options & PCRE2_CASELESS) != 0)? OP_REFI : OP_REF;
PUT2INC(code, 0, meta_arg); PUT2INC(code, 0, meta_arg);

10
testdata/testinput2 vendored
View File

@ -5375,4 +5375,14 @@ a)"xI
/[\d-[:print:]]/ /[\d-[:print:]]/
# Perl gets the second of these wrong, giving no match.
"(?<=(a))\1?b"I
ab
aaab
"(?=(a))\1?b"I
ab
aaab
# End of testinput2 # End of testinput2

28
testdata/testoutput2 vendored
View File

@ -16340,6 +16340,34 @@ Failed: error 150 at offset 3: invalid range in character class
/[\d-[:print:]]/ /[\d-[:print:]]/
Failed: error 150 at offset 3: invalid range in character class Failed: error 150 at offset 3: invalid range in character class
# Perl gets the second of these wrong, giving no match.
"(?<=(a))\1?b"I
Capturing subpattern count = 1
Max back reference = 1
Max lookbehind = 1
Last code unit = 'b'
Subject length lower bound = 1
ab
0: b
1: a
aaab
0: ab
1: a
"(?=(a))\1?b"I
Capturing subpattern count = 1
Max back reference = 1
Starting code units: a
Last code unit = 'b'
Subject length lower bound = 1
ab
0: ab
1: a
aaab
0: ab
1: a
# End of testinput2 # End of testinput2
Error -65: PCRE2_ERROR_BADDATA (unknown error number) Error -65: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data Error -62: bad serialized data