Fix bug when a character > 0xffff appears in a lookbehind within a lookbehind.

This commit is contained in:
Philip.Hazel 2016-12-24 16:25:11 +00:00
parent 6c48775955
commit a7a25ed91d
4 changed files with 42 additions and 20 deletions

View File

@ -48,12 +48,12 @@ parenthesis item, not the length of the whole group. A length of zero is now
given only for a callout at the end of the pattern. Automatic callouts are no given only for a callout at the end of the pattern. Automatic callouts are no
longer inserted before and after explicit callouts in the pattern. longer inserted before and after explicit callouts in the pattern.
A number of bugs in the refactored code were subsequently fixed before release, A number of bugs in the refactored code were subsequently fixed during testing
but after the code was made available in the repository. Many of the bugs were before release, but after the code was made available in the repository. Many
discovered by fuzzing testing. Several of them were related to the change from of the bugs were discovered by fuzzing testing. Several of them were related to
assuming a zero-terminated pattern (which previously had required non-zero the change from assuming a zero-terminated pattern (which previously had
terminated strings to be copied). These bugs were never in released code, but required non-zero terminated strings to be copied). These bugs were never in
are noted here for the record. fully released code, but are noted here for the record.
(a) An overall recursion such as (?0) inside a lookbehind assertion was not (a) An overall recursion such as (?0) inside a lookbehind assertion was not
being diagnosed as an error. being diagnosed as an error.
@ -107,13 +107,17 @@ are noted here for the record.
followed by '?' or '+', and there was at least one literal character followed by '?' or '+', and there was at least one literal character
between them, an internal error "unexpected repeat" occurred (example: between them, an internal error "unexpected repeat" occurred (example:
/.+\QX\E+/). /.+\QX\E+/).
(p) A buffer overflow could occur while sorting the names in the group name (p) A buffer overflow could occur while sorting the names in the group name
list (depending on the order in which the names were seen). list (depending on the order in which the names were seen).
(q) A conditional group that started with a callout was not doing the right (q) A conditional group that started with a callout was not doing the right
check for a following assertion, leading to compiling bad code. Example: check for a following assertion, leading to compiling bad code. Example:
/(?(C'XX))?!XX/ /(?(C'XX))?!XX/
(r) If a character whose code point was greater than 0xffff appeared within
a lookbehind that was within another lookbehind, the calculation of the
lookbehind length went wrong and could provoke an internal error.
4. Back references are now permitted in lookbehind assertions when there are 4. Back references are now permitted in lookbehind assertions when there are
no duplicated group numbers (that is, (?| has not been used), and, if the no duplicated group numbers (that is, (?| has not been used), and, if the
@ -231,24 +235,24 @@ followed by a caseful back reference, could lose the caselessness of the first
repeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX repeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX
but didn't). but didn't).
35. When a pattern is too complicated, PCRE2 gives up trying to find a minimum 35. When a pattern is too complicated, PCRE2 gives up trying to find a minimum
matching length and just records zero. Typically this happens when there are matching length and just records zero. Typically this happens when there are
too many nested or recursive back references. If the limit was reached in too many nested or recursive back references. If the limit was reached in
certain recursive cases it failed to be triggered and an internal error could certain recursive cases it failed to be triggered and an internal error could
be the result. be the result.
36. The pcre2_dfa_match() function now takes note of the recursion limit for 36. The pcre2_dfa_match() function now takes note of the recursion limit for
the internal recursive calls that are used for lookrounds and recursions within the internal recursive calls that are used for lookrounds and recursions within
the pattern. the pattern.
37. More refactoring has got rid of the internal could_be_empty_branch() 37. More refactoring has got rid of the internal could_be_empty_branch()
function (around 400 lines of code, including comments) by keeping track of function (around 400 lines of code, including comments) by keeping track of
could-be-emptiness as the pattern is compiled instead of scanning compiled could-be-emptiness as the pattern is compiled instead of scanning compiled
groups. (This would have been much harder before the refactoring of #3 above.) groups. (This would have been much harder before the refactoring of #3 above.)
This lifts a restriction on the number of branches in a group (more than about This lifts a restriction on the number of branches in a group (more than about
1100 would give "pattern is too complicated"). 1100 would give "pattern is too complicated").
38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern 38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern
auto_callout". auto_callout".

View File

@ -7924,6 +7924,7 @@ Arguments:
Returns: new value of pptr Returns: new value of pptr
NULL if META_END is reached - should never occur NULL if META_END is reached - should never occur
or for an unknown meta value - likewise
*/ */
static uint32_t * static uint32_t *
@ -7934,9 +7935,11 @@ uint32_t nestlevel = 0;
for (pptr += 1;; pptr++) for (pptr += 1;; pptr++)
{ {
uint32_t meta = META_CODE(*pptr); uint32_t meta = META_CODE(*pptr);
switch(meta) switch(meta)
{ {
default: /* Just skip over most items */ default: /* Just skip over most items */
if (meta < META_END) continue; /* Literal */
break; break;
/* This should never occur. */ /* This should never occur. */
@ -8007,7 +8010,7 @@ for (pptr += 1;; pptr++)
/* The extra data item length for each meta is in a table. */ /* The extra data item length for each meta is in a table. */
meta = (meta & 0x0fff0000u) >> 16; meta = (meta >> 16) & 0x7fff;
if (meta >= sizeof(meta_extra_lengths)) return NULL; if (meta >= sizeof(meta_extra_lengths)) return NULL;
pptr += meta_extra_lengths[meta]; pptr += meta_extra_lengths[meta];
} }
@ -8497,7 +8500,7 @@ cb->erroroffset = PCRE2_UNSET;
for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++) for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++)
{ {
if (*pptr < META_END) continue; /* Literal */ if (*pptr < META_END) continue; /* Literal */
switch (META_CODE(*pptr)) switch (META_CODE(*pptr))
{ {
default: default:

2
testdata/testinput5 vendored
View File

@ -1755,4 +1755,6 @@
/[\P{Yi}]/utf,locale=C /[\P{Yi}]/utf,locale=C
\x{2f000} \x{2f000}
/^(?<!(?=􃡜))/B,utf
# End of testinput5 # End of testinput5

13
testdata/testoutput5 vendored
View File

@ -4201,4 +4201,17 @@ Failed: error 125 at offset 2: lookbehind assertion is not fixed length
\x{2f000} \x{2f000}
0: \x{2f000} 0: \x{2f000}
/^(?<!(?=􃡜))/B,utf
------------------------------------------------------------------
Bra
^
AssertB not
Assert
\x{10385c}
Ket
Ket
Ket
End
------------------------------------------------------------------
# End of testinput5 # End of testinput5