Fix bug when a character > 0xffff appears in a lookbehind within a lookbehind.
This commit is contained in:
parent
6c48775955
commit
a7a25ed91d
40
ChangeLog
40
ChangeLog
|
@ -48,12 +48,12 @@ parenthesis item, not the length of the whole group. A length of zero is now
|
||||||
given only for a callout at the end of the pattern. Automatic callouts are no
|
given only for a callout at the end of the pattern. Automatic callouts are no
|
||||||
longer inserted before and after explicit callouts in the pattern.
|
longer inserted before and after explicit callouts in the pattern.
|
||||||
|
|
||||||
A number of bugs in the refactored code were subsequently fixed before release,
|
A number of bugs in the refactored code were subsequently fixed during testing
|
||||||
but after the code was made available in the repository. Many of the bugs were
|
before release, but after the code was made available in the repository. Many
|
||||||
discovered by fuzzing testing. Several of them were related to the change from
|
of the bugs were discovered by fuzzing testing. Several of them were related to
|
||||||
assuming a zero-terminated pattern (which previously had required non-zero
|
the change from assuming a zero-terminated pattern (which previously had
|
||||||
terminated strings to be copied). These bugs were never in released code, but
|
required non-zero terminated strings to be copied). These bugs were never in
|
||||||
are noted here for the record.
|
fully released code, but are noted here for the record.
|
||||||
|
|
||||||
(a) An overall recursion such as (?0) inside a lookbehind assertion was not
|
(a) An overall recursion such as (?0) inside a lookbehind assertion was not
|
||||||
being diagnosed as an error.
|
being diagnosed as an error.
|
||||||
|
@ -107,13 +107,17 @@ are noted here for the record.
|
||||||
followed by '?' or '+', and there was at least one literal character
|
followed by '?' or '+', and there was at least one literal character
|
||||||
between them, an internal error "unexpected repeat" occurred (example:
|
between them, an internal error "unexpected repeat" occurred (example:
|
||||||
/.+\QX\E+/).
|
/.+\QX\E+/).
|
||||||
|
|
||||||
(p) A buffer overflow could occur while sorting the names in the group name
|
(p) A buffer overflow could occur while sorting the names in the group name
|
||||||
list (depending on the order in which the names were seen).
|
list (depending on the order in which the names were seen).
|
||||||
|
|
||||||
(q) A conditional group that started with a callout was not doing the right
|
(q) A conditional group that started with a callout was not doing the right
|
||||||
check for a following assertion, leading to compiling bad code. Example:
|
check for a following assertion, leading to compiling bad code. Example:
|
||||||
/(?(C'XX))?!XX/
|
/(?(C'XX))?!XX/
|
||||||
|
|
||||||
|
(r) If a character whose code point was greater than 0xffff appeared within
|
||||||
|
a lookbehind that was within another lookbehind, the calculation of the
|
||||||
|
lookbehind length went wrong and could provoke an internal error.
|
||||||
|
|
||||||
4. Back references are now permitted in lookbehind assertions when there are
|
4. Back references are now permitted in lookbehind assertions when there are
|
||||||
no duplicated group numbers (that is, (?| has not been used), and, if the
|
no duplicated group numbers (that is, (?| has not been used), and, if the
|
||||||
|
@ -231,24 +235,24 @@ followed by a caseful back reference, could lose the caselessness of the first
|
||||||
repeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX
|
repeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX
|
||||||
but didn't).
|
but didn't).
|
||||||
|
|
||||||
35. When a pattern is too complicated, PCRE2 gives up trying to find a minimum
|
35. When a pattern is too complicated, PCRE2 gives up trying to find a minimum
|
||||||
matching length and just records zero. Typically this happens when there are
|
matching length and just records zero. Typically this happens when there are
|
||||||
too many nested or recursive back references. If the limit was reached in
|
too many nested or recursive back references. If the limit was reached in
|
||||||
certain recursive cases it failed to be triggered and an internal error could
|
certain recursive cases it failed to be triggered and an internal error could
|
||||||
be the result.
|
be the result.
|
||||||
|
|
||||||
36. The pcre2_dfa_match() function now takes note of the recursion limit for
|
36. The pcre2_dfa_match() function now takes note of the recursion limit for
|
||||||
the internal recursive calls that are used for lookrounds and recursions within
|
the internal recursive calls that are used for lookrounds and recursions within
|
||||||
the pattern.
|
the pattern.
|
||||||
|
|
||||||
37. More refactoring has got rid of the internal could_be_empty_branch()
|
37. More refactoring has got rid of the internal could_be_empty_branch()
|
||||||
function (around 400 lines of code, including comments) by keeping track of
|
function (around 400 lines of code, including comments) by keeping track of
|
||||||
could-be-emptiness as the pattern is compiled instead of scanning compiled
|
could-be-emptiness as the pattern is compiled instead of scanning compiled
|
||||||
groups. (This would have been much harder before the refactoring of #3 above.)
|
groups. (This would have been much harder before the refactoring of #3 above.)
|
||||||
This lifts a restriction on the number of branches in a group (more than about
|
This lifts a restriction on the number of branches in a group (more than about
|
||||||
1100 would give "pattern is too complicated").
|
1100 would give "pattern is too complicated").
|
||||||
|
|
||||||
38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern
|
38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern
|
||||||
auto_callout".
|
auto_callout".
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -7924,6 +7924,7 @@ Arguments:
|
||||||
|
|
||||||
Returns: new value of pptr
|
Returns: new value of pptr
|
||||||
NULL if META_END is reached - should never occur
|
NULL if META_END is reached - should never occur
|
||||||
|
or for an unknown meta value - likewise
|
||||||
*/
|
*/
|
||||||
|
|
||||||
static uint32_t *
|
static uint32_t *
|
||||||
|
@ -7934,9 +7935,11 @@ uint32_t nestlevel = 0;
|
||||||
for (pptr += 1;; pptr++)
|
for (pptr += 1;; pptr++)
|
||||||
{
|
{
|
||||||
uint32_t meta = META_CODE(*pptr);
|
uint32_t meta = META_CODE(*pptr);
|
||||||
|
|
||||||
switch(meta)
|
switch(meta)
|
||||||
{
|
{
|
||||||
default: /* Just skip over most items */
|
default: /* Just skip over most items */
|
||||||
|
if (meta < META_END) continue; /* Literal */
|
||||||
break;
|
break;
|
||||||
|
|
||||||
/* This should never occur. */
|
/* This should never occur. */
|
||||||
|
@ -8007,7 +8010,7 @@ for (pptr += 1;; pptr++)
|
||||||
|
|
||||||
/* The extra data item length for each meta is in a table. */
|
/* The extra data item length for each meta is in a table. */
|
||||||
|
|
||||||
meta = (meta & 0x0fff0000u) >> 16;
|
meta = (meta >> 16) & 0x7fff;
|
||||||
if (meta >= sizeof(meta_extra_lengths)) return NULL;
|
if (meta >= sizeof(meta_extra_lengths)) return NULL;
|
||||||
pptr += meta_extra_lengths[meta];
|
pptr += meta_extra_lengths[meta];
|
||||||
}
|
}
|
||||||
|
@ -8497,7 +8500,7 @@ cb->erroroffset = PCRE2_UNSET;
|
||||||
for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++)
|
for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++)
|
||||||
{
|
{
|
||||||
if (*pptr < META_END) continue; /* Literal */
|
if (*pptr < META_END) continue; /* Literal */
|
||||||
|
|
||||||
switch (META_CODE(*pptr))
|
switch (META_CODE(*pptr))
|
||||||
{
|
{
|
||||||
default:
|
default:
|
||||||
|
|
|
@ -1755,4 +1755,6 @@
|
||||||
/[\P{Yi}]/utf,locale=C
|
/[\P{Yi}]/utf,locale=C
|
||||||
\x{2f000}
|
\x{2f000}
|
||||||
|
|
||||||
|
/^(?<!(?=))/B,utf
|
||||||
|
|
||||||
# End of testinput5
|
# End of testinput5
|
||||||
|
|
|
@ -4201,4 +4201,17 @@ Failed: error 125 at offset 2: lookbehind assertion is not fixed length
|
||||||
\x{2f000}
|
\x{2f000}
|
||||||
0: \x{2f000}
|
0: \x{2f000}
|
||||||
|
|
||||||
|
/^(?<!(?=))/B,utf
|
||||||
|
------------------------------------------------------------------
|
||||||
|
Bra
|
||||||
|
^
|
||||||
|
AssertB not
|
||||||
|
Assert
|
||||||
|
\x{10385c}
|
||||||
|
Ket
|
||||||
|
Ket
|
||||||
|
Ket
|
||||||
|
End
|
||||||
|
------------------------------------------------------------------
|
||||||
|
|
||||||
# End of testinput5
|
# End of testinput5
|
||||||
|
|
Loading…
Reference in New Issue