Fix bug when a character > 0xffff appears in a lookbehind within a lookbehind.

This commit is contained in:
Philip.Hazel 2016-12-24 16:25:11 +00:00
parent 6c48775955
commit a7a25ed91d
4 changed files with 42 additions and 20 deletions

View File

@ -48,12 +48,12 @@ parenthesis item, not the length of the whole group. A length of zero is now
given only for a callout at the end of the pattern. Automatic callouts are no given only for a callout at the end of the pattern. Automatic callouts are no
longer inserted before and after explicit callouts in the pattern. longer inserted before and after explicit callouts in the pattern.
A number of bugs in the refactored code were subsequently fixed before release, A number of bugs in the refactored code were subsequently fixed during testing
but after the code was made available in the repository. Many of the bugs were before release, but after the code was made available in the repository. Many
discovered by fuzzing testing. Several of them were related to the change from of the bugs were discovered by fuzzing testing. Several of them were related to
assuming a zero-terminated pattern (which previously had required non-zero the change from assuming a zero-terminated pattern (which previously had
terminated strings to be copied). These bugs were never in released code, but required non-zero terminated strings to be copied). These bugs were never in
are noted here for the record. fully released code, but are noted here for the record.
(a) An overall recursion such as (?0) inside a lookbehind assertion was not (a) An overall recursion such as (?0) inside a lookbehind assertion was not
being diagnosed as an error. being diagnosed as an error.
@ -115,6 +115,10 @@ are noted here for the record.
check for a following assertion, leading to compiling bad code. Example: check for a following assertion, leading to compiling bad code. Example:
/(?(C'XX))?!XX/ /(?(C'XX))?!XX/
(r) If a character whose code point was greater than 0xffff appeared within
a lookbehind that was within another lookbehind, the calculation of the
lookbehind length went wrong and could provoke an internal error.
4. Back references are now permitted in lookbehind assertions when there are 4. Back references are now permitted in lookbehind assertions when there are
no duplicated group numbers (that is, (?| has not been used), and, if the no duplicated group numbers (that is, (?| has not been used), and, if the
reference is by name, there is only one group of that name. The referenced reference is by name, there is only one group of that name. The referenced

View File

@ -7924,6 +7924,7 @@ Arguments:
Returns: new value of pptr Returns: new value of pptr
NULL if META_END is reached - should never occur NULL if META_END is reached - should never occur
or for an unknown meta value - likewise
*/ */
static uint32_t * static uint32_t *
@ -7934,9 +7935,11 @@ uint32_t nestlevel = 0;
for (pptr += 1;; pptr++) for (pptr += 1;; pptr++)
{ {
uint32_t meta = META_CODE(*pptr); uint32_t meta = META_CODE(*pptr);
switch(meta) switch(meta)
{ {
default: /* Just skip over most items */ default: /* Just skip over most items */
if (meta < META_END) continue; /* Literal */
break; break;
/* This should never occur. */ /* This should never occur. */
@ -8007,7 +8010,7 @@ for (pptr += 1;; pptr++)
/* The extra data item length for each meta is in a table. */ /* The extra data item length for each meta is in a table. */
meta = (meta & 0x0fff0000u) >> 16; meta = (meta >> 16) & 0x7fff;
if (meta >= sizeof(meta_extra_lengths)) return NULL; if (meta >= sizeof(meta_extra_lengths)) return NULL;
pptr += meta_extra_lengths[meta]; pptr += meta_extra_lengths[meta];
} }

2
testdata/testinput5 vendored
View File

@ -1755,4 +1755,6 @@
/[\P{Yi}]/utf,locale=C /[\P{Yi}]/utf,locale=C
\x{2f000} \x{2f000}
/^(?<!(?=􃡜))/B,utf
# End of testinput5 # End of testinput5

13
testdata/testoutput5 vendored
View File

@ -4201,4 +4201,17 @@ Failed: error 125 at offset 2: lookbehind assertion is not fixed length
\x{2f000} \x{2f000}
0: \x{2f000} 0: \x{2f000}
/^(?<!(?=􃡜))/B,utf
------------------------------------------------------------------
Bra
^
AssertB not
Assert
\x{10385c}
Ket
Ket
Ket
End
------------------------------------------------------------------
# End of testinput5 # End of testinput5