Fix buffer overflow for recursive byname back reference when duplicate names

exist.
This commit is contained in:
Philip.Hazel 2015-05-15 17:09:01 +00:00
parent 92739ef5d8
commit 56444e9978
4 changed files with 64 additions and 34 deletions

View File

@ -26,12 +26,12 @@ pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect
error about an unsupported item. error about an unsupported item.
8. For some types of pattern, for example /Z*(|d*){216}/, the auto- 8. For some types of pattern, for example /Z*(|d*){216}/, the auto-
possessification code could take exponential time to complete. A recursion possessification code could take exponential time to complete. A recursion
depth limit of 1000 has been imposed to limit the resources used by this depth limit of 1000 has been imposed to limit the resources used by this
optimization. This infelicity was discovered by the LLVM fuzzer. optimization. This infelicity was discovered by the LLVM fuzzer.
9. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class 9. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class
such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored
because \S ensures they are all in the class. The code for doing this was because \S ensures they are all in the class. The code for doing this was
interacting badly with the code for computing the amount of space needed to interacting badly with the code for computing the amount of space needed to
compile the pattern, leading to a buffer overflow. This bug was discovered by compile the pattern, leading to a buffer overflow. This bug was discovered by
@ -45,21 +45,21 @@ discovered by the LLVM fuzzer.
between a subroutine call and its quantifier was incorrectly compiled, leading between a subroutine call and its quantifier was incorrectly compiled, leading
to buffer overflow or other errors. This bug was discovered by the LLVM fuzzer. to buffer overflow or other errors. This bug was discovered by the LLVM fuzzer.
12. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an 12. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an
assertion after (?(. The code was failing to check the character after (?(?< assertion after (?(. The code was failing to check the character after (?(?<
for the ! or = that would indicate a lookbehind assertion. This bug was for the ! or = that would indicate a lookbehind assertion. This bug was
discovered by the LLVM fuzzer. discovered by the LLVM fuzzer.
13. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with 13. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with
a fixed maximum following a group that contains a subroutine reference was a fixed maximum following a group that contains a subroutine reference was
incorrectly compiled and could trigger buffer overflow. This bug was discovered incorrectly compiled and could trigger buffer overflow. This bug was discovered
by the LLVM fuzzer. by the LLVM fuzzer.
14. Negative relative recursive references such as (?-7) to non-existent 14. Negative relative recursive references such as (?-7) to non-existent
subpatterns were not being diagnosed and could lead to unpredictable behaviour. subpatterns were not being diagnosed and could lead to unpredictable behaviour.
This bug was discovered by the LLVM fuzzer. This bug was discovered by the LLVM fuzzer.
15. The bug fixed in 14 was due to an integer variable that was unsigned when 15. The bug fixed in 14 was due to an integer variable that was unsigned when
it should have been signed. Some other "int" variables, having been checked, it should have been signed. Some other "int" variables, having been checked,
have either been changed to uint32_t or commented as "must be signed". have either been changed to uint32_t or commented as "must be signed".
@ -73,41 +73,41 @@ lookbehind assertion. This bug was discovered by the LLVM fuzzer.
18. There was a similar problem to 17 in pcre2test for global matches, though 18. There was a similar problem to 17 in pcre2test for global matches, though
the code there did catch the loop. the code there did catch the loop.
19. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*), 19. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*),
and a subsequent item in the pattern caused a non-match, backtracking over the and a subsequent item in the pattern caused a non-match, backtracking over the
repeated \X did not stop, but carried on past the start of the subject, causing repeated \X did not stop, but carried on past the start of the subject, causing
reference to random memory and/or a segfault. There were also some other cases reference to random memory and/or a segfault. There were also some other cases
where backtracking after \C could crash. This set of bugs was discovered by the where backtracking after \C could crash. This set of bugs was discovered by the
LLVM fuzzer. LLVM fuzzer.
20. The function for finding the minimum length of a matching string could take 20. The function for finding the minimum length of a matching string could take
a very long time if mutual recursion was present many times in a pattern, for a very long time if mutual recursion was present many times in a pattern, for
example, /((?2){73}(?2))((?1))/. A better mutual recursion detection method has example, /((?2){73}(?2))((?1))/. A better mutual recursion detection method has
been implemented. This infelicity was discovered by the LLVM fuzzer. been implemented. This infelicity was discovered by the LLVM fuzzer.
21. Implemented PCRE2_NEVER_BACKSLASH_C. 21. Implemented PCRE2_NEVER_BACKSLASH_C.
22. The feature for string replication in pcre2test could read from freed 22. The feature for string replication in pcre2test could read from freed
memory if the replication required a buffer to be extended, and it was not memory if the replication required a buffer to be extended, and it was not
working properly in 16-bit and 32-bit modes. This issue was discovered by a working properly in 16-bit and 32-bit modes. This issue was discovered by a
fuzzer: see http://lcamtuf.coredump.cx/afl/. fuzzer: see http://lcamtuf.coredump.cx/afl/.
23. Added the PCRE2_ALT_CIRCUMFLEX option. 23. Added the PCRE2_ALT_CIRCUMFLEX option.
24. Adjust the treatment of \8 and \9 to be the same as the current Perl 24. Adjust the treatment of \8 and \9 to be the same as the current Perl
behaviour. behaviour.
25. Static linking against the PCRE2 library using the pkg-config module was 25. Static linking against the PCRE2 library using the pkg-config module was
failing on missing pthread symbols. failing on missing pthread symbols.
26. If a group that contained a recursive back reference also contained a 26. If a group that contained a recursive back reference also contained a
forward reference subroutine call followed by a non-forward-reference forward reference subroutine call followed by a non-forward-reference
subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
compile correct code, leading to undefined behaviour or an internally detected compile correct code, leading to undefined behaviour or an internally detected
error. This bug was discovered by the LLVM fuzzer. error. This bug was discovered by the LLVM fuzzer.
27. Quantification of certain items (e.g. atomic back references) could cause 27. Quantification of certain items (e.g. atomic back references) could cause
incorrect code to be compiled when recursive forward references were involved. incorrect code to be compiled when recursive forward references were involved.
For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. This bug was For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. This bug was
discovered by the LLVM fuzzer. discovered by the LLVM fuzzer.
@ -115,6 +115,10 @@ discovered by the LLVM fuzzer.
a buffer overflow if there was more than one group with the given name. This a buffer overflow if there was more than one group with the given name. This
bug was discovered by the LLVM fuzzer. bug was discovered by the LLVM fuzzer.
29. A recursive back reference by name within a group that had the same name as
another group caused a buffer overflow. For example: /(?J)(?'d'(?'d'\g{d}))/.
This bug was discovered by the LLVM fuzzer.
Version 10.10 06-March-2015 Version 10.10 06-March-2015
--------------------------- ---------------------------

View File

@ -5946,18 +5946,34 @@ for (;; ptr++)
} }
/* The name table does not exist in the first pass; instead we must /* The name table does not exist in the first pass; instead we must
scan the list of names encountered so far in order to get the scan the list of names encountered so far in order to get a number.
number. If the name is not found, set the value to 0 for a forward If there are duplicates, there may be more than one number. For each
reference. */ one, if handling a back reference, we must check to see if it is
recursive, that is, it is inside the group that it references. A flag
is set so that the group can be made atomic. If the name is not
found, set the value of recno to 0 for a forward reference. */
recno = 0;
ng = cb->named_groups; ng = cb->named_groups;
for (i = 0; i < cb->names_found; i++, ng++) for (i = 0; i < cb->names_found; i++, ng++)
{ {
if (namelen == ng->length && if (namelen == ng->length &&
PRIV(strncmp)(name, ng->name, namelen) == 0) PRIV(strncmp)(name, ng->name, namelen) == 0)
break; {
open_capitem *oc;
recno = ng->number;
if (is_recurse) break;
for (oc = cb->open_caps; oc != NULL; oc = oc->next)
{
if (oc->number == recno)
{
oc->flag = TRUE;
break;
}
}
}
} }
recno = (i < cb->names_found)? ng->number : 0;
/* If duplicate names are permitted, we have to allow for a named /* If duplicate names are permitted, we have to allow for a named
reference to a duplicated name (this cannot be determined until the reference to a duplicated name (this cannot be determined until the
@ -6002,8 +6018,8 @@ for (;; ptr++)
if (is_recurse) goto HANDLE_RECURSION; if (is_recurse) goto HANDLE_RECURSION;
/* In the second pass we must see if the name is duplicated. If so, we /* For back references, in the second pass we must see if the name is
generate a different opcode. */ duplicated. If so, we generate a different opcode. */
if (lengthptr == NULL && cb->dupnames) if (lengthptr == NULL && cb->dupnames)
{ {
@ -6036,7 +6052,7 @@ for (;; ptr++)
cb->backref_map |= (recno < 32)? (1 << recno) : 1; cb->backref_map |= (recno < 32)? (1 << recno) : 1;
if ((uint32_t)recno > cb->top_backref) cb->top_backref = recno; if ((uint32_t)recno > cb->top_backref) cb->top_backref = recno;
/* Check to see if this back reference is recursive, that it, it /* Check to see if this back reference is recursive, that is, it
is inside the group that it references. A flag is set so that the is inside the group that it references. A flag is set so that the
group can be made atomic. */ group can be made atomic. */
@ -7138,7 +7154,7 @@ for (;;)
Because we are moving code along, we must ensure that any pending recursive Because we are moving code along, we must ensure that any pending recursive
or forward subroutine references are updated. In any event, remove the or forward subroutine references are updated. In any event, remove the
block from the chain. */ block from the chain. */
if (capnumber > 0) if (capnumber > 0)
{ {
if (cb->open_caps->flag) if (cb->open_caps->flag)
@ -8050,6 +8066,7 @@ at this stage. */
#ifdef CALL_PRINTINT #ifdef CALL_PRINTINT
pcre2_printint(re, stderr, TRUE); pcre2_printint(re, stderr, TRUE);
fprintf(stderr, "Length=%lu Used=%lu\n", length, usedlength);
#endif #endif
/* Fill in any forward references that are required. There may be repeated /* Fill in any forward references that are required. There may be repeated

4
testdata/testinput2 vendored
View File

@ -4306,4 +4306,8 @@ a random value. /Ix
/$(&.+[\p{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(?(R)){0,6}?|){12\x8a\X*?\x8a\x0b\xd1^9\3*+(\xc1,\k'P'\xb4)\xcc(z\z(?JJ)(?''8};(\x0b\xd1^9\?'3*+(\xc1.]k+\x0b'Pm'\xb4\xcc4'\xd1'(?''))?-%--\x95$9*\4'|\xd1(''%\x95*$9)#(?'R')3\x07?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+!~:(?)''(d'E:yD!\s(?'R'\x1e;\x10:U))|')g!\xb0*){29+))#(?'P'})*?/ /$(&.+[\p{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(?(R)){0,6}?|){12\x8a\X*?\x8a\x0b\xd1^9\3*+(\xc1,\k'P'\xb4)\xcc(z\z(?JJ)(?''8};(\x0b\xd1^9\?'3*+(\xc1.]k+\x0b'Pm'\xb4\xcc4'\xd1'(?''))?-%--\x95$9*\4'|\xd1(''%\x95*$9)#(?'R')3\x07?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+!~:(?)''(d'E:yD!\s(?'R'\x1e;\x10:U))|')g!\xb0*){29+))#(?'P'})*?/
"\xa\xf<(.\pZ*\P{Xwd}+^\xa8\3'3yq.::?(?J:()\xd1+!~:3'(8?:)':(?'d'(?'d'^u]!.+.+\\A\Ah(n+?9){7}+\K;(?''u'(?'c'(?'z'(?<y>\xb::\xf0'|\xd3(\xae?'w(z\x8?P>l)\x8?P>a)'\H\R\xd1+!!~:3'(?:h$N{26875}\W+?\\=D{2}\x89(?i:Uy0\N({2\xa(\v\x85*){y*\A(()\p{L}+?\P{^Xan}'+?\xff\+pS\?|).{;y*\A(()\p{L}+?\8}\d?1(|)(/1){7}.+[Lp{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(\xbf(R))\x8a\X*?\x8a\xb\xd1^9\3*+(\xc1,\k'R'\xb4)\xcc(z\z(?J)(?''\x1b(\xb\xd1^9\?'3*+P{^Xan}+?\xff\+(\xc1.]k+\xb'Pm'\xb4)\xcc4f\xa7'\xd1V(?i:U,{2,2})'(?''))?-%--\x95$9*\4'|\xd1(\x9c''%\x94$9)#(?'R')3\x7?('P\xed7'\xa8\xb1^u\xeaw\1\0\0\(|(?1){7}.+[\p{Me}].\s\xdcC*^\x14?(?(<y>))(?<!^)$C((;*?(R*?))+(?(R)\x8a\X*?\x8a\xb\xd1^9\3*+|(\xc1,\k'R'\xb4)\xcc! z)\z(?JJ)(?'';(\xb\xd1^9\?'3*+(\xc1.]k+\xb'Pm'\xb4))':(?'d')(?'RD'(d')|)|$)'|(?<x>\g{d});\g{x}\x11\g{d}\x81\|$((?''\'X'(?'W''\x92()'9'\x83*))\xba*\!?^ <){)':;\xcc4'\xd1'(?''28))?-%--\x95$9*\4'|\xd1((''e\x94*$9:)*#(?'R')3)\x7?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+0!~:(?)'d'E:yD!\s(?'R'\x1e;\x10:U))|'\x9g!\xb0*){)\\x16:;()\x1e\x10\x87*:(?<y>)\xd1+!~:(?)'}'\d'E:yD!\s(?'R'\x1e;\x10:U))|'))|)g!\xb0*R+9{29+)#(?'P'})*?pS\{3,}\x85,{0,}l{*UTF)(\xe{7}){3722,{9,}d{2,?|))|{)\(A?&d}}{\xa,}2}){3,}7,l{)22}(,}l:7{2,4}}29\x19+)#?'P'})*v?))\x5"
"(?J)(?'d'(?'d'\g{d}))"
# End of testinput2 # End of testinput2

View File

@ -14405,4 +14405,9 @@ Failed: error 115 at offset 26: reference to non-existent subpattern
/$(&.+[\p{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(?(R)){0,6}?|){12\x8a\X*?\x8a\x0b\xd1^9\3*+(\xc1,\k'P'\xb4)\xcc(z\z(?JJ)(?''8};(\x0b\xd1^9\?'3*+(\xc1.]k+\x0b'Pm'\xb4\xcc4'\xd1'(?''))?-%--\x95$9*\4'|\xd1(''%\x95*$9)#(?'R')3\x07?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+!~:(?)''(d'E:yD!\s(?'R'\x1e;\x10:U))|')g!\xb0*){29+))#(?'P'})*?/ /$(&.+[\p{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(?(R)){0,6}?|){12\x8a\X*?\x8a\x0b\xd1^9\3*+(\xc1,\k'P'\xb4)\xcc(z\z(?JJ)(?''8};(\x0b\xd1^9\?'3*+(\xc1.]k+\x0b'Pm'\xb4\xcc4'\xd1'(?''))?-%--\x95$9*\4'|\xd1(''%\x95*$9)#(?'R')3\x07?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+!~:(?)''(d'E:yD!\s(?'R'\x1e;\x10:U))|')g!\xb0*){29+))#(?'P'})*?/
"\xa\xf<(.\pZ*\P{Xwd}+^\xa8\3'3yq.::?(?J:()\xd1+!~:3'(8?:)':(?'d'(?'d'^u]!.+.+\\A\Ah(n+?9){7}+\K;(?''u'(?'c'(?'z'(?<y>\xb::\xf0'|\xd3(\xae?'w(z\x8?P>l)\x8?P>a)'\H\R\xd1+!!~:3'(?:h$N{26875}\W+?\\=D{2}\x89(?i:Uy0\N({2\xa(\v\x85*){y*\A(()\p{L}+?\P{^Xan}'+?\xff\+pS\?|).{;y*\A(()\p{L}+?\8}\d?1(|)(/1){7}.+[Lp{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(\xbf(R))\x8a\X*?\x8a\xb\xd1^9\3*+(\xc1,\k'R'\xb4)\xcc(z\z(?J)(?''\x1b(\xb\xd1^9\?'3*+P{^Xan}+?\xff\+(\xc1.]k+\xb'Pm'\xb4)\xcc4f\xa7'\xd1V(?i:U,{2,2})'(?''))?-%--\x95$9*\4'|\xd1(\x9c''%\x94$9)#(?'R')3\x7?('P\xed7'\xa8\xb1^u\xeaw\1\0\0\(|(?1){7}.+[\p{Me}].\s\xdcC*^\x14?(?(<y>))(?<!^)$C((;*?(R*?))+(?(R)\x8a\X*?\x8a\xb\xd1^9\3*+|(\xc1,\k'R'\xb4)\xcc! z)\z(?JJ)(?'';(\xb\xd1^9\?'3*+(\xc1.]k+\xb'Pm'\xb4))':(?'d')(?'RD'(d')|)|$)'|(?<x>\g{d});\g{x}\x11\g{d}\x81\|$((?''\'X'(?'W''\x92()'9'\x83*))\xba*\!?^ <){)':;\xcc4'\xd1'(?''28))?-%--\x95$9*\4'|\xd1((''e\x94*$9:)*#(?'R')3)\x7?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+0!~:(?)'d'E:yD!\s(?'R'\x1e;\x10:U))|'\x9g!\xb0*){)\\x16:;()\x1e\x10\x87*:(?<y>)\xd1+!~:(?)'}'\d'E:yD!\s(?'R'\x1e;\x10:U))|'))|)g!\xb0*R+9{29+)#(?'P'})*?pS\{3,}\x85,{0,}l{*UTF)(\xe{7}){3722,{9,}d{2,?|))|{)\(A?&d}}{\xa,}2}){3,}7,l{)22}(,}l:7{2,4}}29\x19+)#?'P'})*v?))\x5"
Failed: error 122 at offset 1221: unmatched closing parenthesis
"(?J)(?'d'(?'d'\g{d}))"
# End of testinput2 # End of testinput2