Fix class bug when UCP but not UTF was set and all wide characters need to be

included.
This commit is contained in:
Philip.Hazel 2016-12-26 17:11:18 +00:00
parent a7a25ed91d
commit 8933d999d8
7 changed files with 39 additions and 1 deletions

View File

@ -255,6 +255,10 @@ This lifts a restriction on the number of branches in a group (more than about
38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern 38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern
auto_callout". auto_callout".
39. In a library with Unicode support, incorrect data was compiled for a
pattern with PCRE2_UCP set without PCRE2_UTF if a class required all wide
characters to match (for example, /[\s[:^ascii:]]/).
Version 10.22 29-July-2016 Version 10.22 29-July-2016
-------------------------- --------------------------

View File

@ -4927,9 +4927,13 @@ for (;; pptr++)
automatically handled by the use of OP_CLASS or OP_NCLASS, but an automatically handled by the use of OP_CLASS or OP_NCLASS, but an
explicit range is needed for OP_XCLASS. Setting a flag here explicit range is needed for OP_XCLASS. Setting a flag here
causes the range to be generated later when it is known that causes the range to be generated later when it is known that
OP_XCLASS is required. */ OP_XCLASS is required. In the 8-bit library this is relevant only in
utf mode, since no wide characters can exist otherwise. */
default: default:
#if PCRE2_CODE_UNIT_WIDTH == 8
if (utf)
#endif
match_all_or_no_wide_chars |= local_negate; match_all_or_no_wide_chars |= local_negate;
break; break;
} }
@ -5217,6 +5221,8 @@ for (;; pptr++)
all wide characters (depending on whether the whole class is or is not all wide characters (depending on whether the whole class is or is not
negated). This requirement is indicated by match_all_or_no_wide_chars being negated). This requirement is indicated by match_all_or_no_wide_chars being
true. We do this by including an explicit range, which works in both cases. true. We do this by including an explicit range, which works in both cases.
This applies only in UTF and 16-bit and 32-bit non-UTF modes, since there
cannot be any wide characters in 8-bit non-UTF mode.
When there *are* properties in a positive UTF-8 or any 16-bit or 32_bit When there *are* properties in a positive UTF-8 or any 16-bit or 32_bit
class where \S etc is present without PCRE2_UCP, causing an extended class class where \S etc is present without PCRE2_UCP, causing an extended class

View File

@ -456,4 +456,6 @@
/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/utf /(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/utf
/[\s[:^ascii:]]/B,ucp
# End of testinput10 # End of testinput10

View File

@ -358,4 +358,6 @@
\= Expect no match \= Expect no match
123 123
/[\s[:^ascii:]]/B,ucp
# End of testinput12 # End of testinput12

View File

@ -1567,4 +1567,12 @@ No match
/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/utf /(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/utf
Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
/[\s[:^ascii:]]/B,ucp
------------------------------------------------------------------
Bra
[\x80-\xff\p{Xsp}]
Ket
End
------------------------------------------------------------------
# End of testinput10 # End of testinput10

View File

@ -1407,4 +1407,12 @@ Subject length lower bound = 2
123 123
No match No match
/[\s[:^ascii:]]/B,ucp
------------------------------------------------------------------
Bra
[\x80-\xff\p{Xsp}\x{100}-\x{ffff}]
Ket
End
------------------------------------------------------------------
# End of testinput12 # End of testinput12

View File

@ -1401,4 +1401,12 @@ Failed: error -28: UTF-32 error: code points greater than 0x10ffff are not defin
123 123
No match No match
/[\s[:^ascii:]]/B,ucp
------------------------------------------------------------------
Bra
[\x80-\xff\p{Xsp}\x{100}-\x{ffffffff}]
Ket
End
------------------------------------------------------------------
# End of testinput12 # End of testinput12