pcre2/testdata/testoutput10

1603 lines
42 KiB
Plaintext
Raw Blame History

# This set of tests is for UTF-8 support and Unicode property support, with
# relevance only for the 8-bit library.
/X(\C{3})/utf
X\x{1234}
0: X\x{1234}
1: \x{1234}
/X(\C{4})/utf
X\x{1234}YZ
0: X\x{1234}Y
1: \x{1234}Y
/X\C*/utf
XYZabcdce
0: XYZabcdce
/X\C*?/utf
XYZabcde
0: X
/X\C{3,5}/utf
Xabcdefg
0: Xabcde
X\x{1234}
0: X\x{1234}
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
0: X\x{1234}\x{512}
X\x{1234}\x{512}YZ
0: X\x{1234}\x{512}
/X\C{3,5}?/utf
Xabcdefg
0: Xabc
X\x{1234}
0: X\x{1234}
X\x{1234}YZ
0: X\x{1234}
X\x{1234}\x{512}
0: X\x{1234}
/a\Cb/utf
aXb
0: aXb
a\nb
0: a\x{0a}b
/a\C\Cb/utf
a\x{100}b
0: a\x{100}b
/ab\Cde/utf
abXde
0: abXde
/a\C\Cb/utf
a\x{100}b
0: a\x{100}b
** Failers
No match
a\x{12257}b
No match
/[Ã]/utf
Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80
/Ã/utf
Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end
/ÃÃÃxxx/utf
Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80
/badutf/utf
\xdf
Failed: error -3: UTF-8 error: 1 byte missing at end
\xef
Failed: error -4: UTF-8 error: 2 bytes missing at end
\xef\x80
Failed: error -3: UTF-8 error: 1 byte missing at end
\xf7
Failed: error -5: UTF-8 error: 3 bytes missing at end
\xf7\x80
Failed: error -4: UTF-8 error: 2 bytes missing at end
\xf7\x80\x80
Failed: error -3: UTF-8 error: 1 byte missing at end
\xfb
Failed: error -6: UTF-8 error: 4 bytes missing at end
\xfb\x80
Failed: error -5: UTF-8 error: 3 bytes missing at end
\xfb\x80\x80
Failed: error -4: UTF-8 error: 2 bytes missing at end
\xfb\x80\x80\x80
Failed: error -3: UTF-8 error: 1 byte missing at end
\xfd
Failed: error -7: UTF-8 error: 5 bytes missing at end
\xfd\x80
Failed: error -6: UTF-8 error: 4 bytes missing at end
\xfd\x80\x80
Failed: error -5: UTF-8 error: 3 bytes missing at end
\xfd\x80\x80\x80
Failed: error -4: UTF-8 error: 2 bytes missing at end
\xfd\x80\x80\x80\x80
Failed: error -3: UTF-8 error: 1 byte missing at end
\xdf\x7f
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80
\xef\x7f\x80
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80
\xef\x80\x7f
Failed: error -9: UTF-8 error: byte 3 top bits not 0x80
\xf7\x7f\x80\x80
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80
\xf7\x80\x7f\x80
Failed: error -9: UTF-8 error: byte 3 top bits not 0x80
\xf7\x80\x80\x7f
Failed: error -10: UTF-8 error: byte 4 top bits not 0x80
\xfb\x7f\x80\x80\x80
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80
\xfb\x80\x7f\x80\x80
Failed: error -9: UTF-8 error: byte 3 top bits not 0x80
\xfb\x80\x80\x7f\x80
Failed: error -10: UTF-8 error: byte 4 top bits not 0x80
\xfb\x80\x80\x80\x7f
Failed: error -11: UTF-8 error: byte 5 top bits not 0x80
\xfd\x7f\x80\x80\x80\x80
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80
\xfd\x80\x7f\x80\x80\x80
Failed: error -9: UTF-8 error: byte 3 top bits not 0x80
\xfd\x80\x80\x7f\x80\x80
Failed: error -10: UTF-8 error: byte 4 top bits not 0x80
\xfd\x80\x80\x80\x7f\x80
Failed: error -11: UTF-8 error: byte 5 top bits not 0x80
\xfd\x80\x80\x80\x80\x7f
Failed: error -12: UTF-8 error: byte 6 top bits not 0x80
\xed\xa0\x80
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined
\xc0\x8f
Failed: error -17: UTF-8 error: overlong 2-byte sequence
\xe0\x80\x8f
Failed: error -18: UTF-8 error: overlong 3-byte sequence
\xf0\x80\x80\x8f
Failed: error -19: UTF-8 error: overlong 4-byte sequence
\xf8\x80\x80\x80\x8f
Failed: error -20: UTF-8 error: overlong 5-byte sequence
\xfc\x80\x80\x80\x80\x8f
Failed: error -21: UTF-8 error: overlong 6-byte sequence
\x80
Failed: error -22: UTF-8 error: isolated 0x80 byte
\xfe
Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff)
\xff
Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff)
/badutf/utf
\xfb\x80\x80\x80\x80
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629)
\xfd\x80\x80\x80\x80\x80
Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629)
\xf7\xbf\xbf\xbf
Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined
/shortutf/utf
\xdf\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end
\xef\=ph
Failed: error -4: UTF-8 error: 2 bytes missing at end
\xef\x80\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end
\xf7\=ph
Failed: error -5: UTF-8 error: 3 bytes missing at end
\xf7\x80\=ph
Failed: error -4: UTF-8 error: 2 bytes missing at end
\xf7\x80\x80\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end
\xfb\=ph
Failed: error -6: UTF-8 error: 4 bytes missing at end
\xfb\x80\=ph
Failed: error -5: UTF-8 error: 3 bytes missing at end
\xfb\x80\x80\=ph
Failed: error -4: UTF-8 error: 2 bytes missing at end
\xfb\x80\x80\x80\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end
\xfd\=ph
Failed: error -7: UTF-8 error: 5 bytes missing at end
\xfd\x80\=ph
Failed: error -6: UTF-8 error: 4 bytes missing at end
\xfd\x80\x80\=ph
Failed: error -5: UTF-8 error: 3 bytes missing at end
\xfd\x80\x80\x80\=ph
Failed: error -4: UTF-8 error: 2 bytes missing at end
\xfd\x80\x80\x80\x80\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end
/anything/utf
\xc0\x80
Failed: error -17: UTF-8 error: overlong 2-byte sequence
\xc1\x8f
Failed: error -17: UTF-8 error: overlong 2-byte sequence
\xe0\x9f\x80
Failed: error -18: UTF-8 error: overlong 3-byte sequence
\xf0\x8f\x80\x80
Failed: error -19: UTF-8 error: overlong 4-byte sequence
\xf8\x87\x80\x80\x80
Failed: error -20: UTF-8 error: overlong 5-byte sequence
\xfc\x83\x80\x80\x80\x80
Failed: error -21: UTF-8 error: overlong 6-byte sequence
\xfe\x80\x80\x80\x80\x80
Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff)
\xff\x80\x80\x80\x80\x80
Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff)
\xc3\x8f
No match
\xe0\xaf\x80
No match
\xe1\x80\x80
No match
\xf0\x9f\x80\x80
No match
\xf1\x8f\x80\x80
No match
\xf8\x88\x80\x80\x80
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629)
\xf9\x87\x80\x80\x80
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629)
\xfc\x84\x80\x80\x80\x80
Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629)
\xfd\x83\x80\x80\x80\x80
Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629)
\xf8\x88\x80\x80\x80\=no_utf_check
No match
\xf9\x87\x80\x80\x80\=no_utf_check
No match
\xfc\x84\x80\x80\x80\x80\=no_utf_check
No match
\xfd\x83\x80\x80\x80\x80\=no_utf_check
No match
/\x{100}/IB,utf
------------------------------------------------------------------
Bra
\x{100}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
Subject length lower bound = 1
/\x{1000}/IB,utf
------------------------------------------------------------------
Bra
\x{1000}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xe1
Last code unit = \x80
Subject length lower bound = 1
/\x{10000}/IB,utf
------------------------------------------------------------------
Bra
\x{10000}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xf0
Last code unit = \x80
Subject length lower bound = 1
/\x{100000}/IB,utf
------------------------------------------------------------------
Bra
\x{100000}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xf4
Last code unit = \x80
Subject length lower bound = 1
/\x{10ffff}/IB,utf
------------------------------------------------------------------
Bra
\x{10ffff}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xf4
Last code unit = \xbf
Subject length lower bound = 1
/[\x{ff}]/IB,utf
------------------------------------------------------------------
Bra
\x{ff}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc3
Last code unit = \xbf
Subject length lower bound = 1
/[\x{100}]/IB,utf
------------------------------------------------------------------
Bra
\x{100}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
Subject length lower bound = 1
/\x80/IB,utf
------------------------------------------------------------------
Bra
\x{80}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc2
Last code unit = \x80
Subject length lower bound = 1
/\xff/IB,utf
------------------------------------------------------------------
Bra
\x{ff}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc3
Last code unit = \xbf
Subject length lower bound = 1
/\x{D55c}\x{ad6d}\x{C5B4}/IB,utf
------------------------------------------------------------------
Bra
\x{d55c}\x{ad6d}\x{c5b4}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xed
Last code unit = \xb4
Subject length lower bound = 3
\x{D55c}\x{ad6d}\x{C5B4}
0: \x{d55c}\x{ad6d}\x{c5b4}
/\x{65e5}\x{672c}\x{8a9e}/IB,utf
------------------------------------------------------------------
Bra
\x{65e5}\x{672c}\x{8a9e}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xe6
Last code unit = \x9e
Subject length lower bound = 3
\x{65e5}\x{672c}\x{8a9e}
0: \x{65e5}\x{672c}\x{8a9e}
/\x{80}/IB,utf
------------------------------------------------------------------
Bra
\x{80}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc2
Last code unit = \x80
Subject length lower bound = 1
/\x{084}/IB,utf
------------------------------------------------------------------
Bra
\x{84}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc2
Last code unit = \x84
Subject length lower bound = 1
/\x{104}/IB,utf
------------------------------------------------------------------
Bra
\x{104}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc4
Last code unit = \x84
Subject length lower bound = 1
/\x{861}/IB,utf
------------------------------------------------------------------
Bra
\x{861}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xe0
Last code unit = \xa1
Subject length lower bound = 1
/\x{212ab}/IB,utf
------------------------------------------------------------------
Bra
\x{212ab}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xf0
Last code unit = \xab
Subject length lower bound = 1
# This one is here not because it's different to Perl, but because the way
# the captured single-byte is displayed. (In Perl it becomes a character, and you
# can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
0: X\x{1234}
1: \x{e1}
2: \x{88}\x{b4}
X\nabc
0: X\x{0a}abc
1: \x{0a}
2: abc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
*** Failers
No match
a\x{100}b
No match
/[^ab\xC0-\xF0]/IB,utf
------------------------------------------------------------------
Bra
[\x00-`c-\xbf\xf1-\xff] (neg)
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4
5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y
Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f
\xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0
\xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf
\xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee
\xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd
\xfe \xff
No last code unit
Subject length lower bound = 1
\x{f1}
0: \x{f1}
\x{bf}
0: \x{bf}
\x{100}
0: \x{100}
\x{1000}
0: \x{1000}
*** Failers
0: *
\x{c0}
No match
\x{f0}
No match
/Ä€{3,4}/IB,utf
------------------------------------------------------------------
Bra
\x{100}{3}
\x{100}?+
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
Subject length lower bound = 3
\x{100}\x{100}\x{100}\x{100\x{100}
0: \x{100}\x{100}\x{100}
/(\x{100}+|x)/IB,utf
------------------------------------------------------------------
Bra
CBra 1
\x{100}++
Alt
x
Ket
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: utf
Starting code units: x \xc4
No last code unit
Subject length lower bound = 1
/(\x{100}*a|x)/IB,utf
------------------------------------------------------------------
Bra
CBra 1
\x{100}*+
a
Alt
x
Ket
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: utf
Starting code units: a x \xc4
No last code unit
Subject length lower bound = 1
/(\x{100}{0,2}a|x)/IB,utf
------------------------------------------------------------------
Bra
CBra 1
\x{100}{0,2}+
a
Alt
x
Ket
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: utf
Starting code units: a x \xc4
No last code unit
Subject length lower bound = 1
/(\x{100}{1,2}a|x)/IB,utf
------------------------------------------------------------------
Bra
CBra 1
\x{100}
\x{100}{0,1}+
a
Alt
x
Ket
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: utf
Starting code units: x \xc4
No last code unit
Subject length lower bound = 1
/\x{100}/IB,utf
------------------------------------------------------------------
Bra
\x{100}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
Subject length lower bound = 1
/a\x{100}\x{101}*/IB,utf
------------------------------------------------------------------
Bra
a\x{100}
\x{101}*+
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = 'a'
Last code unit = \x80
Subject length lower bound = 2
/a\x{100}\x{101}+/IB,utf
------------------------------------------------------------------
Bra
a\x{100}
\x{101}++
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = 'a'
Last code unit = \x81
Subject length lower bound = 3
/[^\x{c4}]/IB
------------------------------------------------------------------
Bra
[^\x{c4}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first code unit
No last code unit
Subject length lower bound = 1
/[\x{100}]/IB,utf
------------------------------------------------------------------
Bra
\x{100}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
Subject length lower bound = 1
\x{100}
0: \x{100}
Z\x{100}
0: \x{100}
\x{100}Z
0: \x{100}
*** Failers
No match
/[\xff]/IB,utf
------------------------------------------------------------------
Bra
\x{ff}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc3
Last code unit = \xbf
Subject length lower bound = 1
>\x{ff}<
0: \x{ff}
/[^\xff]/IB,utf
------------------------------------------------------------------
Bra
[^\x{ff}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
No first code unit
No last code unit
Subject length lower bound = 1
/\x{100}abc(xyz(?1))/IB,utf
------------------------------------------------------------------
Bra
\x{100}abc
CBra 1
xyz
Recurse
Ket
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: utf
First code unit = \xc4
Last code unit = 'z'
Subject length lower bound = 7
/a\x{1234}b/utf,posix
a\x{1234}b
0: a\x{1234}b
/\777/I,utf
Capturing subpattern count = 0
Options: utf
First code unit = \xc7
Last code unit = \xbf
Subject length lower bound = 1
\x{1ff}
0: \x{1ff}
\777
0: \x{1ff}
/\x{100}+\x{200}/IB,utf
------------------------------------------------------------------
Bra
\x{100}++
\x{200}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
Subject length lower bound = 2
/\x{100}+X/IB,utf
------------------------------------------------------------------
Bra
\x{100}++
X
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc4
Last code unit = 'X'
Subject length lower bound = 2
/^[\QÄ€\E-\QÅ<51>\E/B,utf
Failed: error 106 at offset 15: missing terminating ] for character class
# This tests the stricter UTF-8 check according to RFC 3629.
/X/utf
\x{d800}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined
\x{d800}\=no_utf_check
No match
\x{da00}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined
\x{da00}\=no_utf_check
No match
\x{dfff}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined
\x{dfff}\=no_utf_check
No match
\x{110000}
Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined
\x{110000}\=no_utf_check
No match
\x{2000000}
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629)
\x{2000000}\=no_utf_check
No match
\x{7fffffff}
Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629)
\x{7fffffff}\=no_utf_check
No match
/(*UTF8)\x{1234}/
abcd\x{1234}pqr
0: \x{1234}
/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I
Capturing subpattern count = 0
Compile options: <none>
Overall options: utf
Forced newline is CRLF
First code unit = 'a'
Last code unit = 'b'
Subject length lower bound = 3
/\h/I,utf
Capturing subpattern count = 0
Options: utf
Starting code units: \x09 \x20 \xc2 \xe1 \xe2 \xe3
No last code unit
Subject length lower bound = 1
ABC\x{09}
0: \x{09}
ABC\x{20}
0:
ABC\x{a0}
0: \x{a0}
ABC\x{1680}
0: \x{1680}
ABC\x{180e}
0: \x{180e}
ABC\x{2000}
0: \x{2000}
ABC\x{202f}
0: \x{202f}
ABC\x{205f}
0: \x{205f}
ABC\x{3000}
0: \x{3000}
/\v/I,utf
Capturing subpattern count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
No last code unit
Subject length lower bound = 1
ABC\x{0a}
0: \x{0a}
ABC\x{0b}
0: \x{0b}
ABC\x{0c}
0: \x{0c}
ABC\x{0d}
0: \x{0d}
ABC\x{85}
0: \x{85}
ABC\x{2028}
0: \x{2028}
/\h*A/I,utf
Capturing subpattern count = 0
Options: utf
Starting code units: \x09 \x20 A \xc2 \xe1 \xe2 \xe3
Last code unit = 'A'
Subject length lower bound = 1
CDBABC
0: A
/\v+A/I,utf
Capturing subpattern count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
Last code unit = 'A'
Subject length lower bound = 2
/\s?xxx\s/I,utf
Capturing subpattern count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x
Last code unit = 'x'
Subject length lower bound = 4
/\sxxx\s/I,utf,tables=2
Capturing subpattern count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc2
Last code unit = 'x'
Subject length lower bound = 5
AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ
0: \x{a0}xxx\x{85}
/\S \S/I,utf,tables=2
Capturing subpattern count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
\x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4
\xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3
\xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2
\xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1
\xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
Last code unit = ' '
Subject length lower bound = 3
\x{a2} \x{84}
0: \x{a2} \x{84}
A Z
0: A Z
/a+/utf
a\x{123}aa\=offset=1
0: aa
a\x{123}aa\=offset=2
Error -36 (bad UTF-8 offset)
a\x{123}aa\=offset=3
0: aa
a\x{123}aa\=offset=4
0: a
a\x{123}aa\=offset=5
No match
a\x{123}aa\=offset=6
Failed: error -34: bad offset value
/\x{1234}+/Ii,utf
Capturing subpattern count = 0
Options: caseless utf
Starting code units: \xe1
No last code unit
Subject length lower bound = 1
/\x{1234}+?/Ii,utf
Capturing subpattern count = 0
Options: caseless utf
Starting code units: \xe1
No last code unit
Subject length lower bound = 1
/\x{1234}++/Ii,utf
Capturing subpattern count = 0
Options: caseless utf
Starting code units: \xe1
No last code unit
Subject length lower bound = 1
/\x{1234}{2}/Ii,utf
Capturing subpattern count = 0
Options: caseless utf
Starting code units: \xe1
No last code unit
Subject length lower bound = 2
/[^\x{c4}]/IB,utf
------------------------------------------------------------------
Bra
[^\x{c4}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
No first code unit
No last code unit
Subject length lower bound = 1
/X+\x{200}/IB,utf
------------------------------------------------------------------
Bra
X++
\x{200}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = 'X'
Last code unit = \x80
Subject length lower bound = 2
/\R/I,utf
Capturing subpattern count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
No last code unit
Subject length lower bound = 1
/\777/IB,utf
------------------------------------------------------------------
Bra
\x{1ff}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = \xc7
Last code unit = \xbf
Subject length lower bound = 1
/\w+\x{C4}/B,utf
------------------------------------------------------------------
Bra
\w++
\x{c4}
Ket
End
------------------------------------------------------------------
a\x{C4}\x{C4}
0: a\x{c4}
/\w+\x{C4}/B,utf,tables=2
------------------------------------------------------------------
Bra
\w+
\x{c4}
Ket
End
------------------------------------------------------------------
a\x{C4}\x{C4}
0: a\x{c4}\x{c4}
/\W+\x{C4}/B,utf
------------------------------------------------------------------
Bra
\W+
\x{c4}
Ket
End
------------------------------------------------------------------
!\x{C4}
0: !\x{c4}
/\W+\x{C4}/B,utf,tables=2
------------------------------------------------------------------
Bra
\W++
\x{c4}
Ket
End
------------------------------------------------------------------
!\x{C4}
0: !\x{c4}
/\W+\x{A1}/B,utf
------------------------------------------------------------------
Bra
\W+
\x{a1}
Ket
End
------------------------------------------------------------------
!\x{A1}
0: !\x{a1}
/\W+\x{A1}/B,utf,tables=2
------------------------------------------------------------------
Bra
\W+
\x{a1}
Ket
End
------------------------------------------------------------------
!\x{A1}
0: !\x{a1}
/X\s+\x{A0}/B,utf
------------------------------------------------------------------
Bra
X
\s++
\x{a0}
Ket
End
------------------------------------------------------------------
X\x20\x{A0}\x{A0}
0: X \x{a0}
/X\s+\x{A0}/B,utf,tables=2
------------------------------------------------------------------
Bra
X
\s+
\x{a0}
Ket
End
------------------------------------------------------------------
X\x20\x{A0}\x{A0}
0: X \x{a0}\x{a0}
/\S+\x{A0}/B,utf
------------------------------------------------------------------
Bra
\S+
\x{a0}
Ket
End
------------------------------------------------------------------
X\x{A0}\x{A0}
0: X\x{a0}\x{a0}
/\S+\x{A0}/B,utf,tables=2
------------------------------------------------------------------
Bra
\S++
\x{a0}
Ket
End
------------------------------------------------------------------
X\x{A0}\x{A0}
0: X\x{a0}
/\x{a0}+\s!/B,utf
------------------------------------------------------------------
Bra
\x{a0}++
\s
!
Ket
End
------------------------------------------------------------------
\x{a0}\x20!
0: \x{a0} !
/\x{a0}+\s!/B,utf,tables=2
------------------------------------------------------------------
Bra
\x{a0}+
\s
!
Ket
End
------------------------------------------------------------------
\x{a0}\x20!
0: \x{a0} !
/A/utf
\x{ff000041}
** Character \x{ff000041} is greater than 0x7fffffff and so cannot be converted to UTF-8
\x{7f000041}
Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629)
/(*UTF8)abc/never_utf
Failed: error 174 at offset 7: using UTF is disabled by the application
/abc/utf,never_utf
Failed: error 174 at offset 0: using UTF is disabled by the application
/\w/posix
+++\x{c2}
No match: POSIX code 17: match failed
/\w/ucp,posix
+++\x{c2}
0: \xc2
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IBi,utf
------------------------------------------------------------------
Bra
/i A\x{391}\x{10427}\x{ff3a}\x{1fb0}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf
First code unit = 'A' (caseless)
No last code unit
Subject length lower bound = 5
/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IB,utf
------------------------------------------------------------------
Bra
A\x{391}\x{10427}\x{ff3a}\x{1fb0}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = 'A'
Last code unit = \xb0
Subject length lower bound = 5
/AB\x{1fb0}/IB,utf
------------------------------------------------------------------
Bra
AB\x{1fb0}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
First code unit = 'A'
Last code unit = \xb0
Subject length lower bound = 3
/AB\x{1fb0}/IBi,utf
------------------------------------------------------------------
Bra
/i AB\x{1fb0}
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf
First code unit = 'A' (caseless)
Last code unit = 'B' (caseless)
Subject length lower bound = 3
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
Capturing subpattern count = 0
Options: caseless utf
Starting code units: \xd0 \xd1
No last code unit
Subject length lower bound = 17
\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
\x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f}
0: \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f}
/[â±¥]/Bi,utf
------------------------------------------------------------------
Bra
/i \x{2c65}
Ket
End
------------------------------------------------------------------
/[^â±¥]/Bi,utf
------------------------------------------------------------------
Bra
/i [^\x{2c65}]
Ket
End
------------------------------------------------------------------
/\h/I
Capturing subpattern count = 0
No options
Starting code units: \x09 \x20 \xa0
No last code unit
Subject length lower bound = 1
/\v/I
Capturing subpattern count = 0
No options
Starting code units: \x0a \x0b \x0c \x0d \x85
No last code unit
Subject length lower bound = 1
/\R/I
Capturing subpattern count = 0
No options
Starting code units: \x0a \x0b \x0c \x0d \x85
No last code unit
Subject length lower bound = 1
/[[:blank:]]/B,ucp
------------------------------------------------------------------
Bra
[\x09 \xa0]
Ket
End
------------------------------------------------------------------
/\x{212a}+/Ii,utf
Capturing subpattern count = 0
Options: caseless utf
Starting code units: K k \xe2
No last code unit
Subject length lower bound = 1
KKkk\x{212a}
0: KKkk\x{212a}
/s+/Ii,utf
Capturing subpattern count = 0
Options: caseless utf
Starting code units: S s \xc5
No last code unit
Subject length lower bound = 1
SSss\x{17f}
0: SSss\x{17f}
/\x{100}*A/IB,utf
------------------------------------------------------------------
Bra
\x{100}*+
A
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: A \xc4
Last code unit = 'A'
Subject length lower bound = 1
A
0: A
/\x{100}*\d(?R)/IB,utf
------------------------------------------------------------------
Bra
\x{100}*+
\d
Recurse
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4
No last code unit
Subject length lower bound = 1
/[Z\x{100}]/IB,utf
------------------------------------------------------------------
Bra
[Z\x{100}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: Z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd
\xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc
\xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb
\xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa
\xfb \xfc \xfd \xfe \xff
No last code unit
Subject length lower bound = 1
Z\x{100}
0: Z
\x{100}
0: \x{100}
\x{100}Z
0: \x{100}
*** Failers
No match
/[z-\x{100}]/IB,utf
------------------------------------------------------------------
Bra
[z-\xff\x{100}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9
\xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8
\xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7
\xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6
\xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
No last code unit
Subject length lower bound = 1
/[z\Qa-d]Ä€\E]/IB,utf
------------------------------------------------------------------
Bra
[\-\]adz\x{100}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: - ] a d z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc
\xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb
\xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea
\xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9
\xfa \xfb \xfc \xfd \xfe \xff
No last code unit
Subject length lower bound = 1
\x{100}
0: \x{100}
Ä€
0: \x{100}
/[ab\x{100}]abc(xyz(?1))/IB,utf
------------------------------------------------------------------
Bra
[ab\x{100}]
abc
CBra 1
xyz
Recurse
Ket
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Options: utf
Starting code units: a b \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd
\xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc
\xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb
\xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa
\xfb \xfc \xfd \xfe \xff
Last code unit = 'z'
Subject length lower bound = 7
/\x{100}*\s/IB,utf
------------------------------------------------------------------
Bra
\x{100}*+
\s
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc4
No last code unit
Subject length lower bound = 1
/\x{100}*\d/IB,utf
------------------------------------------------------------------
Bra
\x{100}*+
\d
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4
No last code unit
Subject length lower bound = 1
/\x{100}*\w/IB,utf
------------------------------------------------------------------
Bra
\x{100}*+
\w
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
\xc4
No last code unit
Subject length lower bound = 1
/\x{100}*\D/IB,utf
------------------------------------------------------------------
Bra
\x{100}*
\D
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = >
? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c
d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2
\xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1
\xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0
\xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef
\xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe
\xff
No last code unit
Subject length lower bound = 1
/\x{100}*\S/IB,utf
------------------------------------------------------------------
Bra
\x{100}*
\S
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
\x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4
\xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3
\xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2
\xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1
\xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
No last code unit
Subject length lower bound = 1
/\x{100}*\W/IB,utf
------------------------------------------------------------------
Bra
\x{100}*
\W
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = >
? @ [ \ ] ^ ` { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9
\xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8
\xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7
\xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6
\xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
No last code unit
Subject length lower bound = 1
/[\x{105}-\x{109}]/IBi,utf
------------------------------------------------------------------
Bra
[\x{104}-\x{109}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf
Starting code units: \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce
\xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd
\xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec
\xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb
\xfc \xfd \xfe \xff
No last code unit
Subject length lower bound = 1
\x{104}
0: \x{104}
\x{105}
0: \x{105}
\x{109}
0: \x{109}
** Failers
No match
\x{100}
No match
\x{10a}
No match
/[z-\x{100}]/IBi,utf
------------------------------------------------------------------
Bra
[Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf
Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8
\xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7
\xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6
\xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5
\xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
No last code unit
Subject length lower bound = 1
Z
0: Z
z
0: z
\x{39c}
0: \x{39c}
\x{178}
0: \x{178}
|
0: |
\x{80}
0: \x{80}
\x{ff}
0: \x{ff}
\x{100}
0: \x{100}
\x{101}
0: \x{101}
** Failers
No match
\x{102}
No match
Y
No match
y
No match
/[z-\x{100}]/IBi,utf
------------------------------------------------------------------
Bra
[Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf
Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8
\xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7
\xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6
\xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5
\xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
No last code unit
Subject length lower bound = 1
/\x{3a3}B/IBi,utf
------------------------------------------------------------------
Bra
clist 03a3 03c2 03c3
/i B
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: caseless utf
Starting code units: \xce \xcf
Last code unit = 'B' (caseless)
Subject length lower bound = 2
# End of testinput10