Allow non-ASCII in group names when UTF is set; revise group naming terminology

in documentation to use "capture group", as Perl does.
This commit is contained in:
Philip.Hazel 2019-02-06 18:11:36 +00:00
parent a657d4cff8
commit d7b10a57d1
60 changed files with 4236 additions and 4025 deletions

View File

@ -121,6 +121,9 @@ the option applies only to unrecognized or malformed escape sequences.
tests such as (?(VERSION>=0)...) when the version test was true. Incorrect tests such as (?(VERSION>=0)...) when the version test was true. Incorrect
processing or a crash could result. processing or a crash could result.
30. When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group
names, as Perl does.
Version 10.32 10-September-2018 Version 10.32 10-September-2018
------------------------------- -------------------------------

View File

@ -27,8 +27,8 @@ DESCRIPTION
</b><br> </b><br>
<P> <P>
This convenience function finds, for a compiled pattern, the first and last This convenience function finds, for a compiled pattern, the first and last
entries for a given name in the table that translates capturing parenthesis entries for a given name in the table that translates capture group names into
names into numbers. numbers.
<pre> <pre>
<i>code</i> Compiled regular expression <i>code</i> Compiled regular expression
<i>name</i> Name whose entries required <i>name</i> Name whose entries required

View File

@ -49,7 +49,7 @@ please consult the man page, in case the conversion went wrong.
<li><a name="TOC34" href="#SEC34">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a> <li><a name="TOC34" href="#SEC34">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
<li><a name="TOC35" href="#SEC35">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a> <li><a name="TOC35" href="#SEC35">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
<li><a name="TOC36" href="#SEC36">CREATING A NEW STRING WITH SUBSTITUTIONS</a> <li><a name="TOC36" href="#SEC36">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
<li><a name="TOC37" href="#SEC37">DUPLICATE SUBPATTERN NAMES</a> <li><a name="TOC37" href="#SEC37">DUPLICATE CAPTURE GROUP NAMES</a>
<li><a name="TOC38" href="#SEC38">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a> <li><a name="TOC38" href="#SEC38">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
<li><a name="TOC39" href="#SEC39">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a> <li><a name="TOC39" href="#SEC39">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
<li><a name="TOC40" href="#SEC40">SEE ALSO</a> <li><a name="TOC40" href="#SEC40">SEE ALSO</a>
@ -1490,10 +1490,10 @@ independent of the setting of PCRE2_DOTALL.
<pre> <pre>
PCRE2_DUPNAMES PCRE2_DUPNAMES
</pre> </pre>
If this bit is set, names used to identify capturing subpatterns need not be If this bit is set, names used to identify capture groups need not be unique.
unique. This can be helpful for certain types of pattern when it is known that This can be helpful for certain types of pattern when it is known that only one
only one instance of the named subpattern can ever be matched. There are more instance of the named group can ever be matched. There are more details of
details of named subpatterns below; see also the named capture groups below; see also the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a> <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation. documentation.
<pre> <pre>
@ -1526,11 +1526,11 @@ the end of the subject.
If this bit is set, most white space characters in the pattern are totally If this bit is set, most white space characters in the pattern are totally
ignored except when escaped or inside a character class. However, white space ignored except when escaped or inside a character class. However, white space
is not allowed within sequences such as (?&#62; that introduce various is not allowed within sequences such as (?&#62; that introduce various
parenthesized subpatterns, nor within numerical quantifiers such as {1,3}. parenthesized groups, nor within numerical quantifiers such as {1,3}. Ignorable
Ignorable white space is permitted between an item and a following quantifier white space is permitted between an item and a following quantifier and between
and between a quantifier and a following + that indicates possessiveness. a quantifier and a following + that indicates possessiveness. PCRE2_EXTENDED is
PCRE2_EXTENDED is equivalent to Perl's /x option, and it can be changed within equivalent to Perl's /x option, and it can be changed within a pattern by a
a pattern by a (?x) option setting. (?x) option setting.
</P> </P>
<P> <P>
When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as
@ -1606,7 +1606,7 @@ error.
<pre> <pre>
PCRE2_MATCH_UNSET_BACKREF PCRE2_MATCH_UNSET_BACKREF
</pre> </pre>
If this option is set, a backreference to an unset subpattern group matches an If this option is set, a backreference to an unset capture group matches an
empty string (by default this causes the current matching alternative to fail). empty string (by default this causes the current matching alternative to fail).
A pattern such as (\1)(a) succeeds when this option is set (assuming it can A pattern such as (\1)(a) succeeds when this option is set (assuming it can
find an "a" in the subject), whereas it fails by default, for Perl find an "a" in the subject), whereas it fails by default, for Perl
@ -1668,7 +1668,7 @@ If this option is set, it disables the use of numbered capturing parentheses in
the pattern. Any opening parenthesis that is not followed by ? behaves as if it the pattern. Any opening parenthesis that is not followed by ? behaves as if it
were followed by ?: but named parentheses can still be used for capturing (and were followed by ?: but named parentheses can still be used for capturing (and
they acquire numbers in the usual way). This is the same as Perl's /n option. they acquire numbers in the usual way). This is the same as Perl's /n option.
Note that, when this option is set, references to capturing groups Note that, when this option is set, references to capture groups
(backreferences or recursion/subroutine calls) may only refer to named groups, (backreferences or recursion/subroutine calls) may only refer to named groups,
though the reference can be by name or by number. though the reference can be by name or by number.
<pre> <pre>
@ -1687,7 +1687,7 @@ purposes.
If this option is set, it disables an optimization that is applied when .* is If this option is set, it disables an optimization that is applied when .* is
the first significant item in a top-level branch of a pattern, and all the the first significant item in a top-level branch of a pattern, and all the
other branches also start with .* or with \A or \G or ^. The optimization is other branches also start with .* or with \A or \G or ^. The optimization is
automatically disabled for .* if it is inside an atomic group or a capturing automatically disabled for .* if it is inside an atomic group or a capture
group that is the subject of a backreference, or if the pattern contains group that is the subject of a backreference, or if the pattern contains
(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is (*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
automatically anchored if PCRE2_DOTALL is set for all the .* items and automatically anchored if PCRE2_DOTALL is set for all the .* items and
@ -2066,7 +2066,7 @@ When .* is the first significant item, anchoring is possible only when all the
following are true: following are true:
<pre> <pre>
.* is not in an atomic group .* is not in an atomic group
.* is not in a capturing group that is the subject of a backreference .* is not in a capture group that is the subject of a backreference
PCRE2_DOTALL is in force for .* PCRE2_DOTALL is in force for .*
Neither (*PRUNE) nor (*SKIP) appears in the pattern Neither (*PRUNE) nor (*SKIP) appears in the pattern
PCRE2_NO_DOTSTAR_ANCHOR is not set PCRE2_NO_DOTSTAR_ANCHOR is not set
@ -2077,12 +2077,12 @@ options returned for PCRE2_INFO_ALLOPTIONS.
PCRE2_INFO_BACKREFMAX PCRE2_INFO_BACKREFMAX
</pre> </pre>
Return the number of the highest backreference in the pattern. The third Return the number of the highest backreference in the pattern. The third
argument should point to an <b>uint32_t</b> variable. Named subpatterns acquire argument should point to an <b>uint32_t</b> variable. Named capture groups
numbers as well as names, and these count towards the highest backreference. acquire numbers as well as names, and these count towards the highest
Backreferences such as \4 or \g{12} match the captured characters of the backreference. Backreferences such as \4 or \g{12} match the captured
given group, but in addition, the check that a capturing group is set in a characters of the given group, but in addition, the check that a capture
conditional subpattern such as (?(3)a|b) is also a backreference. Zero is group is set in a conditional group such as (?(3)a|b) is also a backreference.
returned if there are no backreferences. Zero is returned if there are no backreferences.
<pre> <pre>
PCRE2_INFO_BSR PCRE2_INFO_BSR
</pre> </pre>
@ -2093,9 +2093,9 @@ that \R matches only CR, LF, or CRLF.
<pre> <pre>
PCRE2_INFO_CAPTURECOUNT PCRE2_INFO_CAPTURECOUNT
</pre> </pre>
Return the highest capturing subpattern number in the pattern. In patterns Return the highest capture group number in the pattern. In patterns where (?|
where (?| is not used, this is also the total number of capturing subpatterns. is not used, this is also the total number of capture groups. The third
The third argument should point to an <b>uint32_t</b> variable. argument should point to an <b>uint32_t</b> variable.
<pre> <pre>
PCRE2_INFO_DEPTHLIMIT PCRE2_INFO_DEPTHLIMIT
</pre> </pre>
@ -2143,7 +2143,7 @@ Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by <b>pcre2_match()</b> backtracking positions when the pattern is processed by <b>pcre2_match()</b>
without the use of JIT. The third argument should point to a <b>size_t</b> without the use of JIT. The third argument should point to a <b>size_t</b>
variable. The frame size depends on the number of capturing parentheses in the variable. The frame size depends on the number of capturing parentheses in the
pattern. Each additional capturing group adds two PCRE2_SIZE variables. pattern. Each additional capture group adds two PCRE2_SIZE variables.
<pre> <pre>
PCRE2_INFO_HASBACKSLASHC PCRE2_INFO_HASBACKSLASHC
</pre> </pre>
@ -2267,20 +2267,20 @@ the parenthesis number. The rest of the entry is the corresponding name, zero
terminated. terminated.
</P> </P>
<P> <P>
The names are in alphabetical order. If (?| is used to create multiple groups The names are in alphabetical order. If (?| is used to create multiple capture
with the same number, as described in the groups with the same number, as described in the
<a href="pcre2pattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a> <a href="pcre2pattern.html#dupgroupnumber">section on duplicate group numbers</a>
in the in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a> <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
page, the groups may be given the same name, but there is only one entry in the page, the groups may be given the same name, but there is only one entry in the
table. Different names for groups of the same number are not permitted. table. Different names for groups of the same number are not permitted.
</P> </P>
<P> <P>
Duplicate names for subpatterns with different numbers are permitted, but only Duplicate names for capture groups with different numbers are permitted, but
if PCRE2_DUPNAMES is set. They appear in the table in the order in which they only if PCRE2_DUPNAMES is set. They appear in the table in the order in which
were found in the pattern. In the absence of (?| this is the order of they were found in the pattern. In the absence of (?| this is the order of
increasing number; when (?| is used this is not necessarily the case because increasing number; when (?| is used this is not necessarily the case because
later subpatterns may have lower numbers. later capture groups may have lower numbers.
</P> </P>
<P> <P>
As a simple example of the name/number table, consider the following pattern As a simple example of the name/number table, consider the following pattern
@ -2289,16 +2289,16 @@ space - including newlines - is ignored):
<pre> <pre>
(?&#60;date&#62; (?&#60;year&#62;(\d\d)?\d\d) - (?&#60;month&#62;\d\d) - (?&#60;day&#62;\d\d) ) (?&#60;date&#62; (?&#60;year&#62;(\d\d)?\d\d) - (?&#60;month&#62;\d\d) - (?&#60;day&#62;\d\d) )
</pre> </pre>
There are four named subpatterns, so the table has four entries, and each entry There are four named capture groups, so the table has four entries, and each
in the table is eight bytes long. The table is as follows, with non-printing entry in the table is eight bytes long. The table is as follows, with
bytes shows in hexadecimal, and undefined bytes shown as ??: non-printing bytes shows in hexadecimal, and undefined bytes shown as ??:
<pre> <pre>
00 01 d a t e 00 ?? 00 01 d a t e 00 ??
00 05 d a y 00 ?? ?? 00 05 d a y 00 ?? ??
00 04 m o n t h 00 00 04 m o n t h 00
00 02 y e a r 00 ?? 00 02 y e a r 00 ??
</pre> </pre>
When writing code to extract data from named subpatterns using the When writing code to extract data from named capture groups using the
name-to-number map, remember that the length of the entries is likely to be name-to-number map, remember that the length of the entries is likely to be
different for each compiled pattern. different for each compiled pattern.
<pre> <pre>
@ -2741,12 +2741,12 @@ valid newline sequence and explicit \r or \n escapes appear in the pattern.
In general, a pattern matches a certain portion of the subject, and in In general, a pattern matches a certain portion of the subject, and in
addition, further substrings from the subject may be picked out by addition, further substrings from the subject may be picked out by
parenthesized parts of the pattern. Following the usage in Jeffrey Friedl's parenthesized parts of the pattern. Following the usage in Jeffrey Friedl's
book, this is called "capturing" in what follows, and the phrase "capturing book, this is called "capturing" in what follows, and the phrase "capture
subpattern" or "capturing group" is used for a fragment of a pattern that picks group" (Perl terminology) is used for a fragment of a pattern that picks out a
out a substring. PCRE2 supports several other kinds of parenthesized subpattern substring. PCRE2 supports several other kinds of parenthesized group that do
that do not cause substrings to be captured. The <b>pcre2_pattern_info()</b> not cause substrings to be captured. The <b>pcre2_pattern_info()</b> function
function can be used to find out how many capturing subpatterns there are in a can be used to find out how many capture groups there are in a compiled
compiled pattern. pattern.
</P> </P>
<P> <P>
You can use auxiliary functions for accessing captured substrings You can use auxiliary functions for accessing captured substrings
@ -2795,9 +2795,8 @@ For example, if the pattern (?=ab\K) is matched against "ab", the start and
end offset values for the match are 2 and 0. end offset values for the match are 2 and 0.
</P> </P>
<P> <P>
If a capturing subpattern group is matched repeatedly within a single match If a capture group is matched repeatedly within a single match operation, it is
operation, it is the last portion of the subject that it matched that is the last portion of the subject that it matched that is returned.
returned.
</P> </P>
<P> <P>
If the ovector is too small to hold all the captured substring offsets, as much If the ovector is too small to hold all the captured substring offsets, as much
@ -2806,21 +2805,20 @@ substrings are not of interest, <b>pcre2_match()</b> may be called with a match
data block whose ovector is of minimum length (that is, one pair). data block whose ovector is of minimum length (that is, one pair).
</P> </P>
<P> <P>
It is possible for capturing subpattern number <i>n+1</i> to match some part of It is possible for capture group number <i>n+1</i> to match some part of the
the subject when subpattern <i>n</i> has not been used at all. For example, if subject when group <i>n</i> has not been used at all. For example, if the string
the string "abc" is matched against the pattern (a|(z))(bc) the return from the "abc" is matched against the pattern (a|(z))(bc) the return from the function
function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both
happens, both values in the offset pairs corresponding to unused subpatterns values in the offset pairs corresponding to unused groups are set to
are set to PCRE2_UNSET. PCRE2_UNSET.
</P> </P>
<P> <P>
Offset values that correspond to unused subpatterns at the end of the Offset values that correspond to unused groups at the end of the expression are
expression are also set to PCRE2_UNSET. For example, if the string "abc" is also set to PCRE2_UNSET. For example, if the string "abc" is matched against
matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched. the pattern (abc)(x(yz)?)? groups 2 and 3 are not matched. The return from the
The return from the function is 2, because the highest used capturing function is 2, because the highest used capture group number is 1. The offsets
subpattern number is 1. The offsets for for the second and third capturing for for the second and third capture groupss (assuming the vector is large
subpatterns (assuming the vector is large enough, of course) are set to enough, of course) are set to PCRE2_UNSET.
PCRE2_UNSET.
</P> </P>
<P> <P>
Elements in the ovector that do not correspond to capturing parentheses in the Elements in the ovector that do not correspond to capturing parentheses in the
@ -2993,11 +2991,11 @@ as NULL.
</pre> </pre>
This error is returned when <b>pcre2_match()</b> detects a recursion loop within This error is returned when <b>pcre2_match()</b> detects a recursion loop within
the pattern. Specifically, it means that either the whole pattern or a the pattern. Specifically, it means that either the whole pattern or a
subpattern has been called recursively for the second time at the same position capture group has been called recursively for the second time at the same
in the subject string. Some simple patterns that might do this are detected and position in the subject string. Some simple patterns that might do this are
faulted at compile time, but more complicated cases, in particular mutual detected and faulted at compile time, but more complicated cases, in particular
recursions between two different subpatterns, cannot be detected until matching mutual recursions between two different groups, cannot be detected until
is attempted. matching is attempted.
<a name="geterrormessage"></a></P> <a name="geterrormessage"></a></P>
<br><a name="SEC32" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br> <br><a name="SEC32" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
<P> <P>
@ -3074,7 +3072,7 @@ The <b>pcre2_substring_copy_bynumber()</b> function copies a captured substring
into a supplied buffer, whereas <b>pcre2_substring_get_bynumber()</b> copies it into a supplied buffer, whereas <b>pcre2_substring_get_bynumber()</b> copies it
into new memory, obtained using the same memory allocation function that was into new memory, obtained using the same memory allocation function that was
used for the match data block. The first two arguments of these functions are a used for the match data block. The first two arguments of these functions are a
pointer to the match data block and a capturing group number. pointer to the match data block and a capture group number.
</P> </P>
<P> <P>
The final arguments of <b>pcre2_substring_copy_bynumber()</b> are a pointer to The final arguments of <b>pcre2_substring_copy_bynumber()</b> are a pointer to
@ -3150,9 +3148,9 @@ calling <b>pcre2_substring_list_free()</b>.
</P> </P>
<P> <P>
If this function encounters a substring that is unset, which can happen when If this function encounters a substring that is unset, which can happen when
capturing subpattern number <i>n+1</i> matches some part of the subject, but capture group number <i>n+1</i> matches some part of the subject, but group
subpattern <i>n</i> has not been used at all, it returns an empty string. This <i>n</i> has not been used at all, it returns an empty string. This can be
can be distinguished from a genuine zero-length substring by inspecting the distinguished from a genuine zero-length substring by inspecting the
appropriate offset in the ovector, which contain PCRE2_UNSET for unset appropriate offset in the ovector, which contain PCRE2_UNSET for unset
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>. substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
<a name="extractbyname"></a></P> <a name="extractbyname"></a></P>
@ -3182,21 +3180,21 @@ For example, for this pattern:
<pre> <pre>
(a+)b(?&#60;xxx&#62;\d+)... (a+)b(?&#60;xxx&#62;\d+)...
</pre> </pre>
the number of the subpattern called "xxx" is 2. If the name is known to be the number of the capture group called "xxx" is 2. If the name is known to be
unique (PCRE2_DUPNAMES was not set), you can find the number from the name by unique (PCRE2_DUPNAMES was not set), you can find the number from the name by
calling <b>pcre2_substring_number_from_name()</b>. The first argument is the calling <b>pcre2_substring_number_from_name()</b>. The first argument is the
compiled pattern, and the second is the name. The yield of the function is the compiled pattern, and the second is the name. The yield of the function is the
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that group number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name.
that name. Given the number, you can extract the substring directly from the Given the number, you can extract the substring directly from the ovector, or
ovector, or use one of the "bynumber" functions described above. use one of the "bynumber" functions described above.
</P> </P>
<P> <P>
For convenience, there are also "byname" functions that correspond to the For convenience, there are also "byname" functions that correspond to the
"bynumber" functions, the only difference being that the second argument is a "bynumber" functions, the only difference being that the second argument is a
name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
names, these functions scan all the groups with the given name, and return the names, these functions scan all the groups with the given name, and return the
first named string that is set. captured substring from the first named group that is set.
</P> </P>
<P> <P>
If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
@ -3207,13 +3205,13 @@ set, PCRE2_ERROR_UNSET is returned.
</P> </P>
<P> <P>
<b>Warning:</b> If the pattern uses the (?| feature to set up multiple <b>Warning:</b> If the pattern uses the (?| feature to set up multiple
subpatterns with the same number, as described in the capture groups with the same number, as described in the
<a href="pcre2pattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a> <a href="pcre2pattern.html#dupgroupnumber">section on duplicate group numbers</a>
in the in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a> <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
page, you cannot use names to distinguish the different subpatterns, because page, you cannot use names to distinguish the different capture groups, because
names are not included in the compiled code. The matching process uses only names are not included in the compiled code. The matching process uses only
numbers. For this reason, the use of different names for subpatterns of the numbers. For this reason, the use of different names for groups with the
same number causes an error at compile time. same number causes an error at compile time.
<a name="substitutions"></a></P> <a name="substitutions"></a></P>
<br><a name="SEC36" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br> <br><a name="SEC36" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
@ -3276,7 +3274,7 @@ length is in code units, not bytes.
In the replacement string, which is interpreted as a UTF string in UTF mode, In the replacement string, which is interpreted as a UTF string in UTF mode,
and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
dollar character is an escape character that can specify the insertion of dollar character is an escape character that can specify the insertion of
characters from capturing groups or names from (*MARK) or other control verbs characters from capture groups or names from (*MARK) or other control verbs
in the pattern. The following forms are always recognized: in the pattern. The following forms are always recognized:
<pre> <pre>
$$ insert a dollar character $$ insert a dollar character
@ -3345,13 +3343,13 @@ efficient to allocate a large buffer and free the excess afterwards, instead of
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH. using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
</P> </P>
<P> <P>
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capturing groups that do PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do
not appear in the pattern to be treated as unset groups. This option should be not appear in the pattern to be treated as unset groups. This option should be
used with care, because it means that a typo in a group name or number no used with care, because it means that a typo in a group name or number no
longer causes the PCRE2_ERROR_NOSUBSTRING error. longer causes the PCRE2_ERROR_NOSUBSTRING error.
</P> </P>
<P> <P>
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups (including unknown PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including unknown
groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty
strings when inserted as described above. If this option is not set, an attempt strings when inserted as described above. If this option is not set, an attempt
to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does
@ -3379,7 +3377,7 @@ terminating a \Q quoted sequence) reverts to no case forcing. The sequences
\u and \l force the next character (if it is a letter) to upper or lower \u and \l force the next character (if it is a letter) to upper or lower
case, respectively, and then the state automatically reverts to no case case, respectively, and then the state automatically reverts to no case
forcing. Case forcing applies to all inserted characters, including those from forcing. Case forcing applies to all inserted characters, including those from
captured groups and letters within \Q...\E quoted sequences. capture groups and letters within \Q...\E quoted sequences.
</P> </P>
<P> <P>
Note that case forcing sequences such as \U...\E do not nest. For example, Note that case forcing sequences such as \U...\E do not nest. For example,
@ -3388,7 +3386,8 @@ effect.
</P> </P>
<P> <P>
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
flexibility to group substitution. The syntax is similar to that used by Bash: flexibility to capture group substitution. The syntax is similar to that used
by Bash:
<pre> <pre>
${&#60;n&#62;:-&#60;string&#62;} ${&#60;n&#62;:-&#60;string&#62;}
${&#60;n&#62;:+&#60;string1&#62;:&#60;string2&#62;} ${&#60;n&#62;:+&#60;string1&#62;:&#60;string2&#62;}
@ -3518,20 +3517,21 @@ PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the
output and the call to <b>pcre2_substitute()</b> exits, returning the number of output and the call to <b>pcre2_substitute()</b> exits, returning the number of
matches so far. matches so far.
</P> </P>
<br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br> <br><a name="SEC37" href="#TOC1">DUPLICATE CAPTURE GROUP NAMES</a><br>
<P> <P>
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b> <b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b> <b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
</P> </P>
<P> <P>
When a pattern is compiled with the PCRE2_DUPNAMES option, names for When a pattern is compiled with the PCRE2_DUPNAMES option, names for capture
subpatterns are not required to be unique. Duplicate names are always allowed groups are not required to be unique. Duplicate names are always allowed for
for subpatterns with the same number, created by using the (?| feature. Indeed, groups with the same number, created by using the (?| feature. Indeed, if such
if such subpatterns are named, they are required to use the same names. groups are named, they are required to use the same names.
</P> </P>
<P> <P>
Normally, patterns with duplicate names are such that in any one match, only Normally, patterns that use duplicate names are such that in any one match,
one of the named subpatterns participates. An example is shown in the only one of each set of identically-named groups participates. An example is
shown in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a> <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation. documentation.
</P> </P>
@ -3703,9 +3703,8 @@ the three matched strings are
On success, the yield of the function is a number greater than zero, which is On success, the yield of the function is a number greater than zero, which is
the number of matched substrings. The offsets of the substrings are returned in the number of matched substrings. The offsets of the substrings are returned in
the ovector, and can be extracted by number in the same way as for the ovector, and can be extracted by number in the same way as for
<b>pcre2_match()</b>, but the numbers bear no relation to any capturing groups <b>pcre2_match()</b>, but the numbers bear no relation to any capture groups
that may exist in the pattern, because DFA matching does not support group that may exist in the pattern, because DFA matching does not support capturing.
capture.
</P> </P>
<P> <P>
Calls to the convenience functions that extract substrings by name Calls to the convenience functions that extract substrings by name
@ -3747,7 +3746,7 @@ a backreference.
</pre> </pre>
This return is given if <b>pcre2_dfa_match()</b> encounters a condition item This return is given if <b>pcre2_dfa_match()</b> encounters a condition item
that uses a backreference for the condition, or a test for recursion in a that uses a backreference for the condition, or a test for recursion in a
specific group. These are not supported. specific capture group. These are not supported.
<pre> <pre>
PCRE2_ERROR_DFA_WSSIZE PCRE2_ERROR_DFA_WSSIZE
</pre> </pre>
@ -3756,9 +3755,9 @@ This return is given if <b>pcre2_dfa_match()</b> runs out of space in the
<pre> <pre>
PCRE2_ERROR_DFA_RECURSE PCRE2_ERROR_DFA_RECURSE
</pre> </pre>
When a recursive subpattern is processed, the matching function calls itself When a recursion or subroutine call is processed, the matching function calls
recursively, using private memory for the ovector and <i>workspace</i>. This itself recursively, using private memory for the ovector and <i>workspace</i>.
error is given if the internal ovector is not large enough. This should be This error is given if the internal ovector is not large enough. This should be
extremely rare, as a vector of size 1000 is used. extremely rare, as a vector of size 1000 is used.
<pre> <pre>
PCRE2_ERROR_DFA_BADRESTART PCRE2_ERROR_DFA_BADRESTART
@ -3785,7 +3784,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 04 January 2019 Last updated: 04 February 2019
<br> <br>
Copyright &copy; 1997-2019 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>

View File

@ -151,7 +151,7 @@ branch, automatic anchoring occurs if all branches are anchorable.
</P> </P>
<P> <P>
This optimization is disabled, however, if .* is in an atomic group or if there This optimization is disabled, however, if .* is in an atomic group or if there
is a backreference to the capturing group in which it appears. It is also is a backreference to the capture group in which it appears. It is also
disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
callouts does not affect it. callouts does not affect it.
</P> </P>
@ -354,8 +354,8 @@ callout before an assertion such as (?=ab) the length is 3. For an an
alternation bar or a closing parenthesis, the length is one, unless a closing alternation bar or a closing parenthesis, the length is one, unless a closing
parenthesis is followed by a quantifier, in which case its length is included. parenthesis is followed by a quantifier, in which case its length is included.
(This changed in release 10.23. In earlier releases, before an opening (This changed in release 10.23. In earlier releases, before an opening
parenthesis the length was that of the entire subpattern, and before an parenthesis the length was that of the entire group, and before an alternation
alternation bar or a closing parenthesis the length was zero.) bar or a closing parenthesis the length was zero.)
</P> </P>
<P> <P>
The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
@ -471,9 +471,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC8" href="#TOC1">REVISION</a><br> <br><a name="SEC8" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 17 September 2018 Last updated: 03 February 2019
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -36,10 +36,9 @@ assertion just once). Perl allows some repeat quantifiers on other assertions,
for example, \b* (but not \b{3}), but these do not seem to have any use. for example, \b* (but not \b{3}), but these do not seem to have any use.
</P> </P>
<P> <P>
3. Capturing subpatterns that occur inside negative lookaround assertions are 3. Capture groups that occur inside negative lookaround assertions are counted,
counted, but their entries in the offsets vector are set only when a negative but their entries in the offsets vector are set only when a negative assertion
assertion is a condition that has a matching branch (that is, the condition is is a condition that has a matching branch (that is, the condition is false).
false).
</P> </P>
<P> <P>
4. The following Perl escape sequences are not supported: \F, \l, \L, \u, 4. The following Perl escape sequences are not supported: \F, \l, \L, \u,
@ -94,13 +93,13 @@ to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
into subroutine calls is now supported, as in Perl. into subroutine calls is now supported, as in Perl.
</P> </P>
<P> <P>
9. If any of the backtracking control verbs are used in a subpattern that is 9. If any of the backtracking control verbs are used in a group that is called
called as a subroutine (whether or not recursively), their effect is confined as a subroutine (whether or not recursively), their effect is confined to that
to that subpattern; it does not extend to the surrounding pattern. This is not group; it does not extend to the surrounding pattern. This is not always the
always the case in Perl. In particular, if (*THEN) is present in a group that case in Perl. In particular, if (*THEN) is present in a group that is called as
is called as a subroutine, its action is limited to that group, even if the a subroutine, its action is limited to that group, even if the group does not
group does not contain any | characters. Note that such subpatterns are contain any | characters. Note that such groups are processed as anchored
processed as anchored at the point where they are tested. at the point where they are tested.
</P> </P>
<P> <P>
10. If a pattern contains more than one backtracking control verb, the first 10. If a pattern contains more than one backtracking control verb, the first
@ -120,22 +119,21 @@ the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
"b". "b".
</P> </P>
<P> <P>
13. PCRE2's handling of duplicate subpattern numbers and duplicate subpattern 13. PCRE2's handling of duplicate capture group numbers and names is not as
names is not as general as Perl's. This is a consequence of the fact the PCRE2 general as Perl's. This is a consequence of the fact the PCRE2 works internally
works internally just with numbers, using an external table to translate just with numbers, using an external table to translate between numbers and
between numbers and names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B), names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B), where the two
where the two capturing parentheses have the same number but different names, capture groups have the same number but different names, is not supported, and
is not supported, and causes an error at compile time. If it were allowed, it causes an error at compile time. If it were allowed, it would not be possible
would not be possible to distinguish which parentheses matched, because both to distinguish which group matched, because both names map to capture group
names map to capturing subpattern number 1. To avoid this confusing situation, number 1. To avoid this confusing situation, an error is given at compile time.
an error is given at compile time.
</P> </P>
<P> <P>
14. Perl used to recognize comments in some places that PCRE2 does not, for 14. Perl used to recognize comments in some places that PCRE2 does not, for
example, between the ( and ? at the start of a subpattern. If the /x modifier example, between the ( and ? at the start of a group. If the /x modifier is
is set, Perl allowed white space between ( and ? though the latest Perls give set, Perl allowed white space between ( and ? though the latest Perls give an
an error (for a while it was just deprecated). There may still be some cases error (for a while it was just deprecated). There may still be some cases where
where Perl behaves differently. Perl behaves differently.
</P> </P>
<P> <P>
15. Perl, when in warning mode, gives warnings for character classes such as 15. Perl, when in warning mode, gives warnings for character classes such as
@ -235,9 +233,9 @@ Cambridge, England.
REVISION REVISION
</b><br> </b><br>
<P> <P>
Last updated: 28 July 2018 Last updated: 03 February 2019
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -50,17 +50,17 @@ All values in repeating quantifiers must be less than 65536.
The maximum length of a lookbehind assertion is 65535 characters. The maximum length of a lookbehind assertion is 65535 characters.
</P> </P>
<P> <P>
There is no limit to the number of parenthesized subpatterns, but there can be There is no limit to the number of parenthesized groups, but there can be no
no more than 65535 capturing subpatterns. There is, however, a limit to the more than 65535 capture groups, and there is a limit to the depth of nesting of
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in parenthesized subpatterns of all kinds. This is imposed in order to limit the
order to limit the amount of system stack used at compile time. The default amount of system stack used at compile time. The default limit can be specified
limit can be specified when PCRE2 is built; if not, the default is set to 250. when PCRE2 is built; if not, the default is set to 250. An application can
An application can change this limit by calling pcre2_set_parens_nest_limit() change this limit by calling pcre2_set_parens_nest_limit() to set the limit in
to set the limit in a compile context. a compile context.
</P> </P>
<P> <P>
The maximum length of name for a named subpattern is 32 code units, and the The maximum length of name for a named capture group is 32 code units, and the
maximum number of named subpatterns is 10000. maximum number of such groups is 10000.
</P> </P>
<P> <P>
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
@ -86,9 +86,9 @@ Cambridge, England.
REVISION REVISION
</b><br> </b><br>
<P> <P>
Last updated: 30 March 2017 Last updated: 02 February 2019
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

File diff suppressed because it is too large Load Diff

View File

@ -31,9 +31,9 @@ of them.
Patterns are compiled by PCRE2 into a reasonably efficient interpretive code, Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
so that most simple patterns do not use much memory for storing the compiled so that most simple patterns do not use much memory for storing the compiled
version. However, there is one case where the memory usage of a compiled version. However, there is one case where the memory usage of a compiled
pattern can be unexpectedly large. If a parenthesized subpattern has a pattern can be unexpectedly large. If a parenthesized group has a quantifier
quantifier with a minimum greater than 1 and/or a limited maximum, the whole with a minimum greater than 1 and/or a limited maximum, the whole group is
subpattern is repeated in the compiled code. For example, the pattern repeated in the compiled code. For example, the pattern
<pre> <pre>
(abc|def){2,4} (abc|def){2,4}
</pre> </pre>
@ -252,9 +252,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC6" href="#TOC1">REVISION</a><br> <br><a name="SEC6" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 25 April 2018 Last updated: 03 February 2019
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -424,20 +424,23 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
<br><a name="SEC13" href="#TOC1">CAPTURING</a><br> <br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
<P> <P>
<pre> <pre>
(...) capturing group (...) capture group
(?&#60;name&#62;...) named capturing group (Perl) (?&#60;name&#62;...) named capture group (Perl)
(?'name'...) named capturing group (Perl) (?'name'...) named capture group (Perl)
(?P&#60;name&#62;...) named capturing group (Python) (?P&#60;name&#62;...) named capture group (Python)
(?:...) non-capturing group (?:...) non-capture group
(?|...) non-capturing group; reset group numbers for (?|...) non-capture group; reset group numbers for
capturing groups in each alternative capture groups in each alternative
</PRE> </pre>
In non-UTF modes, names may contain underscores and ASCII letters and digits;
in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
both cases, a name must not start with a digit.
</P> </P>
<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br> <br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
<P> <P>
<pre> <pre>
(?&#62;...) atomic, non-capturing group (?&#62;...) atomic non-capture group
(*atomic:...) atomic, non-capturing group (*atomic:...) atomic non-capture group
</PRE> </PRE>
</P> </P>
<br><a name="SEC15" href="#TOC1">COMMENT</a><br> <br><a name="SEC15" href="#TOC1">COMMENT</a><br>
@ -465,7 +468,7 @@ of the group.
Unsetting x or xx unsets both. Several options may be set at once, and a Unsetting x or xx unsets both. Several options may be set at once, and a
mixture of setting and unsetting such as (?i-x) is allowed, but there may be mixture of setting and unsetting such as (?i-x) is allowed, but there may be
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
(?^in). An option setting may appear at the start of a non-capturing group, for (?^in). An option setting may appear at the start of a non-capture group, for
example (?i:...). example (?i:...).
</P> </P>
<P> <P>
@ -565,19 +568,19 @@ Each top-level branch of a lookbehind must be of a fixed length.
<P> <P>
<pre> <pre>
(?R) recurse whole pattern (?R) recurse whole pattern
(?n) call subpattern by absolute number (?n) call subroutine by absolute number
(?+n) call subpattern by relative number (?+n) call subroutine by relative number
(?-n) call subpattern by relative number (?-n) call subroutine by relative number
(?&name) call subpattern by name (Perl) (?&name) call subroutine by name (Perl)
(?P&#62;name) call subpattern by name (Python) (?P&#62;name) call subroutine by name (Python)
\g&#60;name&#62; call subpattern by name (Oniguruma) \g&#60;name&#62; call subroutine by name (Oniguruma)
\g'name' call subpattern by name (Oniguruma) \g'name' call subroutine by name (Oniguruma)
\g&#60;n&#62; call subpattern by absolute number (Oniguruma) \g&#60;n&#62; call subroutine by absolute number (Oniguruma)
\g'n' call subpattern by absolute number (Oniguruma) \g'n' call subroutine by absolute number (Oniguruma)
\g&#60;+n&#62; call subpattern by relative number (PCRE2 extension) \g&#60;+n&#62; call subroutine by relative number (PCRE2 extension)
\g'+n' call subpattern by relative number (PCRE2 extension) \g'+n' call subroutine by relative number (PCRE2 extension)
\g&#60;-n&#62; call subpattern by relative number (PCRE2 extension) \g&#60;-n&#62; call subroutine by relative number (PCRE2 extension)
\g'-n' call subpattern by relative number (PCRE2 extension) \g'-n' call subroutine by relative number (PCRE2 extension)
</PRE> </PRE>
</P> </P>
<br><a name="SEC23" href="#TOC1">CONDITIONAL PATTERNS</a><br> <br><a name="SEC23" href="#TOC1">CONDITIONAL PATTERNS</a><br>
@ -595,7 +598,7 @@ Each top-level branch of a lookbehind must be of a fixed length.
(?(R) overall recursion condition (?(R) overall recursion condition
(?(Rn) specific numbered group recursion condition (?(Rn) specific numbered group recursion condition
(?(R&name) specific named group recursion condition (?(R&name) specific named group recursion condition
(?(DEFINE) define subpattern for reference (?(DEFINE) define groups for reference
(?(VERSION[&#62;]=n.m) test PCRE2 version (?(VERSION[&#62;]=n.m) test PCRE2 version
(?(assert) assertion condition (?(assert) assertion condition
</pre> </pre>
@ -657,9 +660,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC28" href="#TOC1">REVISION</a><br> <br><a name="SEC28" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 10 October 2018 Last updated: 03 February 2019
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -716,14 +716,14 @@ information is obtained from the <b>pcre2_pattern_info()</b> function. Here are
some typical examples: some typical examples:
<pre> <pre>
re&#62; /(?i)(^a|^b)/m,info re&#62; /(?i)(^a|^b)/m,info
Capturing subpattern count = 1 Capture group count = 1
Compile options: multiline Compile options: multiline
Overall options: caseless multiline Overall options: caseless multiline
First code unit at start or follows newline First code unit at start or follows newline
Subject length lower bound = 1 Subject length lower bound = 1
re&#62; /(?i)abc/info re&#62; /(?i)abc/info
Capturing subpattern count = 0 Capture group count = 0
Compile options: &#60;none&#62; Compile options: &#60;none&#62;
Overall options: caseless Overall options: caseless
First code unit = 'a' (caseless) First code unit = 'a' (caseless)
@ -1353,8 +1353,8 @@ Testing substring extraction functions
<P> <P>
The <b>copy</b> and <b>get</b> modifiers can be used to test the The <b>copy</b> and <b>get</b> modifiers can be used to test the
<b>pcre2_substring_copy_xxx()</b> and <b>pcre2_substring_get_xxx()</b> functions. <b>pcre2_substring_copy_xxx()</b> and <b>pcre2_substring_get_xxx()</b> functions.
They can be given more than once, and each can specify a group name or number, They can be given more than once, and each can specify a capture group name or
for example: number, for example:
<pre> <pre>
abcd\=copy=1,copy=3,get=G1 abcd\=copy=1,copy=3,get=G1
</pre> </pre>
@ -2075,9 +2075,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br> <br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 12 November 2018 Last updated: 03 February 2019
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -38,10 +38,11 @@ UNICODE PROPERTY SUPPORT
</b><br> </b><br>
<P> <P>
When PCRE2 is built with Unicode support, the escape sequences \p{..}, When PCRE2 is built with Unicode support, the escape sequences \p{..},
\P{..}, and \X can be used. The Unicode properties that can be tested are \P{..}, and \X can be used. This is not dependent on the PCRE2_UTF setting.
limited to the general category properties such as Lu for an upper case letter The Unicode properties that can be tested are limited to the general category
or Nd for a decimal number, the Unicode script names such as Arabic or Han, and properties such as Lu for an upper case letter or Nd for a decimal number, the
the derived properties Any and L&. Full lists are given in the Unicode script names such as Arabic or Han, and the derived properties Any and
L&. Full lists are given in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a> <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
and and
<a href="pcre2syntax.html"><b>pcre2syntax</b></a> <a href="pcre2syntax.html"><b>pcre2syntax</b></a>
@ -73,11 +74,17 @@ In UTF modes, the dot metacharacter matches one UTF character instead of a
single code unit. single code unit.
</P> </P>
<P> <P>
In UTF modes, capture group names are not restricted to ASCII, and may contain
any Unicode letters and decimal digits, as well as underscore.
</P>
<P>
The escape sequence \C can be used to match a single code unit in a UTF mode, The escape sequence \C can be used to match a single code unit in a UTF mode,
but its use can lead to some strange effects because it breaks up multi-unit but its use can lead to some strange effects because it breaks up multi-unit
characters (see the description of \C in the characters (see the description of \C in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a> <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation). documentation). For this reason, there is a build-time option that disables
support for \C completely. There is also a less draconian compile-time option
for locking out the use of \C when a pattern is compiled.
</P> </P>
<P> <P>
The use of \C is not supported by the alternative matching function The use of \C is not supported by the alternative matching function
@ -410,9 +417,9 @@ Cambridge, England.
REVISION REVISION
</b><br> </b><br>
<P> <P>
Last updated: 12 October 2018 Last updated: 03 February 2019
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2019 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
.TH PCRE2_SUBSTRING_NAMETABLE_SCAN 3 "21 October 2014" "PCRE2 10.00" .TH PCRE2_SUBSTRING_NAMETABLE_SCAN 3 "03 February 2019" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -15,8 +15,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.rs .rs
.sp .sp
This convenience function finds, for a compiled pattern, the first and last This convenience function finds, for a compiled pattern, the first and last
entries for a given name in the table that translates capturing parenthesis entries for a given name in the table that translates capture group names into
names into numbers. numbers.
.sp .sp
\fIcode\fP Compiled regular expression \fIcode\fP Compiled regular expression
\fIname\fP Name whose entries required \fIname\fP Name whose entries required

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "04 January 2019" "PCRE2 10.33" .TH PCRE2API 3 "04 February 2019" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -1429,10 +1429,10 @@ independent of the setting of PCRE2_DOTALL.
.sp .sp
PCRE2_DUPNAMES PCRE2_DUPNAMES
.sp .sp
If this bit is set, names used to identify capturing subpatterns need not be If this bit is set, names used to identify capture groups need not be unique.
unique. This can be helpful for certain types of pattern when it is known that This can be helpful for certain types of pattern when it is known that only one
only one instance of the named subpattern can ever be matched. There are more instance of the named group can ever be matched. There are more details of
details of named subpatterns below; see also the named capture groups below; see also the
.\" HREF .\" HREF
\fBpcre2pattern\fP \fBpcre2pattern\fP
.\" .\"
@ -1466,11 +1466,11 @@ the end of the subject.
If this bit is set, most white space characters in the pattern are totally If this bit is set, most white space characters in the pattern are totally
ignored except when escaped or inside a character class. However, white space ignored except when escaped or inside a character class. However, white space
is not allowed within sequences such as (?> that introduce various is not allowed within sequences such as (?> that introduce various
parenthesized subpatterns, nor within numerical quantifiers such as {1,3}. parenthesized groups, nor within numerical quantifiers such as {1,3}. Ignorable
Ignorable white space is permitted between an item and a following quantifier white space is permitted between an item and a following quantifier and between
and between a quantifier and a following + that indicates possessiveness. a quantifier and a following + that indicates possessiveness. PCRE2_EXTENDED is
PCRE2_EXTENDED is equivalent to Perl's /x option, and it can be changed within equivalent to Perl's /x option, and it can be changed within a pattern by a
a pattern by a (?x) option setting. (?x) option setting.
.P .P
When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as
white space only those characters with code points less than 256 that are white space only those characters with code points less than 256 that are
@ -1547,7 +1547,7 @@ error.
.sp .sp
PCRE2_MATCH_UNSET_BACKREF PCRE2_MATCH_UNSET_BACKREF
.sp .sp
If this option is set, a backreference to an unset subpattern group matches an If this option is set, a backreference to an unset capture group matches an
empty string (by default this causes the current matching alternative to fail). empty string (by default this causes the current matching alternative to fail).
A pattern such as (\e1)(a) succeeds when this option is set (assuming it can A pattern such as (\e1)(a) succeeds when this option is set (assuming it can
find an "a" in the subject), whereas it fails by default, for Perl find an "a" in the subject), whereas it fails by default, for Perl
@ -1608,7 +1608,7 @@ If this option is set, it disables the use of numbered capturing parentheses in
the pattern. Any opening parenthesis that is not followed by ? behaves as if it the pattern. Any opening parenthesis that is not followed by ? behaves as if it
were followed by ?: but named parentheses can still be used for capturing (and were followed by ?: but named parentheses can still be used for capturing (and
they acquire numbers in the usual way). This is the same as Perl's /n option. they acquire numbers in the usual way). This is the same as Perl's /n option.
Note that, when this option is set, references to capturing groups Note that, when this option is set, references to capture groups
(backreferences or recursion/subroutine calls) may only refer to named groups, (backreferences or recursion/subroutine calls) may only refer to named groups,
though the reference can be by name or by number. though the reference can be by name or by number.
.sp .sp
@ -1627,7 +1627,7 @@ purposes.
If this option is set, it disables an optimization that is applied when .* is If this option is set, it disables an optimization that is applied when .* is
the first significant item in a top-level branch of a pattern, and all the the first significant item in a top-level branch of a pattern, and all the
other branches also start with .* or with \eA or \eG or ^. The optimization is other branches also start with .* or with \eA or \eG or ^. The optimization is
automatically disabled for .* if it is inside an atomic group or a capturing automatically disabled for .* if it is inside an atomic group or a capture
group that is the subject of a backreference, or if the pattern contains group that is the subject of a backreference, or if the pattern contains
(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is (*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
automatically anchored if PCRE2_DOTALL is set for all the .* items and automatically anchored if PCRE2_DOTALL is set for all the .* items and
@ -2025,7 +2025,7 @@ following are true:
.sp .sp
.* is not in an atomic group .* is not in an atomic group
.\" JOIN .\" JOIN
.* is not in a capturing group that is the subject .* is not in a capture group that is the subject
of a backreference of a backreference
PCRE2_DOTALL is in force for .* PCRE2_DOTALL is in force for .*
Neither (*PRUNE) nor (*SKIP) appears in the pattern Neither (*PRUNE) nor (*SKIP) appears in the pattern
@ -2037,12 +2037,12 @@ options returned for PCRE2_INFO_ALLOPTIONS.
PCRE2_INFO_BACKREFMAX PCRE2_INFO_BACKREFMAX
.sp .sp
Return the number of the highest backreference in the pattern. The third Return the number of the highest backreference in the pattern. The third
argument should point to an \fBuint32_t\fP variable. Named subpatterns acquire argument should point to an \fBuint32_t\fP variable. Named capture groups
numbers as well as names, and these count towards the highest backreference. acquire numbers as well as names, and these count towards the highest
Backreferences such as \e4 or \eg{12} match the captured characters of the backreference. Backreferences such as \e4 or \eg{12} match the captured
given group, but in addition, the check that a capturing group is set in a characters of the given group, but in addition, the check that a capture
conditional subpattern such as (?(3)a|b) is also a backreference. Zero is group is set in a conditional group such as (?(3)a|b) is also a backreference.
returned if there are no backreferences. Zero is returned if there are no backreferences.
.sp .sp
PCRE2_INFO_BSR PCRE2_INFO_BSR
.sp .sp
@ -2053,9 +2053,9 @@ that \eR matches only CR, LF, or CRLF.
.sp .sp
PCRE2_INFO_CAPTURECOUNT PCRE2_INFO_CAPTURECOUNT
.sp .sp
Return the highest capturing subpattern number in the pattern. In patterns Return the highest capture group number in the pattern. In patterns where (?|
where (?| is not used, this is also the total number of capturing subpatterns. is not used, this is also the total number of capture groups. The third
The third argument should point to an \fBuint32_t\fP variable. argument should point to an \fBuint32_t\fP variable.
.sp .sp
PCRE2_INFO_DEPTHLIMIT PCRE2_INFO_DEPTHLIMIT
.sp .sp
@ -2103,7 +2103,7 @@ Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by \fBpcre2_match()\fP backtracking positions when the pattern is processed by \fBpcre2_match()\fP
without the use of JIT. The third argument should point to a \fBsize_t\fP without the use of JIT. The third argument should point to a \fBsize_t\fP
variable. The frame size depends on the number of capturing parentheses in the variable. The frame size depends on the number of capturing parentheses in the
pattern. Each additional capturing group adds two PCRE2_SIZE variables. pattern. Each additional capture group adds two PCRE2_SIZE variables.
.sp .sp
PCRE2_INFO_HASBACKSLASHC PCRE2_INFO_HASBACKSLASHC
.sp .sp
@ -2224,11 +2224,11 @@ library, the pointer points to 32-bit code units, the first of which contains
the parenthesis number. The rest of the entry is the corresponding name, zero the parenthesis number. The rest of the entry is the corresponding name, zero
terminated. terminated.
.P .P
The names are in alphabetical order. If (?| is used to create multiple groups The names are in alphabetical order. If (?| is used to create multiple capture
with the same number, as described in the groups with the same number, as described in the
.\" HTML <a href="pcre2pattern.html#dupsubpatternnumber"> .\" HTML <a href="pcre2pattern.html#dupgroupnumber">
.\" </a> .\" </a>
section on duplicate subpattern numbers section on duplicate group numbers
.\" .\"
in the in the
.\" HREF .\" HREF
@ -2237,11 +2237,11 @@ in the
page, the groups may be given the same name, but there is only one entry in the page, the groups may be given the same name, but there is only one entry in the
table. Different names for groups of the same number are not permitted. table. Different names for groups of the same number are not permitted.
.P .P
Duplicate names for subpatterns with different numbers are permitted, but only Duplicate names for capture groups with different numbers are permitted, but
if PCRE2_DUPNAMES is set. They appear in the table in the order in which they only if PCRE2_DUPNAMES is set. They appear in the table in the order in which
were found in the pattern. In the absence of (?| this is the order of they were found in the pattern. In the absence of (?| this is the order of
increasing number; when (?| is used this is not necessarily the case because increasing number; when (?| is used this is not necessarily the case because
later subpatterns may have lower numbers. later capture groups may have lower numbers.
.P .P
As a simple example of the name/number table, consider the following pattern As a simple example of the name/number table, consider the following pattern
after compilation by the 8-bit library (assume PCRE2_EXTENDED is set, so white after compilation by the 8-bit library (assume PCRE2_EXTENDED is set, so white
@ -2251,16 +2251,16 @@ space - including newlines - is ignored):
(?<date> (?<year>(\ed\ed)?\ed\ed) - (?<date> (?<year>(\ed\ed)?\ed\ed) -
(?<month>\ed\ed) - (?<day>\ed\ed) ) (?<month>\ed\ed) - (?<day>\ed\ed) )
.sp .sp
There are four named subpatterns, so the table has four entries, and each entry There are four named capture groups, so the table has four entries, and each
in the table is eight bytes long. The table is as follows, with non-printing entry in the table is eight bytes long. The table is as follows, with
bytes shows in hexadecimal, and undefined bytes shown as ??: non-printing bytes shows in hexadecimal, and undefined bytes shown as ??:
.sp .sp
00 01 d a t e 00 ?? 00 01 d a t e 00 ??
00 05 d a y 00 ?? ?? 00 05 d a y 00 ?? ??
00 04 m o n t h 00 00 04 m o n t h 00
00 02 y e a r 00 ?? 00 02 y e a r 00 ??
.sp .sp
When writing code to extract data from named subpatterns using the When writing code to extract data from named capture groups using the
name-to-number map, remember that the length of the entries is likely to be name-to-number map, remember that the length of the entries is likely to be
different for each compiled pattern. different for each compiled pattern.
.sp .sp
@ -2740,12 +2740,12 @@ valid newline sequence and explicit \er or \en escapes appear in the pattern.
In general, a pattern matches a certain portion of the subject, and in In general, a pattern matches a certain portion of the subject, and in
addition, further substrings from the subject may be picked out by addition, further substrings from the subject may be picked out by
parenthesized parts of the pattern. Following the usage in Jeffrey Friedl's parenthesized parts of the pattern. Following the usage in Jeffrey Friedl's
book, this is called "capturing" in what follows, and the phrase "capturing book, this is called "capturing" in what follows, and the phrase "capture
subpattern" or "capturing group" is used for a fragment of a pattern that picks group" (Perl terminology) is used for a fragment of a pattern that picks out a
out a substring. PCRE2 supports several other kinds of parenthesized subpattern substring. PCRE2 supports several other kinds of parenthesized group that do
that do not cause substrings to be captured. The \fBpcre2_pattern_info()\fP not cause substrings to be captured. The \fBpcre2_pattern_info()\fP function
function can be used to find out how many capturing subpatterns there are in a can be used to find out how many capture groups there are in a compiled
compiled pattern. pattern.
.P .P
You can use auxiliary functions for accessing captured substrings You can use auxiliary functions for accessing captured substrings
.\" HTML <a href="#extractbynumber"> .\" HTML <a href="#extractbynumber">
@ -2798,30 +2798,28 @@ reported start of a successful match can be greater than the end of the match.
For example, if the pattern (?=ab\eK) is matched against "ab", the start and For example, if the pattern (?=ab\eK) is matched against "ab", the start and
end offset values for the match are 2 and 0. end offset values for the match are 2 and 0.
.P .P
If a capturing subpattern group is matched repeatedly within a single match If a capture group is matched repeatedly within a single match operation, it is
operation, it is the last portion of the subject that it matched that is the last portion of the subject that it matched that is returned.
returned.
.P .P
If the ovector is too small to hold all the captured substring offsets, as much If the ovector is too small to hold all the captured substring offsets, as much
as possible is filled in, and the function returns a value of zero. If captured as possible is filled in, and the function returns a value of zero. If captured
substrings are not of interest, \fBpcre2_match()\fP may be called with a match substrings are not of interest, \fBpcre2_match()\fP may be called with a match
data block whose ovector is of minimum length (that is, one pair). data block whose ovector is of minimum length (that is, one pair).
.P .P
It is possible for capturing subpattern number \fIn+1\fP to match some part of It is possible for capture group number \fIn+1\fP to match some part of the
the subject when subpattern \fIn\fP has not been used at all. For example, if subject when group \fIn\fP has not been used at all. For example, if the string
the string "abc" is matched against the pattern (a|(z))(bc) the return from the "abc" is matched against the pattern (a|(z))(bc) the return from the function
function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both
happens, both values in the offset pairs corresponding to unused subpatterns values in the offset pairs corresponding to unused groups are set to
are set to PCRE2_UNSET.
.P
Offset values that correspond to unused subpatterns at the end of the
expression are also set to PCRE2_UNSET. For example, if the string "abc" is
matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched.
The return from the function is 2, because the highest used capturing
subpattern number is 1. The offsets for for the second and third capturing
subpatterns (assuming the vector is large enough, of course) are set to
PCRE2_UNSET. PCRE2_UNSET.
.P .P
Offset values that correspond to unused groups at the end of the expression are
also set to PCRE2_UNSET. For example, if the string "abc" is matched against
the pattern (abc)(x(yz)?)? groups 2 and 3 are not matched. The return from the
function is 2, because the highest used capture group number is 1. The offsets
for for the second and third capture groupss (assuming the vector is large
enough, of course) are set to PCRE2_UNSET.
.P
Elements in the ovector that do not correspond to capturing parentheses in the Elements in the ovector that do not correspond to capturing parentheses in the
pattern are never changed. That is, if a pattern contains \fIn\fP capturing pattern are never changed. That is, if a pattern contains \fIn\fP capturing
parentheses, no more than \fIovector[0]\fP to \fIovector[2n+1]\fP are set by parentheses, no more than \fIovector[0]\fP to \fIovector[2n+1]\fP are set by
@ -3006,11 +3004,11 @@ as NULL.
.sp .sp
This error is returned when \fBpcre2_match()\fP detects a recursion loop within This error is returned when \fBpcre2_match()\fP detects a recursion loop within
the pattern. Specifically, it means that either the whole pattern or a the pattern. Specifically, it means that either the whole pattern or a
subpattern has been called recursively for the second time at the same position capture group has been called recursively for the second time at the same
in the subject string. Some simple patterns that might do this are detected and position in the subject string. Some simple patterns that might do this are
faulted at compile time, but more complicated cases, in particular mutual detected and faulted at compile time, but more complicated cases, in particular
recursions between two different subpatterns, cannot be detected until matching mutual recursions between two different groups, cannot be detected until
is attempted. matching is attempted.
. .
. .
.\" HTML <a name="geterrormessage"></a> .\" HTML <a name="geterrormessage"></a>
@ -3090,7 +3088,7 @@ The \fBpcre2_substring_copy_bynumber()\fP function copies a captured substring
into a supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it into a supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it
into new memory, obtained using the same memory allocation function that was into new memory, obtained using the same memory allocation function that was
used for the match data block. The first two arguments of these functions are a used for the match data block. The first two arguments of these functions are a
pointer to the match data block and a capturing group number. pointer to the match data block and a capture group number.
.P .P
The final arguments of \fBpcre2_substring_copy_bynumber()\fP are a pointer to The final arguments of \fBpcre2_substring_copy_bynumber()\fP are a pointer to
the buffer and a pointer to a variable that contains its length in code units. the buffer and a pointer to a variable that contains its length in code units.
@ -3162,9 +3160,9 @@ could not be obtained. When the list is no longer needed, it should be freed by
calling \fBpcre2_substring_list_free()\fP. calling \fBpcre2_substring_list_free()\fP.
.P .P
If this function encounters a substring that is unset, which can happen when If this function encounters a substring that is unset, which can happen when
capturing subpattern number \fIn+1\fP matches some part of the subject, but capture group number \fIn+1\fP matches some part of the subject, but group
subpattern \fIn\fP has not been used at all, it returns an empty string. This \fIn\fP has not been used at all, it returns an empty string. This can be
can be distinguished from a genuine zero-length substring by inspecting the distinguished from a genuine zero-length substring by inspecting the
appropriate offset in the ovector, which contain PCRE2_UNSET for unset appropriate offset in the ovector, which contain PCRE2_UNSET for unset
substrings, or by calling \fBpcre2_substring_length_bynumber()\fP. substrings, or by calling \fBpcre2_substring_length_bynumber()\fP.
. .
@ -3194,20 +3192,20 @@ For example, for this pattern:
.sp .sp
(a+)b(?<xxx>\ed+)... (a+)b(?<xxx>\ed+)...
.sp .sp
the number of the subpattern called "xxx" is 2. If the name is known to be the number of the capture group called "xxx" is 2. If the name is known to be
unique (PCRE2_DUPNAMES was not set), you can find the number from the name by unique (PCRE2_DUPNAMES was not set), you can find the number from the name by
calling \fBpcre2_substring_number_from_name()\fP. The first argument is the calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
compiled pattern, and the second is the name. The yield of the function is the compiled pattern, and the second is the name. The yield of the function is the
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that group number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name.
that name. Given the number, you can extract the substring directly from the Given the number, you can extract the substring directly from the ovector, or
ovector, or use one of the "bynumber" functions described above. use one of the "bynumber" functions described above.
.P .P
For convenience, there are also "byname" functions that correspond to the For convenience, there are also "byname" functions that correspond to the
"bynumber" functions, the only difference being that the second argument is a "bynumber" functions, the only difference being that the second argument is a
name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
names, these functions scan all the groups with the given name, and return the names, these functions scan all the groups with the given name, and return the
first named string that is set. captured substring from the first named group that is set.
.P .P
If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
returned. If all groups with the name have numbers that are greater than the returned. If all groups with the name have numbers that are greater than the
@ -3216,18 +3214,18 @@ is at least one group with a slot in the ovector, but no group is found to be
set, PCRE2_ERROR_UNSET is returned. set, PCRE2_ERROR_UNSET is returned.
.P .P
\fBWarning:\fP If the pattern uses the (?| feature to set up multiple \fBWarning:\fP If the pattern uses the (?| feature to set up multiple
subpatterns with the same number, as described in the capture groups with the same number, as described in the
.\" HTML <a href="pcre2pattern.html#dupsubpatternnumber"> .\" HTML <a href="pcre2pattern.html#dupgroupnumber">
.\" </a> .\" </a>
section on duplicate subpattern numbers section on duplicate group numbers
.\" .\"
in the in the
.\" HREF .\" HREF
\fBpcre2pattern\fP \fBpcre2pattern\fP
.\" .\"
page, you cannot use names to distinguish the different subpatterns, because page, you cannot use names to distinguish the different capture groups, because
names are not included in the compiled code. The matching process uses only names are not included in the compiled code. The matching process uses only
numbers. For this reason, the use of different names for subpatterns of the numbers. For this reason, the use of different names for groups with the
same number causes an error at compile time. same number causes an error at compile time.
. .
. .
@ -3288,7 +3286,7 @@ length is in code units, not bytes.
In the replacement string, which is interpreted as a UTF string in UTF mode, In the replacement string, which is interpreted as a UTF string in UTF mode,
and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
dollar character is an escape character that can specify the insertion of dollar character is an escape character that can specify the insertion of
characters from capturing groups or names from (*MARK) or other control verbs characters from capture groups or names from (*MARK) or other control verbs
in the pattern. The following forms are always recognized: in the pattern. The following forms are always recognized:
.sp .sp
$$ insert a dollar character $$ insert a dollar character
@ -3351,12 +3349,12 @@ operation is carried out twice. Depending on the application, it may be more
efficient to allocate a large buffer and free the excess afterwards, instead of efficient to allocate a large buffer and free the excess afterwards, instead of
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH. using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
.P .P
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capturing groups that do PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do
not appear in the pattern to be treated as unset groups. This option should be not appear in the pattern to be treated as unset groups. This option should be
used with care, because it means that a typo in a group name or number no used with care, because it means that a typo in a group name or number no
longer causes the PCRE2_ERROR_NOSUBSTRING error. longer causes the PCRE2_ERROR_NOSUBSTRING error.
.P .P
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups (including unknown PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including unknown
groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty
strings when inserted as described above. If this option is not set, an attempt strings when inserted as described above. If this option is not set, an attempt
to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does
@ -3381,14 +3379,15 @@ terminating a \eQ quoted sequence) reverts to no case forcing. The sequences
\eu and \el force the next character (if it is a letter) to upper or lower \eu and \el force the next character (if it is a letter) to upper or lower
case, respectively, and then the state automatically reverts to no case case, respectively, and then the state automatically reverts to no case
forcing. Case forcing applies to all inserted characters, including those from forcing. Case forcing applies to all inserted characters, including those from
captured groups and letters within \eQ...\eE quoted sequences. capture groups and letters within \eQ...\eE quoted sequences.
.P .P
Note that case forcing sequences such as \eU...\eE do not nest. For example, Note that case forcing sequences such as \eU...\eE do not nest. For example,
the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
effect. effect.
.P .P
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
flexibility to group substitution. The syntax is similar to that used by Bash: flexibility to capture group substitution. The syntax is similar to that used
by Bash:
.sp .sp
${<n>:-<string>} ${<n>:-<string>}
${<n>:+<string1>:<string2>} ${<n>:+<string1>:<string2>}
@ -3510,7 +3509,7 @@ output and the call to \fBpcre2_substitute()\fP exits, returning the number of
matches so far. matches so far.
. .
. .
.SH "DUPLICATE SUBPATTERN NAMES" .SH "DUPLICATE CAPTURE GROUP NAMES"
.rs .rs
.sp .sp
.nf .nf
@ -3518,13 +3517,14 @@ matches so far.
.B " PCRE2_SPTR \fIname\fP, PCRE2_SPTR *\fIfirst\fP, PCRE2_SPTR *\fIlast\fP);" .B " PCRE2_SPTR \fIname\fP, PCRE2_SPTR *\fIfirst\fP, PCRE2_SPTR *\fIlast\fP);"
.fi .fi
.P .P
When a pattern is compiled with the PCRE2_DUPNAMES option, names for When a pattern is compiled with the PCRE2_DUPNAMES option, names for capture
subpatterns are not required to be unique. Duplicate names are always allowed groups are not required to be unique. Duplicate names are always allowed for
for subpatterns with the same number, created by using the (?| feature. Indeed, groups with the same number, created by using the (?| feature. Indeed, if such
if such subpatterns are named, they are required to use the same names. groups are named, they are required to use the same names.
.P .P
Normally, patterns with duplicate names are such that in any one match, only Normally, patterns that use duplicate names are such that in any one match,
one of the named subpatterns participates. An example is shown in the only one of each set of identically-named groups participates. An example is
shown in the
.\" HREF .\" HREF
\fBpcre2pattern\fP \fBpcre2pattern\fP
.\" .\"
@ -3705,9 +3705,8 @@ the three matched strings are
On success, the yield of the function is a number greater than zero, which is On success, the yield of the function is a number greater than zero, which is
the number of matched substrings. The offsets of the substrings are returned in the number of matched substrings. The offsets of the substrings are returned in
the ovector, and can be extracted by number in the same way as for the ovector, and can be extracted by number in the same way as for
\fBpcre2_match()\fP, but the numbers bear no relation to any capturing groups \fBpcre2_match()\fP, but the numbers bear no relation to any capture groups
that may exist in the pattern, because DFA matching does not support group that may exist in the pattern, because DFA matching does not support capturing.
capture.
.P .P
Calls to the convenience functions that extract substrings by name Calls to the convenience functions that extract substrings by name
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
@ -3749,7 +3748,7 @@ a backreference.
.sp .sp
This return is given if \fBpcre2_dfa_match()\fP encounters a condition item This return is given if \fBpcre2_dfa_match()\fP encounters a condition item
that uses a backreference for the condition, or a test for recursion in a that uses a backreference for the condition, or a test for recursion in a
specific group. These are not supported. specific capture group. These are not supported.
.sp .sp
PCRE2_ERROR_DFA_WSSIZE PCRE2_ERROR_DFA_WSSIZE
.sp .sp
@ -3758,9 +3757,9 @@ This return is given if \fBpcre2_dfa_match()\fP runs out of space in the
.sp .sp
PCRE2_ERROR_DFA_RECURSE PCRE2_ERROR_DFA_RECURSE
.sp .sp
When a recursive subpattern is processed, the matching function calls itself When a recursion or subroutine call is processed, the matching function calls
recursively, using private memory for the ovector and \fIworkspace\fP. This itself recursively, using private memory for the ovector and \fIworkspace\fP.
error is given if the internal ovector is not large enough. This should be This error is given if the internal ovector is not large enough. This should be
extremely rare, as a vector of size 1000 is used. extremely rare, as a vector of size 1000 is used.
.sp .sp
PCRE2_ERROR_DFA_BADRESTART PCRE2_ERROR_DFA_BADRESTART
@ -3793,6 +3792,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 04 January 2019 Last updated: 04 February 2019
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2CALLOUT 3 "17 September 2018" "PCRE2 10.33" .TH PCRE2CALLOUT 3 "03 February 2019" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -137,7 +137,7 @@ start only after an internal newline or at the beginning of the subject, and
branch, automatic anchoring occurs if all branches are anchorable. branch, automatic anchoring occurs if all branches are anchorable.
.P .P
This optimization is disabled, however, if .* is in an atomic group or if there This optimization is disabled, however, if .* is in an atomic group or if there
is a backreference to the capturing group in which it appears. It is also is a backreference to the capture group in which it appears. It is also
disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
callouts does not affect it. callouts does not affect it.
.P .P
@ -331,8 +331,8 @@ callout before an assertion such as (?=ab) the length is 3. For an an
alternation bar or a closing parenthesis, the length is one, unless a closing alternation bar or a closing parenthesis, the length is one, unless a closing
parenthesis is followed by a quantifier, in which case its length is included. parenthesis is followed by a quantifier, in which case its length is included.
(This changed in release 10.23. In earlier releases, before an opening (This changed in release 10.23. In earlier releases, before an opening
parenthesis the length was that of the entire subpattern, and before an parenthesis the length was that of the entire group, and before an alternation
alternation bar or a closing parenthesis the length was zero.) bar or a closing parenthesis the length was zero.)
.P .P
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
help in distinguishing between different automatic callouts, which all have the help in distinguishing between different automatic callouts, which all have the
@ -452,6 +452,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 17 September 2018 Last updated: 03 February 2019
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2COMPAT 3 "28 July 2018" "PCRE2 10.32" .TH PCRE2COMPAT 3 "03 February 2019" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL" .SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -23,10 +23,9 @@ character is not "a" three times (in principle; PCRE2 optimizes this to run the
assertion just once). Perl allows some repeat quantifiers on other assertions, assertion just once). Perl allows some repeat quantifiers on other assertions,
for example, \eb* (but not \eb{3}), but these do not seem to have any use. for example, \eb* (but not \eb{3}), but these do not seem to have any use.
.P .P
3. Capturing subpatterns that occur inside negative lookaround assertions are 3. Capture groups that occur inside negative lookaround assertions are counted,
counted, but their entries in the offsets vector are set only when a negative but their entries in the offsets vector are set only when a negative assertion
assertion is a condition that has a matching branch (that is, the condition is is a condition that has a matching branch (that is, the condition is false).
false).
.P .P
4. The following Perl escape sequences are not supported: \eF, \el, \eL, \eu, 4. The following Perl escape sequences are not supported: \eF, \el, \eL, \eu,
\eU, and \eN when followed by a character name. \eN on its own, matching a \eU, and \eN when followed by a character name. \eN on its own, matching a
@ -79,13 +78,13 @@ documentation for details.
to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
into subroutine calls is now supported, as in Perl. into subroutine calls is now supported, as in Perl.
.P .P
9. If any of the backtracking control verbs are used in a subpattern that is 9. If any of the backtracking control verbs are used in a group that is called
called as a subroutine (whether or not recursively), their effect is confined as a subroutine (whether or not recursively), their effect is confined to that
to that subpattern; it does not extend to the surrounding pattern. This is not group; it does not extend to the surrounding pattern. This is not always the
always the case in Perl. In particular, if (*THEN) is present in a group that case in Perl. In particular, if (*THEN) is present in a group that is called as
is called as a subroutine, its action is limited to that group, even if the a subroutine, its action is limited to that group, even if the group does not
group does not contain any | characters. Note that such subpatterns are contain any | characters. Note that such groups are processed as anchored
processed as anchored at the point where they are tested. at the point where they are tested.
.P .P
10. If a pattern contains more than one backtracking control verb, the first 10. If a pattern contains more than one backtracking control verb, the first
one that is backtracked onto acts. For example, in the pattern one that is backtracked onto acts. For example, in the pattern
@ -101,21 +100,20 @@ strings when part of a pattern is repeated. For example, matching "aba" against
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
"b". "b".
.P .P
13. PCRE2's handling of duplicate subpattern numbers and duplicate subpattern 13. PCRE2's handling of duplicate capture group numbers and names is not as
names is not as general as Perl's. This is a consequence of the fact the PCRE2 general as Perl's. This is a consequence of the fact the PCRE2 works internally
works internally just with numbers, using an external table to translate just with numbers, using an external table to translate between numbers and
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b>B), names. In particular, a pattern such as (?|(?<a>A)|(?<b>B), where the two
where the two capturing parentheses have the same number but different names, capture groups have the same number but different names, is not supported, and
is not supported, and causes an error at compile time. If it were allowed, it causes an error at compile time. If it were allowed, it would not be possible
would not be possible to distinguish which parentheses matched, because both to distinguish which group matched, because both names map to capture group
names map to capturing subpattern number 1. To avoid this confusing situation, number 1. To avoid this confusing situation, an error is given at compile time.
an error is given at compile time.
.P .P
14. Perl used to recognize comments in some places that PCRE2 does not, for 14. Perl used to recognize comments in some places that PCRE2 does not, for
example, between the ( and ? at the start of a subpattern. If the /x modifier example, between the ( and ? at the start of a group. If the /x modifier is
is set, Perl allowed white space between ( and ? though the latest Perls give set, Perl allowed white space between ( and ? though the latest Perls give an
an error (for a while it was just deprecated). There may still be some cases error (for a while it was just deprecated). There may still be some cases where
where Perl behaves differently. Perl behaves differently.
.P .P
15. Perl, when in warning mode, gives warnings for character classes such as 15. Perl, when in warning mode, gives warnings for character classes such as
[A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no [A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
@ -200,6 +198,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 28 July 2018 Last updated: 03 February 2019
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2LIMITS 3 "30 March 2017" "PCRE2 10.30" .TH PCRE2LIMITS 3 "03 February 2019" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "SIZE AND OTHER LIMITATIONS" .SH "SIZE AND OTHER LIMITATIONS"
@ -34,16 +34,16 @@ All values in repeating quantifiers must be less than 65536.
.P .P
The maximum length of a lookbehind assertion is 65535 characters. The maximum length of a lookbehind assertion is 65535 characters.
.P .P
There is no limit to the number of parenthesized subpatterns, but there can be There is no limit to the number of parenthesized groups, but there can be no
no more than 65535 capturing subpatterns. There is, however, a limit to the more than 65535 capture groups, and there is a limit to the depth of nesting of
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in parenthesized subpatterns of all kinds. This is imposed in order to limit the
order to limit the amount of system stack used at compile time. The default amount of system stack used at compile time. The default limit can be specified
limit can be specified when PCRE2 is built; if not, the default is set to 250. when PCRE2 is built; if not, the default is set to 250. An application can
An application can change this limit by calling pcre2_set_parens_nest_limit() change this limit by calling pcre2_set_parens_nest_limit() to set the limit in
to set the limit in a compile context. a compile context.
.P .P
The maximum length of name for a named subpattern is 32 code units, and the The maximum length of name for a named capture group is 32 code units, and the
maximum number of named subpatterns is 10000. maximum number of such groups is 10000.
.P .P
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
is 255 code units for the 8-bit library and 65535 code units for the 16-bit and is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
@ -67,6 +67,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 30 March 2017 Last updated: 02 February 2019
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
.TH PCRE2PERFORM 3 "25 April 2018" "PCRE2 10.32" .TH PCRE2PERFORM 3 "03 February 2019" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 PERFORMANCE" .SH "PCRE2 PERFORMANCE"
@ -14,9 +14,9 @@ of them.
Patterns are compiled by PCRE2 into a reasonably efficient interpretive code, Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
so that most simple patterns do not use much memory for storing the compiled so that most simple patterns do not use much memory for storing the compiled
version. However, there is one case where the memory usage of a compiled version. However, there is one case where the memory usage of a compiled
pattern can be unexpectedly large. If a parenthesized subpattern has a pattern can be unexpectedly large. If a parenthesized group has a quantifier
quantifier with a minimum greater than 1 and/or a limited maximum, the whole with a minimum greater than 1 and/or a limited maximum, the whole group is
subpattern is repeated in the compiled code. For example, the pattern repeated in the compiled code. For example, the pattern
.sp .sp
(abc|def){2,4} (abc|def){2,4}
.sp .sp
@ -239,6 +239,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 25 April 2018 Last updated: 03 February 2019
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2SYNTAX 3 "10 October 2018" "PCRE2 10.33" .TH PCRE2SYNTAX 3 "03 February 2019" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -398,20 +398,24 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
.SH "CAPTURING" .SH "CAPTURING"
.rs .rs
.sp .sp
(...) capturing group (...) capture group
(?<name>...) named capturing group (Perl) (?<name>...) named capture group (Perl)
(?'name'...) named capturing group (Perl) (?'name'...) named capture group (Perl)
(?P<name>...) named capturing group (Python) (?P<name>...) named capture group (Python)
(?:...) non-capturing group (?:...) non-capture group
(?|...) non-capturing group; reset group numbers for (?|...) non-capture group; reset group numbers for
capturing groups in each alternative capture groups in each alternative
.sp
In non-UTF modes, names may contain underscores and ASCII letters and digits;
in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
both cases, a name must not start with a digit.
. .
. .
.SH "ATOMIC GROUPS" .SH "ATOMIC GROUPS"
.rs .rs
.sp .sp
(?>...) atomic, non-capturing group (?>...) atomic non-capture group
(*atomic:...) atomic, non-capturing group (*atomic:...) atomic non-capture group
. .
. .
.SH "COMMENT" .SH "COMMENT"
@ -439,7 +443,7 @@ of the group.
Unsetting x or xx unsets both. Several options may be set at once, and a Unsetting x or xx unsets both. Several options may be set at once, and a
mixture of setting and unsetting such as (?i-x) is allowed, but there may be mixture of setting and unsetting such as (?i-x) is allowed, but there may be
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
(?^in). An option setting may appear at the start of a non-capturing group, for (?^in). An option setting may appear at the start of a non-capture group, for
example (?i:...). example (?i:...).
.P .P
The following are recognized only at the very start of a pattern or after one The following are recognized only at the very start of a pattern or after one
@ -542,19 +546,19 @@ Each top-level branch of a lookbehind must be of a fixed length.
.rs .rs
.sp .sp
(?R) recurse whole pattern (?R) recurse whole pattern
(?n) call subpattern by absolute number (?n) call subroutine by absolute number
(?+n) call subpattern by relative number (?+n) call subroutine by relative number
(?-n) call subpattern by relative number (?-n) call subroutine by relative number
(?&name) call subpattern by name (Perl) (?&name) call subroutine by name (Perl)
(?P>name) call subpattern by name (Python) (?P>name) call subroutine by name (Python)
\eg<name> call subpattern by name (Oniguruma) \eg<name> call subroutine by name (Oniguruma)
\eg'name' call subpattern by name (Oniguruma) \eg'name' call subroutine by name (Oniguruma)
\eg<n> call subpattern by absolute number (Oniguruma) \eg<n> call subroutine by absolute number (Oniguruma)
\eg'n' call subpattern by absolute number (Oniguruma) \eg'n' call subroutine by absolute number (Oniguruma)
\eg<+n> call subpattern by relative number (PCRE2 extension) \eg<+n> call subroutine by relative number (PCRE2 extension)
\eg'+n' call subpattern by relative number (PCRE2 extension) \eg'+n' call subroutine by relative number (PCRE2 extension)
\eg<-n> call subpattern by relative number (PCRE2 extension) \eg<-n> call subroutine by relative number (PCRE2 extension)
\eg'-n' call subpattern by relative number (PCRE2 extension) \eg'-n' call subroutine by relative number (PCRE2 extension)
. .
. .
.SH "CONDITIONAL PATTERNS" .SH "CONDITIONAL PATTERNS"
@ -572,7 +576,7 @@ Each top-level branch of a lookbehind must be of a fixed length.
(?(R) overall recursion condition (?(R) overall recursion condition
(?(Rn) specific numbered group recursion condition (?(Rn) specific numbered group recursion condition
(?(R&name) specific named group recursion condition (?(R&name) specific named group recursion condition
(?(DEFINE) define subpattern for reference (?(DEFINE) define groups for reference
(?(VERSION[>]=n.m) test PCRE2 version (?(VERSION[>]=n.m) test PCRE2 version
(?(assert) assertion condition (?(assert) assertion condition
.sp .sp
@ -643,6 +647,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 10 October 2018 Last updated: 03 February 2019
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "12 November 2018" "PCRE 10.33" .TH PCRE2TEST 1 "03 February 2019" "PCRE 10.33"
.SH NAME .SH NAME
pcre2test - a program for testing Perl-compatible regular expressions. pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
@ -672,14 +672,14 @@ information is obtained from the \fBpcre2_pattern_info()\fP function. Here are
some typical examples: some typical examples:
.sp .sp
re> /(?i)(^a|^b)/m,info re> /(?i)(^a|^b)/m,info
Capturing subpattern count = 1 Capture group count = 1
Compile options: multiline Compile options: multiline
Overall options: caseless multiline Overall options: caseless multiline
First code unit at start or follows newline First code unit at start or follows newline
Subject length lower bound = 1 Subject length lower bound = 1
.sp .sp
re> /(?i)abc/info re> /(?i)abc/info
Capturing subpattern count = 0 Capture group count = 0
Compile options: <none> Compile options: <none>
Overall options: caseless Overall options: caseless
First code unit = 'a' (caseless) First code unit = 'a' (caseless)
@ -1325,8 +1325,8 @@ current character is CR followed by LF, an advance of two characters occurs.
.sp .sp
The \fBcopy\fP and \fBget\fP modifiers can be used to test the The \fBcopy\fP and \fBget\fP modifiers can be used to test the
\fBpcre2_substring_copy_xxx()\fP and \fBpcre2_substring_get_xxx()\fP functions. \fBpcre2_substring_copy_xxx()\fP and \fBpcre2_substring_get_xxx()\fP functions.
They can be given more than once, and each can specify a group name or number, They can be given more than once, and each can specify a capture group name or
for example: number, for example:
.sp .sp
abcd\e=copy=1,copy=3,get=G1 abcd\e=copy=1,copy=3,get=G1
.sp .sp
@ -2056,6 +2056,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 12 November 2018 Last updated: 03 February 2019
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

View File

@ -646,14 +646,14 @@ PATTERN MODIFIERS
are some typical examples: are some typical examples:
re> /(?i)(^a|^b)/m,info re> /(?i)(^a|^b)/m,info
Capturing subpattern count = 1 Capture group count = 1
Compile options: multiline Compile options: multiline
Overall options: caseless multiline Overall options: caseless multiline
First code unit at start or follows newline First code unit at start or follows newline
Subject length lower bound = 1 Subject length lower bound = 1
re> /(?i)abc/info re> /(?i)abc/info
Capturing subpattern count = 0 Capture group count = 0
Compile options: <none> Compile options: <none>
Overall options: caseless Overall options: caseless
First code unit = 'a' (caseless) First code unit = 'a' (caseless)
@ -1214,8 +1214,8 @@ SUBJECT MODIFIERS
The copy and get modifiers can be used to test the pcre2_sub- The copy and get modifiers can be used to test the pcre2_sub-
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
given more than once, and each can specify a group name or number, for given more than once, and each can specify a capture group name or num-
example: ber, for example:
abcd\=copy=1,copy=3,get=G1 abcd\=copy=1,copy=3,get=G1
@ -1887,5 +1887,5 @@ AUTHOR
REVISION REVISION
Last updated: 12 November 2018 Last updated: 03 February 2019
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.

View File

@ -1,4 +1,4 @@
.TH PCRE2UNICODE 3 "12 October 2018" "PCRE2 10.33" .TH PCRE2UNICODE 3 "03 February 2019" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE - Perl-compatible regular expressions (revised API) PCRE - Perl-compatible regular expressions (revised API)
.SH "UNICODE AND UTF SUPPORT" .SH "UNICODE AND UTF SUPPORT"
@ -27,10 +27,11 @@ case the library will be smaller.
.rs .rs
.sp .sp
When PCRE2 is built with Unicode support, the escape sequences \ep{..}, When PCRE2 is built with Unicode support, the escape sequences \ep{..},
\eP{..}, and \eX can be used. The Unicode properties that can be tested are \eP{..}, and \eX can be used. This is not dependent on the PCRE2_UTF setting.
limited to the general category properties such as Lu for an upper case letter The Unicode properties that can be tested are limited to the general category
or Nd for a decimal number, the Unicode script names such as Arabic or Han, and properties such as Lu for an upper case letter or Nd for a decimal number, the
the derived properties Any and L&. Full lists are given in the Unicode script names such as Arabic or Han, and the derived properties Any and
L&. Full lists are given in the
.\" HREF .\" HREF
\fBpcre2pattern\fP \fBpcre2pattern\fP
.\" .\"
@ -62,13 +63,18 @@ individual code units.
In UTF modes, the dot metacharacter matches one UTF character instead of a In UTF modes, the dot metacharacter matches one UTF character instead of a
single code unit. single code unit.
.P .P
In UTF modes, capture group names are not restricted to ASCII, and may contain
any Unicode letters and decimal digits, as well as underscore.
.P
The escape sequence \eC can be used to match a single code unit in a UTF mode, The escape sequence \eC can be used to match a single code unit in a UTF mode,
but its use can lead to some strange effects because it breaks up multi-unit but its use can lead to some strange effects because it breaks up multi-unit
characters (see the description of \eC in the characters (see the description of \eC in the
.\" HREF .\" HREF
\fBpcre2pattern\fP \fBpcre2pattern\fP
.\" .\"
documentation). documentation). For this reason, there is a build-time option that disables
support for \eC completely. There is also a less draconian compile-time option
for locking out the use of \eC when a pattern is compiled.
.P .P
The use of \eC is not supported by the alternative matching function The use of \eC is not supported by the alternative matching function
\fBpcre2_dfa_match()\fP when in UTF-8 or UTF-16 mode, that is, when a character \fBpcre2_dfa_match()\fP when in UTF-8 or UTF-16 mode, that is, when a character
@ -387,6 +393,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 12 October 2018 Last updated: 03 February 2019
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2019 University of Cambridge.
.fi .fi

View File

@ -2194,6 +2194,7 @@ so it is simplest just to return both.
Arguments: Arguments:
ptrptr points to the character pointer variable ptrptr points to the character pointer variable
ptrend points to the end of the input string ptrend points to the end of the input string
utf true if the input is UTF-encoded
terminator the terminator of a subpattern name must be this terminator the terminator of a subpattern name must be this
offsetptr where to put the offset from the start of the pattern offsetptr where to put the offset from the start of the pattern
nameptr where to put a pointer to the name in the input nameptr where to put a pointer to the name in the input
@ -2206,13 +2207,12 @@ Returns: TRUE if a name was read
*/ */
static BOOL static BOOL
read_name(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t terminator, read_name(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, BOOL utf, uint32_t terminator,
PCRE2_SIZE *offsetptr, PCRE2_SPTR *nameptr, uint32_t *namelenptr, PCRE2_SIZE *offsetptr, PCRE2_SPTR *nameptr, uint32_t *namelenptr,
int *errorcodeptr, compile_block *cb) int *errorcodeptr, compile_block *cb)
{ {
PCRE2_SPTR ptr = *ptrptr; PCRE2_SPTR ptr = *ptrptr;
BOOL is_group = (*ptr != CHAR_ASTERISK); BOOL is_group = (*ptr != CHAR_ASTERISK);
uint32_t namelen = 0;
if (++ptr >= ptrend) /* No characters in name */ if (++ptr >= ptrend) /* No characters in name */
{ {
@ -2221,35 +2221,74 @@ if (++ptr >= ptrend) /* No characters in name */
goto FAILED; goto FAILED;
} }
/* A group name must not start with a digit. If either of the others start with
a digit it just won't be recognized. */
if (is_group && IS_DIGIT(*ptr))
{
*errorcodeptr = ERR44;
goto FAILED;
}
*nameptr = ptr; *nameptr = ptr;
*offsetptr = (PCRE2_SIZE)(ptr - cb->start_pattern); *offsetptr = (PCRE2_SIZE)(ptr - cb->start_pattern);
while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype_word) != 0) /* In UTF mode, a group name may contain letters and decimal digits as defined
by Unicode properties, and underscores, but must not start with a digit. */
#ifdef SUPPORT_UNICODE
if (utf && is_group)
{ {
ptr++; uint32_t c, type;
namelen++;
if (namelen > MAX_NAME_SIZE) GETCHAR(c, ptr);
type = UCD_CHARTYPE(c);
if (type == ucp_Nd)
{ {
*errorcodeptr = ERR48; *errorcodeptr = ERR44;
goto FAILED; goto FAILED;
} }
for(;;)
{
if (type != ucp_Nd && PRIV(ucp_gentype)[type] != ucp_L &&
c != CHAR_UNDERSCORE) break;
ptr++;
FORWARDCHAR(ptr);
if (ptr >= ptrend) break;
GETCHAR(c, ptr);
type = UCD_CHARTYPE(c);
}
} }
else
#else
(void)utf; /* Avoid compiler warning */
#endif /* SUPPORT_UNICODE */
/* Handle non-group names and group names in non-UTF modes. A group name must
not start with a digit. If either of the others start with a digit it just
won't be recognized. */
{
if (is_group && IS_DIGIT(*ptr))
{
*errorcodeptr = ERR44;
goto FAILED;
}
while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype_word) != 0)
{
ptr++;
}
}
/* Check name length */
if (ptr > *nameptr + MAX_NAME_SIZE)
{
*errorcodeptr = ERR48;
goto FAILED;
}
*namelenptr = ptr - *nameptr;
/* Subpattern names must not be empty, and their terminator is checked here. /* Subpattern names must not be empty, and their terminator is checked here.
(What follows a verb or alpha assertion name is checked separately.) */ (What follows a verb or alpha assertion name is checked separately.) */
if (is_group) if (is_group)
{ {
if (namelen == 0) if (ptr == *nameptr)
{ {
*errorcodeptr = ERR62; /* Subpattern name expected */ *errorcodeptr = ERR62; /* Subpattern name expected */
goto FAILED; goto FAILED;
@ -2262,7 +2301,6 @@ if (is_group)
ptr++; ptr++;
} }
*namelenptr = namelen;
*ptrptr = ptr; *ptrptr = ptr;
return TRUE; return TRUE;
@ -2981,7 +3019,7 @@ while (ptr < ptrend)
/* Not a numerical recursion */ /* Not a numerical recursion */
if (!read_name(&ptr, ptrend, terminator, &offset, &name, &namelen, if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen,
&errorcode, cb)) goto ESCAPE_FAILED; &errorcode, cb)) goto ESCAPE_FAILED;
/* \k and \g when used with braces are back references, whereas \g used /* \k and \g when used with braces are back references, whereas \g used
@ -3554,8 +3592,8 @@ while (ptr < ptrend)
uint32_t meta; uint32_t meta;
vn = alasnames; vn = alasnames;
if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode, if (!read_name(&ptr, ptrend, utf, 0, &offset, &name, &namelen,
cb)) goto FAILED; &errorcode, cb)) goto FAILED;
if (ptr >= ptrend || *ptr != CHAR_COLON) if (ptr >= ptrend || *ptr != CHAR_COLON)
{ {
errorcode = ERR95; /* Malformed */ errorcode = ERR95; /* Malformed */
@ -3651,8 +3689,8 @@ while (ptr < ptrend)
else else
{ {
vn = verbnames; vn = verbnames;
if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode, if (!read_name(&ptr, ptrend, utf, 0, &offset, &name, &namelen,
cb)) goto FAILED; &errorcode, cb)) goto FAILED;
if (ptr >= ptrend || (*ptr != CHAR_COLON && if (ptr >= ptrend || (*ptr != CHAR_COLON &&
*ptr != CHAR_RIGHT_PARENTHESIS)) *ptr != CHAR_RIGHT_PARENTHESIS))
{ {
@ -3907,7 +3945,7 @@ while (ptr < ptrend)
errorcode = ERR41; errorcode = ERR41;
goto FAILED; goto FAILED;
} }
if (!read_name(&ptr, ptrend, CHAR_RIGHT_PARENTHESIS, &offset, &name, if (!read_name(&ptr, ptrend, utf, CHAR_RIGHT_PARENTHESIS, &offset, &name,
&namelen, &errorcode, cb)) goto FAILED; &namelen, &errorcode, cb)) goto FAILED;
*parsed_pattern++ = META_BACKREF_BYNAME; *parsed_pattern++ = META_BACKREF_BYNAME;
*parsed_pattern++ = namelen; *parsed_pattern++ = namelen;
@ -3967,7 +4005,7 @@ while (ptr < ptrend)
case CHAR_AMPERSAND: case CHAR_AMPERSAND:
RECURSE_BY_NAME: RECURSE_BY_NAME:
if (!read_name(&ptr, ptrend, CHAR_RIGHT_PARENTHESIS, &offset, &name, if (!read_name(&ptr, ptrend, utf, CHAR_RIGHT_PARENTHESIS, &offset, &name,
&namelen, &errorcode, cb)) goto FAILED; &namelen, &errorcode, cb)) goto FAILED;
*parsed_pattern++ = META_RECURSE_BYNAME; *parsed_pattern++ = META_RECURSE_BYNAME;
*parsed_pattern++ = namelen; *parsed_pattern++ = namelen;
@ -4215,7 +4253,7 @@ while (ptr < ptrend)
terminator = CHAR_RIGHT_PARENTHESIS; terminator = CHAR_RIGHT_PARENTHESIS;
ptr--; /* Point to char before name */ ptr--; /* Point to char before name */
} }
if (!read_name(&ptr, ptrend, terminator, &offset, &name, &namelen, if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen,
&errorcode, cb)) goto FAILED; &errorcode, cb)) goto FAILED;
/* Handle (?(R&name) */ /* Handle (?(R&name) */
@ -4349,7 +4387,7 @@ while (ptr < ptrend)
terminator = CHAR_APOSTROPHE; /* Terminator */ terminator = CHAR_APOSTROPHE; /* Terminator */
DEFINE_NAME: DEFINE_NAME:
if (!read_name(&ptr, ptrend, terminator, &offset, &name, &namelen, if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen,
&errorcode, cb)) goto FAILED; &errorcode, cb)) goto FAILED;
/* We have a name for this capturing group. It is also assigned a number, /* We have a name for this capturing group. It is also assigned a number,

View File

@ -95,7 +95,7 @@ static const unsigned char compile_error_texts[] =
/* 25 */ /* 25 */
"lookbehind assertion is not fixed length\0" "lookbehind assertion is not fixed length\0"
"a relative value of zero is not allowed\0" "a relative value of zero is not allowed\0"
"conditional group contains more than two branches\0" "conditional subpattern contains more than two branches\0"
"assertion expected after (?( or (?(?C)\0" "assertion expected after (?( or (?(?C)\0"
"digit expected after (?+ or (?-\0" "digit expected after (?+ or (?-\0"
/* 30 */ /* 30 */
@ -113,21 +113,21 @@ static const unsigned char compile_error_texts[] =
/* 40 */ /* 40 */
"invalid escape sequence in (*VERB) name\0" "invalid escape sequence in (*VERB) name\0"
"unrecognized character after (?P\0" "unrecognized character after (?P\0"
"syntax error in subpattern name (missing terminator)\0" "syntax error in subpattern name (missing terminator?)\0"
"two named subpatterns have the same name (PCRE2_DUPNAMES not set)\0" "two named subpatterns have the same name (PCRE2_DUPNAMES not set)\0"
"group name must start with a non-digit\0" "subpattern name must start with a non-digit\0"
/* 45 */ /* 45 */
"this version of PCRE2 does not have support for \\P, \\p, or \\X\0" "this version of PCRE2 does not have support for \\P, \\p, or \\X\0"
"malformed \\P or \\p sequence\0" "malformed \\P or \\p sequence\0"
"unknown property name after \\P or \\p\0" "unknown property name after \\P or \\p\0"
"subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " characters)\0" "subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " code units)\0"
"too many named subpatterns (maximum " XSTRING(MAX_NAME_COUNT) ")\0" "too many named subpatterns (maximum " XSTRING(MAX_NAME_COUNT) ")\0"
/* 50 */ /* 50 */
"invalid range in character class\0" "invalid range in character class\0"
"octal value is greater than \\377 in 8-bit non-UTF-8 mode\0" "octal value is greater than \\377 in 8-bit non-UTF-8 mode\0"
"internal error: overran compiling workspace\0" "internal error: overran compiling workspace\0"
"internal error: previously-checked referenced subpattern not found\0" "internal error: previously-checked referenced subpattern not found\0"
"DEFINE group contains more than one branch\0" "DEFINE subpattern contains more than one branch\0"
/* 55 */ /* 55 */
"missing opening brace after \\o\0" "missing opening brace after \\o\0"
"internal error: unknown newline setting\0" "internal error: unknown newline setting\0"
@ -137,7 +137,7 @@ static const unsigned char compile_error_texts[] =
"obsolete error (should not occur)\0" /* Was the above */ "obsolete error (should not occur)\0" /* Was the above */
/* 60 */ /* 60 */
"(*VERB) not recognized or malformed\0" "(*VERB) not recognized or malformed\0"
"group number is too big\0" "subpattern number is too big\0"
"subpattern name expected\0" "subpattern name expected\0"
"internal error: parsed pattern overflow\0" "internal error: parsed pattern overflow\0"
"non-octal character in \\o{} (closing brace missing?)\0" "non-octal character in \\o{} (closing brace missing?)\0"

View File

@ -169,7 +169,7 @@ commented out the original, but kept it around just in case. */
/* void vms_setsymbol( char *, char *, int ); Original code from [1]. */ /* void vms_setsymbol( char *, char *, int ); Original code from [1]. */
#endif #endif
/* VC and older compilers don't support %td or %zu, and even some that claim to /* VC and older compilers don't support %td or %zu, and even some that claim to
be C99 don't support it (hence DISABLE_PERCENT_ZT). */ be C99 don't support it (hence DISABLE_PERCENT_ZT). */
#if defined(_MSC_VER) || !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199901L || defined(DISABLE_PERCENT_ZT) #if defined(_MSC_VER) || !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199901L || defined(DISABLE_PERCENT_ZT)
@ -539,7 +539,7 @@ typedef struct patctl { /* Structure for pattern modifiers. */
uint32_t jitstack; /* Must be in same position as datctl */ uint32_t jitstack; /* Must be in same position as datctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */ uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t substitute_skip; /* Must be in same position as patctl */ uint32_t substitute_skip; /* Must be in same position as patctl */
uint32_t substitute_stop; /* Must be in same position as patctl */ uint32_t substitute_stop; /* Must be in same position as patctl */
uint32_t jit; uint32_t jit;
uint32_t stackguard_test; uint32_t stackguard_test;
uint32_t tables_id; uint32_t tables_id;
@ -561,7 +561,7 @@ typedef struct datctl { /* Structure for data line modifiers. */
uint32_t jitstack; /* Must be in same position as patctl */ uint32_t jitstack; /* Must be in same position as patctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */ uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t substitute_skip; /* Must be in same position as patctl */ uint32_t substitute_skip; /* Must be in same position as patctl */
uint32_t substitute_stop; /* Must be in same position as patctl */ uint32_t substitute_stop; /* Must be in same position as patctl */
uint32_t startend[2]; uint32_t startend[2];
uint32_t cerror[2]; uint32_t cerror[2];
uint32_t cfail[2]; uint32_t cfail[2];
@ -3049,13 +3049,14 @@ return yield;
#ifdef SUPPORT_PCRE2_8
/************************************************* /*************************************************
* Convert character value to UTF-8 * * Convert character value to UTF-8 *
*************************************************/ *************************************************/
/* This function takes an integer value in the range 0 - 0x7fffffff /* This function takes an integer value in the range 0 - 0x7fffffff
and encodes it as a UTF-8 character in 0 to 6 bytes. and encodes it as a UTF-8 character in 0 to 6 bytes. It is needed even when the
8-bit library is not supported, to generate UTF-8 output for non-ASCII
characters.
Arguments: Arguments:
cvalue the character value cvalue the character value
@ -3081,7 +3082,6 @@ for (j = i; j > 0; j--)
*utf8bytes = utf8_table2[i] | cvalue; *utf8bytes = utf8_table2[i] | cvalue;
return i + 1; return i + 1;
} }
#endif /* SUPPORT_PCRE2_8 */
@ -4374,6 +4374,7 @@ static int
show_pattern_info(void) show_pattern_info(void)
{ {
uint32_t compile_options, overall_options, extra_options; uint32_t compile_options, overall_options, extra_options;
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
if ((pat_patctl.control & (CTL_BINCODE|CTL_FULLBINCODE)) != 0) if ((pat_patctl.control & (CTL_BINCODE|CTL_FULLBINCODE)) != 0)
{ {
@ -4463,7 +4464,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)
!= 0) != 0)
return PR_ABEND; return PR_ABEND;
fprintf(outfile, "Capturing subpattern count = %d\n", capture_count); fprintf(outfile, "Capture group count = %d\n", capture_count);
if (backrefmax > 0) if (backrefmax > 0)
fprintf(outfile, "Max back reference = %d\n", backrefmax); fprintf(outfile, "Max back reference = %d\n", backrefmax);
@ -4482,14 +4483,60 @@ if ((pat_patctl.control & CTL_INFO) != 0)
if (namecount > 0) if (namecount > 0)
{ {
fprintf(outfile, "Named capturing subpatterns:\n"); fprintf(outfile, "Named capture groups:\n");
for (; namecount > 0; namecount--) for (; namecount > 0; namecount--)
{ {
int imm2_size = test_mode == PCRE8_MODE ? 2 : 1; int imm2_size = test_mode == PCRE8_MODE ? 2 : 1;
uint32_t length = (uint32_t)STRLEN(nametable + imm2_size); uint32_t length = (uint32_t)STRLEN(nametable + imm2_size);
fprintf(outfile, " "); fprintf(outfile, " ");
PCHARSV(nametable, imm2_size, length, FALSE, outfile);
/* In UTF mode the name may be a UTF string containing non-ASCII
letters and digits. We must output it as a UTF-8 string. In non-UTF mode,
use the normal string printing functions, which use escapes for all
non-ASCII characters. */
if (utf)
{
#ifdef SUPPORT_PCRE2_32
if (test_mode == PCRE32_MODE)
{
PCRE2_SPTR32 nameptr = (PCRE2_SPTR32)nametable + imm2_size;
while (*nameptr != 0)
{
uint8_t u8buff[6];
int len = ord2utf8(*nameptr++, u8buff);
fprintf(outfile, "%.*s", len, u8buff);
}
}
#endif
#ifdef SUPPORT_PCRE2_16
if (test_mode == PCRE16_MODE)
{
PCRE2_SPTR16 nameptr = (PCRE2_SPTR16)nametable + imm2_size;
while (*nameptr != 0)
{
int len;
uint8_t u8buff[6];
uint32_t c = *nameptr++ & 0xffff;
if (c >= 0xD800 && c < 0xDC00)
c = ((c & 0x3ff) << 10) + (*nameptr++ & 0x3ff) + 0x10000;
len = ord2utf8(c, u8buff);
fprintf(outfile, "%.*s", len, u8buff);
}
}
#endif
#ifdef SUPPORT_PCRE2_8
if (test_mode == PCRE8_MODE)
fprintf(outfile, "%s", (PCRE2_SPTR8)nametable + imm2_size);
#endif
}
else /* Not UTF mode */
{
PCHARSV(nametable, imm2_size, length, FALSE, outfile);
}
while (length++ < nameentrysize - imm2_size) putc(' ', outfile); while (length++ < nameentrysize - imm2_size) putc(' ', outfile);
#ifdef SUPPORT_PCRE2_32 #ifdef SUPPORT_PCRE2_32
if (test_mode == PCRE32_MODE) if (test_mode == PCRE32_MODE)
fprintf(outfile, "%3d\n", (int)(((PCRE2_SPTR32)nametable)[0])); fprintf(outfile, "%3d\n", (int)(((PCRE2_SPTR32)nametable)[0]));
@ -4503,6 +4550,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)
fprintf(outfile, "%3d\n", (int)( fprintf(outfile, "%3d\n", (int)(
((((PCRE2_SPTR8)nametable)[0]) << 8) | ((PCRE2_SPTR8)nametable)[1])); ((((PCRE2_SPTR8)nametable)[0]) << 8) | ((PCRE2_SPTR8)nametable)[1]));
#endif #endif
nametable = (void*)((PCRE2_SPTR8)nametable + nameentrysize * code_unit_size); nametable = (void*)((PCRE2_SPTR8)nametable + nameentrysize * code_unit_size);
} }
} }
@ -5971,30 +6019,30 @@ BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
(void)data_ptr; /* Not used */ (void)data_ptr; /* Not used */
fprintf(outfile, "%2d(%d) Old %" SIZ_FORM " %" SIZ_FORM " \"", fprintf(outfile, "%2d(%d) Old %" SIZ_FORM " %" SIZ_FORM " \"",
scb->subscount, scb->oveccount, scb->subscount, scb->oveccount,
SIZ_CAST scb->ovector[0], SIZ_CAST scb->ovector[1]); SIZ_CAST scb->ovector[0], SIZ_CAST scb->ovector[1]);
PCHARSV(scb->input, scb->ovector[0], scb->ovector[1] - scb->ovector[0], PCHARSV(scb->input, scb->ovector[0], scb->ovector[1] - scb->ovector[0],
utf, outfile); utf, outfile);
fprintf(outfile, "\" New %" SIZ_FORM " %" SIZ_FORM " \"", fprintf(outfile, "\" New %" SIZ_FORM " %" SIZ_FORM " \"",
SIZ_CAST scb->output_offsets[0], SIZ_CAST scb->output_offsets[1]); SIZ_CAST scb->output_offsets[0], SIZ_CAST scb->output_offsets[1]);
PCHARSV(scb->output, scb->output_offsets[0], PCHARSV(scb->output, scb->output_offsets[0],
scb->output_offsets[1] - scb->output_offsets[0], utf, outfile); scb->output_offsets[1] - scb->output_offsets[0], utf, outfile);
if (scb->subscount == dat_datctl.substitute_stop) if (scb->subscount == dat_datctl.substitute_stop)
{ {
yield = -1; yield = -1;
fprintf(outfile, " STOPPED"); fprintf(outfile, " STOPPED");
} }
else if (scb->subscount == dat_datctl.substitute_skip) else if (scb->subscount == dat_datctl.substitute_skip)
{ {
yield = +1; yield = +1;
fprintf(outfile, " SKIPPED"); fprintf(outfile, " SKIPPED");
} }
fprintf(outfile, "\"\n"); fprintf(outfile, "\"\n");
return yield; return yield;
} }
@ -6867,11 +6915,11 @@ arg_ulen = ulen; /* Value to use in match arg */
if (p[-1] != 0 && !decode_modifiers(p, CTX_DAT, NULL, &dat_datctl)) if (p[-1] != 0 && !decode_modifiers(p, CTX_DAT, NULL, &dat_datctl))
return PR_OK; return PR_OK;
/* Setting substitute_{skip,fail} implies a substitute callout. */ /* Setting substitute_{skip,fail} implies a substitute callout. */
if (dat_datctl.substitute_skip != 0 || dat_datctl.substitute_stop != 0) if (dat_datctl.substitute_skip != 0 || dat_datctl.substitute_stop != 0)
dat_datctl.control2 |= CTL2_SUBSTITUTE_CALLOUT; dat_datctl.control2 |= CTL2_SUBSTITUTE_CALLOUT;
/* Check for mutually exclusive modifiers. At present, these are all in the /* Check for mutually exclusive modifiers. At present, these are all in the
first control word. */ first control word. */
@ -8129,7 +8177,7 @@ if (arg != NULL && arg[0] != CHAR_MINUS)
break; break;
} }
/* For VMS, return the value by setting a symbol, for certain values only. This /* For VMS, return the value by setting a symbol, for certain values only. This
is contributed code which the PCRE2 developers have no means of testing. */ is contributed code which the PCRE2 developers have no means of testing. */
#ifdef __VMS #ifdef __VMS

View File

@ -480,5 +480,13 @@
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout /(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr 123abcáyzabcdef789abcሴqr
# Check name length with non-ASCII characters
/(?'ABáC678901234567890123456789012'...)/utf
/(?'ABáC6789012345678901234567890123'...)/utf
/(?'ABZC6789012345678901234567890123'...)/utf
# End of testinput10 # End of testinput10

23
testdata/testinput4 vendored
View File

@ -2457,4 +2457,27 @@
# ------- # -------
# Test group names containing non-ASCII letters and digits
/(?'ABáC'...)\g{ABáC}/utf
abcabcdefg
/(?'XʰABC'...)/utf
xyzpq
/(?'XאABC'...)/utf
12345
/(?'XᾈABC'...)/utf
%^&*(...
/(?'𐨐ABC'...)/utf
abcde
/^(?'אABC'...)(?&אABC)(?P=אABC)/utf
123123123456
/^(?'אABC'...)(?&אABC)/utf
123123123456
# End of testinput4 # End of testinput4

15
testdata/testinput5 vendored
View File

@ -2149,4 +2149,19 @@
# ------- # -------
# Test reference and errors in non-ASCII characters in group names
/(?'𑠅ABC'...)/I,utf
abcde\=copy=𑠅ABC
# Bad ones
/(?'AB၌C'...)\g{AB၌C}/utf
/(?'٠ABC'...)/utf
/(?'²ABC'...)/utf
/(?'X²ABC'...)/utf
# End of testinput5 # End of testinput5

165
testdata/testoutput10 vendored
View File

@ -248,7 +248,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc4 First code unit = \xc4
Last code unit = \x80 Last code unit = \x80
@ -261,7 +261,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xe1 First code unit = \xe1
Last code unit = \x80 Last code unit = \x80
@ -274,7 +274,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xf0 First code unit = \xf0
Last code unit = \x80 Last code unit = \x80
@ -287,7 +287,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xf4 First code unit = \xf4
Last code unit = \x80 Last code unit = \x80
@ -300,7 +300,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xf4 First code unit = \xf4
Last code unit = \xbf Last code unit = \xbf
@ -313,7 +313,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc3 First code unit = \xc3
Last code unit = \xbf Last code unit = \xbf
@ -326,7 +326,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc4 First code unit = \xc4
Last code unit = \x80 Last code unit = \x80
@ -339,7 +339,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc2 First code unit = \xc2
Last code unit = \x80 Last code unit = \x80
@ -352,7 +352,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc3 First code unit = \xc3
Last code unit = \xbf Last code unit = \xbf
@ -365,7 +365,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xed First code unit = \xed
Last code unit = \xb4 Last code unit = \xb4
@ -380,7 +380,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xe6 First code unit = \xe6
Last code unit = \x9e Last code unit = \x9e
@ -395,7 +395,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc2 First code unit = \xc2
Last code unit = \x80 Last code unit = \x80
@ -408,7 +408,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc2 First code unit = \xc2
Last code unit = \x84 Last code unit = \x84
@ -421,7 +421,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc4 First code unit = \xc4
Last code unit = \x84 Last code unit = \x84
@ -434,7 +434,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xe0 First code unit = \xe0
Last code unit = \xa1 Last code unit = \xa1
@ -447,7 +447,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xf0 First code unit = \xf0
Last code unit = \xab Last code unit = \xab
@ -460,7 +460,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -495,7 +495,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc4 First code unit = \xc4
Last code unit = \x80 Last code unit = \x80
@ -514,7 +514,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: x \xc4 Starting code units: x \xc4
Subject length lower bound = 1 Subject length lower bound = 1
@ -531,7 +531,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: a x \xc4 Starting code units: a x \xc4
Subject length lower bound = 1 Subject length lower bound = 1
@ -548,7 +548,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: a x \xc4 Starting code units: a x \xc4
Subject length lower bound = 1 Subject length lower bound = 1
@ -566,7 +566,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: x \xc4 Starting code units: x \xc4
Subject length lower bound = 1 Subject length lower bound = 1
@ -578,7 +578,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc4 First code unit = \xc4
Last code unit = \x80 Last code unit = \x80
@ -592,7 +592,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'a' First code unit = 'a'
Last code unit = \x80 Last code unit = \x80
@ -606,7 +606,7 @@ Subject length lower bound = 2
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'a' First code unit = 'a'
Last code unit = \x81 Last code unit = \x81
@ -619,7 +619,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/[\x{100}]/IB,utf /[\x{100}]/IB,utf
@ -629,7 +629,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc4 First code unit = \xc4
Last code unit = \x80 Last code unit = \x80
@ -648,7 +648,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc3 First code unit = \xc3
Last code unit = \xbf Last code unit = \xbf
@ -663,7 +663,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Subject length lower bound = 1 Subject length lower bound = 1
@ -678,14 +678,14 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
First code unit = \xc4 First code unit = \xc4
Last code unit = 'z' Last code unit = 'z'
Subject length lower bound = 7 Subject length lower bound = 7
/\777/I,utf /\777/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc7 First code unit = \xc7
Last code unit = \xbf Last code unit = \xbf
@ -703,7 +703,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc4 First code unit = \xc4
Last code unit = \x80 Last code unit = \x80
@ -717,7 +717,7 @@ Subject length lower bound = 2
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc4 First code unit = \xc4
Last code unit = 'X' Last code unit = 'X'
@ -761,7 +761,7 @@ No match
0: \x{1234} 0: \x{1234}
/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I /(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I
Capturing subpattern count = 0 Capture group count = 0
Compile options: <none> Compile options: <none>
Overall options: utf Overall options: utf
\R matches any Unicode newline \R matches any Unicode newline
@ -771,7 +771,7 @@ Last code unit = 'b'
Subject length lower bound = 3 Subject length lower bound = 3
/\h/I,utf /\h/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x20 \xc2 \xe1 \xe2 \xe3 Starting code units: \x09 \x20 \xc2 \xe1 \xe2 \xe3
Subject length lower bound = 1 Subject length lower bound = 1
@ -795,7 +795,7 @@ Subject length lower bound = 1
0: \x{3000} 0: \x{3000}
/\v/I,utf /\v/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
Subject length lower bound = 1 Subject length lower bound = 1
@ -813,7 +813,7 @@ Subject length lower bound = 1
0: \x{2028} 0: \x{2028}
/\h*A/I,utf /\h*A/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x20 A \xc2 \xe1 \xe2 \xe3 Starting code units: \x09 \x20 A \xc2 \xe1 \xe2 \xe3
Last code unit = 'A' Last code unit = 'A'
@ -822,21 +822,21 @@ Subject length lower bound = 1
0: A 0: A
/\v+A/I,utf /\v+A/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
Last code unit = 'A' Last code unit = 'A'
Subject length lower bound = 2 Subject length lower bound = 2
/\s?xxx\s/I,utf /\s?xxx\s/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x
Last code unit = 'x' Last code unit = 'x'
Subject length lower bound = 4 Subject length lower bound = 4
/\sxxx\s/I,utf,tables=2 /\sxxx\s/I,utf,tables=2
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc2 Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc2
Last code unit = 'x' Last code unit = 'x'
@ -847,7 +847,7 @@ Subject length lower bound = 5
0: \x{a0}xxx\x{85} 0: \x{a0}xxx\x{85}
/\S \S/I,utf,tables=2 /\S \S/I,utf,tables=2
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -883,25 +883,25 @@ Error -36 (bad UTF-8 offset)
No match No match
/\x{1234}+/Ii,utf /\x{1234}+/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: \xe1 Starting code units: \xe1
Subject length lower bound = 1 Subject length lower bound = 1
/\x{1234}+?/Ii,utf /\x{1234}+?/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: \xe1 Starting code units: \xe1
Subject length lower bound = 1 Subject length lower bound = 1
/\x{1234}++/Ii,utf /\x{1234}++/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: \xe1 Starting code units: \xe1
Subject length lower bound = 1 Subject length lower bound = 1
/\x{1234}{2}/Ii,utf /\x{1234}{2}/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: \xe1 Starting code units: \xe1
Subject length lower bound = 2 Subject length lower bound = 2
@ -913,7 +913,7 @@ Subject length lower bound = 2
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Subject length lower bound = 1 Subject length lower bound = 1
@ -925,14 +925,14 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'X' First code unit = 'X'
Last code unit = \x80 Last code unit = \x80
Subject length lower bound = 2 Subject length lower bound = 2
/\R/I,utf /\R/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
Subject length lower bound = 1 Subject length lower bound = 1
@ -944,7 +944,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xc7 First code unit = \xc7
Last code unit = \xbf Last code unit = \xbf
@ -1105,7 +1105,7 @@ Failed: error 174 at offset 0: using UTF is disabled by the application
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = 'A' (caseless) First code unit = 'A' (caseless)
Subject length lower bound = 5 Subject length lower bound = 5
@ -1117,7 +1117,7 @@ Subject length lower bound = 5
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = \xb0 Last code unit = \xb0
@ -1130,7 +1130,7 @@ Subject length lower bound = 5
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = \xb0 Last code unit = \xb0
@ -1143,14 +1143,14 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = 'A' (caseless) First code unit = 'A' (caseless)
Last code unit = 'B' (caseless) Last code unit = 'B' (caseless)
Subject length lower bound = 3 Subject length lower bound = 3
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf /\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: \xd0 \xd1 Starting code units: \xd0 \xd1
Subject length lower bound = 17 Subject length lower bound = 17
@ -1176,17 +1176,17 @@ Subject length lower bound = 17
------------------------------------------------------------------ ------------------------------------------------------------------
/\h/I /\h/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x09 \x20 \xa0 Starting code units: \x09 \x20 \xa0
Subject length lower bound = 1 Subject length lower bound = 1
/\v/I /\v/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 Starting code units: \x0a \x0b \x0c \x0d \x85
Subject length lower bound = 1 Subject length lower bound = 1
/\R/I /\R/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 Starting code units: \x0a \x0b \x0c \x0d \x85
Subject length lower bound = 1 Subject length lower bound = 1
@ -1199,7 +1199,7 @@ Subject length lower bound = 1
------------------------------------------------------------------ ------------------------------------------------------------------
/\x{212a}+/Ii,utf /\x{212a}+/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: K k \xe2 Starting code units: K k \xe2
Subject length lower bound = 1 Subject length lower bound = 1
@ -1207,7 +1207,7 @@ Subject length lower bound = 1
0: KKkk\x{212a} 0: KKkk\x{212a}
/s+/Ii,utf /s+/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: S s \xc5 Starting code units: S s \xc5
Subject length lower bound = 1 Subject length lower bound = 1
@ -1222,7 +1222,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: A \xc4 Starting code units: A \xc4
Last code unit = 'A' Last code unit = 'A'
@ -1239,7 +1239,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4 Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4
Subject length lower bound = 1 Subject length lower bound = 1
@ -1251,7 +1251,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: Z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd Starting code units: Z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd
\xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc
@ -1273,7 +1273,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 Starting code units: z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9
\xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8
@ -1289,7 +1289,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: - ] a d z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc Starting code units: - ] a d z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc
\xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb
@ -1314,7 +1314,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: a b \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd Starting code units: a b \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd
\xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc
@ -1332,7 +1332,7 @@ Subject length lower bound = 7
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc4 Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc4
Subject length lower bound = 1 Subject length lower bound = 1
@ -1345,7 +1345,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4 Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4
Subject length lower bound = 1 Subject length lower bound = 1
@ -1358,7 +1358,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
@ -1373,7 +1373,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1395,7 +1395,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -1416,7 +1416,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1435,7 +1435,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce Starting code units: \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce
\xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd
@ -1462,7 +1462,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8
\xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7
@ -1503,7 +1503,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8
\xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7
@ -1520,7 +1520,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: \xce \xcf Starting code units: \xce \xcf
Last code unit = 'B' (caseless) Last code unit = 'B' (caseless)
@ -1531,7 +1531,7 @@ Subject length lower bound = 2
Failed: error -3: UTF-8 error: 1 byte missing at end Failed: error -3: UTF-8 error: 1 byte missing at end
/(?<=(a)(?-1))x/I,utf /(?<=(a)(?-1))x/I,utf
Capturing subpattern count = 1 Capture group count = 1
Max lookbehind = 2 Max lookbehind = 2
Options: utf Options: utf
First code unit = 'x' First code unit = 'x'
@ -1579,7 +1579,7 @@ Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP),
# but subjects containing them must not be UTF-checked. # but subjects containing them must not be UTF-checked.
/\x{d800}/I,utf,allow_surrogate_escapes /\x{d800}/I,utf,allow_surrogate_escapes
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Extra options: allow_surrogate_escapes Extra options: allow_surrogate_escapes
First code unit = \xed First code unit = \xed
@ -1602,7 +1602,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Compile options: utf Compile options: utf
Overall options: anchored utf Overall options: anchored utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
@ -1635,5 +1635,14 @@ No match
3(2) Old 13 16 "def" New 17 22 "<def>" 3(2) Old 13 16 "def" New 17 22 "<def>"
4(2) Old 22 22 "" New 28 30 "<>" 4(2) Old 22 22 "" New 28 30 "<>"
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr 4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
# Check name length with non-ASCII characters
/(?'ABáC678901234567890123456789012'...)/utf
/(?'ABáC6789012345678901234567890123'...)/utf
Failed: error 148 at offset 36: subpattern name is too long (maximum 32 code units)
/(?'ABZC6789012345678901234567890123'...)/utf
# End of testinput10 # End of testinput10

View File

@ -13,11 +13,11 @@
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/\x{100}/I /\x{100}/I
Capturing subpattern count = 0 Capture group count = 0
First code unit = \x{100} First code unit = \x{100}
Subject length lower bound = 1 Subject length lower bound = 1
@ -215,7 +215,7 @@ Subject length lower bound = 1
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional trailing comment \) )* # optional trailing comment
/Ix /Ix
Capturing subpattern count = 0 Capture group count = 0
Contains explicit CR or LF match Contains explicit CR or LF match
Options: extended Options: extended
Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
@ -260,7 +260,7 @@ Subject length lower bound = 3
------------------------------------------------------------------ ------------------------------------------------------------------
/\h+/I /\h+/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x09 \x20 \xa0 \xff Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1 Subject length lower bound = 1
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
@ -275,7 +275,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x09 \x20 \xa0 \xff Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1 Subject length lower bound = 1
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
@ -284,7 +284,7 @@ Subject length lower bound = 1
0: \x{200a}\xa0\x{2000} 0: \x{200a}\xa0\x{2000}
/\H+/I /\H+/I
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
0: \x{167f}\x{1681}\x{180d}\x{180f} 0: \x{167f}\x{1681}\x{180d}\x{180f}
@ -306,7 +306,7 @@ Subject length lower bound = 1
0: \x9f\xa1\x{2fff}\x{3001} 0: \x9f\xa1\x{2fff}\x{3001}
/\v+/I /\v+/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1 Subject length lower bound = 1
\x{2027}\x{2030}\x{2028}\x{2029} \x{2027}\x{2030}\x{2028}\x{2029}
@ -321,7 +321,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1 Subject length lower bound = 1
\x{2027}\x{2030}\x{2028}\x{2029} \x{2027}\x{2030}\x{2028}\x{2029}
@ -330,7 +330,7 @@ Subject length lower bound = 1
0: \x85\x0a\x0b\x0c\x0d 0: \x85\x0a\x0b\x0c\x0d
/\V+/I /\V+/I
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
\x{2028}\x{2029}\x{2027}\x{2030} \x{2028}\x{2029}\x{2027}\x{2030}
0: \x{2027}\x{2030} 0: \x{2027}\x{2030}
@ -344,7 +344,7 @@ Subject length lower bound = 1
0: \x09\x0e\x84\x86 0: \x09\x0e\x84\x86
/\R+/I,bsr=unicode /\R+/I,bsr=unicode
Capturing subpattern count = 0 Capture group count = 0
\R matches any Unicode newline \R matches any Unicode newline
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -354,7 +354,7 @@ Subject length lower bound = 1
0: \x85\x0a\x0b\x0c\x0d 0: \x85\x0a\x0b\x0c\x0d
/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I /\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I
Capturing subpattern count = 0 Capture group count = 0
First code unit = \x{d800} First code unit = \x{d800}
Last code unit = \x{dd00} Last code unit = \x{dd00}
Subject length lower bound = 6 Subject length lower bound = 6
@ -600,7 +600,7 @@ Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b
\x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a
\x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 \x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9
@ -624,7 +624,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >

View File

@ -13,11 +13,11 @@
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/\x{100}/I /\x{100}/I
Capturing subpattern count = 0 Capture group count = 0
First code unit = \x{100} First code unit = \x{100}
Subject length lower bound = 1 Subject length lower bound = 1
@ -215,7 +215,7 @@ Subject length lower bound = 1
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional trailing comment \) )* # optional trailing comment
/Ix /Ix
Capturing subpattern count = 0 Capture group count = 0
Contains explicit CR or LF match Contains explicit CR or LF match
Options: extended Options: extended
Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
@ -260,7 +260,7 @@ Subject length lower bound = 3
------------------------------------------------------------------ ------------------------------------------------------------------
/\h+/I /\h+/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x09 \x20 \xa0 \xff Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1 Subject length lower bound = 1
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
@ -275,7 +275,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x09 \x20 \xa0 \xff Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1 Subject length lower bound = 1
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
@ -284,7 +284,7 @@ Subject length lower bound = 1
0: \x{200a}\xa0\x{2000} 0: \x{200a}\xa0\x{2000}
/\H+/I /\H+/I
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
0: \x{167f}\x{1681}\x{180d}\x{180f} 0: \x{167f}\x{1681}\x{180d}\x{180f}
@ -306,7 +306,7 @@ Subject length lower bound = 1
0: \x9f\xa1\x{2fff}\x{3001} 0: \x9f\xa1\x{2fff}\x{3001}
/\v+/I /\v+/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1 Subject length lower bound = 1
\x{2027}\x{2030}\x{2028}\x{2029} \x{2027}\x{2030}\x{2028}\x{2029}
@ -321,7 +321,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1 Subject length lower bound = 1
\x{2027}\x{2030}\x{2028}\x{2029} \x{2027}\x{2030}\x{2028}\x{2029}
@ -330,7 +330,7 @@ Subject length lower bound = 1
0: \x85\x0a\x0b\x0c\x0d 0: \x85\x0a\x0b\x0c\x0d
/\V+/I /\V+/I
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
\x{2028}\x{2029}\x{2027}\x{2030} \x{2028}\x{2029}\x{2027}\x{2030}
0: \x{2027}\x{2030} 0: \x{2027}\x{2030}
@ -344,7 +344,7 @@ Subject length lower bound = 1
0: \x09\x0e\x84\x86 0: \x09\x0e\x84\x86
/\R+/I,bsr=unicode /\R+/I,bsr=unicode
Capturing subpattern count = 0 Capture group count = 0
\R matches any Unicode newline \R matches any Unicode newline
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -354,7 +354,7 @@ Subject length lower bound = 1
0: \x85\x0a\x0b\x0c\x0d 0: \x85\x0a\x0b\x0c\x0d
/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I /\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I
Capturing subpattern count = 0 Capture group count = 0
First code unit = \x{d800} First code unit = \x{d800}
Last code unit = \x{dd00} Last code unit = \x{dd00}
Subject length lower bound = 6 Subject length lower bound = 6
@ -558,19 +558,19 @@ Failed: error 134 at offset 12: character code point value in \x{} or \o{} is to
Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large
/\x{7fffffff}\x{7fffffff}/I /\x{7fffffff}\x{7fffffff}/I
Capturing subpattern count = 0 Capture group count = 0
First code unit = \x{7fffffff} First code unit = \x{7fffffff}
Last code unit = \x{7fffffff} Last code unit = \x{7fffffff}
Subject length lower bound = 2 Subject length lower bound = 2
/\x{80000000}\x{80000000}/I /\x{80000000}\x{80000000}/I
Capturing subpattern count = 0 Capture group count = 0
First code unit = \x{80000000} First code unit = \x{80000000}
Last code unit = \x{80000000} Last code unit = \x{80000000}
Subject length lower bound = 2 Subject length lower bound = 2
/\x{ffffffff}\x{ffffffff}/I /\x{ffffffff}\x{ffffffff}/I
Capturing subpattern count = 0 Capture group count = 0
First code unit = \x{ffffffff} First code unit = \x{ffffffff}
Last code unit = \x{ffffffff} Last code unit = \x{ffffffff}
Subject length lower bound = 2 Subject length lower bound = 2
@ -588,7 +588,7 @@ Subject length lower bound = 2
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless Options: caseless
First code unit = \x{400000} First code unit = \x{400000}
Last code unit = \x{800000} Last code unit = \x{800000}
@ -603,7 +603,7 @@ Subject length lower bound = 2
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b
\x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a
\x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 \x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9
@ -627,7 +627,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >

View File

@ -18,7 +18,7 @@
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{ffff} First code unit = \x{ffff}
Subject length lower bound = 1 Subject length lower bound = 1
@ -30,7 +30,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{d800} First code unit = \x{d800}
Last code unit = \x{dc00} Last code unit = \x{dc00}
@ -43,7 +43,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Subject length lower bound = 1 Subject length lower bound = 1
@ -55,7 +55,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{1000} First code unit = \x{1000}
Subject length lower bound = 1 Subject length lower bound = 1
@ -67,7 +67,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{d800} First code unit = \x{d800}
Last code unit = \x{dc00} Last code unit = \x{dc00}
@ -80,7 +80,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{dbc0} First code unit = \x{dbc0}
Last code unit = \x{dc00} Last code unit = \x{dc00}
@ -93,7 +93,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{dbff} First code unit = \x{dbff}
Last code unit = \x{dfff} Last code unit = \x{dfff}
@ -106,7 +106,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xff First code unit = \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -118,7 +118,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Subject length lower bound = 1 Subject length lower bound = 1
@ -130,7 +130,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x80 First code unit = \x80
Subject length lower bound = 1 Subject length lower bound = 1
@ -142,7 +142,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xff First code unit = \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -154,7 +154,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{d55c} First code unit = \x{d55c}
Last code unit = \x{c5b4} Last code unit = \x{c5b4}
@ -169,7 +169,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{65e5} First code unit = \x{65e5}
Last code unit = \x{8a9e} Last code unit = \x{8a9e}
@ -184,7 +184,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x80 First code unit = \x80
Subject length lower bound = 1 Subject length lower bound = 1
@ -196,7 +196,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x84 First code unit = \x84
Subject length lower bound = 1 Subject length lower bound = 1
@ -208,7 +208,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{104} First code unit = \x{104}
Subject length lower bound = 1 Subject length lower bound = 1
@ -220,7 +220,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{861} First code unit = \x{861}
Subject length lower bound = 1 Subject length lower bound = 1
@ -232,7 +232,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{d844} First code unit = \x{d844}
Last code unit = \x{deab} Last code unit = \x{deab}
@ -245,7 +245,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -281,7 +281,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Last code unit = \x{100} Last code unit = \x{100}
@ -300,7 +300,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: x \xff Starting code units: x \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -317,7 +317,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: a x \xff Starting code units: a x \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -334,7 +334,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: a x \xff Starting code units: a x \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -352,7 +352,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: x \xff Starting code units: x \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -364,7 +364,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Subject length lower bound = 1 Subject length lower bound = 1
@ -377,7 +377,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'a' First code unit = 'a'
Last code unit = \x{100} Last code unit = \x{100}
@ -391,7 +391,7 @@ Subject length lower bound = 2
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'a' First code unit = 'a'
Last code unit = \x{101} Last code unit = \x{101}
@ -404,7 +404,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/[\x{100}]/IB,utf /[\x{100}]/IB,utf
@ -414,7 +414,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Subject length lower bound = 1 Subject length lower bound = 1
@ -432,7 +432,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xff First code unit = \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -446,7 +446,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Subject length lower bound = 1 Subject length lower bound = 1
@ -461,14 +461,14 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Last code unit = 'z' Last code unit = 'z'
Subject length lower bound = 7 Subject length lower bound = 7
/\777/I,utf /\777/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{1ff} First code unit = \x{1ff}
Subject length lower bound = 1 Subject length lower bound = 1
@ -485,7 +485,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Last code unit = \x{200} Last code unit = \x{200}
@ -499,7 +499,7 @@ Subject length lower bound = 2
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Last code unit = 'X' Last code unit = 'X'
@ -547,7 +547,7 @@ Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
0: \x{11234} 0: \x{11234}
/(*UTF)\x{11234}/I /(*UTF)\x{11234}/I
Capturing subpattern count = 0 Capture group count = 0
Compile options: <none> Compile options: <none>
Overall options: utf Overall options: utf
First code unit = \x{d804} First code unit = \x{d804}
@ -565,7 +565,7 @@ Failed: error 160 at offset 5: (*VERB) not recognized or malformed
abcd\x{11234}pqr abcd\x{11234}pqr
/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I /(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I
Capturing subpattern count = 0 Capture group count = 0
Compile options: <none> Compile options: <none>
Overall options: utf Overall options: utf
\R matches any Unicode newline \R matches any Unicode newline
@ -578,7 +578,7 @@ Subject length lower bound = 3
Failed: error 160 at offset 14: (*VERB) not recognized or malformed Failed: error 160 at offset 14: (*VERB) not recognized or malformed
/\h/I,utf /\h/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x20 \xa0 \xff Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -602,7 +602,7 @@ Subject length lower bound = 1
0: \x{3000} 0: \x{3000}
/\v/I,utf /\v/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -620,7 +620,7 @@ Subject length lower bound = 1
0: \x{2028} 0: \x{2028}
/\h*A/I,utf /\h*A/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x20 A \xa0 \xff Starting code units: \x09 \x20 A \xa0 \xff
Last code unit = 'A' Last code unit = 'A'
@ -631,7 +631,7 @@ Subject length lower bound = 1
0: \x{2000}A 0: \x{2000}A
/\R*A/I,bsr=unicode,utf /\R*A/I,bsr=unicode,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
\R matches any Unicode newline \R matches any Unicode newline
Starting code units: \x0a \x0b \x0c \x0d A \x85 \xff Starting code units: \x0a \x0b \x0c \x0d A \x85 \xff
@ -643,21 +643,21 @@ Subject length lower bound = 1
0: \x{2028}A 0: \x{2028}A
/\v+A/I,utf /\v+A/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Last code unit = 'A' Last code unit = 'A'
Subject length lower bound = 2 Subject length lower bound = 2
/\s?xxx\s/I,utf /\s?xxx\s/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x
Last code unit = 'x' Last code unit = 'x'
Subject length lower bound = 4 Subject length lower bound = 4
/\sxxx\s/I,utf,tables=2 /\sxxx\s/I,utf,tables=2
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0 Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
Last code unit = 'x' Last code unit = 'x'
@ -668,7 +668,7 @@ Subject length lower bound = 5
0: \x{a0}xxx\x{85} 0: \x{a0}xxx\x{85}
/\S \S/I,utf,tables=2 /\S \S/I,utf,tables=2
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -708,25 +708,25 @@ Failed: error -33: bad offset value
Failed: error -33: bad offset value Failed: error -33: bad offset value
/\x{1234}+/Ii,utf /\x{1234}+/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = \x{1234} First code unit = \x{1234}
Subject length lower bound = 1 Subject length lower bound = 1
/\x{1234}+?/Ii,utf /\x{1234}+?/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = \x{1234} First code unit = \x{1234}
Subject length lower bound = 1 Subject length lower bound = 1
/\x{1234}++/Ii,utf /\x{1234}++/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = \x{1234} First code unit = \x{1234}
Subject length lower bound = 1 Subject length lower bound = 1
/\x{1234}{2}/Ii,utf /\x{1234}{2}/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = \x{1234} First code unit = \x{1234}
Last code unit = \x{1234} Last code unit = \x{1234}
@ -739,7 +739,7 @@ Subject length lower bound = 2
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Subject length lower bound = 1 Subject length lower bound = 1
@ -751,14 +751,14 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'X' First code unit = 'X'
Last code unit = \x{200} Last code unit = \x{200}
Subject length lower bound = 2 Subject length lower bound = 2
/\R/I,utf /\R/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -936,7 +936,7 @@ Failed: error 174 at offset 0: using UTF is disabled by the application
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = 'A' (caseless) First code unit = 'A' (caseless)
Last code unit = \x{1fb0} (caseless) Last code unit = \x{1fb0} (caseless)
@ -949,7 +949,7 @@ Subject length lower bound = 5
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = \x{1fb0} Last code unit = \x{1fb0}
@ -962,7 +962,7 @@ Subject length lower bound = 5
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = \x{1fb0} Last code unit = \x{1fb0}
@ -975,14 +975,14 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = 'A' (caseless) First code unit = 'A' (caseless)
Last code unit = \x{1fb0} (caseless) Last code unit = \x{1fb0} (caseless)
Subject length lower bound = 3 Subject length lower bound = 3
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf /\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = \x{401} (caseless) First code unit = \x{401} (caseless)
Last code unit = \x{42f} (caseless) Last code unit = \x{42f} (caseless)
@ -1017,7 +1017,7 @@ Subject length lower bound = 17
------------------------------------------------------------------ ------------------------------------------------------------------
/\x{212a}+/Ii,utf /\x{212a}+/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: K k \xff Starting code units: K k \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1025,7 +1025,7 @@ Subject length lower bound = 1
0: KKkk\x{212a} 0: KKkk\x{212a}
/s+/Ii,utf /s+/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: S s \xff Starting code units: S s \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1048,7 +1048,7 @@ Failed: error 134 at offset 10: character code point value in \x{} or \o{} is to
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: A \xff Starting code units: A \xff
Last code unit = 'A' Last code unit = 'A'
@ -1065,7 +1065,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1077,7 +1077,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: Z \xff Starting code units: Z \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1095,7 +1095,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 Starting code units: z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87
\x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96
@ -1115,7 +1115,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: - ] a d z \xff Starting code units: - ] a d z \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1136,7 +1136,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: a b \xff Starting code units: a b \xff
Last code unit = 'z' Last code unit = 'z'
@ -1150,7 +1150,7 @@ Subject length lower bound = 7
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xff Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1163,7 +1163,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1176,7 +1176,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
@ -1191,7 +1191,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1217,7 +1217,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -1243,7 +1243,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1266,7 +1266,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: \xff Starting code units: \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1289,7 +1289,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
@ -1335,7 +1335,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
@ -1357,7 +1357,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: \xff Starting code units: \xff
Last code unit = 'B' (caseless) Last code unit = 'B' (caseless)
@ -1443,7 +1443,7 @@ Failed: error 191 at offset 0: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowe
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Compile options: utf Compile options: utf
Overall options: anchored utf Overall options: anchored utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a

View File

@ -18,7 +18,7 @@
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{ffff} First code unit = \x{ffff}
Subject length lower bound = 1 Subject length lower bound = 1
@ -30,7 +30,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{10000} First code unit = \x{10000}
Subject length lower bound = 1 Subject length lower bound = 1
@ -42,7 +42,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Subject length lower bound = 1 Subject length lower bound = 1
@ -54,7 +54,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{1000} First code unit = \x{1000}
Subject length lower bound = 1 Subject length lower bound = 1
@ -66,7 +66,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{10000} First code unit = \x{10000}
Subject length lower bound = 1 Subject length lower bound = 1
@ -78,7 +78,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100000} First code unit = \x{100000}
Subject length lower bound = 1 Subject length lower bound = 1
@ -90,7 +90,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{10ffff} First code unit = \x{10ffff}
Subject length lower bound = 1 Subject length lower bound = 1
@ -102,7 +102,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xff First code unit = \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -114,7 +114,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Subject length lower bound = 1 Subject length lower bound = 1
@ -126,7 +126,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x80 First code unit = \x80
Subject length lower bound = 1 Subject length lower bound = 1
@ -138,7 +138,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xff First code unit = \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -150,7 +150,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{d55c} First code unit = \x{d55c}
Last code unit = \x{c5b4} Last code unit = \x{c5b4}
@ -165,7 +165,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{65e5} First code unit = \x{65e5}
Last code unit = \x{8a9e} Last code unit = \x{8a9e}
@ -180,7 +180,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x80 First code unit = \x80
Subject length lower bound = 1 Subject length lower bound = 1
@ -192,7 +192,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x84 First code unit = \x84
Subject length lower bound = 1 Subject length lower bound = 1
@ -204,7 +204,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{104} First code unit = \x{104}
Subject length lower bound = 1 Subject length lower bound = 1
@ -216,7 +216,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{861} First code unit = \x{861}
Subject length lower bound = 1 Subject length lower bound = 1
@ -228,7 +228,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{212ab} First code unit = \x{212ab}
Subject length lower bound = 1 Subject length lower bound = 1
@ -240,7 +240,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -276,7 +276,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Last code unit = \x{100} Last code unit = \x{100}
@ -295,7 +295,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: x \xff Starting code units: x \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -312,7 +312,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: a x \xff Starting code units: a x \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -329,7 +329,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: a x \xff Starting code units: a x \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -347,7 +347,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: x \xff Starting code units: x \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -359,7 +359,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Subject length lower bound = 1 Subject length lower bound = 1
@ -372,7 +372,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'a' First code unit = 'a'
Last code unit = \x{100} Last code unit = \x{100}
@ -386,7 +386,7 @@ Subject length lower bound = 2
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'a' First code unit = 'a'
Last code unit = \x{101} Last code unit = \x{101}
@ -399,7 +399,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/[\x{100}]/IB,utf /[\x{100}]/IB,utf
@ -409,7 +409,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Subject length lower bound = 1 Subject length lower bound = 1
@ -427,7 +427,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xff First code unit = \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -441,7 +441,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Subject length lower bound = 1 Subject length lower bound = 1
@ -456,14 +456,14 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Last code unit = 'z' Last code unit = 'z'
Subject length lower bound = 7 Subject length lower bound = 7
/\777/I,utf /\777/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{1ff} First code unit = \x{1ff}
Subject length lower bound = 1 Subject length lower bound = 1
@ -480,7 +480,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Last code unit = \x{200} Last code unit = \x{200}
@ -494,7 +494,7 @@ Subject length lower bound = 2
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{100} First code unit = \x{100}
Last code unit = 'X' Last code unit = 'X'
@ -542,7 +542,7 @@ Failed: error 160 at offset 7: (*VERB) not recognized or malformed
abcd\x{11234}pqr abcd\x{11234}pqr
/(*UTF)\x{11234}/I /(*UTF)\x{11234}/I
Capturing subpattern count = 0 Capture group count = 0
Compile options: <none> Compile options: <none>
Overall options: utf Overall options: utf
First code unit = \x{11234} First code unit = \x{11234}
@ -562,7 +562,7 @@ Failed: error 160 at offset 5: (*VERB) not recognized or malformed
Failed: error 160 at offset 14: (*VERB) not recognized or malformed Failed: error 160 at offset 14: (*VERB) not recognized or malformed
/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I /(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
Capturing subpattern count = 0 Capture group count = 0
Compile options: <none> Compile options: <none>
Overall options: utf Overall options: utf
\R matches any Unicode newline \R matches any Unicode newline
@ -572,7 +572,7 @@ Last code unit = 'b'
Subject length lower bound = 3 Subject length lower bound = 3
/\h/I,utf /\h/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x20 \xa0 \xff Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -596,7 +596,7 @@ Subject length lower bound = 1
0: \x{3000} 0: \x{3000}
/\v/I,utf /\v/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -614,7 +614,7 @@ Subject length lower bound = 1
0: \x{2028} 0: \x{2028}
/\h*A/I,utf /\h*A/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x20 A \xa0 \xff Starting code units: \x09 \x20 A \xa0 \xff
Last code unit = 'A' Last code unit = 'A'
@ -625,7 +625,7 @@ Subject length lower bound = 1
0: \x{2000}A 0: \x{2000}A
/\R*A/I,bsr=unicode,utf /\R*A/I,bsr=unicode,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
\R matches any Unicode newline \R matches any Unicode newline
Starting code units: \x0a \x0b \x0c \x0d A \x85 \xff Starting code units: \x0a \x0b \x0c \x0d A \x85 \xff
@ -637,21 +637,21 @@ Subject length lower bound = 1
0: \x{2028}A 0: \x{2028}A
/\v+A/I,utf /\v+A/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Last code unit = 'A' Last code unit = 'A'
Subject length lower bound = 2 Subject length lower bound = 2
/\s?xxx\s/I,utf /\s?xxx\s/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x
Last code unit = 'x' Last code unit = 'x'
Subject length lower bound = 4 Subject length lower bound = 4
/\sxxx\s/I,utf,tables=2 /\sxxx\s/I,utf,tables=2
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0 Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
Last code unit = 'x' Last code unit = 'x'
@ -662,7 +662,7 @@ Subject length lower bound = 5
0: \x{a0}xxx\x{85} 0: \x{a0}xxx\x{85}
/\S \S/I,utf,tables=2 /\S \S/I,utf,tables=2
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -702,25 +702,25 @@ Failed: error -33: bad offset value
Failed: error -33: bad offset value Failed: error -33: bad offset value
/\x{1234}+/Ii,utf /\x{1234}+/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = \x{1234} First code unit = \x{1234}
Subject length lower bound = 1 Subject length lower bound = 1
/\x{1234}+?/Ii,utf /\x{1234}+?/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = \x{1234} First code unit = \x{1234}
Subject length lower bound = 1 Subject length lower bound = 1
/\x{1234}++/Ii,utf /\x{1234}++/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = \x{1234} First code unit = \x{1234}
Subject length lower bound = 1 Subject length lower bound = 1
/\x{1234}{2}/Ii,utf /\x{1234}{2}/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = \x{1234} First code unit = \x{1234}
Last code unit = \x{1234} Last code unit = \x{1234}
@ -733,7 +733,7 @@ Subject length lower bound = 2
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Subject length lower bound = 1 Subject length lower bound = 1
@ -745,14 +745,14 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'X' First code unit = 'X'
Last code unit = \x{200} Last code unit = \x{200}
Subject length lower bound = 2 Subject length lower bound = 2
/\R/I,utf /\R/I,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -930,7 +930,7 @@ Failed: error 174 at offset 0: using UTF is disabled by the application
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = 'A' (caseless) First code unit = 'A' (caseless)
Last code unit = \x{1fb0} (caseless) Last code unit = \x{1fb0} (caseless)
@ -943,7 +943,7 @@ Subject length lower bound = 5
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = \x{1fb0} Last code unit = \x{1fb0}
@ -956,7 +956,7 @@ Subject length lower bound = 5
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = \x{1fb0} Last code unit = \x{1fb0}
@ -969,14 +969,14 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = 'A' (caseless) First code unit = 'A' (caseless)
Last code unit = \x{1fb0} (caseless) Last code unit = \x{1fb0} (caseless)
Subject length lower bound = 3 Subject length lower bound = 3
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf /\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = \x{401} (caseless) First code unit = \x{401} (caseless)
Last code unit = \x{42f} (caseless) Last code unit = \x{42f} (caseless)
@ -1011,7 +1011,7 @@ Subject length lower bound = 17
------------------------------------------------------------------ ------------------------------------------------------------------
/\x{212a}+/Ii,utf /\x{212a}+/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: K k \xff Starting code units: K k \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1019,7 +1019,7 @@ Subject length lower bound = 1
0: KKkk\x{212a} 0: KKkk\x{212a}
/s+/Ii,utf /s+/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: S s \xff Starting code units: S s \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1042,7 +1042,7 @@ Failed: error 134 at offset 10: character code point value in \x{} or \o{} is to
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: A \xff Starting code units: A \xff
Last code unit = 'A' Last code unit = 'A'
@ -1059,7 +1059,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1071,7 +1071,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: Z \xff Starting code units: Z \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1089,7 +1089,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 Starting code units: z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87
\x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96
@ -1109,7 +1109,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: - ] a d z \xff Starting code units: - ] a d z \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1130,7 +1130,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Starting code units: a b \xff Starting code units: a b \xff
Last code unit = 'z' Last code unit = 'z'
@ -1144,7 +1144,7 @@ Subject length lower bound = 7
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xff Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1157,7 +1157,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1170,7 +1170,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
@ -1185,7 +1185,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1211,7 +1211,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -1237,7 +1237,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1260,7 +1260,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: \xff Starting code units: \xff
Subject length lower bound = 1 Subject length lower bound = 1
@ -1283,7 +1283,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
@ -1329,7 +1329,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
@ -1351,7 +1351,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Starting code units: \xff Starting code units: \xff
Last code unit = 'B' (caseless) Last code unit = 'B' (caseless)
@ -1418,7 +1418,7 @@ No match
# errors in 16-bit mode. # errors in 16-bit mode.
/\x{d800}/I,utf,allow_surrogate_escapes /\x{d800}/I,utf,allow_surrogate_escapes
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Extra options: allow_surrogate_escapes Extra options: allow_surrogate_escapes
First code unit = \x{d800} First code unit = \x{d800}
@ -1440,7 +1440,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Compile options: utf Compile options: utf
Overall options: anchored utf Overall options: anchored utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a

34
testdata/testoutput15 vendored
View File

@ -7,7 +7,7 @@
# (2) Other tests that must not be run with JIT. # (2) Other tests that must not be run with JIT.
/(a+)*zz/I /(a+)*zz/I
Capturing subpattern count = 1 Capture group count = 1
Starting code units: a z Starting code units: a z
Last code unit = 'z' Last code unit = 'z'
Subject length lower bound = 2 Subject length lower bound = 2
@ -24,7 +24,7 @@ Minimum depth limit = 30
No match No match
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I !((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
Capturing subpattern count = 1 Capture group count = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
/* this is a C style comment */\=find_limits /* this is a C style comment */\=find_limits
@ -117,7 +117,7 @@ Failed: error 160 at offset 17: (*VERB) not recognized or malformed
Failed: error 160 at offset 24: (*VERB) not recognized or malformed Failed: error 160 at offset 24: (*VERB) not recognized or malformed
/(*LIMIT_DEPTH=4294967280)abc/I /(*LIMIT_DEPTH=4294967280)abc/I
Capturing subpattern count = 0 Capture group count = 0
Depth limit = 4294967280 Depth limit = 4294967280
First code unit = 'a' First code unit = 'a'
Last code unit = 'c' Last code unit = 'c'
@ -137,7 +137,7 @@ Failed: error -47: match limit exceeded
Failed: error -53: matching depth limit exceeded Failed: error -53: matching depth limit exceeded
/(*LIMIT_MATCH=3000)(a+)*zz/I /(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1 Capture group count = 1
Match limit = 3000 Match limit = 3000
Starting code units: a z Starting code units: a z
Last code unit = 'z' Last code unit = 'z'
@ -150,7 +150,7 @@ Failed: error -47: match limit exceeded
Failed: error -47: match limit exceeded Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I /(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1 Capture group count = 1
Match limit = 3000 Match limit = 3000
Starting code units: a z Starting code units: a z
Last code unit = 'z' Last code unit = 'z'
@ -160,7 +160,7 @@ Subject length lower bound = 2
Failed: error -47: match limit exceeded Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(a+)*zz/I /(*LIMIT_MATCH=60000)(a+)*zz/I
Capturing subpattern count = 1 Capture group count = 1
Match limit = 60000 Match limit = 60000
Starting code units: a z Starting code units: a z
Last code unit = 'z' Last code unit = 'z'
@ -173,7 +173,7 @@ No match
Failed: error -47: match limit exceeded Failed: error -47: match limit exceeded
/(*LIMIT_DEPTH=10)(a+)*zz/I /(*LIMIT_DEPTH=10)(a+)*zz/I
Capturing subpattern count = 1 Capture group count = 1
Depth limit = 10 Depth limit = 10
Starting code units: a z Starting code units: a z
Last code unit = 'z' Last code unit = 'z'
@ -186,7 +186,7 @@ Failed: error -53: matching depth limit exceeded
Failed: error -53: matching depth limit exceeded Failed: error -53: matching depth limit exceeded
/(*LIMIT_DEPTH=10)(*LIMIT_DEPTH=1000)(a+)*zz/I /(*LIMIT_DEPTH=10)(*LIMIT_DEPTH=1000)(a+)*zz/I
Capturing subpattern count = 1 Capture group count = 1
Depth limit = 1000 Depth limit = 1000
Starting code units: a z Starting code units: a z
Last code unit = 'z' Last code unit = 'z'
@ -196,7 +196,7 @@ Subject length lower bound = 2
No match No match
/(*LIMIT_DEPTH=1000)(a+)*zz/I /(*LIMIT_DEPTH=1000)(a+)*zz/I
Capturing subpattern count = 1 Capture group count = 1
Depth limit = 1000 Depth limit = 1000
Starting code units: a z Starting code units: a z
Last code unit = 'z' Last code unit = 'z'
@ -269,14 +269,14 @@ Failed: error -52: nested recursion at the same subject position
# when JIT is used. # when JIT is used.
/(?R)/I /(?R)/I
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
abcd abcd
Failed: error -52: nested recursion at the same subject position Failed: error -52: nested recursion at the same subject position
/(a|(?R))/I /(a|(?R))/I
Capturing subpattern count = 1 Capture group count = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
abcd abcd
@ -286,7 +286,7 @@ Subject length lower bound = 0
Failed: error -52: nested recursion at the same subject position Failed: error -52: nested recursion at the same subject position
/(ab|(bc|(de|(?R))))/I /(ab|(bc|(de|(?R))))/I
Capturing subpattern count = 3 Capture group count = 3
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
abcd abcd
@ -296,7 +296,7 @@ Subject length lower bound = 0
Failed: error -52: nested recursion at the same subject position Failed: error -52: nested recursion at the same subject position
/(ab|(bc|(de|(?1))))/I /(ab|(bc|(de|(?1))))/I
Capturing subpattern count = 3 Capture group count = 3
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
abcd abcd
@ -306,7 +306,7 @@ Subject length lower bound = 0
Failed: error -52: nested recursion at the same subject position Failed: error -52: nested recursion at the same subject position
/x(ab|(bc|(de|(?1)x)x)x)/I /x(ab|(bc|(de|(?1)x)x)x)/I
Capturing subpattern count = 3 Capture group count = 3
First code unit = 'x' First code unit = 'x'
Subject length lower bound = 3 Subject length lower bound = 3
xab123 xab123
@ -352,7 +352,7 @@ Failed: error -52: nested recursion at the same subject position
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Subject length lower bound = 1 Subject length lower bound = 1
abcd abcd
Failed: error -52: nested recursion at the same subject position Failed: error -52: nested recursion at the same subject position
@ -367,7 +367,7 @@ Failed: error -52: nested recursion at the same subject position
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: no_auto_possess Options: no_auto_possess
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
@ -390,7 +390,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Compile options: <none> Compile options: <none>
Overall options: no_auto_possess Overall options: no_auto_possess
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P

View File

@ -3,14 +3,14 @@
# are different without JIT. # are different without JIT.
/abc/I,jit,jitverify /abc/I,jit,jitverify
Capturing subpattern count = 0 Capture group count = 0
First code unit = 'a' First code unit = 'a'
Last code unit = 'c' Last code unit = 'c'
Subject length lower bound = 3 Subject length lower bound = 3
JIT support is not available in this version of PCRE2 JIT support is not available in this version of PCRE2
/a*/I /a*/I
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0

36
testdata/testoutput17 vendored

File diff suppressed because one or more lines are too long

1336
testdata/testoutput2 vendored

File diff suppressed because it is too large Load Diff

View File

@ -32,9 +32,9 @@
#load testsaved2 #load testsaved2
#pop info #pop info
Capturing subpattern count = 2 Capture group count = 2
Max back reference = 2 Max back reference = 2
Named capturing subpatterns: Named capture groups:
n 1 n 1
n 2 n 2
Options: dupnames Options: dupnames
@ -66,8 +66,8 @@ No match, mark = A
4: A 4: A
#pop info #pop info
Capturing subpattern count = 4 Capture group count = 4
Named capturing subpatterns: Named capture groups:
ADDR 2 ADDR 2
ADDRESS_PAT 4 ADDRESS_PAT 4
NAME 1 NAME 1

View File

@ -79,7 +79,7 @@
Failed: error 183 at offset 4: using \C is disabled by the application Failed: error 183 at offset 4: using \C is disabled by the application
/ab\Cde/info /ab\Cde/info
Capturing subpattern count = 0 Capture group count = 0
Contains \C Contains \C
First code unit = 'a' First code unit = 'a'
Last code unit = 'e' Last code unit = 'e'

View File

@ -4,7 +4,7 @@
# in some widths and not in others. # in some widths and not in others.
/ab\Cde/utf,info /ab\Cde/utf,info
Capturing subpattern count = 0 Capture group count = 0
Contains \C Contains \C
Options: utf Options: utf
First code unit = 'a' First code unit = 'a'

View File

@ -4,7 +4,7 @@
# in some widths and not in others. # in some widths and not in others.
/ab\Cde/utf,info /ab\Cde/utf,info
Capturing subpattern count = 0 Capture group count = 0
Contains \C Contains \C
Options: utf Options: utf
First code unit = 'a' First code unit = 'a'

View File

@ -4,7 +4,7 @@
# in some widths and not in others. # in some widths and not in others.
/ab\Cde/utf,info /ab\Cde/utf,info
Capturing subpattern count = 0 Capture group count = 0
Contains \C Contains \C
Options: utf Options: utf
First code unit = 'a' First code unit = 'a'

View File

@ -78,13 +78,13 @@ No match
0: école 0: école
/\w/I /\w/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1 Subject length lower bound = 1
/\w/I,locale=fr_FR /\w/I,locale=fr_FR
Capturing subpattern count = 0 Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
@ -153,7 +153,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í

View File

@ -78,13 +78,13 @@ No match
0: école 0: école
/\w/I /\w/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1 Subject length lower bound = 1
/\w/I,locale=fr_FR /\w/I,locale=fr_FR
Capturing subpattern count = 0 Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
@ -153,7 +153,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í

View File

@ -78,13 +78,13 @@ No match
0: école 0: école
/\w/I /\w/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1 Subject length lower bound = 1
/\w/I,locale=fr_FR /\w/I,locale=fr_FR
Capturing subpattern count = 0 Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
@ -153,7 +153,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í

37
testdata/testoutput4 vendored
View File

@ -3975,4 +3975,41 @@ No match
# ------- # -------
# Test group names containing non-ASCII letters and digits
/(?'ABáC'...)\g{ABáC}/utf
abcabcdefg
0: abcabc
1: abc
/(?'XʰABC'...)/utf
xyzpq
0: xyz
1: xyz
/(?'XאABC'...)/utf
12345
0: 123
1: 123
/(?'XᾈABC'...)/utf
%^&*(...
0: %^&
1: %^&
/(?'𐨐ABC'...)/utf
abcde
0: abc
1: abc
/^(?'אABC'...)(?&אABC)(?P=אABC)/utf
123123123456
0: 123123123
1: 123
/^(?'אABC'...)(?&אABC)/utf
123123123456
0: 123123
1: 123
# End of testinput4 # End of testinput4

93
testdata/testoutput5 vendored
View File

@ -147,7 +147,7 @@ Failed: error 173 at offset 9: disallowed Unicode code point (>= 0xd800 && <= 0x
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = '.' Last code unit = '.'
@ -164,7 +164,7 @@ Subject length lower bound = 4
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Last code unit = 'X' Last code unit = 'X'
Subject length lower bound = 4 Subject length lower bound = 4
@ -179,7 +179,7 @@ Subject length lower bound = 4
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Subject length lower bound = 3 Subject length lower bound = 3
\x{212ab}\x{212ab}\x{212ab}\x{861} \x{212ab}\x{212ab}\x{212ab}\x{861}
@ -193,7 +193,7 @@ Subject length lower bound = 3
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Compile options: utf Compile options: utf
Overall options: anchored utf Overall options: anchored utf
Starting code units: a b Starting code units: a b
@ -238,7 +238,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Options: utf Options: utf
Subject length lower bound = 0 Subject length lower bound = 0
@ -251,7 +251,7 @@ Subject length lower bound = 0
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'a' First code unit = 'a'
Subject length lower bound = 1 Subject length lower bound = 1
@ -264,7 +264,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'a' First code unit = 'a'
Last code unit = 'b' Last code unit = 'b'
@ -291,7 +291,7 @@ No match
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
First code unit = \xff First code unit = \xff
Subject length lower bound = 1 Subject length lower bound = 1
>\xff< >\xff<
@ -304,7 +304,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/[Ä-Ü]/utf /[Ä-Ü]/utf
@ -343,7 +343,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Options: utf Options: utf
Last code unit = 'z' Last code unit = 'z'
Subject length lower bound = 7 Subject length lower bound = 7
@ -363,7 +363,7 @@ Subject length lower bound = 7
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 2 Capture group count = 2
May match empty string May match empty string
Options: utf Options: utf
Subject length lower bound = 0 Subject length lower bound = 0
@ -394,7 +394,7 @@ Subject length lower bound = 0
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 2 Capture group count = 2
May match empty string May match empty string
Options: utf Options: utf
Subject length lower bound = 0 Subject length lower bound = 0
@ -414,7 +414,7 @@ Subject length lower bound = 0
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 2 Capture group count = 2
May match empty string May match empty string
Options: utf Options: utf
Subject length lower bound = 0 Subject length lower bound = 0
@ -445,7 +445,7 @@ Subject length lower bound = 0
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 2 Capture group count = 2
May match empty string May match empty string
Options: utf Options: utf
Subject length lower bound = 0 Subject length lower bound = 0
@ -471,7 +471,7 @@ Subject length lower bound = 0
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Compile options: no_start_optimize utf Compile options: no_start_optimize utf
Overall options: anchored no_start_optimize utf Overall options: anchored no_start_optimize utf
Subject length lower bound = 0 Subject length lower bound = 0
@ -713,7 +713,7 @@ No match
0: \x{1ec5} 0: \x{1ec5}
/a\Rb/I,bsr=anycrlf,utf /a\Rb/I,bsr=anycrlf,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
\R matches CR, LF, or CRLF \R matches CR, LF, or CRLF
First code unit = 'a' First code unit = 'a'
@ -732,7 +732,7 @@ No match
No match No match
/a\Rb/I,bsr=unicode,utf /a\Rb/I,bsr=unicode,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
\R matches any Unicode newline \R matches any Unicode newline
First code unit = 'a' First code unit = 'a'
@ -750,7 +750,7 @@ Subject length lower bound = 3
0: a\x{0b}b 0: a\x{0b}b
/a\R?b/I,bsr=anycrlf,utf /a\R?b/I,bsr=anycrlf,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
\R matches CR, LF, or CRLF \R matches CR, LF, or CRLF
First code unit = 'a' First code unit = 'a'
@ -769,7 +769,7 @@ No match
No match No match
/a\R?b/I,bsr=unicode,utf /a\R?b/I,bsr=unicode,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
\R matches any Unicode newline \R matches any Unicode newline
First code unit = 'a' First code unit = 'a'
@ -1408,22 +1408,22 @@ Failed: error 168 at offset 3: \c must be followed by a printable ASCII characte
2: \x{0d} 2: \x{0d}
/[^\x{1234}]+/Ii,utf /[^\x{1234}]+/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Subject length lower bound = 1 Subject length lower bound = 1
/[^\x{1234}]+?/Ii,utf /[^\x{1234}]+?/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Subject length lower bound = 1 Subject length lower bound = 1
/[^\x{1234}]++/Ii,utf /[^\x{1234}]++/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Subject length lower bound = 1 Subject length lower bound = 1
/[^\x{1234}]{2}/Ii,utf /[^\x{1234}]{2}/Ii,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
Subject length lower bound = 2 Subject length lower bound = 2
@ -1703,7 +1703,7 @@ Partial match: \x{0d}\x{0d}
------------------------------------------------------------------ ------------------------------------------------------------------
/(?<=\x{1234}\x{1234})\bxy/I,utf /(?<=\x{1234}\x{1234})\bxy/I,utf
Capturing subpattern count = 0 Capture group count = 0
Max lookbehind = 2 Max lookbehind = 2
Options: utf Options: utf
First code unit = 'x' First code unit = 'x'
@ -1768,7 +1768,7 @@ Failed: error 173 at offset 6: disallowed Unicode code point (>= 0xd800 && <= 0x
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/[\p{^L}]/IB /[\p{^L}]/IB
@ -1778,7 +1778,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/[\P{L}]/IB /[\P{L}]/IB
@ -1788,7 +1788,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/[\P{^L}]/IB /[\P{^L}]/IB
@ -1798,7 +1798,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/[abc\p{L}\x{0660}]/IB,utf /[abc\p{L}\x{0660}]/IB,utf
@ -1808,7 +1808,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Subject length lower bound = 1 Subject length lower bound = 1
@ -1819,7 +1819,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Subject length lower bound = 1 Subject length lower bound = 1
1234 1234
@ -1832,7 +1832,7 @@ Subject length lower bound = 1
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
Subject length lower bound = 1 Subject length lower bound = 1
1234 1234
@ -2998,7 +2998,7 @@ Partial match: AA
Ket Ket
End End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: caseless utf Options: caseless utf
First code unit = 'A' (caseless) First code unit = 'A' (caseless)
Last code unit = 'B' (caseless) Last code unit = 'B' (caseless)
@ -3914,7 +3914,7 @@ No match
------------------------------------------------------------------ ------------------------------------------------------------------
/^s?c/Iim,utf /^s?c/Iim,utf
Capturing subpattern count = 0 Capture group count = 0
Options: caseless multiline utf Options: caseless multiline utf
First code unit at start or follows newline First code unit at start or follows newline
Last code unit = 'c' (caseless) Last code unit = 'c' (caseless)
@ -4889,4 +4889,31 @@ MK: ABC
# ------- # -------
# Test reference and errors in non-ASCII characters in group names
/(?'𑠅ABC'...)/I,utf
Capture group count = 1
Named capture groups:
𑠅ABC 1
Options: utf
Subject length lower bound = 3
abcde\=copy=𑠅ABC
0: abc
1: abc
C abc (3) 𑠅ABC (group 1)
# Bad ones
/(?'AB၌C'...)\g{AB၌C}/utf
Failed: error 142 at offset 5: syntax error in subpattern name (missing terminator?)
/(?'٠ABC'...)/utf
Failed: error 144 at offset 3: subpattern name must start with a non-digit
/(?'²ABC'...)/utf
Failed: error 162 at offset 3: subpattern name expected
/(?'X²ABC'...)/utf
Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?)
# End of testinput5 # End of testinput5

30
testdata/testoutput6 vendored
View File

@ -5978,7 +5978,7 @@ Partial match: 123
0: Content-Type:xxxyyyz 0: Content-Type:xxxyyyz
/^abc/Im,newline=lf /^abc/Im,newline=lf
Capturing subpattern count = 0 Capture group count = 0
Options: multiline Options: multiline
Forced newline is LF Forced newline is LF
First code unit at start or follows newline First code unit at start or follows newline
@ -6001,7 +6001,7 @@ No match
No match No match
/^abc/Im,newline=crlf /^abc/Im,newline=crlf
Capturing subpattern count = 0 Capture group count = 0
Options: multiline Options: multiline
Forced newline is CRLF Forced newline is CRLF
First code unit at start or follows newline First code unit at start or follows newline
@ -6016,7 +6016,7 @@ No match
No match No match
/^abc/Im,newline=cr /^abc/Im,newline=cr
Capturing subpattern count = 0 Capture group count = 0
Options: multiline Options: multiline
Forced newline is CR Forced newline is CR
First code unit at start or follows newline First code unit at start or follows newline
@ -6031,7 +6031,7 @@ No match
No match No match
/.*/I,newline=lf /.*/I,newline=lf
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Forced newline is LF Forced newline is LF
First code unit at start or follows newline First code unit at start or follows newline
@ -6044,7 +6044,7 @@ Subject length lower bound = 0
0: abc\x0d 0: abc\x0d
/.*/I,newline=cr /.*/I,newline=cr
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Forced newline is CR Forced newline is CR
First code unit at start or follows newline First code unit at start or follows newline
@ -6057,7 +6057,7 @@ Subject length lower bound = 0
0: abc 0: abc
/.*/I,newline=crlf /.*/I,newline=crlf
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Forced newline is CRLF Forced newline is CRLF
First code unit at start or follows newline First code unit at start or follows newline
@ -6070,7 +6070,7 @@ Subject length lower bound = 0
0: abc 0: abc
/\w+(.)(.)?def/Is /\w+(.)(.)?def/Is
Capturing subpattern count = 2 Capture group count = 2
Options: dotall Options: dotall
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
@ -6447,7 +6447,7 @@ No match
0: \x0aA 0: \x0aA
/a\Rb/I,bsr=anycrlf /a\Rb/I,bsr=anycrlf
Capturing subpattern count = 0 Capture group count = 0
\R matches CR, LF, or CRLF \R matches CR, LF, or CRLF
First code unit = 'a' First code unit = 'a'
Last code unit = 'b' Last code unit = 'b'
@ -6465,7 +6465,7 @@ No match
No match No match
/a\Rb/I,bsr=unicode /a\Rb/I,bsr=unicode
Capturing subpattern count = 0 Capture group count = 0
\R matches any Unicode newline \R matches any Unicode newline
First code unit = 'a' First code unit = 'a'
Last code unit = 'b' Last code unit = 'b'
@ -6482,7 +6482,7 @@ Subject length lower bound = 3
0: a\x0bb 0: a\x0bb
/a\R?b/I,bsr=anycrlf /a\R?b/I,bsr=anycrlf
Capturing subpattern count = 0 Capture group count = 0
\R matches CR, LF, or CRLF \R matches CR, LF, or CRLF
First code unit = 'a' First code unit = 'a'
Last code unit = 'b' Last code unit = 'b'
@ -6500,7 +6500,7 @@ No match
No match No match
/a\R?b/I,bsr=unicode /a\R?b/I,bsr=unicode
Capturing subpattern count = 0 Capture group count = 0
\R matches any Unicode newline \R matches any Unicode newline
First code unit = 'a' First code unit = 'a'
Last code unit = 'b' Last code unit = 'b'
@ -6517,7 +6517,7 @@ Subject length lower bound = 2
0: a\x0bb 0: a\x0bb
/a\R{2,4}b/I,bsr=anycrlf /a\R{2,4}b/I,bsr=anycrlf
Capturing subpattern count = 0 Capture group count = 0
\R matches CR, LF, or CRLF \R matches CR, LF, or CRLF
First code unit = 'a' First code unit = 'a'
Last code unit = 'b' Last code unit = 'b'
@ -6535,7 +6535,7 @@ No match
No match No match
/a\R{2,4}b/I,bsr=unicode /a\R{2,4}b/I,bsr=unicode
Capturing subpattern count = 0 Capture group count = 0
\R matches any Unicode newline \R matches any Unicode newline
First code unit = 'a' First code unit = 'a'
Last code unit = 'b' Last code unit = 'b'
@ -6831,7 +6831,7 @@ Partial match: +ab
0+ CBA 0+ CBA
/(abc|def|xyz)/I /(abc|def|xyz)/I
Capturing subpattern count = 1 Capture group count = 1
Starting code units: a d x Starting code units: a d x
Subject length lower bound = 3 Subject length lower bound = 3
terhjk;abcdaadsfe terhjk;abcdaadsfe
@ -6843,7 +6843,7 @@ Subject length lower bound = 3
No match No match
/(abc|def|xyz)/I,no_start_optimize /(abc|def|xyz)/I,no_start_optimize
Capturing subpattern count = 1 Capture group count = 1
Options: no_start_optimize Options: no_start_optimize
Subject length lower bound = 0 Subject length lower bound = 0
terhjk;abcdaadsfe terhjk;abcdaadsfe

View File

@ -1030,7 +1030,7 @@ No match
No match No match
/a\Rb/I,bsr=anycrlf,utf /a\Rb/I,bsr=anycrlf,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
\R matches CR, LF, or CRLF \R matches CR, LF, or CRLF
First code unit = 'a' First code unit = 'a'
@ -1049,7 +1049,7 @@ No match
No match No match
/a\Rb/I,bsr=unicode,utf /a\Rb/I,bsr=unicode,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
\R matches any Unicode newline \R matches any Unicode newline
First code unit = 'a' First code unit = 'a'
@ -1067,7 +1067,7 @@ Subject length lower bound = 3
0: a\x{0b}b 0: a\x{0b}b
/a\R?b/I,bsr=anycrlf,utf /a\R?b/I,bsr=anycrlf,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
\R matches CR, LF, or CRLF \R matches CR, LF, or CRLF
First code unit = 'a' First code unit = 'a'
@ -1086,7 +1086,7 @@ No match
No match No match
/a\R?b/I,bsr=unicode,utf /a\R?b/I,bsr=unicode,utf
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
\R matches any Unicode newline \R matches any Unicode newline
First code unit = 'a' First code unit = 'a'

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 10
2 2 Ket 2 2 Ket
4 End 4 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Options: extended Options: extended
Subject length lower bound = 0 Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 14
4 4 Ket 4 4 Ket
6 End 6 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: extended Options: extended
First code unit = 'a' First code unit = 'a'
Subject length lower bound = 1 Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 26
10 10 Ket 10 10 Ket
12 End 12 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = '.' Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 22
8 8 Ket 8 8 Ket
10 End 10 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{d55c} First code unit = \x{d55c}
Last code unit = \x{c5b4} Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 22
8 8 Ket 8 8 Ket
10 End 10 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{65e5} First code unit = \x{65e5}
Last code unit = \x{8a9e} Last code unit = \x{8a9e}
@ -904,7 +904,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
79 79 Ket 79 79 Ket
81 End 81 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -938,7 +938,7 @@ Subject length lower bound = 0
43 43 Ket 43 43 Ket
45 End 45 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -1011,7 +1011,7 @@ No match
133 133 Ket 133 133 Ket
135 End 135 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 10 Capture group count = 10
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 14
3 3 Ket 3 3 Ket
6 End 6 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Options: extended Options: extended
Subject length lower bound = 0 Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 18
5 5 Ket 5 5 Ket
8 End 8 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: extended Options: extended
First code unit = 'a' First code unit = 'a'
Subject length lower bound = 1 Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 30
11 11 Ket 11 11 Ket
14 End 14 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = '.' Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 26
9 9 Ket 9 9 Ket
12 End 12 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{d55c} First code unit = \x{d55c}
Last code unit = \x{c5b4} Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 26
9 9 Ket 9 9 Ket
12 End 12 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{65e5} First code unit = \x{65e5}
Last code unit = \x{8a9e} Last code unit = \x{8a9e}
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
110 110 Ket 110 110 Ket
113 End 113 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
58 58 Ket 58 58 Ket
61 End 61 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
194 194 Ket 194 194 Ket
197 End 197 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 10 Capture group count = 10
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 14
3 3 Ket 3 3 Ket
6 End 6 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Options: extended Options: extended
Subject length lower bound = 0 Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 18
5 5 Ket 5 5 Ket
8 End 8 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: extended Options: extended
First code unit = 'a' First code unit = 'a'
Subject length lower bound = 1 Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 30
11 11 Ket 11 11 Ket
14 End 14 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = '.' Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 26
9 9 Ket 9 9 Ket
12 End 12 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{d55c} First code unit = \x{d55c}
Last code unit = \x{c5b4} Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 26
9 9 Ket 9 9 Ket
12 End 12 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{65e5} First code unit = \x{65e5}
Last code unit = \x{8a9e} Last code unit = \x{8a9e}
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
110 110 Ket 110 110 Ket
113 End 113 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
58 58 Ket 58 58 Ket
61 End 61 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
194 194 Ket 194 194 Ket
197 End 197 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 10 Capture group count = 10
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 20
2 2 Ket 2 2 Ket
4 End 4 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Options: extended Options: extended
Subject length lower bound = 0 Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 28
4 4 Ket 4 4 Ket
6 End 6 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: extended Options: extended
First code unit = 'a' First code unit = 'a'
Subject length lower bound = 1 Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 52
10 10 Ket 10 10 Ket
12 End 12 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = '.' Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 44
8 8 Ket 8 8 Ket
10 End 10 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{d55c} First code unit = \x{d55c}
Last code unit = \x{c5b4} Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 44
8 8 Ket 8 8 Ket
10 End 10 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{65e5} First code unit = \x{65e5}
Last code unit = \x{8a9e} Last code unit = \x{8a9e}
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
79 79 Ket 79 79 Ket
81 End 81 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
43 43 Ket 43 43 Ket
45 End 45 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
133 133 Ket 133 133 Ket
135 End 135 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 10 Capture group count = 10
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 20
2 2 Ket 2 2 Ket
4 End 4 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Options: extended Options: extended
Subject length lower bound = 0 Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 28
4 4 Ket 4 4 Ket
6 End 6 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: extended Options: extended
First code unit = 'a' First code unit = 'a'
Subject length lower bound = 1 Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 52
10 10 Ket 10 10 Ket
12 End 12 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = '.' Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 44
8 8 Ket 8 8 Ket
10 End 10 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{d55c} First code unit = \x{d55c}
Last code unit = \x{c5b4} Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 44
8 8 Ket 8 8 Ket
10 End 10 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{65e5} First code unit = \x{65e5}
Last code unit = \x{8a9e} Last code unit = \x{8a9e}
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
79 79 Ket 79 79 Ket
81 End 81 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
43 43 Ket 43 43 Ket
45 End 45 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
133 133 Ket 133 133 Ket
135 End 135 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 10 Capture group count = 10
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 20
2 2 Ket 2 2 Ket
4 End 4 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Options: extended Options: extended
Subject length lower bound = 0 Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 28
4 4 Ket 4 4 Ket
6 End 6 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: extended Options: extended
First code unit = 'a' First code unit = 'a'
Subject length lower bound = 1 Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 52
10 10 Ket 10 10 Ket
12 End 12 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = '.' Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 44
8 8 Ket 8 8 Ket
10 End 10 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{d55c} First code unit = \x{d55c}
Last code unit = \x{c5b4} Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 44
8 8 Ket 8 8 Ket
10 End 10 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \x{65e5} First code unit = \x{65e5}
Last code unit = \x{8a9e} Last code unit = \x{8a9e}
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
79 79 Ket 79 79 Ket
81 End 81 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
43 43 Ket 43 43 Ket
45 End 45 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
133 133 Ket 133 133 Ket
135 End 135 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 10 Capture group count = 10
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 7
3 3 Ket 3 3 Ket
6 End 6 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Options: extended Options: extended
Subject length lower bound = 0 Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 9
5 5 Ket 5 5 Ket
8 End 8 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: extended Options: extended
First code unit = 'a' First code unit = 'a'
Subject length lower bound = 1 Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 18
14 14 Ket 14 14 Ket
17 End 17 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = '.' Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 19
15 15 Ket 15 15 Ket
18 End 18 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xed First code unit = \xed
Last code unit = \xb4 Last code unit = \xb4
@ -404,7 +404,7 @@ Memory allocation (code space): 19
15 15 Ket 15 15 Ket
18 End 18 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xe6 First code unit = \xe6
Last code unit = \x9e Last code unit = \x9e
@ -904,7 +904,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
119 119 Ket 119 119 Ket
122 End 122 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -938,7 +938,7 @@ Subject length lower bound = 0
61 61 Ket 61 61 Ket
64 End 64 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -1011,7 +1011,7 @@ No match
205 205 Ket 205 205 Ket
208 End 208 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 10 Capture group count = 10
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 9
4 4 Ket 4 4 Ket
8 End 8 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Options: extended Options: extended
Subject length lower bound = 0 Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 11
6 6 Ket 6 6 Ket
10 End 10 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: extended Options: extended
First code unit = 'a' First code unit = 'a'
Subject length lower bound = 1 Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 20
15 15 Ket 15 15 Ket
19 End 19 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = '.' Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 21
16 16 Ket 16 16 Ket
20 End 20 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xed First code unit = \xed
Last code unit = \xb4 Last code unit = \xb4
@ -404,7 +404,7 @@ Memory allocation (code space): 21
16 16 Ket 16 16 Ket
20 End 20 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xe6 First code unit = \xe6
Last code unit = \x9e Last code unit = \x9e
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
150 150 Ket 150 150 Ket
154 End 154 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
76 76 Ket 76 76 Ket
80 End 80 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
266 266 Ket 266 266 Ket
270 End 270 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 10 Capture group count = 10
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 11
5 5 Ket 5 5 Ket
10 End 10 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
May match empty string May match empty string
Options: extended Options: extended
Subject length lower bound = 0 Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 13
7 7 Ket 7 7 Ket
12 End 12 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: extended Options: extended
First code unit = 'a' First code unit = 'a'
Subject length lower bound = 1 Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 22
16 16 Ket 16 16 Ket
21 End 21 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = 'A' First code unit = 'A'
Last code unit = '.' Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 23
17 17 Ket 17 17 Ket
22 End 22 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xed First code unit = \xed
Last code unit = \xb4 Last code unit = \xb4
@ -404,7 +404,7 @@ Memory allocation (code space): 23
17 17 Ket 17 17 Ket
22 End 22 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 0 Capture group count = 0
Options: utf Options: utf
First code unit = \xe6 First code unit = \xe6
Last code unit = \x9e Last code unit = \x9e
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
181 181 Ket 181 181 Ket
186 End 186 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
91 91 Ket 91 91 Ket
96 End 96 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 1 Capture group count = 1
Max back reference = 1 Max back reference = 1
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
327 327 Ket 327 327 Ket
332 End 332 End
------------------------------------------------------------------ ------------------------------------------------------------------
Capturing subpattern count = 10 Capture group count = 10
May match empty string May match empty string
Subject length lower bound = 0 Subject length lower bound = 0

12
testdata/testoutput9 vendored
View File

@ -215,7 +215,7 @@ Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional trailing comment \) )* # optional trailing comment
/Ix /Ix
Capturing subpattern count = 0 Capture group count = 0
Contains explicit CR or LF match Contains explicit CR or LF match
Options: extended Options: extended
Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
@ -224,25 +224,25 @@ Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
Subject length lower bound = 3 Subject length lower bound = 3
/\h/I /\h/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x09 \x20 \xa0 Starting code units: \x09 \x20 \xa0
Subject length lower bound = 1 Subject length lower bound = 1
/\H/I /\H/I
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/\v/I /\v/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 Starting code units: \x0a \x0b \x0c \x0d \x85
Subject length lower bound = 1 Subject length lower bound = 1
/\V/I /\V/I
Capturing subpattern count = 0 Capture group count = 0
Subject length lower bound = 1 Subject length lower bound = 1
/\R/I /\R/I
Capturing subpattern count = 0 Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 Starting code units: \x0a \x0b \x0c \x0d \x85
Subject length lower bound = 1 Subject length lower bound = 1