Allow non-ASCII in group names when UTF is set; revise group naming terminology

in documentation to use "capture group", as Perl does.
This commit is contained in:
Philip.Hazel 2019-02-06 18:11:36 +00:00
parent a657d4cff8
commit d7b10a57d1
60 changed files with 4236 additions and 4025 deletions

View File

@ -121,6 +121,9 @@ the option applies only to unrecognized or malformed escape sequences.
tests such as (?(VERSION>=0)...) when the version test was true. Incorrect
processing or a crash could result.
30. When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group
names, as Perl does.
Version 10.32 10-September-2018
-------------------------------

View File

@ -27,8 +27,8 @@ DESCRIPTION
</b><br>
<P>
This convenience function finds, for a compiled pattern, the first and last
entries for a given name in the table that translates capturing parenthesis
names into numbers.
entries for a given name in the table that translates capture group names into
numbers.
<pre>
<i>code</i> Compiled regular expression
<i>name</i> Name whose entries required

View File

@ -49,7 +49,7 @@ please consult the man page, in case the conversion went wrong.
<li><a name="TOC34" href="#SEC34">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
<li><a name="TOC35" href="#SEC35">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
<li><a name="TOC36" href="#SEC36">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
<li><a name="TOC37" href="#SEC37">DUPLICATE SUBPATTERN NAMES</a>
<li><a name="TOC37" href="#SEC37">DUPLICATE CAPTURE GROUP NAMES</a>
<li><a name="TOC38" href="#SEC38">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
<li><a name="TOC39" href="#SEC39">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
<li><a name="TOC40" href="#SEC40">SEE ALSO</a>
@ -1490,10 +1490,10 @@ independent of the setting of PCRE2_DOTALL.
<pre>
PCRE2_DUPNAMES
</pre>
If this bit is set, names used to identify capturing subpatterns need not be
unique. This can be helpful for certain types of pattern when it is known that
only one instance of the named subpattern can ever be matched. There are more
details of named subpatterns below; see also the
If this bit is set, names used to identify capture groups need not be unique.
This can be helpful for certain types of pattern when it is known that only one
instance of the named group can ever be matched. There are more details of
named capture groups below; see also the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation.
<pre>
@ -1526,11 +1526,11 @@ the end of the subject.
If this bit is set, most white space characters in the pattern are totally
ignored except when escaped or inside a character class. However, white space
is not allowed within sequences such as (?&#62; that introduce various
parenthesized subpatterns, nor within numerical quantifiers such as {1,3}.
Ignorable white space is permitted between an item and a following quantifier
and between a quantifier and a following + that indicates possessiveness.
PCRE2_EXTENDED is equivalent to Perl's /x option, and it can be changed within
a pattern by a (?x) option setting.
parenthesized groups, nor within numerical quantifiers such as {1,3}. Ignorable
white space is permitted between an item and a following quantifier and between
a quantifier and a following + that indicates possessiveness. PCRE2_EXTENDED is
equivalent to Perl's /x option, and it can be changed within a pattern by a
(?x) option setting.
</P>
<P>
When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as
@ -1606,7 +1606,7 @@ error.
<pre>
PCRE2_MATCH_UNSET_BACKREF
</pre>
If this option is set, a backreference to an unset subpattern group matches an
If this option is set, a backreference to an unset capture group matches an
empty string (by default this causes the current matching alternative to fail).
A pattern such as (\1)(a) succeeds when this option is set (assuming it can
find an "a" in the subject), whereas it fails by default, for Perl
@ -1668,7 +1668,7 @@ If this option is set, it disables the use of numbered capturing parentheses in
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
were followed by ?: but named parentheses can still be used for capturing (and
they acquire numbers in the usual way). This is the same as Perl's /n option.
Note that, when this option is set, references to capturing groups
Note that, when this option is set, references to capture groups
(backreferences or recursion/subroutine calls) may only refer to named groups,
though the reference can be by name or by number.
<pre>
@ -1687,7 +1687,7 @@ purposes.
If this option is set, it disables an optimization that is applied when .* is
the first significant item in a top-level branch of a pattern, and all the
other branches also start with .* or with \A or \G or ^. The optimization is
automatically disabled for .* if it is inside an atomic group or a capturing
automatically disabled for .* if it is inside an atomic group or a capture
group that is the subject of a backreference, or if the pattern contains
(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
automatically anchored if PCRE2_DOTALL is set for all the .* items and
@ -2066,7 +2066,7 @@ When .* is the first significant item, anchoring is possible only when all the
following are true:
<pre>
.* is not in an atomic group
.* is not in a capturing group that is the subject of a backreference
.* is not in a capture group that is the subject of a backreference
PCRE2_DOTALL is in force for .*
Neither (*PRUNE) nor (*SKIP) appears in the pattern
PCRE2_NO_DOTSTAR_ANCHOR is not set
@ -2077,12 +2077,12 @@ options returned for PCRE2_INFO_ALLOPTIONS.
PCRE2_INFO_BACKREFMAX
</pre>
Return the number of the highest backreference in the pattern. The third
argument should point to an <b>uint32_t</b> variable. Named subpatterns acquire
numbers as well as names, and these count towards the highest backreference.
Backreferences such as \4 or \g{12} match the captured characters of the
given group, but in addition, the check that a capturing group is set in a
conditional subpattern such as (?(3)a|b) is also a backreference. Zero is
returned if there are no backreferences.
argument should point to an <b>uint32_t</b> variable. Named capture groups
acquire numbers as well as names, and these count towards the highest
backreference. Backreferences such as \4 or \g{12} match the captured
characters of the given group, but in addition, the check that a capture
group is set in a conditional group such as (?(3)a|b) is also a backreference.
Zero is returned if there are no backreferences.
<pre>
PCRE2_INFO_BSR
</pre>
@ -2093,9 +2093,9 @@ that \R matches only CR, LF, or CRLF.
<pre>
PCRE2_INFO_CAPTURECOUNT
</pre>
Return the highest capturing subpattern number in the pattern. In patterns
where (?| is not used, this is also the total number of capturing subpatterns.
The third argument should point to an <b>uint32_t</b> variable.
Return the highest capture group number in the pattern. In patterns where (?|
is not used, this is also the total number of capture groups. The third
argument should point to an <b>uint32_t</b> variable.
<pre>
PCRE2_INFO_DEPTHLIMIT
</pre>
@ -2143,7 +2143,7 @@ Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
without the use of JIT. The third argument should point to a <b>size_t</b>
variable. The frame size depends on the number of capturing parentheses in the
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
pattern. Each additional capture group adds two PCRE2_SIZE variables.
<pre>
PCRE2_INFO_HASBACKSLASHC
</pre>
@ -2267,20 +2267,20 @@ the parenthesis number. The rest of the entry is the corresponding name, zero
terminated.
</P>
<P>
The names are in alphabetical order. If (?| is used to create multiple groups
with the same number, as described in the
<a href="pcre2pattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>
The names are in alphabetical order. If (?| is used to create multiple capture
groups with the same number, as described in the
<a href="pcre2pattern.html#dupgroupnumber">section on duplicate group numbers</a>
in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
page, the groups may be given the same name, but there is only one entry in the
table. Different names for groups of the same number are not permitted.
</P>
<P>
Duplicate names for subpatterns with different numbers are permitted, but only
if PCRE2_DUPNAMES is set. They appear in the table in the order in which they
were found in the pattern. In the absence of (?| this is the order of
Duplicate names for capture groups with different numbers are permitted, but
only if PCRE2_DUPNAMES is set. They appear in the table in the order in which
they were found in the pattern. In the absence of (?| this is the order of
increasing number; when (?| is used this is not necessarily the case because
later subpatterns may have lower numbers.
later capture groups may have lower numbers.
</P>
<P>
As a simple example of the name/number table, consider the following pattern
@ -2289,16 +2289,16 @@ space - including newlines - is ignored):
<pre>
(?&#60;date&#62; (?&#60;year&#62;(\d\d)?\d\d) - (?&#60;month&#62;\d\d) - (?&#60;day&#62;\d\d) )
</pre>
There are four named subpatterns, so the table has four entries, and each entry
in the table is eight bytes long. The table is as follows, with non-printing
bytes shows in hexadecimal, and undefined bytes shown as ??:
There are four named capture groups, so the table has four entries, and each
entry in the table is eight bytes long. The table is as follows, with
non-printing bytes shows in hexadecimal, and undefined bytes shown as ??:
<pre>
00 01 d a t e 00 ??
00 05 d a y 00 ?? ??
00 04 m o n t h 00
00 02 y e a r 00 ??
</pre>
When writing code to extract data from named subpatterns using the
When writing code to extract data from named capture groups using the
name-to-number map, remember that the length of the entries is likely to be
different for each compiled pattern.
<pre>
@ -2741,12 +2741,12 @@ valid newline sequence and explicit \r or \n escapes appear in the pattern.
In general, a pattern matches a certain portion of the subject, and in
addition, further substrings from the subject may be picked out by
parenthesized parts of the pattern. Following the usage in Jeffrey Friedl's
book, this is called "capturing" in what follows, and the phrase "capturing
subpattern" or "capturing group" is used for a fragment of a pattern that picks
out a substring. PCRE2 supports several other kinds of parenthesized subpattern
that do not cause substrings to be captured. The <b>pcre2_pattern_info()</b>
function can be used to find out how many capturing subpatterns there are in a
compiled pattern.
book, this is called "capturing" in what follows, and the phrase "capture
group" (Perl terminology) is used for a fragment of a pattern that picks out a
substring. PCRE2 supports several other kinds of parenthesized group that do
not cause substrings to be captured. The <b>pcre2_pattern_info()</b> function
can be used to find out how many capture groups there are in a compiled
pattern.
</P>
<P>
You can use auxiliary functions for accessing captured substrings
@ -2795,9 +2795,8 @@ For example, if the pattern (?=ab\K) is matched against "ab", the start and
end offset values for the match are 2 and 0.
</P>
<P>
If a capturing subpattern group is matched repeatedly within a single match
operation, it is the last portion of the subject that it matched that is
returned.
If a capture group is matched repeatedly within a single match operation, it is
the last portion of the subject that it matched that is returned.
</P>
<P>
If the ovector is too small to hold all the captured substring offsets, as much
@ -2806,21 +2805,20 @@ substrings are not of interest, <b>pcre2_match()</b> may be called with a match
data block whose ovector is of minimum length (that is, one pair).
</P>
<P>
It is possible for capturing subpattern number <i>n+1</i> to match some part of
the subject when subpattern <i>n</i> has not been used at all. For example, if
the string "abc" is matched against the pattern (a|(z))(bc) the return from the
function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this
happens, both values in the offset pairs corresponding to unused subpatterns
are set to PCRE2_UNSET.
It is possible for capture group number <i>n+1</i> to match some part of the
subject when group <i>n</i> has not been used at all. For example, if the string
"abc" is matched against the pattern (a|(z))(bc) the return from the function
is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both
values in the offset pairs corresponding to unused groups are set to
PCRE2_UNSET.
</P>
<P>
Offset values that correspond to unused subpatterns at the end of the
expression are also set to PCRE2_UNSET. For example, if the string "abc" is
matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched.
The return from the function is 2, because the highest used capturing
subpattern number is 1. The offsets for for the second and third capturing
subpatterns (assuming the vector is large enough, of course) are set to
PCRE2_UNSET.
Offset values that correspond to unused groups at the end of the expression are
also set to PCRE2_UNSET. For example, if the string "abc" is matched against
the pattern (abc)(x(yz)?)? groups 2 and 3 are not matched. The return from the
function is 2, because the highest used capture group number is 1. The offsets
for for the second and third capture groupss (assuming the vector is large
enough, of course) are set to PCRE2_UNSET.
</P>
<P>
Elements in the ovector that do not correspond to capturing parentheses in the
@ -2993,11 +2991,11 @@ as NULL.
</pre>
This error is returned when <b>pcre2_match()</b> detects a recursion loop within
the pattern. Specifically, it means that either the whole pattern or a
subpattern has been called recursively for the second time at the same position
in the subject string. Some simple patterns that might do this are detected and
faulted at compile time, but more complicated cases, in particular mutual
recursions between two different subpatterns, cannot be detected until matching
is attempted.
capture group has been called recursively for the second time at the same
position in the subject string. Some simple patterns that might do this are
detected and faulted at compile time, but more complicated cases, in particular
mutual recursions between two different groups, cannot be detected until
matching is attempted.
<a name="geterrormessage"></a></P>
<br><a name="SEC32" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
<P>
@ -3074,7 +3072,7 @@ The <b>pcre2_substring_copy_bynumber()</b> function copies a captured substring
into a supplied buffer, whereas <b>pcre2_substring_get_bynumber()</b> copies it
into new memory, obtained using the same memory allocation function that was
used for the match data block. The first two arguments of these functions are a
pointer to the match data block and a capturing group number.
pointer to the match data block and a capture group number.
</P>
<P>
The final arguments of <b>pcre2_substring_copy_bynumber()</b> are a pointer to
@ -3150,9 +3148,9 @@ calling <b>pcre2_substring_list_free()</b>.
</P>
<P>
If this function encounters a substring that is unset, which can happen when
capturing subpattern number <i>n+1</i> matches some part of the subject, but
subpattern <i>n</i> has not been used at all, it returns an empty string. This
can be distinguished from a genuine zero-length substring by inspecting the
capture group number <i>n+1</i> matches some part of the subject, but group
<i>n</i> has not been used at all, it returns an empty string. This can be
distinguished from a genuine zero-length substring by inspecting the
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
<a name="extractbyname"></a></P>
@ -3182,21 +3180,21 @@ For example, for this pattern:
<pre>
(a+)b(?&#60;xxx&#62;\d+)...
</pre>
the number of the subpattern called "xxx" is 2. If the name is known to be
the number of the capture group called "xxx" is 2. If the name is known to be
unique (PCRE2_DUPNAMES was not set), you can find the number from the name by
calling <b>pcre2_substring_number_from_name()</b>. The first argument is the
compiled pattern, and the second is the name. The yield of the function is the
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
that name. Given the number, you can extract the substring directly from the
ovector, or use one of the "bynumber" functions described above.
group number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or
PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name.
Given the number, you can extract the substring directly from the ovector, or
use one of the "bynumber" functions described above.
</P>
<P>
For convenience, there are also "byname" functions that correspond to the
"bynumber" functions, the only difference being that the second argument is a
name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
names, these functions scan all the groups with the given name, and return the
first named string that is set.
captured substring from the first named group that is set.
</P>
<P>
If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
@ -3207,13 +3205,13 @@ set, PCRE2_ERROR_UNSET is returned.
</P>
<P>
<b>Warning:</b> If the pattern uses the (?| feature to set up multiple
subpatterns with the same number, as described in the
<a href="pcre2pattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>
capture groups with the same number, as described in the
<a href="pcre2pattern.html#dupgroupnumber">section on duplicate group numbers</a>
in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
page, you cannot use names to distinguish the different subpatterns, because
page, you cannot use names to distinguish the different capture groups, because
names are not included in the compiled code. The matching process uses only
numbers. For this reason, the use of different names for subpatterns of the
numbers. For this reason, the use of different names for groups with the
same number causes an error at compile time.
<a name="substitutions"></a></P>
<br><a name="SEC36" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
@ -3276,7 +3274,7 @@ length is in code units, not bytes.
In the replacement string, which is interpreted as a UTF string in UTF mode,
and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
dollar character is an escape character that can specify the insertion of
characters from capturing groups or names from (*MARK) or other control verbs
characters from capture groups or names from (*MARK) or other control verbs
in the pattern. The following forms are always recognized:
<pre>
$$ insert a dollar character
@ -3345,13 +3343,13 @@ efficient to allocate a large buffer and free the excess afterwards, instead of
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
</P>
<P>
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capturing groups that do
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do
not appear in the pattern to be treated as unset groups. This option should be
used with care, because it means that a typo in a group name or number no
longer causes the PCRE2_ERROR_NOSUBSTRING error.
</P>
<P>
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups (including unknown
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including unknown
groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty
strings when inserted as described above. If this option is not set, an attempt
to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does
@ -3379,7 +3377,7 @@ terminating a \Q quoted sequence) reverts to no case forcing. The sequences
\u and \l force the next character (if it is a letter) to upper or lower
case, respectively, and then the state automatically reverts to no case
forcing. Case forcing applies to all inserted characters, including those from
captured groups and letters within \Q...\E quoted sequences.
capture groups and letters within \Q...\E quoted sequences.
</P>
<P>
Note that case forcing sequences such as \U...\E do not nest. For example,
@ -3388,7 +3386,8 @@ effect.
</P>
<P>
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
flexibility to group substitution. The syntax is similar to that used by Bash:
flexibility to capture group substitution. The syntax is similar to that used
by Bash:
<pre>
${&#60;n&#62;:-&#60;string&#62;}
${&#60;n&#62;:+&#60;string1&#62;:&#60;string2&#62;}
@ -3518,20 +3517,21 @@ PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the
output and the call to <b>pcre2_substitute()</b> exits, returning the number of
matches so far.
</P>
<br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
<br><a name="SEC37" href="#TOC1">DUPLICATE CAPTURE GROUP NAMES</a><br>
<P>
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
</P>
<P>
When a pattern is compiled with the PCRE2_DUPNAMES option, names for
subpatterns are not required to be unique. Duplicate names are always allowed
for subpatterns with the same number, created by using the (?| feature. Indeed,
if such subpatterns are named, they are required to use the same names.
When a pattern is compiled with the PCRE2_DUPNAMES option, names for capture
groups are not required to be unique. Duplicate names are always allowed for
groups with the same number, created by using the (?| feature. Indeed, if such
groups are named, they are required to use the same names.
</P>
<P>
Normally, patterns with duplicate names are such that in any one match, only
one of the named subpatterns participates. An example is shown in the
Normally, patterns that use duplicate names are such that in any one match,
only one of each set of identically-named groups participates. An example is
shown in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation.
</P>
@ -3703,9 +3703,8 @@ the three matched strings are
On success, the yield of the function is a number greater than zero, which is
the number of matched substrings. The offsets of the substrings are returned in
the ovector, and can be extracted by number in the same way as for
<b>pcre2_match()</b>, but the numbers bear no relation to any capturing groups
that may exist in the pattern, because DFA matching does not support group
capture.
<b>pcre2_match()</b>, but the numbers bear no relation to any capture groups
that may exist in the pattern, because DFA matching does not support capturing.
</P>
<P>
Calls to the convenience functions that extract substrings by name
@ -3747,7 +3746,7 @@ a backreference.
</pre>
This return is given if <b>pcre2_dfa_match()</b> encounters a condition item
that uses a backreference for the condition, or a test for recursion in a
specific group. These are not supported.
specific capture group. These are not supported.
<pre>
PCRE2_ERROR_DFA_WSSIZE
</pre>
@ -3756,9 +3755,9 @@ This return is given if <b>pcre2_dfa_match()</b> runs out of space in the
<pre>
PCRE2_ERROR_DFA_RECURSE
</pre>
When a recursive subpattern is processed, the matching function calls itself
recursively, using private memory for the ovector and <i>workspace</i>. This
error is given if the internal ovector is not large enough. This should be
When a recursion or subroutine call is processed, the matching function calls
itself recursively, using private memory for the ovector and <i>workspace</i>.
This error is given if the internal ovector is not large enough. This should be
extremely rare, as a vector of size 1000 is used.
<pre>
PCRE2_ERROR_DFA_BADRESTART
@ -3785,7 +3784,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 04 January 2019
Last updated: 04 February 2019
<br>
Copyright &copy; 1997-2019 University of Cambridge.
<br>

View File

@ -151,7 +151,7 @@ branch, automatic anchoring occurs if all branches are anchorable.
</P>
<P>
This optimization is disabled, however, if .* is in an atomic group or if there
is a backreference to the capturing group in which it appears. It is also
is a backreference to the capture group in which it appears. It is also
disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
callouts does not affect it.
</P>
@ -354,8 +354,8 @@ callout before an assertion such as (?=ab) the length is 3. For an an
alternation bar or a closing parenthesis, the length is one, unless a closing
parenthesis is followed by a quantifier, in which case its length is included.
(This changed in release 10.23. In earlier releases, before an opening
parenthesis the length was that of the entire subpattern, and before an
alternation bar or a closing parenthesis the length was zero.)
parenthesis the length was that of the entire group, and before an alternation
bar or a closing parenthesis the length was zero.)
</P>
<P>
The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
@ -471,9 +471,9 @@ Cambridge, England.
</P>
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
<P>
Last updated: 17 September 2018
Last updated: 03 February 2019
<br>
Copyright &copy; 1997-2018 University of Cambridge.
Copyright &copy; 1997-2019 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -36,10 +36,9 @@ assertion just once). Perl allows some repeat quantifiers on other assertions,
for example, \b* (but not \b{3}), but these do not seem to have any use.
</P>
<P>
3. Capturing subpatterns that occur inside negative lookaround assertions are
counted, but their entries in the offsets vector are set only when a negative
assertion is a condition that has a matching branch (that is, the condition is
false).
3. Capture groups that occur inside negative lookaround assertions are counted,
but their entries in the offsets vector are set only when a negative assertion
is a condition that has a matching branch (that is, the condition is false).
</P>
<P>
4. The following Perl escape sequences are not supported: \F, \l, \L, \u,
@ -94,13 +93,13 @@ to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
into subroutine calls is now supported, as in Perl.
</P>
<P>
9. If any of the backtracking control verbs are used in a subpattern that is
called as a subroutine (whether or not recursively), their effect is confined
to that subpattern; it does not extend to the surrounding pattern. This is not
always the case in Perl. In particular, if (*THEN) is present in a group that
is called as a subroutine, its action is limited to that group, even if the
group does not contain any | characters. Note that such subpatterns are
processed as anchored at the point where they are tested.
9. If any of the backtracking control verbs are used in a group that is called
as a subroutine (whether or not recursively), their effect is confined to that
group; it does not extend to the surrounding pattern. This is not always the
case in Perl. In particular, if (*THEN) is present in a group that is called as
a subroutine, its action is limited to that group, even if the group does not
contain any | characters. Note that such groups are processed as anchored
at the point where they are tested.
</P>
<P>
10. If a pattern contains more than one backtracking control verb, the first
@ -120,22 +119,21 @@ the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
"b".
</P>
<P>
13. PCRE2's handling of duplicate subpattern numbers and duplicate subpattern
names is not as general as Perl's. This is a consequence of the fact the PCRE2
works internally just with numbers, using an external table to translate
between numbers and names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B),
where the two capturing parentheses have the same number but different names,
is not supported, and causes an error at compile time. If it were allowed, it
would not be possible to distinguish which parentheses matched, because both
names map to capturing subpattern number 1. To avoid this confusing situation,
an error is given at compile time.
13. PCRE2's handling of duplicate capture group numbers and names is not as
general as Perl's. This is a consequence of the fact the PCRE2 works internally
just with numbers, using an external table to translate between numbers and
names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B), where the two
capture groups have the same number but different names, is not supported, and
causes an error at compile time. If it were allowed, it would not be possible
to distinguish which group matched, because both names map to capture group
number 1. To avoid this confusing situation, an error is given at compile time.
</P>
<P>
14. Perl used to recognize comments in some places that PCRE2 does not, for
example, between the ( and ? at the start of a subpattern. If the /x modifier
is set, Perl allowed white space between ( and ? though the latest Perls give
an error (for a while it was just deprecated). There may still be some cases
where Perl behaves differently.
example, between the ( and ? at the start of a group. If the /x modifier is
set, Perl allowed white space between ( and ? though the latest Perls give an
error (for a while it was just deprecated). There may still be some cases where
Perl behaves differently.
</P>
<P>
15. Perl, when in warning mode, gives warnings for character classes such as
@ -235,9 +233,9 @@ Cambridge, England.
REVISION
</b><br>
<P>
Last updated: 28 July 2018
Last updated: 03 February 2019
<br>
Copyright &copy; 1997-2018 University of Cambridge.
Copyright &copy; 1997-2019 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -50,17 +50,17 @@ All values in repeating quantifiers must be less than 65536.
The maximum length of a lookbehind assertion is 65535 characters.
</P>
<P>
There is no limit to the number of parenthesized subpatterns, but there can be
no more than 65535 capturing subpatterns. There is, however, a limit to the
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
order to limit the amount of system stack used at compile time. The default
limit can be specified when PCRE2 is built; if not, the default is set to 250.
An application can change this limit by calling pcre2_set_parens_nest_limit()
to set the limit in a compile context.
There is no limit to the number of parenthesized groups, but there can be no
more than 65535 capture groups, and there is a limit to the depth of nesting of
parenthesized subpatterns of all kinds. This is imposed in order to limit the
amount of system stack used at compile time. The default limit can be specified
when PCRE2 is built; if not, the default is set to 250. An application can
change this limit by calling pcre2_set_parens_nest_limit() to set the limit in
a compile context.
</P>
<P>
The maximum length of name for a named subpattern is 32 code units, and the
maximum number of named subpatterns is 10000.
The maximum length of name for a named capture group is 32 code units, and the
maximum number of such groups is 10000.
</P>
<P>
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
@ -86,9 +86,9 @@ Cambridge, England.
REVISION
</b><br>
<P>
Last updated: 30 March 2017
Last updated: 02 February 2019
<br>
Copyright &copy; 1997-2017 University of Cambridge.
Copyright &copy; 1997-2019 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

File diff suppressed because it is too large Load Diff

View File

@ -31,9 +31,9 @@ of them.
Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
so that most simple patterns do not use much memory for storing the compiled
version. However, there is one case where the memory usage of a compiled
pattern can be unexpectedly large. If a parenthesized subpattern has a
quantifier with a minimum greater than 1 and/or a limited maximum, the whole
subpattern is repeated in the compiled code. For example, the pattern
pattern can be unexpectedly large. If a parenthesized group has a quantifier
with a minimum greater than 1 and/or a limited maximum, the whole group is
repeated in the compiled code. For example, the pattern
<pre>
(abc|def){2,4}
</pre>
@ -252,9 +252,9 @@ Cambridge, England.
</P>
<br><a name="SEC6" href="#TOC1">REVISION</a><br>
<P>
Last updated: 25 April 2018
Last updated: 03 February 2019
<br>
Copyright &copy; 1997-2018 University of Cambridge.
Copyright &copy; 1997-2019 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -424,20 +424,23 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
<br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
<P>
<pre>
(...) capturing group
(?&#60;name&#62;...) named capturing group (Perl)
(?'name'...) named capturing group (Perl)
(?P&#60;name&#62;...) named capturing group (Python)
(?:...) non-capturing group
(?|...) non-capturing group; reset group numbers for
capturing groups in each alternative
</PRE>
(...) capture group
(?&#60;name&#62;...) named capture group (Perl)
(?'name'...) named capture group (Perl)
(?P&#60;name&#62;...) named capture group (Python)
(?:...) non-capture group
(?|...) non-capture group; reset group numbers for
capture groups in each alternative
</pre>
In non-UTF modes, names may contain underscores and ASCII letters and digits;
in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
both cases, a name must not start with a digit.
</P>
<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
<P>
<pre>
(?&#62;...) atomic, non-capturing group
(*atomic:...) atomic, non-capturing group
(?&#62;...) atomic non-capture group
(*atomic:...) atomic non-capture group
</PRE>
</P>
<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
@ -465,7 +468,7 @@ of the group.
Unsetting x or xx unsets both. Several options may be set at once, and a
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
(?^in). An option setting may appear at the start of a non-capturing group, for
(?^in). An option setting may appear at the start of a non-capture group, for
example (?i:...).
</P>
<P>
@ -565,19 +568,19 @@ Each top-level branch of a lookbehind must be of a fixed length.
<P>
<pre>
(?R) recurse whole pattern
(?n) call subpattern by absolute number
(?+n) call subpattern by relative number
(?-n) call subpattern by relative number
(?&name) call subpattern by name (Perl)
(?P&#62;name) call subpattern by name (Python)
\g&#60;name&#62; call subpattern by name (Oniguruma)
\g'name' call subpattern by name (Oniguruma)
\g&#60;n&#62; call subpattern by absolute number (Oniguruma)
\g'n' call subpattern by absolute number (Oniguruma)
\g&#60;+n&#62; call subpattern by relative number (PCRE2 extension)
\g'+n' call subpattern by relative number (PCRE2 extension)
\g&#60;-n&#62; call subpattern by relative number (PCRE2 extension)
\g'-n' call subpattern by relative number (PCRE2 extension)
(?n) call subroutine by absolute number
(?+n) call subroutine by relative number
(?-n) call subroutine by relative number
(?&name) call subroutine by name (Perl)
(?P&#62;name) call subroutine by name (Python)
\g&#60;name&#62; call subroutine by name (Oniguruma)
\g'name' call subroutine by name (Oniguruma)
\g&#60;n&#62; call subroutine by absolute number (Oniguruma)
\g'n' call subroutine by absolute number (Oniguruma)
\g&#60;+n&#62; call subroutine by relative number (PCRE2 extension)
\g'+n' call subroutine by relative number (PCRE2 extension)
\g&#60;-n&#62; call subroutine by relative number (PCRE2 extension)
\g'-n' call subroutine by relative number (PCRE2 extension)
</PRE>
</P>
<br><a name="SEC23" href="#TOC1">CONDITIONAL PATTERNS</a><br>
@ -595,7 +598,7 @@ Each top-level branch of a lookbehind must be of a fixed length.
(?(R) overall recursion condition
(?(Rn) specific numbered group recursion condition
(?(R&name) specific named group recursion condition
(?(DEFINE) define subpattern for reference
(?(DEFINE) define groups for reference
(?(VERSION[&#62;]=n.m) test PCRE2 version
(?(assert) assertion condition
</pre>
@ -657,9 +660,9 @@ Cambridge, England.
</P>
<br><a name="SEC28" href="#TOC1">REVISION</a><br>
<P>
Last updated: 10 October 2018
Last updated: 03 February 2019
<br>
Copyright &copy; 1997-2018 University of Cambridge.
Copyright &copy; 1997-2019 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -716,14 +716,14 @@ information is obtained from the <b>pcre2_pattern_info()</b> function. Here are
some typical examples:
<pre>
re&#62; /(?i)(^a|^b)/m,info
Capturing subpattern count = 1
Capture group count = 1
Compile options: multiline
Overall options: caseless multiline
First code unit at start or follows newline
Subject length lower bound = 1
re&#62; /(?i)abc/info
Capturing subpattern count = 0
Capture group count = 0
Compile options: &#60;none&#62;
Overall options: caseless
First code unit = 'a' (caseless)
@ -1353,8 +1353,8 @@ Testing substring extraction functions
<P>
The <b>copy</b> and <b>get</b> modifiers can be used to test the
<b>pcre2_substring_copy_xxx()</b> and <b>pcre2_substring_get_xxx()</b> functions.
They can be given more than once, and each can specify a group name or number,
for example:
They can be given more than once, and each can specify a capture group name or
number, for example:
<pre>
abcd\=copy=1,copy=3,get=G1
</pre>
@ -2075,9 +2075,9 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 12 November 2018
Last updated: 03 February 2019
<br>
Copyright &copy; 1997-2018 University of Cambridge.
Copyright &copy; 1997-2019 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -38,10 +38,11 @@ UNICODE PROPERTY SUPPORT
</b><br>
<P>
When PCRE2 is built with Unicode support, the escape sequences \p{..},
\P{..}, and \X can be used. The Unicode properties that can be tested are
limited to the general category properties such as Lu for an upper case letter
or Nd for a decimal number, the Unicode script names such as Arabic or Han, and
the derived properties Any and L&. Full lists are given in the
\P{..}, and \X can be used. This is not dependent on the PCRE2_UTF setting.
The Unicode properties that can be tested are limited to the general category
properties such as Lu for an upper case letter or Nd for a decimal number, the
Unicode script names such as Arabic or Han, and the derived properties Any and
L&. Full lists are given in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
and
<a href="pcre2syntax.html"><b>pcre2syntax</b></a>
@ -73,11 +74,17 @@ In UTF modes, the dot metacharacter matches one UTF character instead of a
single code unit.
</P>
<P>
In UTF modes, capture group names are not restricted to ASCII, and may contain
any Unicode letters and decimal digits, as well as underscore.
</P>
<P>
The escape sequence \C can be used to match a single code unit in a UTF mode,
but its use can lead to some strange effects because it breaks up multi-unit
characters (see the description of \C in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
documentation).
documentation). For this reason, there is a build-time option that disables
support for \C completely. There is also a less draconian compile-time option
for locking out the use of \C when a pattern is compiled.
</P>
<P>
The use of \C is not supported by the alternative matching function
@ -410,9 +417,9 @@ Cambridge, England.
REVISION
</b><br>
<P>
Last updated: 12 October 2018
Last updated: 03 February 2019
<br>
Copyright &copy; 1997-2018 University of Cambridge.
Copyright &copy; 1997-2019 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
.TH PCRE2_SUBSTRING_NAMETABLE_SCAN 3 "21 October 2014" "PCRE2 10.00"
.TH PCRE2_SUBSTRING_NAMETABLE_SCAN 3 "03 February 2019" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -15,8 +15,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.rs
.sp
This convenience function finds, for a compiled pattern, the first and last
entries for a given name in the table that translates capturing parenthesis
names into numbers.
entries for a given name in the table that translates capture group names into
numbers.
.sp
\fIcode\fP Compiled regular expression
\fIname\fP Name whose entries required

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "04 January 2019" "PCRE2 10.33"
.TH PCRE2API 3 "04 February 2019" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -1429,10 +1429,10 @@ independent of the setting of PCRE2_DOTALL.
.sp
PCRE2_DUPNAMES
.sp
If this bit is set, names used to identify capturing subpatterns need not be
unique. This can be helpful for certain types of pattern when it is known that
only one instance of the named subpattern can ever be matched. There are more
details of named subpatterns below; see also the
If this bit is set, names used to identify capture groups need not be unique.
This can be helpful for certain types of pattern when it is known that only one
instance of the named group can ever be matched. There are more details of
named capture groups below; see also the
.\" HREF
\fBpcre2pattern\fP
.\"
@ -1466,11 +1466,11 @@ the end of the subject.
If this bit is set, most white space characters in the pattern are totally
ignored except when escaped or inside a character class. However, white space
is not allowed within sequences such as (?> that introduce various
parenthesized subpatterns, nor within numerical quantifiers such as {1,3}.
Ignorable white space is permitted between an item and a following quantifier
and between a quantifier and a following + that indicates possessiveness.
PCRE2_EXTENDED is equivalent to Perl's /x option, and it can be changed within
a pattern by a (?x) option setting.
parenthesized groups, nor within numerical quantifiers such as {1,3}. Ignorable
white space is permitted between an item and a following quantifier and between
a quantifier and a following + that indicates possessiveness. PCRE2_EXTENDED is
equivalent to Perl's /x option, and it can be changed within a pattern by a
(?x) option setting.
.P
When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as
white space only those characters with code points less than 256 that are
@ -1547,7 +1547,7 @@ error.
.sp
PCRE2_MATCH_UNSET_BACKREF
.sp
If this option is set, a backreference to an unset subpattern group matches an
If this option is set, a backreference to an unset capture group matches an
empty string (by default this causes the current matching alternative to fail).
A pattern such as (\e1)(a) succeeds when this option is set (assuming it can
find an "a" in the subject), whereas it fails by default, for Perl
@ -1608,7 +1608,7 @@ If this option is set, it disables the use of numbered capturing parentheses in
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
were followed by ?: but named parentheses can still be used for capturing (and
they acquire numbers in the usual way). This is the same as Perl's /n option.
Note that, when this option is set, references to capturing groups
Note that, when this option is set, references to capture groups
(backreferences or recursion/subroutine calls) may only refer to named groups,
though the reference can be by name or by number.
.sp
@ -1627,7 +1627,7 @@ purposes.
If this option is set, it disables an optimization that is applied when .* is
the first significant item in a top-level branch of a pattern, and all the
other branches also start with .* or with \eA or \eG or ^. The optimization is
automatically disabled for .* if it is inside an atomic group or a capturing
automatically disabled for .* if it is inside an atomic group or a capture
group that is the subject of a backreference, or if the pattern contains
(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
automatically anchored if PCRE2_DOTALL is set for all the .* items and
@ -2025,7 +2025,7 @@ following are true:
.sp
.* is not in an atomic group
.\" JOIN
.* is not in a capturing group that is the subject
.* is not in a capture group that is the subject
of a backreference
PCRE2_DOTALL is in force for .*
Neither (*PRUNE) nor (*SKIP) appears in the pattern
@ -2037,12 +2037,12 @@ options returned for PCRE2_INFO_ALLOPTIONS.
PCRE2_INFO_BACKREFMAX
.sp
Return the number of the highest backreference in the pattern. The third
argument should point to an \fBuint32_t\fP variable. Named subpatterns acquire
numbers as well as names, and these count towards the highest backreference.
Backreferences such as \e4 or \eg{12} match the captured characters of the
given group, but in addition, the check that a capturing group is set in a
conditional subpattern such as (?(3)a|b) is also a backreference. Zero is
returned if there are no backreferences.
argument should point to an \fBuint32_t\fP variable. Named capture groups
acquire numbers as well as names, and these count towards the highest
backreference. Backreferences such as \e4 or \eg{12} match the captured
characters of the given group, but in addition, the check that a capture
group is set in a conditional group such as (?(3)a|b) is also a backreference.
Zero is returned if there are no backreferences.
.sp
PCRE2_INFO_BSR
.sp
@ -2053,9 +2053,9 @@ that \eR matches only CR, LF, or CRLF.
.sp
PCRE2_INFO_CAPTURECOUNT
.sp
Return the highest capturing subpattern number in the pattern. In patterns
where (?| is not used, this is also the total number of capturing subpatterns.
The third argument should point to an \fBuint32_t\fP variable.
Return the highest capture group number in the pattern. In patterns where (?|
is not used, this is also the total number of capture groups. The third
argument should point to an \fBuint32_t\fP variable.
.sp
PCRE2_INFO_DEPTHLIMIT
.sp
@ -2103,7 +2103,7 @@ Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
without the use of JIT. The third argument should point to a \fBsize_t\fP
variable. The frame size depends on the number of capturing parentheses in the
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
pattern. Each additional capture group adds two PCRE2_SIZE variables.
.sp
PCRE2_INFO_HASBACKSLASHC
.sp
@ -2224,11 +2224,11 @@ library, the pointer points to 32-bit code units, the first of which contains
the parenthesis number. The rest of the entry is the corresponding name, zero
terminated.
.P
The names are in alphabetical order. If (?| is used to create multiple groups
with the same number, as described in the
.\" HTML <a href="pcre2pattern.html#dupsubpatternnumber">
The names are in alphabetical order. If (?| is used to create multiple capture
groups with the same number, as described in the
.\" HTML <a href="pcre2pattern.html#dupgroupnumber">
.\" </a>
section on duplicate subpattern numbers
section on duplicate group numbers
.\"
in the
.\" HREF
@ -2237,11 +2237,11 @@ in the
page, the groups may be given the same name, but there is only one entry in the
table. Different names for groups of the same number are not permitted.
.P
Duplicate names for subpatterns with different numbers are permitted, but only
if PCRE2_DUPNAMES is set. They appear in the table in the order in which they
were found in the pattern. In the absence of (?| this is the order of
Duplicate names for capture groups with different numbers are permitted, but
only if PCRE2_DUPNAMES is set. They appear in the table in the order in which
they were found in the pattern. In the absence of (?| this is the order of
increasing number; when (?| is used this is not necessarily the case because
later subpatterns may have lower numbers.
later capture groups may have lower numbers.
.P
As a simple example of the name/number table, consider the following pattern
after compilation by the 8-bit library (assume PCRE2_EXTENDED is set, so white
@ -2251,16 +2251,16 @@ space - including newlines - is ignored):
(?<date> (?<year>(\ed\ed)?\ed\ed) -
(?<month>\ed\ed) - (?<day>\ed\ed) )
.sp
There are four named subpatterns, so the table has four entries, and each entry
in the table is eight bytes long. The table is as follows, with non-printing
bytes shows in hexadecimal, and undefined bytes shown as ??:
There are four named capture groups, so the table has four entries, and each
entry in the table is eight bytes long. The table is as follows, with
non-printing bytes shows in hexadecimal, and undefined bytes shown as ??:
.sp
00 01 d a t e 00 ??
00 05 d a y 00 ?? ??
00 04 m o n t h 00
00 02 y e a r 00 ??
.sp
When writing code to extract data from named subpatterns using the
When writing code to extract data from named capture groups using the
name-to-number map, remember that the length of the entries is likely to be
different for each compiled pattern.
.sp
@ -2740,12 +2740,12 @@ valid newline sequence and explicit \er or \en escapes appear in the pattern.
In general, a pattern matches a certain portion of the subject, and in
addition, further substrings from the subject may be picked out by
parenthesized parts of the pattern. Following the usage in Jeffrey Friedl's
book, this is called "capturing" in what follows, and the phrase "capturing
subpattern" or "capturing group" is used for a fragment of a pattern that picks
out a substring. PCRE2 supports several other kinds of parenthesized subpattern
that do not cause substrings to be captured. The \fBpcre2_pattern_info()\fP
function can be used to find out how many capturing subpatterns there are in a
compiled pattern.
book, this is called "capturing" in what follows, and the phrase "capture
group" (Perl terminology) is used for a fragment of a pattern that picks out a
substring. PCRE2 supports several other kinds of parenthesized group that do
not cause substrings to be captured. The \fBpcre2_pattern_info()\fP function
can be used to find out how many capture groups there are in a compiled
pattern.
.P
You can use auxiliary functions for accessing captured substrings
.\" HTML <a href="#extractbynumber">
@ -2798,30 +2798,28 @@ reported start of a successful match can be greater than the end of the match.
For example, if the pattern (?=ab\eK) is matched against "ab", the start and
end offset values for the match are 2 and 0.
.P
If a capturing subpattern group is matched repeatedly within a single match
operation, it is the last portion of the subject that it matched that is
returned.
If a capture group is matched repeatedly within a single match operation, it is
the last portion of the subject that it matched that is returned.
.P
If the ovector is too small to hold all the captured substring offsets, as much
as possible is filled in, and the function returns a value of zero. If captured
substrings are not of interest, \fBpcre2_match()\fP may be called with a match
data block whose ovector is of minimum length (that is, one pair).
.P
It is possible for capturing subpattern number \fIn+1\fP to match some part of
the subject when subpattern \fIn\fP has not been used at all. For example, if
the string "abc" is matched against the pattern (a|(z))(bc) the return from the
function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this
happens, both values in the offset pairs corresponding to unused subpatterns
are set to PCRE2_UNSET.
.P
Offset values that correspond to unused subpatterns at the end of the
expression are also set to PCRE2_UNSET. For example, if the string "abc" is
matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched.
The return from the function is 2, because the highest used capturing
subpattern number is 1. The offsets for for the second and third capturing
subpatterns (assuming the vector is large enough, of course) are set to
It is possible for capture group number \fIn+1\fP to match some part of the
subject when group \fIn\fP has not been used at all. For example, if the string
"abc" is matched against the pattern (a|(z))(bc) the return from the function
is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both
values in the offset pairs corresponding to unused groups are set to
PCRE2_UNSET.
.P
Offset values that correspond to unused groups at the end of the expression are
also set to PCRE2_UNSET. For example, if the string "abc" is matched against
the pattern (abc)(x(yz)?)? groups 2 and 3 are not matched. The return from the
function is 2, because the highest used capture group number is 1. The offsets
for for the second and third capture groupss (assuming the vector is large
enough, of course) are set to PCRE2_UNSET.
.P
Elements in the ovector that do not correspond to capturing parentheses in the
pattern are never changed. That is, if a pattern contains \fIn\fP capturing
parentheses, no more than \fIovector[0]\fP to \fIovector[2n+1]\fP are set by
@ -3006,11 +3004,11 @@ as NULL.
.sp
This error is returned when \fBpcre2_match()\fP detects a recursion loop within
the pattern. Specifically, it means that either the whole pattern or a
subpattern has been called recursively for the second time at the same position
in the subject string. Some simple patterns that might do this are detected and
faulted at compile time, but more complicated cases, in particular mutual
recursions between two different subpatterns, cannot be detected until matching
is attempted.
capture group has been called recursively for the second time at the same
position in the subject string. Some simple patterns that might do this are
detected and faulted at compile time, but more complicated cases, in particular
mutual recursions between two different groups, cannot be detected until
matching is attempted.
.
.
.\" HTML <a name="geterrormessage"></a>
@ -3090,7 +3088,7 @@ The \fBpcre2_substring_copy_bynumber()\fP function copies a captured substring
into a supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it
into new memory, obtained using the same memory allocation function that was
used for the match data block. The first two arguments of these functions are a
pointer to the match data block and a capturing group number.
pointer to the match data block and a capture group number.
.P
The final arguments of \fBpcre2_substring_copy_bynumber()\fP are a pointer to
the buffer and a pointer to a variable that contains its length in code units.
@ -3162,9 +3160,9 @@ could not be obtained. When the list is no longer needed, it should be freed by
calling \fBpcre2_substring_list_free()\fP.
.P
If this function encounters a substring that is unset, which can happen when
capturing subpattern number \fIn+1\fP matches some part of the subject, but
subpattern \fIn\fP has not been used at all, it returns an empty string. This
can be distinguished from a genuine zero-length substring by inspecting the
capture group number \fIn+1\fP matches some part of the subject, but group
\fIn\fP has not been used at all, it returns an empty string. This can be
distinguished from a genuine zero-length substring by inspecting the
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
substrings, or by calling \fBpcre2_substring_length_bynumber()\fP.
.
@ -3194,20 +3192,20 @@ For example, for this pattern:
.sp
(a+)b(?<xxx>\ed+)...
.sp
the number of the subpattern called "xxx" is 2. If the name is known to be
the number of the capture group called "xxx" is 2. If the name is known to be
unique (PCRE2_DUPNAMES was not set), you can find the number from the name by
calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
compiled pattern, and the second is the name. The yield of the function is the
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
that name. Given the number, you can extract the substring directly from the
ovector, or use one of the "bynumber" functions described above.
group number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or
PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name.
Given the number, you can extract the substring directly from the ovector, or
use one of the "bynumber" functions described above.
.P
For convenience, there are also "byname" functions that correspond to the
"bynumber" functions, the only difference being that the second argument is a
name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
names, these functions scan all the groups with the given name, and return the
first named string that is set.
captured substring from the first named group that is set.
.P
If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
returned. If all groups with the name have numbers that are greater than the
@ -3216,18 +3214,18 @@ is at least one group with a slot in the ovector, but no group is found to be
set, PCRE2_ERROR_UNSET is returned.
.P
\fBWarning:\fP If the pattern uses the (?| feature to set up multiple
subpatterns with the same number, as described in the
.\" HTML <a href="pcre2pattern.html#dupsubpatternnumber">
capture groups with the same number, as described in the
.\" HTML <a href="pcre2pattern.html#dupgroupnumber">
.\" </a>
section on duplicate subpattern numbers
section on duplicate group numbers
.\"
in the
.\" HREF
\fBpcre2pattern\fP
.\"
page, you cannot use names to distinguish the different subpatterns, because
page, you cannot use names to distinguish the different capture groups, because
names are not included in the compiled code. The matching process uses only
numbers. For this reason, the use of different names for subpatterns of the
numbers. For this reason, the use of different names for groups with the
same number causes an error at compile time.
.
.
@ -3288,7 +3286,7 @@ length is in code units, not bytes.
In the replacement string, which is interpreted as a UTF string in UTF mode,
and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
dollar character is an escape character that can specify the insertion of
characters from capturing groups or names from (*MARK) or other control verbs
characters from capture groups or names from (*MARK) or other control verbs
in the pattern. The following forms are always recognized:
.sp
$$ insert a dollar character
@ -3351,12 +3349,12 @@ operation is carried out twice. Depending on the application, it may be more
efficient to allocate a large buffer and free the excess afterwards, instead of
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
.P
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capturing groups that do
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do
not appear in the pattern to be treated as unset groups. This option should be
used with care, because it means that a typo in a group name or number no
longer causes the PCRE2_ERROR_NOSUBSTRING error.
.P
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups (including unknown
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including unknown
groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty
strings when inserted as described above. If this option is not set, an attempt
to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does
@ -3381,14 +3379,15 @@ terminating a \eQ quoted sequence) reverts to no case forcing. The sequences
\eu and \el force the next character (if it is a letter) to upper or lower
case, respectively, and then the state automatically reverts to no case
forcing. Case forcing applies to all inserted characters, including those from
captured groups and letters within \eQ...\eE quoted sequences.
capture groups and letters within \eQ...\eE quoted sequences.
.P
Note that case forcing sequences such as \eU...\eE do not nest. For example,
the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
effect.
.P
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
flexibility to group substitution. The syntax is similar to that used by Bash:
flexibility to capture group substitution. The syntax is similar to that used
by Bash:
.sp
${<n>:-<string>}
${<n>:+<string1>:<string2>}
@ -3510,7 +3509,7 @@ output and the call to \fBpcre2_substitute()\fP exits, returning the number of
matches so far.
.
.
.SH "DUPLICATE SUBPATTERN NAMES"
.SH "DUPLICATE CAPTURE GROUP NAMES"
.rs
.sp
.nf
@ -3518,13 +3517,14 @@ matches so far.
.B " PCRE2_SPTR \fIname\fP, PCRE2_SPTR *\fIfirst\fP, PCRE2_SPTR *\fIlast\fP);"
.fi
.P
When a pattern is compiled with the PCRE2_DUPNAMES option, names for
subpatterns are not required to be unique. Duplicate names are always allowed
for subpatterns with the same number, created by using the (?| feature. Indeed,
if such subpatterns are named, they are required to use the same names.
When a pattern is compiled with the PCRE2_DUPNAMES option, names for capture
groups are not required to be unique. Duplicate names are always allowed for
groups with the same number, created by using the (?| feature. Indeed, if such
groups are named, they are required to use the same names.
.P
Normally, patterns with duplicate names are such that in any one match, only
one of the named subpatterns participates. An example is shown in the
Normally, patterns that use duplicate names are such that in any one match,
only one of each set of identically-named groups participates. An example is
shown in the
.\" HREF
\fBpcre2pattern\fP
.\"
@ -3705,9 +3705,8 @@ the three matched strings are
On success, the yield of the function is a number greater than zero, which is
the number of matched substrings. The offsets of the substrings are returned in
the ovector, and can be extracted by number in the same way as for
\fBpcre2_match()\fP, but the numbers bear no relation to any capturing groups
that may exist in the pattern, because DFA matching does not support group
capture.
\fBpcre2_match()\fP, but the numbers bear no relation to any capture groups
that may exist in the pattern, because DFA matching does not support capturing.
.P
Calls to the convenience functions that extract substrings by name
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
@ -3749,7 +3748,7 @@ a backreference.
.sp
This return is given if \fBpcre2_dfa_match()\fP encounters a condition item
that uses a backreference for the condition, or a test for recursion in a
specific group. These are not supported.
specific capture group. These are not supported.
.sp
PCRE2_ERROR_DFA_WSSIZE
.sp
@ -3758,9 +3757,9 @@ This return is given if \fBpcre2_dfa_match()\fP runs out of space in the
.sp
PCRE2_ERROR_DFA_RECURSE
.sp
When a recursive subpattern is processed, the matching function calls itself
recursively, using private memory for the ovector and \fIworkspace\fP. This
error is given if the internal ovector is not large enough. This should be
When a recursion or subroutine call is processed, the matching function calls
itself recursively, using private memory for the ovector and \fIworkspace\fP.
This error is given if the internal ovector is not large enough. This should be
extremely rare, as a vector of size 1000 is used.
.sp
PCRE2_ERROR_DFA_BADRESTART
@ -3793,6 +3792,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 04 January 2019
Last updated: 04 February 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2CALLOUT 3 "17 September 2018" "PCRE2 10.33"
.TH PCRE2CALLOUT 3 "03 February 2019" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -137,7 +137,7 @@ start only after an internal newline or at the beginning of the subject, and
branch, automatic anchoring occurs if all branches are anchorable.
.P
This optimization is disabled, however, if .* is in an atomic group or if there
is a backreference to the capturing group in which it appears. It is also
is a backreference to the capture group in which it appears. It is also
disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
callouts does not affect it.
.P
@ -331,8 +331,8 @@ callout before an assertion such as (?=ab) the length is 3. For an an
alternation bar or a closing parenthesis, the length is one, unless a closing
parenthesis is followed by a quantifier, in which case its length is included.
(This changed in release 10.23. In earlier releases, before an opening
parenthesis the length was that of the entire subpattern, and before an
alternation bar or a closing parenthesis the length was zero.)
parenthesis the length was that of the entire group, and before an alternation
bar or a closing parenthesis the length was zero.)
.P
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
help in distinguishing between different automatic callouts, which all have the
@ -452,6 +452,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 17 September 2018
Copyright (c) 1997-2018 University of Cambridge.
Last updated: 03 February 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2COMPAT 3 "28 July 2018" "PCRE2 10.32"
.TH PCRE2COMPAT 3 "03 February 2019" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -23,10 +23,9 @@ character is not "a" three times (in principle; PCRE2 optimizes this to run the
assertion just once). Perl allows some repeat quantifiers on other assertions,
for example, \eb* (but not \eb{3}), but these do not seem to have any use.
.P
3. Capturing subpatterns that occur inside negative lookaround assertions are
counted, but their entries in the offsets vector are set only when a negative
assertion is a condition that has a matching branch (that is, the condition is
false).
3. Capture groups that occur inside negative lookaround assertions are counted,
but their entries in the offsets vector are set only when a negative assertion
is a condition that has a matching branch (that is, the condition is false).
.P
4. The following Perl escape sequences are not supported: \eF, \el, \eL, \eu,
\eU, and \eN when followed by a character name. \eN on its own, matching a
@ -79,13 +78,13 @@ documentation for details.
to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
into subroutine calls is now supported, as in Perl.
.P
9. If any of the backtracking control verbs are used in a subpattern that is
called as a subroutine (whether or not recursively), their effect is confined
to that subpattern; it does not extend to the surrounding pattern. This is not
always the case in Perl. In particular, if (*THEN) is present in a group that
is called as a subroutine, its action is limited to that group, even if the
group does not contain any | characters. Note that such subpatterns are
processed as anchored at the point where they are tested.
9. If any of the backtracking control verbs are used in a group that is called
as a subroutine (whether or not recursively), their effect is confined to that
group; it does not extend to the surrounding pattern. This is not always the
case in Perl. In particular, if (*THEN) is present in a group that is called as
a subroutine, its action is limited to that group, even if the group does not
contain any | characters. Note that such groups are processed as anchored
at the point where they are tested.
.P
10. If a pattern contains more than one backtracking control verb, the first
one that is backtracked onto acts. For example, in the pattern
@ -101,21 +100,20 @@ strings when part of a pattern is repeated. For example, matching "aba" against
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
"b".
.P
13. PCRE2's handling of duplicate subpattern numbers and duplicate subpattern
names is not as general as Perl's. This is a consequence of the fact the PCRE2
works internally just with numbers, using an external table to translate
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b>B),
where the two capturing parentheses have the same number but different names,
is not supported, and causes an error at compile time. If it were allowed, it
would not be possible to distinguish which parentheses matched, because both
names map to capturing subpattern number 1. To avoid this confusing situation,
an error is given at compile time.
13. PCRE2's handling of duplicate capture group numbers and names is not as
general as Perl's. This is a consequence of the fact the PCRE2 works internally
just with numbers, using an external table to translate between numbers and
names. In particular, a pattern such as (?|(?<a>A)|(?<b>B), where the two
capture groups have the same number but different names, is not supported, and
causes an error at compile time. If it were allowed, it would not be possible
to distinguish which group matched, because both names map to capture group
number 1. To avoid this confusing situation, an error is given at compile time.
.P
14. Perl used to recognize comments in some places that PCRE2 does not, for
example, between the ( and ? at the start of a subpattern. If the /x modifier
is set, Perl allowed white space between ( and ? though the latest Perls give
an error (for a while it was just deprecated). There may still be some cases
where Perl behaves differently.
example, between the ( and ? at the start of a group. If the /x modifier is
set, Perl allowed white space between ( and ? though the latest Perls give an
error (for a while it was just deprecated). There may still be some cases where
Perl behaves differently.
.P
15. Perl, when in warning mode, gives warnings for character classes such as
[A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
@ -200,6 +198,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 28 July 2018
Copyright (c) 1997-2018 University of Cambridge.
Last updated: 03 February 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2LIMITS 3 "30 March 2017" "PCRE2 10.30"
.TH PCRE2LIMITS 3 "03 February 2019" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "SIZE AND OTHER LIMITATIONS"
@ -34,16 +34,16 @@ All values in repeating quantifiers must be less than 65536.
.P
The maximum length of a lookbehind assertion is 65535 characters.
.P
There is no limit to the number of parenthesized subpatterns, but there can be
no more than 65535 capturing subpatterns. There is, however, a limit to the
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
order to limit the amount of system stack used at compile time. The default
limit can be specified when PCRE2 is built; if not, the default is set to 250.
An application can change this limit by calling pcre2_set_parens_nest_limit()
to set the limit in a compile context.
There is no limit to the number of parenthesized groups, but there can be no
more than 65535 capture groups, and there is a limit to the depth of nesting of
parenthesized subpatterns of all kinds. This is imposed in order to limit the
amount of system stack used at compile time. The default limit can be specified
when PCRE2 is built; if not, the default is set to 250. An application can
change this limit by calling pcre2_set_parens_nest_limit() to set the limit in
a compile context.
.P
The maximum length of name for a named subpattern is 32 code units, and the
maximum number of named subpatterns is 10000.
The maximum length of name for a named capture group is 32 code units, and the
maximum number of such groups is 10000.
.P
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
@ -67,6 +67,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 30 March 2017
Copyright (c) 1997-2017 University of Cambridge.
Last updated: 02 February 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
.TH PCRE2PERFORM 3 "25 April 2018" "PCRE2 10.32"
.TH PCRE2PERFORM 3 "03 February 2019" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 PERFORMANCE"
@ -14,9 +14,9 @@ of them.
Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
so that most simple patterns do not use much memory for storing the compiled
version. However, there is one case where the memory usage of a compiled
pattern can be unexpectedly large. If a parenthesized subpattern has a
quantifier with a minimum greater than 1 and/or a limited maximum, the whole
subpattern is repeated in the compiled code. For example, the pattern
pattern can be unexpectedly large. If a parenthesized group has a quantifier
with a minimum greater than 1 and/or a limited maximum, the whole group is
repeated in the compiled code. For example, the pattern
.sp
(abc|def){2,4}
.sp
@ -239,6 +239,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 25 April 2018
Copyright (c) 1997-2018 University of Cambridge.
Last updated: 03 February 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2SYNTAX 3 "10 October 2018" "PCRE2 10.33"
.TH PCRE2SYNTAX 3 "03 February 2019" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -398,20 +398,24 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
.SH "CAPTURING"
.rs
.sp
(...) capturing group
(?<name>...) named capturing group (Perl)
(?'name'...) named capturing group (Perl)
(?P<name>...) named capturing group (Python)
(?:...) non-capturing group
(?|...) non-capturing group; reset group numbers for
capturing groups in each alternative
(...) capture group
(?<name>...) named capture group (Perl)
(?'name'...) named capture group (Perl)
(?P<name>...) named capture group (Python)
(?:...) non-capture group
(?|...) non-capture group; reset group numbers for
capture groups in each alternative
.sp
In non-UTF modes, names may contain underscores and ASCII letters and digits;
in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
both cases, a name must not start with a digit.
.
.
.SH "ATOMIC GROUPS"
.rs
.sp
(?>...) atomic, non-capturing group
(*atomic:...) atomic, non-capturing group
(?>...) atomic non-capture group
(*atomic:...) atomic non-capture group
.
.
.SH "COMMENT"
@ -439,7 +443,7 @@ of the group.
Unsetting x or xx unsets both. Several options may be set at once, and a
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
(?^in). An option setting may appear at the start of a non-capturing group, for
(?^in). An option setting may appear at the start of a non-capture group, for
example (?i:...).
.P
The following are recognized only at the very start of a pattern or after one
@ -542,19 +546,19 @@ Each top-level branch of a lookbehind must be of a fixed length.
.rs
.sp
(?R) recurse whole pattern
(?n) call subpattern by absolute number
(?+n) call subpattern by relative number
(?-n) call subpattern by relative number
(?&name) call subpattern by name (Perl)
(?P>name) call subpattern by name (Python)
\eg<name> call subpattern by name (Oniguruma)
\eg'name' call subpattern by name (Oniguruma)
\eg<n> call subpattern by absolute number (Oniguruma)
\eg'n' call subpattern by absolute number (Oniguruma)
\eg<+n> call subpattern by relative number (PCRE2 extension)
\eg'+n' call subpattern by relative number (PCRE2 extension)
\eg<-n> call subpattern by relative number (PCRE2 extension)
\eg'-n' call subpattern by relative number (PCRE2 extension)
(?n) call subroutine by absolute number
(?+n) call subroutine by relative number
(?-n) call subroutine by relative number
(?&name) call subroutine by name (Perl)
(?P>name) call subroutine by name (Python)
\eg<name> call subroutine by name (Oniguruma)
\eg'name' call subroutine by name (Oniguruma)
\eg<n> call subroutine by absolute number (Oniguruma)
\eg'n' call subroutine by absolute number (Oniguruma)
\eg<+n> call subroutine by relative number (PCRE2 extension)
\eg'+n' call subroutine by relative number (PCRE2 extension)
\eg<-n> call subroutine by relative number (PCRE2 extension)
\eg'-n' call subroutine by relative number (PCRE2 extension)
.
.
.SH "CONDITIONAL PATTERNS"
@ -572,7 +576,7 @@ Each top-level branch of a lookbehind must be of a fixed length.
(?(R) overall recursion condition
(?(Rn) specific numbered group recursion condition
(?(R&name) specific named group recursion condition
(?(DEFINE) define subpattern for reference
(?(DEFINE) define groups for reference
(?(VERSION[>]=n.m) test PCRE2 version
(?(assert) assertion condition
.sp
@ -643,6 +647,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 10 October 2018
Copyright (c) 1997-2018 University of Cambridge.
Last updated: 03 February 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "12 November 2018" "PCRE 10.33"
.TH PCRE2TEST 1 "03 February 2019" "PCRE 10.33"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -672,14 +672,14 @@ information is obtained from the \fBpcre2_pattern_info()\fP function. Here are
some typical examples:
.sp
re> /(?i)(^a|^b)/m,info
Capturing subpattern count = 1
Capture group count = 1
Compile options: multiline
Overall options: caseless multiline
First code unit at start or follows newline
Subject length lower bound = 1
.sp
re> /(?i)abc/info
Capturing subpattern count = 0
Capture group count = 0
Compile options: <none>
Overall options: caseless
First code unit = 'a' (caseless)
@ -1325,8 +1325,8 @@ current character is CR followed by LF, an advance of two characters occurs.
.sp
The \fBcopy\fP and \fBget\fP modifiers can be used to test the
\fBpcre2_substring_copy_xxx()\fP and \fBpcre2_substring_get_xxx()\fP functions.
They can be given more than once, and each can specify a group name or number,
for example:
They can be given more than once, and each can specify a capture group name or
number, for example:
.sp
abcd\e=copy=1,copy=3,get=G1
.sp
@ -2056,6 +2056,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 12 November 2018
Copyright (c) 1997-2018 University of Cambridge.
Last updated: 03 February 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi

View File

@ -646,14 +646,14 @@ PATTERN MODIFIERS
are some typical examples:
re> /(?i)(^a|^b)/m,info
Capturing subpattern count = 1
Capture group count = 1
Compile options: multiline
Overall options: caseless multiline
First code unit at start or follows newline
Subject length lower bound = 1
re> /(?i)abc/info
Capturing subpattern count = 0
Capture group count = 0
Compile options: <none>
Overall options: caseless
First code unit = 'a' (caseless)
@ -1214,8 +1214,8 @@ SUBJECT MODIFIERS
The copy and get modifiers can be used to test the pcre2_sub-
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
given more than once, and each can specify a group name or number, for
example:
given more than once, and each can specify a capture group name or num-
ber, for example:
abcd\=copy=1,copy=3,get=G1
@ -1887,5 +1887,5 @@ AUTHOR
REVISION
Last updated: 12 November 2018
Copyright (c) 1997-2018 University of Cambridge.
Last updated: 03 February 2019
Copyright (c) 1997-2019 University of Cambridge.

View File

@ -1,4 +1,4 @@
.TH PCRE2UNICODE 3 "12 October 2018" "PCRE2 10.33"
.TH PCRE2UNICODE 3 "03 February 2019" "PCRE2 10.33"
.SH NAME
PCRE - Perl-compatible regular expressions (revised API)
.SH "UNICODE AND UTF SUPPORT"
@ -27,10 +27,11 @@ case the library will be smaller.
.rs
.sp
When PCRE2 is built with Unicode support, the escape sequences \ep{..},
\eP{..}, and \eX can be used. The Unicode properties that can be tested are
limited to the general category properties such as Lu for an upper case letter
or Nd for a decimal number, the Unicode script names such as Arabic or Han, and
the derived properties Any and L&. Full lists are given in the
\eP{..}, and \eX can be used. This is not dependent on the PCRE2_UTF setting.
The Unicode properties that can be tested are limited to the general category
properties such as Lu for an upper case letter or Nd for a decimal number, the
Unicode script names such as Arabic or Han, and the derived properties Any and
L&. Full lists are given in the
.\" HREF
\fBpcre2pattern\fP
.\"
@ -62,13 +63,18 @@ individual code units.
In UTF modes, the dot metacharacter matches one UTF character instead of a
single code unit.
.P
In UTF modes, capture group names are not restricted to ASCII, and may contain
any Unicode letters and decimal digits, as well as underscore.
.P
The escape sequence \eC can be used to match a single code unit in a UTF mode,
but its use can lead to some strange effects because it breaks up multi-unit
characters (see the description of \eC in the
.\" HREF
\fBpcre2pattern\fP
.\"
documentation).
documentation). For this reason, there is a build-time option that disables
support for \eC completely. There is also a less draconian compile-time option
for locking out the use of \eC when a pattern is compiled.
.P
The use of \eC is not supported by the alternative matching function
\fBpcre2_dfa_match()\fP when in UTF-8 or UTF-16 mode, that is, when a character
@ -387,6 +393,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 12 October 2018
Copyright (c) 1997-2018 University of Cambridge.
Last updated: 03 February 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi

View File

@ -2194,6 +2194,7 @@ so it is simplest just to return both.
Arguments:
ptrptr points to the character pointer variable
ptrend points to the end of the input string
utf true if the input is UTF-encoded
terminator the terminator of a subpattern name must be this
offsetptr where to put the offset from the start of the pattern
nameptr where to put a pointer to the name in the input
@ -2206,13 +2207,12 @@ Returns: TRUE if a name was read
*/
static BOOL
read_name(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t terminator,
read_name(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, BOOL utf, uint32_t terminator,
PCRE2_SIZE *offsetptr, PCRE2_SPTR *nameptr, uint32_t *namelenptr,
int *errorcodeptr, compile_block *cb)
{
PCRE2_SPTR ptr = *ptrptr;
BOOL is_group = (*ptr != CHAR_ASTERISK);
uint32_t namelen = 0;
if (++ptr >= ptrend) /* No characters in name */
{
@ -2221,35 +2221,74 @@ if (++ptr >= ptrend) /* No characters in name */
goto FAILED;
}
/* A group name must not start with a digit. If either of the others start with
a digit it just won't be recognized. */
if (is_group && IS_DIGIT(*ptr))
{
*errorcodeptr = ERR44;
goto FAILED;
}
*nameptr = ptr;
*offsetptr = (PCRE2_SIZE)(ptr - cb->start_pattern);
while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype_word) != 0)
/* In UTF mode, a group name may contain letters and decimal digits as defined
by Unicode properties, and underscores, but must not start with a digit. */
#ifdef SUPPORT_UNICODE
if (utf && is_group)
{
ptr++;
namelen++;
if (namelen > MAX_NAME_SIZE)
uint32_t c, type;
GETCHAR(c, ptr);
type = UCD_CHARTYPE(c);
if (type == ucp_Nd)
{
*errorcodeptr = ERR48;
*errorcodeptr = ERR44;
goto FAILED;
}
for(;;)
{
if (type != ucp_Nd && PRIV(ucp_gentype)[type] != ucp_L &&
c != CHAR_UNDERSCORE) break;
ptr++;
FORWARDCHAR(ptr);
if (ptr >= ptrend) break;
GETCHAR(c, ptr);
type = UCD_CHARTYPE(c);
}
}
else
#else
(void)utf; /* Avoid compiler warning */
#endif /* SUPPORT_UNICODE */
/* Handle non-group names and group names in non-UTF modes. A group name must
not start with a digit. If either of the others start with a digit it just
won't be recognized. */
{
if (is_group && IS_DIGIT(*ptr))
{
*errorcodeptr = ERR44;
goto FAILED;
}
while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype_word) != 0)
{
ptr++;
}
}
/* Check name length */
if (ptr > *nameptr + MAX_NAME_SIZE)
{
*errorcodeptr = ERR48;
goto FAILED;
}
*namelenptr = ptr - *nameptr;
/* Subpattern names must not be empty, and their terminator is checked here.
(What follows a verb or alpha assertion name is checked separately.) */
if (is_group)
{
if (namelen == 0)
if (ptr == *nameptr)
{
*errorcodeptr = ERR62; /* Subpattern name expected */
goto FAILED;
@ -2262,7 +2301,6 @@ if (is_group)
ptr++;
}
*namelenptr = namelen;
*ptrptr = ptr;
return TRUE;
@ -2981,7 +3019,7 @@ while (ptr < ptrend)
/* Not a numerical recursion */
if (!read_name(&ptr, ptrend, terminator, &offset, &name, &namelen,
if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen,
&errorcode, cb)) goto ESCAPE_FAILED;
/* \k and \g when used with braces are back references, whereas \g used
@ -3554,8 +3592,8 @@ while (ptr < ptrend)
uint32_t meta;
vn = alasnames;
if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode,
cb)) goto FAILED;
if (!read_name(&ptr, ptrend, utf, 0, &offset, &name, &namelen,
&errorcode, cb)) goto FAILED;
if (ptr >= ptrend || *ptr != CHAR_COLON)
{
errorcode = ERR95; /* Malformed */
@ -3651,8 +3689,8 @@ while (ptr < ptrend)
else
{
vn = verbnames;
if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode,
cb)) goto FAILED;
if (!read_name(&ptr, ptrend, utf, 0, &offset, &name, &namelen,
&errorcode, cb)) goto FAILED;
if (ptr >= ptrend || (*ptr != CHAR_COLON &&
*ptr != CHAR_RIGHT_PARENTHESIS))
{
@ -3907,7 +3945,7 @@ while (ptr < ptrend)
errorcode = ERR41;
goto FAILED;
}
if (!read_name(&ptr, ptrend, CHAR_RIGHT_PARENTHESIS, &offset, &name,
if (!read_name(&ptr, ptrend, utf, CHAR_RIGHT_PARENTHESIS, &offset, &name,
&namelen, &errorcode, cb)) goto FAILED;
*parsed_pattern++ = META_BACKREF_BYNAME;
*parsed_pattern++ = namelen;
@ -3967,7 +4005,7 @@ while (ptr < ptrend)
case CHAR_AMPERSAND:
RECURSE_BY_NAME:
if (!read_name(&ptr, ptrend, CHAR_RIGHT_PARENTHESIS, &offset, &name,
if (!read_name(&ptr, ptrend, utf, CHAR_RIGHT_PARENTHESIS, &offset, &name,
&namelen, &errorcode, cb)) goto FAILED;
*parsed_pattern++ = META_RECURSE_BYNAME;
*parsed_pattern++ = namelen;
@ -4215,7 +4253,7 @@ while (ptr < ptrend)
terminator = CHAR_RIGHT_PARENTHESIS;
ptr--; /* Point to char before name */
}
if (!read_name(&ptr, ptrend, terminator, &offset, &name, &namelen,
if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen,
&errorcode, cb)) goto FAILED;
/* Handle (?(R&name) */
@ -4349,7 +4387,7 @@ while (ptr < ptrend)
terminator = CHAR_APOSTROPHE; /* Terminator */
DEFINE_NAME:
if (!read_name(&ptr, ptrend, terminator, &offset, &name, &namelen,
if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen,
&errorcode, cb)) goto FAILED;
/* We have a name for this capturing group. It is also assigned a number,

View File

@ -95,7 +95,7 @@ static const unsigned char compile_error_texts[] =
/* 25 */
"lookbehind assertion is not fixed length\0"
"a relative value of zero is not allowed\0"
"conditional group contains more than two branches\0"
"conditional subpattern contains more than two branches\0"
"assertion expected after (?( or (?(?C)\0"
"digit expected after (?+ or (?-\0"
/* 30 */
@ -113,21 +113,21 @@ static const unsigned char compile_error_texts[] =
/* 40 */
"invalid escape sequence in (*VERB) name\0"
"unrecognized character after (?P\0"
"syntax error in subpattern name (missing terminator)\0"
"syntax error in subpattern name (missing terminator?)\0"
"two named subpatterns have the same name (PCRE2_DUPNAMES not set)\0"
"group name must start with a non-digit\0"
"subpattern name must start with a non-digit\0"
/* 45 */
"this version of PCRE2 does not have support for \\P, \\p, or \\X\0"
"malformed \\P or \\p sequence\0"
"unknown property name after \\P or \\p\0"
"subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " characters)\0"
"subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " code units)\0"
"too many named subpatterns (maximum " XSTRING(MAX_NAME_COUNT) ")\0"
/* 50 */
"invalid range in character class\0"
"octal value is greater than \\377 in 8-bit non-UTF-8 mode\0"
"internal error: overran compiling workspace\0"
"internal error: previously-checked referenced subpattern not found\0"
"DEFINE group contains more than one branch\0"
"DEFINE subpattern contains more than one branch\0"
/* 55 */
"missing opening brace after \\o\0"
"internal error: unknown newline setting\0"
@ -137,7 +137,7 @@ static const unsigned char compile_error_texts[] =
"obsolete error (should not occur)\0" /* Was the above */
/* 60 */
"(*VERB) not recognized or malformed\0"
"group number is too big\0"
"subpattern number is too big\0"
"subpattern name expected\0"
"internal error: parsed pattern overflow\0"
"non-octal character in \\o{} (closing brace missing?)\0"

View File

@ -169,7 +169,7 @@ commented out the original, but kept it around just in case. */
/* void vms_setsymbol( char *, char *, int ); Original code from [1]. */
#endif
/* VC and older compilers don't support %td or %zu, and even some that claim to
/* VC and older compilers don't support %td or %zu, and even some that claim to
be C99 don't support it (hence DISABLE_PERCENT_ZT). */
#if defined(_MSC_VER) || !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199901L || defined(DISABLE_PERCENT_ZT)
@ -539,7 +539,7 @@ typedef struct patctl { /* Structure for pattern modifiers. */
uint32_t jitstack; /* Must be in same position as datctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t substitute_skip; /* Must be in same position as patctl */
uint32_t substitute_stop; /* Must be in same position as patctl */
uint32_t substitute_stop; /* Must be in same position as patctl */
uint32_t jit;
uint32_t stackguard_test;
uint32_t tables_id;
@ -561,7 +561,7 @@ typedef struct datctl { /* Structure for data line modifiers. */
uint32_t jitstack; /* Must be in same position as patctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t substitute_skip; /* Must be in same position as patctl */
uint32_t substitute_stop; /* Must be in same position as patctl */
uint32_t substitute_stop; /* Must be in same position as patctl */
uint32_t startend[2];
uint32_t cerror[2];
uint32_t cfail[2];
@ -3049,13 +3049,14 @@ return yield;
#ifdef SUPPORT_PCRE2_8
/*************************************************
* Convert character value to UTF-8 *
*************************************************/
/* This function takes an integer value in the range 0 - 0x7fffffff
and encodes it as a UTF-8 character in 0 to 6 bytes.
and encodes it as a UTF-8 character in 0 to 6 bytes. It is needed even when the
8-bit library is not supported, to generate UTF-8 output for non-ASCII
characters.
Arguments:
cvalue the character value
@ -3081,7 +3082,6 @@ for (j = i; j > 0; j--)
*utf8bytes = utf8_table2[i] | cvalue;
return i + 1;
}
#endif /* SUPPORT_PCRE2_8 */
@ -4374,6 +4374,7 @@ static int
show_pattern_info(void)
{
uint32_t compile_options, overall_options, extra_options;
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
if ((pat_patctl.control & (CTL_BINCODE|CTL_FULLBINCODE)) != 0)
{
@ -4463,7 +4464,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)
!= 0)
return PR_ABEND;
fprintf(outfile, "Capturing subpattern count = %d\n", capture_count);
fprintf(outfile, "Capture group count = %d\n", capture_count);
if (backrefmax > 0)
fprintf(outfile, "Max back reference = %d\n", backrefmax);
@ -4482,14 +4483,60 @@ if ((pat_patctl.control & CTL_INFO) != 0)
if (namecount > 0)
{
fprintf(outfile, "Named capturing subpatterns:\n");
fprintf(outfile, "Named capture groups:\n");
for (; namecount > 0; namecount--)
{
int imm2_size = test_mode == PCRE8_MODE ? 2 : 1;
uint32_t length = (uint32_t)STRLEN(nametable + imm2_size);
fprintf(outfile, " ");
PCHARSV(nametable, imm2_size, length, FALSE, outfile);
/* In UTF mode the name may be a UTF string containing non-ASCII
letters and digits. We must output it as a UTF-8 string. In non-UTF mode,
use the normal string printing functions, which use escapes for all
non-ASCII characters. */
if (utf)
{
#ifdef SUPPORT_PCRE2_32
if (test_mode == PCRE32_MODE)
{
PCRE2_SPTR32 nameptr = (PCRE2_SPTR32)nametable + imm2_size;
while (*nameptr != 0)
{
uint8_t u8buff[6];
int len = ord2utf8(*nameptr++, u8buff);
fprintf(outfile, "%.*s", len, u8buff);
}
}
#endif
#ifdef SUPPORT_PCRE2_16
if (test_mode == PCRE16_MODE)
{
PCRE2_SPTR16 nameptr = (PCRE2_SPTR16)nametable + imm2_size;
while (*nameptr != 0)
{
int len;
uint8_t u8buff[6];
uint32_t c = *nameptr++ & 0xffff;
if (c >= 0xD800 && c < 0xDC00)
c = ((c & 0x3ff) << 10) + (*nameptr++ & 0x3ff) + 0x10000;
len = ord2utf8(c, u8buff);
fprintf(outfile, "%.*s", len, u8buff);
}
}
#endif
#ifdef SUPPORT_PCRE2_8
if (test_mode == PCRE8_MODE)
fprintf(outfile, "%s", (PCRE2_SPTR8)nametable + imm2_size);
#endif
}
else /* Not UTF mode */
{
PCHARSV(nametable, imm2_size, length, FALSE, outfile);
}
while (length++ < nameentrysize - imm2_size) putc(' ', outfile);
#ifdef SUPPORT_PCRE2_32
if (test_mode == PCRE32_MODE)
fprintf(outfile, "%3d\n", (int)(((PCRE2_SPTR32)nametable)[0]));
@ -4503,6 +4550,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)
fprintf(outfile, "%3d\n", (int)(
((((PCRE2_SPTR8)nametable)[0]) << 8) | ((PCRE2_SPTR8)nametable)[1]));
#endif
nametable = (void*)((PCRE2_SPTR8)nametable + nameentrysize * code_unit_size);
}
}
@ -5971,30 +6019,30 @@ BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
(void)data_ptr; /* Not used */
fprintf(outfile, "%2d(%d) Old %" SIZ_FORM " %" SIZ_FORM " \"",
scb->subscount, scb->oveccount,
scb->subscount, scb->oveccount,
SIZ_CAST scb->ovector[0], SIZ_CAST scb->ovector[1]);
PCHARSV(scb->input, scb->ovector[0], scb->ovector[1] - scb->ovector[0],
PCHARSV(scb->input, scb->ovector[0], scb->ovector[1] - scb->ovector[0],
utf, outfile);
fprintf(outfile, "\" New %" SIZ_FORM " %" SIZ_FORM " \"",
SIZ_CAST scb->output_offsets[0], SIZ_CAST scb->output_offsets[1]);
PCHARSV(scb->output, scb->output_offsets[0],
PCHARSV(scb->output, scb->output_offsets[0],
scb->output_offsets[1] - scb->output_offsets[0], utf, outfile);
if (scb->subscount == dat_datctl.substitute_stop)
if (scb->subscount == dat_datctl.substitute_stop)
{
yield = -1;
fprintf(outfile, " STOPPED");
}
else if (scb->subscount == dat_datctl.substitute_skip)
fprintf(outfile, " STOPPED");
}
else if (scb->subscount == dat_datctl.substitute_skip)
{
yield = +1;
fprintf(outfile, " SKIPPED");
}
fprintf(outfile, " SKIPPED");
}
fprintf(outfile, "\"\n");
fprintf(outfile, "\"\n");
return yield;
}
@ -6867,11 +6915,11 @@ arg_ulen = ulen; /* Value to use in match arg */
if (p[-1] != 0 && !decode_modifiers(p, CTX_DAT, NULL, &dat_datctl))
return PR_OK;
/* Setting substitute_{skip,fail} implies a substitute callout. */
/* Setting substitute_{skip,fail} implies a substitute callout. */
if (dat_datctl.substitute_skip != 0 || dat_datctl.substitute_stop != 0)
dat_datctl.control2 |= CTL2_SUBSTITUTE_CALLOUT;
dat_datctl.control2 |= CTL2_SUBSTITUTE_CALLOUT;
/* Check for mutually exclusive modifiers. At present, these are all in the
first control word. */
@ -8129,7 +8177,7 @@ if (arg != NULL && arg[0] != CHAR_MINUS)
break;
}
/* For VMS, return the value by setting a symbol, for certain values only. This
/* For VMS, return the value by setting a symbol, for certain values only. This
is contributed code which the PCRE2 developers have no means of testing. */
#ifdef __VMS

View File

@ -480,5 +480,13 @@
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr
# Check name length with non-ASCII characters
/(?'ABáC678901234567890123456789012'...)/utf
/(?'ABáC6789012345678901234567890123'...)/utf
/(?'ABZC6789012345678901234567890123'...)/utf
# End of testinput10

23
testdata/testinput4 vendored
View File

@ -2457,4 +2457,27 @@
# -------
# Test group names containing non-ASCII letters and digits
/(?'ABáC'...)\g{ABáC}/utf
abcabcdefg
/(?'XʰABC'...)/utf
xyzpq
/(?'XאABC'...)/utf
12345
/(?'XᾈABC'...)/utf
%^&*(...
/(?'𐨐ABC'...)/utf
abcde
/^(?'אABC'...)(?&אABC)(?P=אABC)/utf
123123123456
/^(?'אABC'...)(?&אABC)/utf
123123123456
# End of testinput4

15
testdata/testinput5 vendored
View File

@ -2149,4 +2149,19 @@
# -------
# Test reference and errors in non-ASCII characters in group names
/(?'𑠅ABC'...)/I,utf
abcde\=copy=𑠅ABC
# Bad ones
/(?'AB၌C'...)\g{AB၌C}/utf
/(?'٠ABC'...)/utf
/(?'²ABC'...)/utf
/(?'X²ABC'...)/utf
# End of testinput5

165
testdata/testoutput10 vendored
View File

@ -248,7 +248,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
@ -261,7 +261,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xe1
Last code unit = \x80
@ -274,7 +274,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xf0
Last code unit = \x80
@ -287,7 +287,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xf4
Last code unit = \x80
@ -300,7 +300,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xf4
Last code unit = \xbf
@ -313,7 +313,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc3
Last code unit = \xbf
@ -326,7 +326,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
@ -339,7 +339,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc2
Last code unit = \x80
@ -352,7 +352,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc3
Last code unit = \xbf
@ -365,7 +365,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xed
Last code unit = \xb4
@ -380,7 +380,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xe6
Last code unit = \x9e
@ -395,7 +395,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc2
Last code unit = \x80
@ -408,7 +408,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc2
Last code unit = \x84
@ -421,7 +421,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc4
Last code unit = \x84
@ -434,7 +434,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xe0
Last code unit = \xa1
@ -447,7 +447,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xf0
Last code unit = \xab
@ -460,7 +460,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -495,7 +495,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
@ -514,7 +514,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: x \xc4
Subject length lower bound = 1
@ -531,7 +531,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: a x \xc4
Subject length lower bound = 1
@ -548,7 +548,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: a x \xc4
Subject length lower bound = 1
@ -566,7 +566,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: x \xc4
Subject length lower bound = 1
@ -578,7 +578,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
@ -592,7 +592,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'a'
Last code unit = \x80
@ -606,7 +606,7 @@ Subject length lower bound = 2
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'a'
Last code unit = \x81
@ -619,7 +619,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/[\x{100}]/IB,utf
@ -629,7 +629,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
@ -648,7 +648,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc3
Last code unit = \xbf
@ -663,7 +663,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Subject length lower bound = 1
@ -678,14 +678,14 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
First code unit = \xc4
Last code unit = 'z'
Subject length lower bound = 7
/\777/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc7
Last code unit = \xbf
@ -703,7 +703,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc4
Last code unit = \x80
@ -717,7 +717,7 @@ Subject length lower bound = 2
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc4
Last code unit = 'X'
@ -761,7 +761,7 @@ No match
0: \x{1234}
/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I
Capturing subpattern count = 0
Capture group count = 0
Compile options: <none>
Overall options: utf
\R matches any Unicode newline
@ -771,7 +771,7 @@ Last code unit = 'b'
Subject length lower bound = 3
/\h/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x20 \xc2 \xe1 \xe2 \xe3
Subject length lower bound = 1
@ -795,7 +795,7 @@ Subject length lower bound = 1
0: \x{3000}
/\v/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
Subject length lower bound = 1
@ -813,7 +813,7 @@ Subject length lower bound = 1
0: \x{2028}
/\h*A/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x20 A \xc2 \xe1 \xe2 \xe3
Last code unit = 'A'
@ -822,21 +822,21 @@ Subject length lower bound = 1
0: A
/\v+A/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
Last code unit = 'A'
Subject length lower bound = 2
/\s?xxx\s/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x
Last code unit = 'x'
Subject length lower bound = 4
/\sxxx\s/I,utf,tables=2
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc2
Last code unit = 'x'
@ -847,7 +847,7 @@ Subject length lower bound = 5
0: \x{a0}xxx\x{85}
/\S \S/I,utf,tables=2
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -883,25 +883,25 @@ Error -36 (bad UTF-8 offset)
No match
/\x{1234}+/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: \xe1
Subject length lower bound = 1
/\x{1234}+?/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: \xe1
Subject length lower bound = 1
/\x{1234}++/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: \xe1
Subject length lower bound = 1
/\x{1234}{2}/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: \xe1
Subject length lower bound = 2
@ -913,7 +913,7 @@ Subject length lower bound = 2
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Subject length lower bound = 1
@ -925,14 +925,14 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'X'
Last code unit = \x80
Subject length lower bound = 2
/\R/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
Subject length lower bound = 1
@ -944,7 +944,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xc7
Last code unit = \xbf
@ -1105,7 +1105,7 @@ Failed: error 174 at offset 0: using UTF is disabled by the application
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = 'A' (caseless)
Subject length lower bound = 5
@ -1117,7 +1117,7 @@ Subject length lower bound = 5
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = \xb0
@ -1130,7 +1130,7 @@ Subject length lower bound = 5
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = \xb0
@ -1143,14 +1143,14 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = 'A' (caseless)
Last code unit = 'B' (caseless)
Subject length lower bound = 3
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: \xd0 \xd1
Subject length lower bound = 17
@ -1176,17 +1176,17 @@ Subject length lower bound = 17
------------------------------------------------------------------
/\h/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x09 \x20 \xa0
Subject length lower bound = 1
/\v/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85
Subject length lower bound = 1
/\R/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85
Subject length lower bound = 1
@ -1199,7 +1199,7 @@ Subject length lower bound = 1
------------------------------------------------------------------
/\x{212a}+/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: K k \xe2
Subject length lower bound = 1
@ -1207,7 +1207,7 @@ Subject length lower bound = 1
0: KKkk\x{212a}
/s+/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: S s \xc5
Subject length lower bound = 1
@ -1222,7 +1222,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: A \xc4
Last code unit = 'A'
@ -1239,7 +1239,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4
Subject length lower bound = 1
@ -1251,7 +1251,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: Z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd
\xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc
@ -1273,7 +1273,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9
\xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8
@ -1289,7 +1289,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: - ] a d z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc
\xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb
@ -1314,7 +1314,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: a b \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd
\xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc
@ -1332,7 +1332,7 @@ Subject length lower bound = 7
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc4
Subject length lower bound = 1
@ -1345,7 +1345,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4
Subject length lower bound = 1
@ -1358,7 +1358,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
@ -1373,7 +1373,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1395,7 +1395,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -1416,7 +1416,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1435,7 +1435,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce
\xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd
@ -1462,7 +1462,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8
\xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7
@ -1503,7 +1503,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8
\xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7
@ -1520,7 +1520,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: \xce \xcf
Last code unit = 'B' (caseless)
@ -1531,7 +1531,7 @@ Subject length lower bound = 2
Failed: error -3: UTF-8 error: 1 byte missing at end
/(?<=(a)(?-1))x/I,utf
Capturing subpattern count = 1
Capture group count = 1
Max lookbehind = 2
Options: utf
First code unit = 'x'
@ -1579,7 +1579,7 @@ Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP),
# but subjects containing them must not be UTF-checked.
/\x{d800}/I,utf,allow_surrogate_escapes
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Extra options: allow_surrogate_escapes
First code unit = \xed
@ -1602,7 +1602,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Compile options: utf
Overall options: anchored utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
@ -1635,5 +1635,14 @@ No match
3(2) Old 13 16 "def" New 17 22 "<def>"
4(2) Old 22 22 "" New 28 30 "<>"
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
# Check name length with non-ASCII characters
/(?'ABáC678901234567890123456789012'...)/utf
/(?'ABáC6789012345678901234567890123'...)/utf
Failed: error 148 at offset 36: subpattern name is too long (maximum 32 code units)
/(?'ABZC6789012345678901234567890123'...)/utf
# End of testinput10

View File

@ -13,11 +13,11 @@
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/\x{100}/I
Capturing subpattern count = 0
Capture group count = 0
First code unit = \x{100}
Subject length lower bound = 1
@ -215,7 +215,7 @@ Subject length lower bound = 1
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional trailing comment
/Ix
Capturing subpattern count = 0
Capture group count = 0
Contains explicit CR or LF match
Options: extended
Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
@ -260,7 +260,7 @@ Subject length lower bound = 3
------------------------------------------------------------------
/\h+/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
@ -275,7 +275,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
@ -284,7 +284,7 @@ Subject length lower bound = 1
0: \x{200a}\xa0\x{2000}
/\H+/I
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
0: \x{167f}\x{1681}\x{180d}\x{180f}
@ -306,7 +306,7 @@ Subject length lower bound = 1
0: \x9f\xa1\x{2fff}\x{3001}
/\v+/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
\x{2027}\x{2030}\x{2028}\x{2029}
@ -321,7 +321,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
\x{2027}\x{2030}\x{2028}\x{2029}
@ -330,7 +330,7 @@ Subject length lower bound = 1
0: \x85\x0a\x0b\x0c\x0d
/\V+/I
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
\x{2028}\x{2029}\x{2027}\x{2030}
0: \x{2027}\x{2030}
@ -344,7 +344,7 @@ Subject length lower bound = 1
0: \x09\x0e\x84\x86
/\R+/I,bsr=unicode
Capturing subpattern count = 0
Capture group count = 0
\R matches any Unicode newline
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
@ -354,7 +354,7 @@ Subject length lower bound = 1
0: \x85\x0a\x0b\x0c\x0d
/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I
Capturing subpattern count = 0
Capture group count = 0
First code unit = \x{d800}
Last code unit = \x{dd00}
Subject length lower bound = 6
@ -600,7 +600,7 @@ Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b
\x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a
\x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9
@ -624,7 +624,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >

View File

@ -13,11 +13,11 @@
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/\x{100}/I
Capturing subpattern count = 0
Capture group count = 0
First code unit = \x{100}
Subject length lower bound = 1
@ -215,7 +215,7 @@ Subject length lower bound = 1
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional trailing comment
/Ix
Capturing subpattern count = 0
Capture group count = 0
Contains explicit CR or LF match
Options: extended
Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
@ -260,7 +260,7 @@ Subject length lower bound = 3
------------------------------------------------------------------
/\h+/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
@ -275,7 +275,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
@ -284,7 +284,7 @@ Subject length lower bound = 1
0: \x{200a}\xa0\x{2000}
/\H+/I
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
0: \x{167f}\x{1681}\x{180d}\x{180f}
@ -306,7 +306,7 @@ Subject length lower bound = 1
0: \x9f\xa1\x{2fff}\x{3001}
/\v+/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
\x{2027}\x{2030}\x{2028}\x{2029}
@ -321,7 +321,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
\x{2027}\x{2030}\x{2028}\x{2029}
@ -330,7 +330,7 @@ Subject length lower bound = 1
0: \x85\x0a\x0b\x0c\x0d
/\V+/I
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
\x{2028}\x{2029}\x{2027}\x{2030}
0: \x{2027}\x{2030}
@ -344,7 +344,7 @@ Subject length lower bound = 1
0: \x09\x0e\x84\x86
/\R+/I,bsr=unicode
Capturing subpattern count = 0
Capture group count = 0
\R matches any Unicode newline
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
@ -354,7 +354,7 @@ Subject length lower bound = 1
0: \x85\x0a\x0b\x0c\x0d
/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I
Capturing subpattern count = 0
Capture group count = 0
First code unit = \x{d800}
Last code unit = \x{dd00}
Subject length lower bound = 6
@ -558,19 +558,19 @@ Failed: error 134 at offset 12: character code point value in \x{} or \o{} is to
Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large
/\x{7fffffff}\x{7fffffff}/I
Capturing subpattern count = 0
Capture group count = 0
First code unit = \x{7fffffff}
Last code unit = \x{7fffffff}
Subject length lower bound = 2
/\x{80000000}\x{80000000}/I
Capturing subpattern count = 0
Capture group count = 0
First code unit = \x{80000000}
Last code unit = \x{80000000}
Subject length lower bound = 2
/\x{ffffffff}\x{ffffffff}/I
Capturing subpattern count = 0
Capture group count = 0
First code unit = \x{ffffffff}
Last code unit = \x{ffffffff}
Subject length lower bound = 2
@ -588,7 +588,7 @@ Subject length lower bound = 2
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless
First code unit = \x{400000}
Last code unit = \x{800000}
@ -603,7 +603,7 @@ Subject length lower bound = 2
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b
\x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a
\x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9
@ -627,7 +627,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >

View File

@ -18,7 +18,7 @@
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{ffff}
Subject length lower bound = 1
@ -30,7 +30,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{d800}
Last code unit = \x{dc00}
@ -43,7 +43,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Subject length lower bound = 1
@ -55,7 +55,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{1000}
Subject length lower bound = 1
@ -67,7 +67,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{d800}
Last code unit = \x{dc00}
@ -80,7 +80,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{dbc0}
Last code unit = \x{dc00}
@ -93,7 +93,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{dbff}
Last code unit = \x{dfff}
@ -106,7 +106,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xff
Subject length lower bound = 1
@ -118,7 +118,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Subject length lower bound = 1
@ -130,7 +130,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x80
Subject length lower bound = 1
@ -142,7 +142,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xff
Subject length lower bound = 1
@ -154,7 +154,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{d55c}
Last code unit = \x{c5b4}
@ -169,7 +169,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{65e5}
Last code unit = \x{8a9e}
@ -184,7 +184,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x80
Subject length lower bound = 1
@ -196,7 +196,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x84
Subject length lower bound = 1
@ -208,7 +208,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{104}
Subject length lower bound = 1
@ -220,7 +220,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{861}
Subject length lower bound = 1
@ -232,7 +232,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{d844}
Last code unit = \x{deab}
@ -245,7 +245,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -281,7 +281,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Last code unit = \x{100}
@ -300,7 +300,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: x \xff
Subject length lower bound = 1
@ -317,7 +317,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: a x \xff
Subject length lower bound = 1
@ -334,7 +334,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: a x \xff
Subject length lower bound = 1
@ -352,7 +352,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: x \xff
Subject length lower bound = 1
@ -364,7 +364,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Subject length lower bound = 1
@ -377,7 +377,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'a'
Last code unit = \x{100}
@ -391,7 +391,7 @@ Subject length lower bound = 2
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'a'
Last code unit = \x{101}
@ -404,7 +404,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/[\x{100}]/IB,utf
@ -414,7 +414,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Subject length lower bound = 1
@ -432,7 +432,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xff
Subject length lower bound = 1
@ -446,7 +446,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Subject length lower bound = 1
@ -461,14 +461,14 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
First code unit = \x{100}
Last code unit = 'z'
Subject length lower bound = 7
/\777/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{1ff}
Subject length lower bound = 1
@ -485,7 +485,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Last code unit = \x{200}
@ -499,7 +499,7 @@ Subject length lower bound = 2
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Last code unit = 'X'
@ -547,7 +547,7 @@ Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
0: \x{11234}
/(*UTF)\x{11234}/I
Capturing subpattern count = 0
Capture group count = 0
Compile options: <none>
Overall options: utf
First code unit = \x{d804}
@ -565,7 +565,7 @@ Failed: error 160 at offset 5: (*VERB) not recognized or malformed
abcd\x{11234}pqr
/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I
Capturing subpattern count = 0
Capture group count = 0
Compile options: <none>
Overall options: utf
\R matches any Unicode newline
@ -578,7 +578,7 @@ Subject length lower bound = 3
Failed: error 160 at offset 14: (*VERB) not recognized or malformed
/\h/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1
@ -602,7 +602,7 @@ Subject length lower bound = 1
0: \x{3000}
/\v/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
@ -620,7 +620,7 @@ Subject length lower bound = 1
0: \x{2028}
/\h*A/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x20 A \xa0 \xff
Last code unit = 'A'
@ -631,7 +631,7 @@ Subject length lower bound = 1
0: \x{2000}A
/\R*A/I,bsr=unicode,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
\R matches any Unicode newline
Starting code units: \x0a \x0b \x0c \x0d A \x85 \xff
@ -643,21 +643,21 @@ Subject length lower bound = 1
0: \x{2028}A
/\v+A/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Last code unit = 'A'
Subject length lower bound = 2
/\s?xxx\s/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x
Last code unit = 'x'
Subject length lower bound = 4
/\sxxx\s/I,utf,tables=2
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
Last code unit = 'x'
@ -668,7 +668,7 @@ Subject length lower bound = 5
0: \x{a0}xxx\x{85}
/\S \S/I,utf,tables=2
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -708,25 +708,25 @@ Failed: error -33: bad offset value
Failed: error -33: bad offset value
/\x{1234}+/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = \x{1234}
Subject length lower bound = 1
/\x{1234}+?/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = \x{1234}
Subject length lower bound = 1
/\x{1234}++/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = \x{1234}
Subject length lower bound = 1
/\x{1234}{2}/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = \x{1234}
Last code unit = \x{1234}
@ -739,7 +739,7 @@ Subject length lower bound = 2
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Subject length lower bound = 1
@ -751,14 +751,14 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'X'
Last code unit = \x{200}
Subject length lower bound = 2
/\R/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
@ -936,7 +936,7 @@ Failed: error 174 at offset 0: using UTF is disabled by the application
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = 'A' (caseless)
Last code unit = \x{1fb0} (caseless)
@ -949,7 +949,7 @@ Subject length lower bound = 5
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = \x{1fb0}
@ -962,7 +962,7 @@ Subject length lower bound = 5
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = \x{1fb0}
@ -975,14 +975,14 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = 'A' (caseless)
Last code unit = \x{1fb0} (caseless)
Subject length lower bound = 3
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = \x{401} (caseless)
Last code unit = \x{42f} (caseless)
@ -1017,7 +1017,7 @@ Subject length lower bound = 17
------------------------------------------------------------------
/\x{212a}+/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: K k \xff
Subject length lower bound = 1
@ -1025,7 +1025,7 @@ Subject length lower bound = 1
0: KKkk\x{212a}
/s+/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: S s \xff
Subject length lower bound = 1
@ -1048,7 +1048,7 @@ Failed: error 134 at offset 10: character code point value in \x{} or \o{} is to
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: A \xff
Last code unit = 'A'
@ -1065,7 +1065,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
Subject length lower bound = 1
@ -1077,7 +1077,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: Z \xff
Subject length lower bound = 1
@ -1095,7 +1095,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87
\x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96
@ -1115,7 +1115,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: - ] a d z \xff
Subject length lower bound = 1
@ -1136,7 +1136,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: a b \xff
Last code unit = 'z'
@ -1150,7 +1150,7 @@ Subject length lower bound = 7
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xff
Subject length lower bound = 1
@ -1163,7 +1163,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
Subject length lower bound = 1
@ -1176,7 +1176,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
@ -1191,7 +1191,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1217,7 +1217,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -1243,7 +1243,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1266,7 +1266,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: \xff
Subject length lower bound = 1
@ -1289,7 +1289,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
@ -1335,7 +1335,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
@ -1357,7 +1357,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: \xff
Last code unit = 'B' (caseless)
@ -1443,7 +1443,7 @@ Failed: error 191 at offset 0: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowe
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Compile options: utf
Overall options: anchored utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a

View File

@ -18,7 +18,7 @@
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{ffff}
Subject length lower bound = 1
@ -30,7 +30,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{10000}
Subject length lower bound = 1
@ -42,7 +42,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Subject length lower bound = 1
@ -54,7 +54,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{1000}
Subject length lower bound = 1
@ -66,7 +66,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{10000}
Subject length lower bound = 1
@ -78,7 +78,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100000}
Subject length lower bound = 1
@ -90,7 +90,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{10ffff}
Subject length lower bound = 1
@ -102,7 +102,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xff
Subject length lower bound = 1
@ -114,7 +114,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Subject length lower bound = 1
@ -126,7 +126,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x80
Subject length lower bound = 1
@ -138,7 +138,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xff
Subject length lower bound = 1
@ -150,7 +150,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{d55c}
Last code unit = \x{c5b4}
@ -165,7 +165,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{65e5}
Last code unit = \x{8a9e}
@ -180,7 +180,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x80
Subject length lower bound = 1
@ -192,7 +192,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x84
Subject length lower bound = 1
@ -204,7 +204,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{104}
Subject length lower bound = 1
@ -216,7 +216,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{861}
Subject length lower bound = 1
@ -228,7 +228,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{212ab}
Subject length lower bound = 1
@ -240,7 +240,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -276,7 +276,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Last code unit = \x{100}
@ -295,7 +295,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: x \xff
Subject length lower bound = 1
@ -312,7 +312,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: a x \xff
Subject length lower bound = 1
@ -329,7 +329,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: a x \xff
Subject length lower bound = 1
@ -347,7 +347,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: x \xff
Subject length lower bound = 1
@ -359,7 +359,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Subject length lower bound = 1
@ -372,7 +372,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'a'
Last code unit = \x{100}
@ -386,7 +386,7 @@ Subject length lower bound = 2
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'a'
Last code unit = \x{101}
@ -399,7 +399,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/[\x{100}]/IB,utf
@ -409,7 +409,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Subject length lower bound = 1
@ -427,7 +427,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xff
Subject length lower bound = 1
@ -441,7 +441,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Subject length lower bound = 1
@ -456,14 +456,14 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
First code unit = \x{100}
Last code unit = 'z'
Subject length lower bound = 7
/\777/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{1ff}
Subject length lower bound = 1
@ -480,7 +480,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Last code unit = \x{200}
@ -494,7 +494,7 @@ Subject length lower bound = 2
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{100}
Last code unit = 'X'
@ -542,7 +542,7 @@ Failed: error 160 at offset 7: (*VERB) not recognized or malformed
abcd\x{11234}pqr
/(*UTF)\x{11234}/I
Capturing subpattern count = 0
Capture group count = 0
Compile options: <none>
Overall options: utf
First code unit = \x{11234}
@ -562,7 +562,7 @@ Failed: error 160 at offset 5: (*VERB) not recognized or malformed
Failed: error 160 at offset 14: (*VERB) not recognized or malformed
/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
Capturing subpattern count = 0
Capture group count = 0
Compile options: <none>
Overall options: utf
\R matches any Unicode newline
@ -572,7 +572,7 @@ Last code unit = 'b'
Subject length lower bound = 3
/\h/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x20 \xa0 \xff
Subject length lower bound = 1
@ -596,7 +596,7 @@ Subject length lower bound = 1
0: \x{3000}
/\v/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
@ -614,7 +614,7 @@ Subject length lower bound = 1
0: \x{2028}
/\h*A/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x20 A \xa0 \xff
Last code unit = 'A'
@ -625,7 +625,7 @@ Subject length lower bound = 1
0: \x{2000}A
/\R*A/I,bsr=unicode,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
\R matches any Unicode newline
Starting code units: \x0a \x0b \x0c \x0d A \x85 \xff
@ -637,21 +637,21 @@ Subject length lower bound = 1
0: \x{2028}A
/\v+A/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Last code unit = 'A'
Subject length lower bound = 2
/\s?xxx\s/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x
Last code unit = 'x'
Subject length lower bound = 4
/\sxxx\s/I,utf,tables=2
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
Last code unit = 'x'
@ -662,7 +662,7 @@ Subject length lower bound = 5
0: \x{a0}xxx\x{85}
/\S \S/I,utf,tables=2
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -702,25 +702,25 @@ Failed: error -33: bad offset value
Failed: error -33: bad offset value
/\x{1234}+/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = \x{1234}
Subject length lower bound = 1
/\x{1234}+?/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = \x{1234}
Subject length lower bound = 1
/\x{1234}++/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = \x{1234}
Subject length lower bound = 1
/\x{1234}{2}/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = \x{1234}
Last code unit = \x{1234}
@ -733,7 +733,7 @@ Subject length lower bound = 2
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Subject length lower bound = 1
@ -745,14 +745,14 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'X'
Last code unit = \x{200}
Subject length lower bound = 2
/\R/I,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
Subject length lower bound = 1
@ -930,7 +930,7 @@ Failed: error 174 at offset 0: using UTF is disabled by the application
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = 'A' (caseless)
Last code unit = \x{1fb0} (caseless)
@ -943,7 +943,7 @@ Subject length lower bound = 5
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = \x{1fb0}
@ -956,7 +956,7 @@ Subject length lower bound = 5
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = \x{1fb0}
@ -969,14 +969,14 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = 'A' (caseless)
Last code unit = \x{1fb0} (caseless)
Subject length lower bound = 3
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = \x{401} (caseless)
Last code unit = \x{42f} (caseless)
@ -1011,7 +1011,7 @@ Subject length lower bound = 17
------------------------------------------------------------------
/\x{212a}+/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: K k \xff
Subject length lower bound = 1
@ -1019,7 +1019,7 @@ Subject length lower bound = 1
0: KKkk\x{212a}
/s+/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: S s \xff
Subject length lower bound = 1
@ -1042,7 +1042,7 @@ Failed: error 134 at offset 10: character code point value in \x{} or \o{} is to
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: A \xff
Last code unit = 'A'
@ -1059,7 +1059,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
Subject length lower bound = 1
@ -1071,7 +1071,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: Z \xff
Subject length lower bound = 1
@ -1089,7 +1089,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87
\x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96
@ -1109,7 +1109,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: - ] a d z \xff
Subject length lower bound = 1
@ -1130,7 +1130,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Starting code units: a b \xff
Last code unit = 'z'
@ -1144,7 +1144,7 @@ Subject length lower bound = 7
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xff
Subject length lower bound = 1
@ -1157,7 +1157,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
Subject length lower bound = 1
@ -1170,7 +1170,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
@ -1185,7 +1185,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1211,7 +1211,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
@ -1237,7 +1237,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
@ -1260,7 +1260,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: \xff
Subject length lower bound = 1
@ -1283,7 +1283,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
@ -1329,7 +1329,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
@ -1351,7 +1351,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Starting code units: \xff
Last code unit = 'B' (caseless)
@ -1418,7 +1418,7 @@ No match
# errors in 16-bit mode.
/\x{d800}/I,utf,allow_surrogate_escapes
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Extra options: allow_surrogate_escapes
First code unit = \x{d800}
@ -1440,7 +1440,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Compile options: utf
Overall options: anchored utf
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a

34
testdata/testoutput15 vendored
View File

@ -7,7 +7,7 @@
# (2) Other tests that must not be run with JIT.
/(a+)*zz/I
Capturing subpattern count = 1
Capture group count = 1
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
@ -24,7 +24,7 @@ Minimum depth limit = 30
No match
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
Capturing subpattern count = 1
Capture group count = 1
May match empty string
Subject length lower bound = 0
/* this is a C style comment */\=find_limits
@ -117,7 +117,7 @@ Failed: error 160 at offset 17: (*VERB) not recognized or malformed
Failed: error 160 at offset 24: (*VERB) not recognized or malformed
/(*LIMIT_DEPTH=4294967280)abc/I
Capturing subpattern count = 0
Capture group count = 0
Depth limit = 4294967280
First code unit = 'a'
Last code unit = 'c'
@ -137,7 +137,7 @@ Failed: error -47: match limit exceeded
Failed: error -53: matching depth limit exceeded
/(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1
Capture group count = 1
Match limit = 3000
Starting code units: a z
Last code unit = 'z'
@ -150,7 +150,7 @@ Failed: error -47: match limit exceeded
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1
Capture group count = 1
Match limit = 3000
Starting code units: a z
Last code unit = 'z'
@ -160,7 +160,7 @@ Subject length lower bound = 2
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(a+)*zz/I
Capturing subpattern count = 1
Capture group count = 1
Match limit = 60000
Starting code units: a z
Last code unit = 'z'
@ -173,7 +173,7 @@ No match
Failed: error -47: match limit exceeded
/(*LIMIT_DEPTH=10)(a+)*zz/I
Capturing subpattern count = 1
Capture group count = 1
Depth limit = 10
Starting code units: a z
Last code unit = 'z'
@ -186,7 +186,7 @@ Failed: error -53: matching depth limit exceeded
Failed: error -53: matching depth limit exceeded
/(*LIMIT_DEPTH=10)(*LIMIT_DEPTH=1000)(a+)*zz/I
Capturing subpattern count = 1
Capture group count = 1
Depth limit = 1000
Starting code units: a z
Last code unit = 'z'
@ -196,7 +196,7 @@ Subject length lower bound = 2
No match
/(*LIMIT_DEPTH=1000)(a+)*zz/I
Capturing subpattern count = 1
Capture group count = 1
Depth limit = 1000
Starting code units: a z
Last code unit = 'z'
@ -269,14 +269,14 @@ Failed: error -52: nested recursion at the same subject position
# when JIT is used.
/(?R)/I
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Subject length lower bound = 0
abcd
Failed: error -52: nested recursion at the same subject position
/(a|(?R))/I
Capturing subpattern count = 1
Capture group count = 1
May match empty string
Subject length lower bound = 0
abcd
@ -286,7 +286,7 @@ Subject length lower bound = 0
Failed: error -52: nested recursion at the same subject position
/(ab|(bc|(de|(?R))))/I
Capturing subpattern count = 3
Capture group count = 3
May match empty string
Subject length lower bound = 0
abcd
@ -296,7 +296,7 @@ Subject length lower bound = 0
Failed: error -52: nested recursion at the same subject position
/(ab|(bc|(de|(?1))))/I
Capturing subpattern count = 3
Capture group count = 3
May match empty string
Subject length lower bound = 0
abcd
@ -306,7 +306,7 @@ Subject length lower bound = 0
Failed: error -52: nested recursion at the same subject position
/x(ab|(bc|(de|(?1)x)x)x)/I
Capturing subpattern count = 3
Capture group count = 3
First code unit = 'x'
Subject length lower bound = 3
xab123
@ -352,7 +352,7 @@ Failed: error -52: nested recursion at the same subject position
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Subject length lower bound = 1
abcd
Failed: error -52: nested recursion at the same subject position
@ -367,7 +367,7 @@ Failed: error -52: nested recursion at the same subject position
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: no_auto_possess
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
@ -390,7 +390,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Compile options: <none>
Overall options: no_auto_possess
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P

View File

@ -3,14 +3,14 @@
# are different without JIT.
/abc/I,jit,jitverify
Capturing subpattern count = 0
Capture group count = 0
First code unit = 'a'
Last code unit = 'c'
Subject length lower bound = 3
JIT support is not available in this version of PCRE2
/a*/I
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Subject length lower bound = 0

36
testdata/testoutput17 vendored

File diff suppressed because one or more lines are too long

1336
testdata/testoutput2 vendored

File diff suppressed because it is too large Load Diff

View File

@ -32,9 +32,9 @@
#load testsaved2
#pop info
Capturing subpattern count = 2
Capture group count = 2
Max back reference = 2
Named capturing subpatterns:
Named capture groups:
n 1
n 2
Options: dupnames
@ -66,8 +66,8 @@ No match, mark = A
4: A
#pop info
Capturing subpattern count = 4
Named capturing subpatterns:
Capture group count = 4
Named capture groups:
ADDR 2
ADDRESS_PAT 4
NAME 1

View File

@ -79,7 +79,7 @@
Failed: error 183 at offset 4: using \C is disabled by the application
/ab\Cde/info
Capturing subpattern count = 0
Capture group count = 0
Contains \C
First code unit = 'a'
Last code unit = 'e'

View File

@ -4,7 +4,7 @@
# in some widths and not in others.
/ab\Cde/utf,info
Capturing subpattern count = 0
Capture group count = 0
Contains \C
Options: utf
First code unit = 'a'

View File

@ -4,7 +4,7 @@
# in some widths and not in others.
/ab\Cde/utf,info
Capturing subpattern count = 0
Capture group count = 0
Contains \C
Options: utf
First code unit = 'a'

View File

@ -4,7 +4,7 @@
# in some widths and not in others.
/ab\Cde/utf,info
Capturing subpattern count = 0
Capture group count = 0
Contains \C
Options: utf
First code unit = 'a'

View File

@ -78,13 +78,13 @@ No match
0: école
/\w/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1
/\w/I,locale=fr_FR
Capturing subpattern count = 0
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
@ -153,7 +153,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í

View File

@ -78,13 +78,13 @@ No match
0: école
/\w/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1
/\w/I,locale=fr_FR
Capturing subpattern count = 0
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
@ -153,7 +153,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í

View File

@ -78,13 +78,13 @@ No match
0: école
/\w/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1
/\w/I,locale=fr_FR
Capturing subpattern count = 0
Capture group count = 0
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
@ -153,7 +153,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í

37
testdata/testoutput4 vendored
View File

@ -3975,4 +3975,41 @@ No match
# -------
# Test group names containing non-ASCII letters and digits
/(?'ABáC'...)\g{ABáC}/utf
abcabcdefg
0: abcabc
1: abc
/(?'XʰABC'...)/utf
xyzpq
0: xyz
1: xyz
/(?'XאABC'...)/utf
12345
0: 123
1: 123
/(?'XᾈABC'...)/utf
%^&*(...
0: %^&
1: %^&
/(?'𐨐ABC'...)/utf
abcde
0: abc
1: abc
/^(?'אABC'...)(?&אABC)(?P=אABC)/utf
123123123456
0: 123123123
1: 123
/^(?'אABC'...)(?&אABC)/utf
123123123456
0: 123123
1: 123
# End of testinput4

93
testdata/testoutput5 vendored
View File

@ -147,7 +147,7 @@ Failed: error 173 at offset 9: disallowed Unicode code point (>= 0xd800 && <= 0x
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = '.'
@ -164,7 +164,7 @@ Subject length lower bound = 4
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Last code unit = 'X'
Subject length lower bound = 4
@ -179,7 +179,7 @@ Subject length lower bound = 4
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Subject length lower bound = 3
\x{212ab}\x{212ab}\x{212ab}\x{861}
@ -193,7 +193,7 @@ Subject length lower bound = 3
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Compile options: utf
Overall options: anchored utf
Starting code units: a b
@ -238,7 +238,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Options: utf
Subject length lower bound = 0
@ -251,7 +251,7 @@ Subject length lower bound = 0
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'a'
Subject length lower bound = 1
@ -264,7 +264,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'a'
Last code unit = 'b'
@ -291,7 +291,7 @@ No match
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
First code unit = \xff
Subject length lower bound = 1
>\xff<
@ -304,7 +304,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/[Ä-Ü]/utf
@ -343,7 +343,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Options: utf
Last code unit = 'z'
Subject length lower bound = 7
@ -363,7 +363,7 @@ Subject length lower bound = 7
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 2
Capture group count = 2
May match empty string
Options: utf
Subject length lower bound = 0
@ -394,7 +394,7 @@ Subject length lower bound = 0
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 2
Capture group count = 2
May match empty string
Options: utf
Subject length lower bound = 0
@ -414,7 +414,7 @@ Subject length lower bound = 0
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 2
Capture group count = 2
May match empty string
Options: utf
Subject length lower bound = 0
@ -445,7 +445,7 @@ Subject length lower bound = 0
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 2
Capture group count = 2
May match empty string
Options: utf
Subject length lower bound = 0
@ -471,7 +471,7 @@ Subject length lower bound = 0
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Compile options: no_start_optimize utf
Overall options: anchored no_start_optimize utf
Subject length lower bound = 0
@ -713,7 +713,7 @@ No match
0: \x{1ec5}
/a\Rb/I,bsr=anycrlf,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
\R matches CR, LF, or CRLF
First code unit = 'a'
@ -732,7 +732,7 @@ No match
No match
/a\Rb/I,bsr=unicode,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
\R matches any Unicode newline
First code unit = 'a'
@ -750,7 +750,7 @@ Subject length lower bound = 3
0: a\x{0b}b
/a\R?b/I,bsr=anycrlf,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
\R matches CR, LF, or CRLF
First code unit = 'a'
@ -769,7 +769,7 @@ No match
No match
/a\R?b/I,bsr=unicode,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
\R matches any Unicode newline
First code unit = 'a'
@ -1408,22 +1408,22 @@ Failed: error 168 at offset 3: \c must be followed by a printable ASCII characte
2: \x{0d}
/[^\x{1234}]+/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Subject length lower bound = 1
/[^\x{1234}]+?/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Subject length lower bound = 1
/[^\x{1234}]++/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Subject length lower bound = 1
/[^\x{1234}]{2}/Ii,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
Subject length lower bound = 2
@ -1703,7 +1703,7 @@ Partial match: \x{0d}\x{0d}
------------------------------------------------------------------
/(?<=\x{1234}\x{1234})\bxy/I,utf
Capturing subpattern count = 0
Capture group count = 0
Max lookbehind = 2
Options: utf
First code unit = 'x'
@ -1768,7 +1768,7 @@ Failed: error 173 at offset 6: disallowed Unicode code point (>= 0xd800 && <= 0x
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/[\p{^L}]/IB
@ -1778,7 +1778,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/[\P{L}]/IB
@ -1788,7 +1788,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/[\P{^L}]/IB
@ -1798,7 +1798,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/[abc\p{L}\x{0660}]/IB,utf
@ -1808,7 +1808,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Subject length lower bound = 1
@ -1819,7 +1819,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Subject length lower bound = 1
1234
@ -1832,7 +1832,7 @@ Subject length lower bound = 1
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
Subject length lower bound = 1
1234
@ -2998,7 +2998,7 @@ Partial match: AA
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: caseless utf
First code unit = 'A' (caseless)
Last code unit = 'B' (caseless)
@ -3914,7 +3914,7 @@ No match
------------------------------------------------------------------
/^s?c/Iim,utf
Capturing subpattern count = 0
Capture group count = 0
Options: caseless multiline utf
First code unit at start or follows newline
Last code unit = 'c' (caseless)
@ -4889,4 +4889,31 @@ MK: ABC
# -------
# Test reference and errors in non-ASCII characters in group names
/(?'𑠅ABC'...)/I,utf
Capture group count = 1
Named capture groups:
𑠅ABC 1
Options: utf
Subject length lower bound = 3
abcde\=copy=𑠅ABC
0: abc
1: abc
C abc (3) 𑠅ABC (group 1)
# Bad ones
/(?'AB၌C'...)\g{AB၌C}/utf
Failed: error 142 at offset 5: syntax error in subpattern name (missing terminator?)
/(?'٠ABC'...)/utf
Failed: error 144 at offset 3: subpattern name must start with a non-digit
/(?'²ABC'...)/utf
Failed: error 162 at offset 3: subpattern name expected
/(?'X²ABC'...)/utf
Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?)
# End of testinput5

30
testdata/testoutput6 vendored
View File

@ -5978,7 +5978,7 @@ Partial match: 123
0: Content-Type:xxxyyyz
/^abc/Im,newline=lf
Capturing subpattern count = 0
Capture group count = 0
Options: multiline
Forced newline is LF
First code unit at start or follows newline
@ -6001,7 +6001,7 @@ No match
No match
/^abc/Im,newline=crlf
Capturing subpattern count = 0
Capture group count = 0
Options: multiline
Forced newline is CRLF
First code unit at start or follows newline
@ -6016,7 +6016,7 @@ No match
No match
/^abc/Im,newline=cr
Capturing subpattern count = 0
Capture group count = 0
Options: multiline
Forced newline is CR
First code unit at start or follows newline
@ -6031,7 +6031,7 @@ No match
No match
/.*/I,newline=lf
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Forced newline is LF
First code unit at start or follows newline
@ -6044,7 +6044,7 @@ Subject length lower bound = 0
0: abc\x0d
/.*/I,newline=cr
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Forced newline is CR
First code unit at start or follows newline
@ -6057,7 +6057,7 @@ Subject length lower bound = 0
0: abc
/.*/I,newline=crlf
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Forced newline is CRLF
First code unit at start or follows newline
@ -6070,7 +6070,7 @@ Subject length lower bound = 0
0: abc
/\w+(.)(.)?def/Is
Capturing subpattern count = 2
Capture group count = 2
Options: dotall
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
@ -6447,7 +6447,7 @@ No match
0: \x0aA
/a\Rb/I,bsr=anycrlf
Capturing subpattern count = 0
Capture group count = 0
\R matches CR, LF, or CRLF
First code unit = 'a'
Last code unit = 'b'
@ -6465,7 +6465,7 @@ No match
No match
/a\Rb/I,bsr=unicode
Capturing subpattern count = 0
Capture group count = 0
\R matches any Unicode newline
First code unit = 'a'
Last code unit = 'b'
@ -6482,7 +6482,7 @@ Subject length lower bound = 3
0: a\x0bb
/a\R?b/I,bsr=anycrlf
Capturing subpattern count = 0
Capture group count = 0
\R matches CR, LF, or CRLF
First code unit = 'a'
Last code unit = 'b'
@ -6500,7 +6500,7 @@ No match
No match
/a\R?b/I,bsr=unicode
Capturing subpattern count = 0
Capture group count = 0
\R matches any Unicode newline
First code unit = 'a'
Last code unit = 'b'
@ -6517,7 +6517,7 @@ Subject length lower bound = 2
0: a\x0bb
/a\R{2,4}b/I,bsr=anycrlf
Capturing subpattern count = 0
Capture group count = 0
\R matches CR, LF, or CRLF
First code unit = 'a'
Last code unit = 'b'
@ -6535,7 +6535,7 @@ No match
No match
/a\R{2,4}b/I,bsr=unicode
Capturing subpattern count = 0
Capture group count = 0
\R matches any Unicode newline
First code unit = 'a'
Last code unit = 'b'
@ -6831,7 +6831,7 @@ Partial match: +ab
0+ CBA
/(abc|def|xyz)/I
Capturing subpattern count = 1
Capture group count = 1
Starting code units: a d x
Subject length lower bound = 3
terhjk;abcdaadsfe
@ -6843,7 +6843,7 @@ Subject length lower bound = 3
No match
/(abc|def|xyz)/I,no_start_optimize
Capturing subpattern count = 1
Capture group count = 1
Options: no_start_optimize
Subject length lower bound = 0
terhjk;abcdaadsfe

View File

@ -1030,7 +1030,7 @@ No match
No match
/a\Rb/I,bsr=anycrlf,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
\R matches CR, LF, or CRLF
First code unit = 'a'
@ -1049,7 +1049,7 @@ No match
No match
/a\Rb/I,bsr=unicode,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
\R matches any Unicode newline
First code unit = 'a'
@ -1067,7 +1067,7 @@ Subject length lower bound = 3
0: a\x{0b}b
/a\R?b/I,bsr=anycrlf,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
\R matches CR, LF, or CRLF
First code unit = 'a'
@ -1086,7 +1086,7 @@ No match
No match
/a\R?b/I,bsr=unicode,utf
Capturing subpattern count = 0
Capture group count = 0
Options: utf
\R matches any Unicode newline
First code unit = 'a'

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 10
2 2 Ket
4 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Options: extended
Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 14
4 4 Ket
6 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: extended
First code unit = 'a'
Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 26
10 10 Ket
12 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 22
8 8 Ket
10 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{d55c}
Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 22
8 8 Ket
10 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{65e5}
Last code unit = \x{8a9e}
@ -904,7 +904,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
79 79 Ket
81 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -938,7 +938,7 @@ Subject length lower bound = 0
43 43 Ket
45 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -1011,7 +1011,7 @@ No match
133 133 Ket
135 End
------------------------------------------------------------------
Capturing subpattern count = 10
Capture group count = 10
May match empty string
Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 14
3 3 Ket
6 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Options: extended
Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 18
5 5 Ket
8 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: extended
First code unit = 'a'
Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 30
11 11 Ket
14 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 26
9 9 Ket
12 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{d55c}
Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 26
9 9 Ket
12 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{65e5}
Last code unit = \x{8a9e}
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
110 110 Ket
113 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
58 58 Ket
61 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
194 194 Ket
197 End
------------------------------------------------------------------
Capturing subpattern count = 10
Capture group count = 10
May match empty string
Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 14
3 3 Ket
6 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Options: extended
Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 18
5 5 Ket
8 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: extended
First code unit = 'a'
Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 30
11 11 Ket
14 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 26
9 9 Ket
12 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{d55c}
Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 26
9 9 Ket
12 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{65e5}
Last code unit = \x{8a9e}
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
110 110 Ket
113 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
58 58 Ket
61 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
194 194 Ket
197 End
------------------------------------------------------------------
Capturing subpattern count = 10
Capture group count = 10
May match empty string
Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 20
2 2 Ket
4 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Options: extended
Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 28
4 4 Ket
6 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: extended
First code unit = 'a'
Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 52
10 10 Ket
12 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 44
8 8 Ket
10 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{d55c}
Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 44
8 8 Ket
10 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{65e5}
Last code unit = \x{8a9e}
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
79 79 Ket
81 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
43 43 Ket
45 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
133 133 Ket
135 End
------------------------------------------------------------------
Capturing subpattern count = 10
Capture group count = 10
May match empty string
Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 20
2 2 Ket
4 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Options: extended
Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 28
4 4 Ket
6 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: extended
First code unit = 'a'
Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 52
10 10 Ket
12 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 44
8 8 Ket
10 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{d55c}
Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 44
8 8 Ket
10 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{65e5}
Last code unit = \x{8a9e}
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
79 79 Ket
81 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
43 43 Ket
45 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
133 133 Ket
135 End
------------------------------------------------------------------
Capturing subpattern count = 10
Capture group count = 10
May match empty string
Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 20
2 2 Ket
4 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Options: extended
Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 28
4 4 Ket
6 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: extended
First code unit = 'a'
Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 52
10 10 Ket
12 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 44
8 8 Ket
10 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{d55c}
Last code unit = \x{c5b4}
@ -404,7 +404,7 @@ Memory allocation (code space): 44
8 8 Ket
10 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \x{65e5}
Last code unit = \x{8a9e}
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
79 79 Ket
81 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
43 43 Ket
45 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
133 133 Ket
135 End
------------------------------------------------------------------
Capturing subpattern count = 10
Capture group count = 10
May match empty string
Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 7
3 3 Ket
6 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Options: extended
Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 9
5 5 Ket
8 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: extended
First code unit = 'a'
Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 18
14 14 Ket
17 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 19
15 15 Ket
18 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xed
Last code unit = \xb4
@ -404,7 +404,7 @@ Memory allocation (code space): 19
15 15 Ket
18 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xe6
Last code unit = \x9e
@ -904,7 +904,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
119 119 Ket
122 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -938,7 +938,7 @@ Subject length lower bound = 0
61 61 Ket
64 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -1011,7 +1011,7 @@ No match
205 205 Ket
208 End
------------------------------------------------------------------
Capturing subpattern count = 10
Capture group count = 10
May match empty string
Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 9
4 4 Ket
8 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Options: extended
Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 11
6 6 Ket
10 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: extended
First code unit = 'a'
Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 20
15 15 Ket
19 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 21
16 16 Ket
20 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xed
Last code unit = \xb4
@ -404,7 +404,7 @@ Memory allocation (code space): 21
16 16 Ket
20 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xe6
Last code unit = \x9e
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
150 150 Ket
154 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
76 76 Ket
80 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
266 266 Ket
270 End
------------------------------------------------------------------
Capturing subpattern count = 10
Capture group count = 10
May match empty string
Subject length lower bound = 0

View File

@ -67,7 +67,7 @@ Memory allocation (code space): 11
5 5 Ket
10 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
May match empty string
Options: extended
Subject length lower bound = 0
@ -80,7 +80,7 @@ Memory allocation (code space): 13
7 7 Ket
12 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: extended
First code unit = 'a'
Subject length lower bound = 1
@ -376,7 +376,7 @@ Memory allocation (code space): 22
16 16 Ket
21 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = 'A'
Last code unit = '.'
@ -390,7 +390,7 @@ Memory allocation (code space): 23
17 17 Ket
22 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xed
Last code unit = \xb4
@ -404,7 +404,7 @@ Memory allocation (code space): 23
17 17 Ket
22 End
------------------------------------------------------------------
Capturing subpattern count = 0
Capture group count = 0
Options: utf
First code unit = \xe6
Last code unit = \x9e
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
181 181 Ket
186 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -937,7 +937,7 @@ Subject length lower bound = 0
91 91 Ket
96 End
------------------------------------------------------------------
Capturing subpattern count = 1
Capture group count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
@ -1010,7 +1010,7 @@ No match
327 327 Ket
332 End
------------------------------------------------------------------
Capturing subpattern count = 10
Capture group count = 10
May match empty string
Subject length lower bound = 0

12
testdata/testoutput9 vendored
View File

@ -215,7 +215,7 @@ Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
\) )* # optional trailing comment
/Ix
Capturing subpattern count = 0
Capture group count = 0
Contains explicit CR or LF match
Options: extended
Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
@ -224,25 +224,25 @@ Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
Subject length lower bound = 3
/\h/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x09 \x20 \xa0
Subject length lower bound = 1
/\H/I
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/\v/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85
Subject length lower bound = 1
/\V/I
Capturing subpattern count = 0
Capture group count = 0
Subject length lower bound = 1
/\R/I
Capturing subpattern count = 0
Capture group count = 0
Starting code units: \x0a \x0b \x0c \x0d \x85
Subject length lower bound = 1