Allow non-ASCII in group names when UTF is set; revise group naming terminology
in documentation to use "capture group", as Perl does.
This commit is contained in:
parent
a657d4cff8
commit
d7b10a57d1
|
@ -121,6 +121,9 @@ the option applies only to unrecognized or malformed escape sequences.
|
|||
tests such as (?(VERSION>=0)...) when the version test was true. Incorrect
|
||||
processing or a crash could result.
|
||||
|
||||
30. When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group
|
||||
names, as Perl does.
|
||||
|
||||
|
||||
Version 10.32 10-September-2018
|
||||
-------------------------------
|
||||
|
|
|
@ -27,8 +27,8 @@ DESCRIPTION
|
|||
</b><br>
|
||||
<P>
|
||||
This convenience function finds, for a compiled pattern, the first and last
|
||||
entries for a given name in the table that translates capturing parenthesis
|
||||
names into numbers.
|
||||
entries for a given name in the table that translates capture group names into
|
||||
numbers.
|
||||
<pre>
|
||||
<i>code</i> Compiled regular expression
|
||||
<i>name</i> Name whose entries required
|
||||
|
|
|
@ -49,7 +49,7 @@ please consult the man page, in case the conversion went wrong.
|
|||
<li><a name="TOC34" href="#SEC34">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC35" href="#SEC35">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
||||
<li><a name="TOC36" href="#SEC36">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
||||
<li><a name="TOC37" href="#SEC37">DUPLICATE SUBPATTERN NAMES</a>
|
||||
<li><a name="TOC37" href="#SEC37">DUPLICATE CAPTURE GROUP NAMES</a>
|
||||
<li><a name="TOC38" href="#SEC38">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
||||
<li><a name="TOC39" href="#SEC39">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
||||
<li><a name="TOC40" href="#SEC40">SEE ALSO</a>
|
||||
|
@ -1490,10 +1490,10 @@ independent of the setting of PCRE2_DOTALL.
|
|||
<pre>
|
||||
PCRE2_DUPNAMES
|
||||
</pre>
|
||||
If this bit is set, names used to identify capturing subpatterns need not be
|
||||
unique. This can be helpful for certain types of pattern when it is known that
|
||||
only one instance of the named subpattern can ever be matched. There are more
|
||||
details of named subpatterns below; see also the
|
||||
If this bit is set, names used to identify capture groups need not be unique.
|
||||
This can be helpful for certain types of pattern when it is known that only one
|
||||
instance of the named group can ever be matched. There are more details of
|
||||
named capture groups below; see also the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
documentation.
|
||||
<pre>
|
||||
|
@ -1526,11 +1526,11 @@ the end of the subject.
|
|||
If this bit is set, most white space characters in the pattern are totally
|
||||
ignored except when escaped or inside a character class. However, white space
|
||||
is not allowed within sequences such as (?> that introduce various
|
||||
parenthesized subpatterns, nor within numerical quantifiers such as {1,3}.
|
||||
Ignorable white space is permitted between an item and a following quantifier
|
||||
and between a quantifier and a following + that indicates possessiveness.
|
||||
PCRE2_EXTENDED is equivalent to Perl's /x option, and it can be changed within
|
||||
a pattern by a (?x) option setting.
|
||||
parenthesized groups, nor within numerical quantifiers such as {1,3}. Ignorable
|
||||
white space is permitted between an item and a following quantifier and between
|
||||
a quantifier and a following + that indicates possessiveness. PCRE2_EXTENDED is
|
||||
equivalent to Perl's /x option, and it can be changed within a pattern by a
|
||||
(?x) option setting.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as
|
||||
|
@ -1606,7 +1606,7 @@ error.
|
|||
<pre>
|
||||
PCRE2_MATCH_UNSET_BACKREF
|
||||
</pre>
|
||||
If this option is set, a backreference to an unset subpattern group matches an
|
||||
If this option is set, a backreference to an unset capture group matches an
|
||||
empty string (by default this causes the current matching alternative to fail).
|
||||
A pattern such as (\1)(a) succeeds when this option is set (assuming it can
|
||||
find an "a" in the subject), whereas it fails by default, for Perl
|
||||
|
@ -1668,7 +1668,7 @@ If this option is set, it disables the use of numbered capturing parentheses in
|
|||
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
||||
were followed by ?: but named parentheses can still be used for capturing (and
|
||||
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
||||
Note that, when this option is set, references to capturing groups
|
||||
Note that, when this option is set, references to capture groups
|
||||
(backreferences or recursion/subroutine calls) may only refer to named groups,
|
||||
though the reference can be by name or by number.
|
||||
<pre>
|
||||
|
@ -1687,7 +1687,7 @@ purposes.
|
|||
If this option is set, it disables an optimization that is applied when .* is
|
||||
the first significant item in a top-level branch of a pattern, and all the
|
||||
other branches also start with .* or with \A or \G or ^. The optimization is
|
||||
automatically disabled for .* if it is inside an atomic group or a capturing
|
||||
automatically disabled for .* if it is inside an atomic group or a capture
|
||||
group that is the subject of a backreference, or if the pattern contains
|
||||
(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
|
||||
automatically anchored if PCRE2_DOTALL is set for all the .* items and
|
||||
|
@ -2066,7 +2066,7 @@ When .* is the first significant item, anchoring is possible only when all the
|
|||
following are true:
|
||||
<pre>
|
||||
.* is not in an atomic group
|
||||
.* is not in a capturing group that is the subject of a backreference
|
||||
.* is not in a capture group that is the subject of a backreference
|
||||
PCRE2_DOTALL is in force for .*
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
||||
|
@ -2077,12 +2077,12 @@ options returned for PCRE2_INFO_ALLOPTIONS.
|
|||
PCRE2_INFO_BACKREFMAX
|
||||
</pre>
|
||||
Return the number of the highest backreference in the pattern. The third
|
||||
argument should point to an <b>uint32_t</b> variable. Named subpatterns acquire
|
||||
numbers as well as names, and these count towards the highest backreference.
|
||||
Backreferences such as \4 or \g{12} match the captured characters of the
|
||||
given group, but in addition, the check that a capturing group is set in a
|
||||
conditional subpattern such as (?(3)a|b) is also a backreference. Zero is
|
||||
returned if there are no backreferences.
|
||||
argument should point to an <b>uint32_t</b> variable. Named capture groups
|
||||
acquire numbers as well as names, and these count towards the highest
|
||||
backreference. Backreferences such as \4 or \g{12} match the captured
|
||||
characters of the given group, but in addition, the check that a capture
|
||||
group is set in a conditional group such as (?(3)a|b) is also a backreference.
|
||||
Zero is returned if there are no backreferences.
|
||||
<pre>
|
||||
PCRE2_INFO_BSR
|
||||
</pre>
|
||||
|
@ -2093,9 +2093,9 @@ that \R matches only CR, LF, or CRLF.
|
|||
<pre>
|
||||
PCRE2_INFO_CAPTURECOUNT
|
||||
</pre>
|
||||
Return the highest capturing subpattern number in the pattern. In patterns
|
||||
where (?| is not used, this is also the total number of capturing subpatterns.
|
||||
The third argument should point to an <b>uint32_t</b> variable.
|
||||
Return the highest capture group number in the pattern. In patterns where (?|
|
||||
is not used, this is also the total number of capture groups. The third
|
||||
argument should point to an <b>uint32_t</b> variable.
|
||||
<pre>
|
||||
PCRE2_INFO_DEPTHLIMIT
|
||||
</pre>
|
||||
|
@ -2143,7 +2143,7 @@ Return the size (in bytes) of the data frames that are used to remember
|
|||
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
|
||||
without the use of JIT. The third argument should point to a <b>size_t</b>
|
||||
variable. The frame size depends on the number of capturing parentheses in the
|
||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||
pattern. Each additional capture group adds two PCRE2_SIZE variables.
|
||||
<pre>
|
||||
PCRE2_INFO_HASBACKSLASHC
|
||||
</pre>
|
||||
|
@ -2267,20 +2267,20 @@ the parenthesis number. The rest of the entry is the corresponding name, zero
|
|||
terminated.
|
||||
</P>
|
||||
<P>
|
||||
The names are in alphabetical order. If (?| is used to create multiple groups
|
||||
with the same number, as described in the
|
||||
<a href="pcre2pattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>
|
||||
The names are in alphabetical order. If (?| is used to create multiple capture
|
||||
groups with the same number, as described in the
|
||||
<a href="pcre2pattern.html#dupgroupnumber">section on duplicate group numbers</a>
|
||||
in the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
page, the groups may be given the same name, but there is only one entry in the
|
||||
table. Different names for groups of the same number are not permitted.
|
||||
</P>
|
||||
<P>
|
||||
Duplicate names for subpatterns with different numbers are permitted, but only
|
||||
if PCRE2_DUPNAMES is set. They appear in the table in the order in which they
|
||||
were found in the pattern. In the absence of (?| this is the order of
|
||||
Duplicate names for capture groups with different numbers are permitted, but
|
||||
only if PCRE2_DUPNAMES is set. They appear in the table in the order in which
|
||||
they were found in the pattern. In the absence of (?| this is the order of
|
||||
increasing number; when (?| is used this is not necessarily the case because
|
||||
later subpatterns may have lower numbers.
|
||||
later capture groups may have lower numbers.
|
||||
</P>
|
||||
<P>
|
||||
As a simple example of the name/number table, consider the following pattern
|
||||
|
@ -2289,16 +2289,16 @@ space - including newlines - is ignored):
|
|||
<pre>
|
||||
(?<date> (?<year>(\d\d)?\d\d) - (?<month>\d\d) - (?<day>\d\d) )
|
||||
</pre>
|
||||
There are four named subpatterns, so the table has four entries, and each entry
|
||||
in the table is eight bytes long. The table is as follows, with non-printing
|
||||
bytes shows in hexadecimal, and undefined bytes shown as ??:
|
||||
There are four named capture groups, so the table has four entries, and each
|
||||
entry in the table is eight bytes long. The table is as follows, with
|
||||
non-printing bytes shows in hexadecimal, and undefined bytes shown as ??:
|
||||
<pre>
|
||||
00 01 d a t e 00 ??
|
||||
00 05 d a y 00 ?? ??
|
||||
00 04 m o n t h 00
|
||||
00 02 y e a r 00 ??
|
||||
</pre>
|
||||
When writing code to extract data from named subpatterns using the
|
||||
When writing code to extract data from named capture groups using the
|
||||
name-to-number map, remember that the length of the entries is likely to be
|
||||
different for each compiled pattern.
|
||||
<pre>
|
||||
|
@ -2741,12 +2741,12 @@ valid newline sequence and explicit \r or \n escapes appear in the pattern.
|
|||
In general, a pattern matches a certain portion of the subject, and in
|
||||
addition, further substrings from the subject may be picked out by
|
||||
parenthesized parts of the pattern. Following the usage in Jeffrey Friedl's
|
||||
book, this is called "capturing" in what follows, and the phrase "capturing
|
||||
subpattern" or "capturing group" is used for a fragment of a pattern that picks
|
||||
out a substring. PCRE2 supports several other kinds of parenthesized subpattern
|
||||
that do not cause substrings to be captured. The <b>pcre2_pattern_info()</b>
|
||||
function can be used to find out how many capturing subpatterns there are in a
|
||||
compiled pattern.
|
||||
book, this is called "capturing" in what follows, and the phrase "capture
|
||||
group" (Perl terminology) is used for a fragment of a pattern that picks out a
|
||||
substring. PCRE2 supports several other kinds of parenthesized group that do
|
||||
not cause substrings to be captured. The <b>pcre2_pattern_info()</b> function
|
||||
can be used to find out how many capture groups there are in a compiled
|
||||
pattern.
|
||||
</P>
|
||||
<P>
|
||||
You can use auxiliary functions for accessing captured substrings
|
||||
|
@ -2795,9 +2795,8 @@ For example, if the pattern (?=ab\K) is matched against "ab", the start and
|
|||
end offset values for the match are 2 and 0.
|
||||
</P>
|
||||
<P>
|
||||
If a capturing subpattern group is matched repeatedly within a single match
|
||||
operation, it is the last portion of the subject that it matched that is
|
||||
returned.
|
||||
If a capture group is matched repeatedly within a single match operation, it is
|
||||
the last portion of the subject that it matched that is returned.
|
||||
</P>
|
||||
<P>
|
||||
If the ovector is too small to hold all the captured substring offsets, as much
|
||||
|
@ -2806,21 +2805,20 @@ substrings are not of interest, <b>pcre2_match()</b> may be called with a match
|
|||
data block whose ovector is of minimum length (that is, one pair).
|
||||
</P>
|
||||
<P>
|
||||
It is possible for capturing subpattern number <i>n+1</i> to match some part of
|
||||
the subject when subpattern <i>n</i> has not been used at all. For example, if
|
||||
the string "abc" is matched against the pattern (a|(z))(bc) the return from the
|
||||
function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this
|
||||
happens, both values in the offset pairs corresponding to unused subpatterns
|
||||
are set to PCRE2_UNSET.
|
||||
It is possible for capture group number <i>n+1</i> to match some part of the
|
||||
subject when group <i>n</i> has not been used at all. For example, if the string
|
||||
"abc" is matched against the pattern (a|(z))(bc) the return from the function
|
||||
is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both
|
||||
values in the offset pairs corresponding to unused groups are set to
|
||||
PCRE2_UNSET.
|
||||
</P>
|
||||
<P>
|
||||
Offset values that correspond to unused subpatterns at the end of the
|
||||
expression are also set to PCRE2_UNSET. For example, if the string "abc" is
|
||||
matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched.
|
||||
The return from the function is 2, because the highest used capturing
|
||||
subpattern number is 1. The offsets for for the second and third capturing
|
||||
subpatterns (assuming the vector is large enough, of course) are set to
|
||||
PCRE2_UNSET.
|
||||
Offset values that correspond to unused groups at the end of the expression are
|
||||
also set to PCRE2_UNSET. For example, if the string "abc" is matched against
|
||||
the pattern (abc)(x(yz)?)? groups 2 and 3 are not matched. The return from the
|
||||
function is 2, because the highest used capture group number is 1. The offsets
|
||||
for for the second and third capture groupss (assuming the vector is large
|
||||
enough, of course) are set to PCRE2_UNSET.
|
||||
</P>
|
||||
<P>
|
||||
Elements in the ovector that do not correspond to capturing parentheses in the
|
||||
|
@ -2993,11 +2991,11 @@ as NULL.
|
|||
</pre>
|
||||
This error is returned when <b>pcre2_match()</b> detects a recursion loop within
|
||||
the pattern. Specifically, it means that either the whole pattern or a
|
||||
subpattern has been called recursively for the second time at the same position
|
||||
in the subject string. Some simple patterns that might do this are detected and
|
||||
faulted at compile time, but more complicated cases, in particular mutual
|
||||
recursions between two different subpatterns, cannot be detected until matching
|
||||
is attempted.
|
||||
capture group has been called recursively for the second time at the same
|
||||
position in the subject string. Some simple patterns that might do this are
|
||||
detected and faulted at compile time, but more complicated cases, in particular
|
||||
mutual recursions between two different groups, cannot be detected until
|
||||
matching is attempted.
|
||||
<a name="geterrormessage"></a></P>
|
||||
<br><a name="SEC32" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
|
||||
<P>
|
||||
|
@ -3074,7 +3072,7 @@ The <b>pcre2_substring_copy_bynumber()</b> function copies a captured substring
|
|||
into a supplied buffer, whereas <b>pcre2_substring_get_bynumber()</b> copies it
|
||||
into new memory, obtained using the same memory allocation function that was
|
||||
used for the match data block. The first two arguments of these functions are a
|
||||
pointer to the match data block and a capturing group number.
|
||||
pointer to the match data block and a capture group number.
|
||||
</P>
|
||||
<P>
|
||||
The final arguments of <b>pcre2_substring_copy_bynumber()</b> are a pointer to
|
||||
|
@ -3150,9 +3148,9 @@ calling <b>pcre2_substring_list_free()</b>.
|
|||
</P>
|
||||
<P>
|
||||
If this function encounters a substring that is unset, which can happen when
|
||||
capturing subpattern number <i>n+1</i> matches some part of the subject, but
|
||||
subpattern <i>n</i> has not been used at all, it returns an empty string. This
|
||||
can be distinguished from a genuine zero-length substring by inspecting the
|
||||
capture group number <i>n+1</i> matches some part of the subject, but group
|
||||
<i>n</i> has not been used at all, it returns an empty string. This can be
|
||||
distinguished from a genuine zero-length substring by inspecting the
|
||||
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
||||
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
|
||||
<a name="extractbyname"></a></P>
|
||||
|
@ -3182,21 +3180,21 @@ For example, for this pattern:
|
|||
<pre>
|
||||
(a+)b(?<xxx>\d+)...
|
||||
</pre>
|
||||
the number of the subpattern called "xxx" is 2. If the name is known to be
|
||||
the number of the capture group called "xxx" is 2. If the name is known to be
|
||||
unique (PCRE2_DUPNAMES was not set), you can find the number from the name by
|
||||
calling <b>pcre2_substring_number_from_name()</b>. The first argument is the
|
||||
compiled pattern, and the second is the name. The yield of the function is the
|
||||
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
|
||||
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
|
||||
that name. Given the number, you can extract the substring directly from the
|
||||
ovector, or use one of the "bynumber" functions described above.
|
||||
group number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or
|
||||
PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name.
|
||||
Given the number, you can extract the substring directly from the ovector, or
|
||||
use one of the "bynumber" functions described above.
|
||||
</P>
|
||||
<P>
|
||||
For convenience, there are also "byname" functions that correspond to the
|
||||
"bynumber" functions, the only difference being that the second argument is a
|
||||
name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
|
||||
names, these functions scan all the groups with the given name, and return the
|
||||
first named string that is set.
|
||||
captured substring from the first named group that is set.
|
||||
</P>
|
||||
<P>
|
||||
If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
|
||||
|
@ -3207,13 +3205,13 @@ set, PCRE2_ERROR_UNSET is returned.
|
|||
</P>
|
||||
<P>
|
||||
<b>Warning:</b> If the pattern uses the (?| feature to set up multiple
|
||||
subpatterns with the same number, as described in the
|
||||
<a href="pcre2pattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>
|
||||
capture groups with the same number, as described in the
|
||||
<a href="pcre2pattern.html#dupgroupnumber">section on duplicate group numbers</a>
|
||||
in the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
page, you cannot use names to distinguish the different subpatterns, because
|
||||
page, you cannot use names to distinguish the different capture groups, because
|
||||
names are not included in the compiled code. The matching process uses only
|
||||
numbers. For this reason, the use of different names for subpatterns of the
|
||||
numbers. For this reason, the use of different names for groups with the
|
||||
same number causes an error at compile time.
|
||||
<a name="substitutions"></a></P>
|
||||
<br><a name="SEC36" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
||||
|
@ -3276,7 +3274,7 @@ length is in code units, not bytes.
|
|||
In the replacement string, which is interpreted as a UTF string in UTF mode,
|
||||
and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
|
||||
dollar character is an escape character that can specify the insertion of
|
||||
characters from capturing groups or names from (*MARK) or other control verbs
|
||||
characters from capture groups or names from (*MARK) or other control verbs
|
||||
in the pattern. The following forms are always recognized:
|
||||
<pre>
|
||||
$$ insert a dollar character
|
||||
|
@ -3345,13 +3343,13 @@ efficient to allocate a large buffer and free the excess afterwards, instead of
|
|||
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
|
||||
</P>
|
||||
<P>
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capturing groups that do
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do
|
||||
not appear in the pattern to be treated as unset groups. This option should be
|
||||
used with care, because it means that a typo in a group name or number no
|
||||
longer causes the PCRE2_ERROR_NOSUBSTRING error.
|
||||
</P>
|
||||
<P>
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups (including unknown
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including unknown
|
||||
groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty
|
||||
strings when inserted as described above. If this option is not set, an attempt
|
||||
to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does
|
||||
|
@ -3379,7 +3377,7 @@ terminating a \Q quoted sequence) reverts to no case forcing. The sequences
|
|||
\u and \l force the next character (if it is a letter) to upper or lower
|
||||
case, respectively, and then the state automatically reverts to no case
|
||||
forcing. Case forcing applies to all inserted characters, including those from
|
||||
captured groups and letters within \Q...\E quoted sequences.
|
||||
capture groups and letters within \Q...\E quoted sequences.
|
||||
</P>
|
||||
<P>
|
||||
Note that case forcing sequences such as \U...\E do not nest. For example,
|
||||
|
@ -3388,7 +3386,8 @@ effect.
|
|||
</P>
|
||||
<P>
|
||||
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
||||
flexibility to group substitution. The syntax is similar to that used by Bash:
|
||||
flexibility to capture group substitution. The syntax is similar to that used
|
||||
by Bash:
|
||||
<pre>
|
||||
${<n>:-<string>}
|
||||
${<n>:+<string1>:<string2>}
|
||||
|
@ -3518,20 +3517,21 @@ PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the
|
|||
output and the call to <b>pcre2_substitute()</b> exits, returning the number of
|
||||
matches so far.
|
||||
</P>
|
||||
<br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
||||
<br><a name="SEC37" href="#TOC1">DUPLICATE CAPTURE GROUP NAMES</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
|
||||
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
When a pattern is compiled with the PCRE2_DUPNAMES option, names for
|
||||
subpatterns are not required to be unique. Duplicate names are always allowed
|
||||
for subpatterns with the same number, created by using the (?| feature. Indeed,
|
||||
if such subpatterns are named, they are required to use the same names.
|
||||
When a pattern is compiled with the PCRE2_DUPNAMES option, names for capture
|
||||
groups are not required to be unique. Duplicate names are always allowed for
|
||||
groups with the same number, created by using the (?| feature. Indeed, if such
|
||||
groups are named, they are required to use the same names.
|
||||
</P>
|
||||
<P>
|
||||
Normally, patterns with duplicate names are such that in any one match, only
|
||||
one of the named subpatterns participates. An example is shown in the
|
||||
Normally, patterns that use duplicate names are such that in any one match,
|
||||
only one of each set of identically-named groups participates. An example is
|
||||
shown in the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
|
@ -3703,9 +3703,8 @@ the three matched strings are
|
|||
On success, the yield of the function is a number greater than zero, which is
|
||||
the number of matched substrings. The offsets of the substrings are returned in
|
||||
the ovector, and can be extracted by number in the same way as for
|
||||
<b>pcre2_match()</b>, but the numbers bear no relation to any capturing groups
|
||||
that may exist in the pattern, because DFA matching does not support group
|
||||
capture.
|
||||
<b>pcre2_match()</b>, but the numbers bear no relation to any capture groups
|
||||
that may exist in the pattern, because DFA matching does not support capturing.
|
||||
</P>
|
||||
<P>
|
||||
Calls to the convenience functions that extract substrings by name
|
||||
|
@ -3747,7 +3746,7 @@ a backreference.
|
|||
</pre>
|
||||
This return is given if <b>pcre2_dfa_match()</b> encounters a condition item
|
||||
that uses a backreference for the condition, or a test for recursion in a
|
||||
specific group. These are not supported.
|
||||
specific capture group. These are not supported.
|
||||
<pre>
|
||||
PCRE2_ERROR_DFA_WSSIZE
|
||||
</pre>
|
||||
|
@ -3756,9 +3755,9 @@ This return is given if <b>pcre2_dfa_match()</b> runs out of space in the
|
|||
<pre>
|
||||
PCRE2_ERROR_DFA_RECURSE
|
||||
</pre>
|
||||
When a recursive subpattern is processed, the matching function calls itself
|
||||
recursively, using private memory for the ovector and <i>workspace</i>. This
|
||||
error is given if the internal ovector is not large enough. This should be
|
||||
When a recursion or subroutine call is processed, the matching function calls
|
||||
itself recursively, using private memory for the ovector and <i>workspace</i>.
|
||||
This error is given if the internal ovector is not large enough. This should be
|
||||
extremely rare, as a vector of size 1000 is used.
|
||||
<pre>
|
||||
PCRE2_ERROR_DFA_BADRESTART
|
||||
|
@ -3785,7 +3784,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 04 January 2019
|
||||
Last updated: 04 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -151,7 +151,7 @@ branch, automatic anchoring occurs if all branches are anchorable.
|
|||
</P>
|
||||
<P>
|
||||
This optimization is disabled, however, if .* is in an atomic group or if there
|
||||
is a backreference to the capturing group in which it appears. It is also
|
||||
is a backreference to the capture group in which it appears. It is also
|
||||
disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
|
||||
callouts does not affect it.
|
||||
</P>
|
||||
|
@ -354,8 +354,8 @@ callout before an assertion such as (?=ab) the length is 3. For an an
|
|||
alternation bar or a closing parenthesis, the length is one, unless a closing
|
||||
parenthesis is followed by a quantifier, in which case its length is included.
|
||||
(This changed in release 10.23. In earlier releases, before an opening
|
||||
parenthesis the length was that of the entire subpattern, and before an
|
||||
alternation bar or a closing parenthesis the length was zero.)
|
||||
parenthesis the length was that of the entire group, and before an alternation
|
||||
bar or a closing parenthesis the length was zero.)
|
||||
</P>
|
||||
<P>
|
||||
The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
|
||||
|
@ -471,9 +471,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 17 September 2018
|
||||
Last updated: 03 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -36,10 +36,9 @@ assertion just once). Perl allows some repeat quantifiers on other assertions,
|
|||
for example, \b* (but not \b{3}), but these do not seem to have any use.
|
||||
</P>
|
||||
<P>
|
||||
3. Capturing subpatterns that occur inside negative lookaround assertions are
|
||||
counted, but their entries in the offsets vector are set only when a negative
|
||||
assertion is a condition that has a matching branch (that is, the condition is
|
||||
false).
|
||||
3. Capture groups that occur inside negative lookaround assertions are counted,
|
||||
but their entries in the offsets vector are set only when a negative assertion
|
||||
is a condition that has a matching branch (that is, the condition is false).
|
||||
</P>
|
||||
<P>
|
||||
4. The following Perl escape sequences are not supported: \F, \l, \L, \u,
|
||||
|
@ -94,13 +93,13 @@ to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
|
|||
into subroutine calls is now supported, as in Perl.
|
||||
</P>
|
||||
<P>
|
||||
9. If any of the backtracking control verbs are used in a subpattern that is
|
||||
called as a subroutine (whether or not recursively), their effect is confined
|
||||
to that subpattern; it does not extend to the surrounding pattern. This is not
|
||||
always the case in Perl. In particular, if (*THEN) is present in a group that
|
||||
is called as a subroutine, its action is limited to that group, even if the
|
||||
group does not contain any | characters. Note that such subpatterns are
|
||||
processed as anchored at the point where they are tested.
|
||||
9. If any of the backtracking control verbs are used in a group that is called
|
||||
as a subroutine (whether or not recursively), their effect is confined to that
|
||||
group; it does not extend to the surrounding pattern. This is not always the
|
||||
case in Perl. In particular, if (*THEN) is present in a group that is called as
|
||||
a subroutine, its action is limited to that group, even if the group does not
|
||||
contain any | characters. Note that such groups are processed as anchored
|
||||
at the point where they are tested.
|
||||
</P>
|
||||
<P>
|
||||
10. If a pattern contains more than one backtracking control verb, the first
|
||||
|
@ -120,22 +119,21 @@ the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
|
|||
"b".
|
||||
</P>
|
||||
<P>
|
||||
13. PCRE2's handling of duplicate subpattern numbers and duplicate subpattern
|
||||
names is not as general as Perl's. This is a consequence of the fact the PCRE2
|
||||
works internally just with numbers, using an external table to translate
|
||||
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b>B),
|
||||
where the two capturing parentheses have the same number but different names,
|
||||
is not supported, and causes an error at compile time. If it were allowed, it
|
||||
would not be possible to distinguish which parentheses matched, because both
|
||||
names map to capturing subpattern number 1. To avoid this confusing situation,
|
||||
an error is given at compile time.
|
||||
13. PCRE2's handling of duplicate capture group numbers and names is not as
|
||||
general as Perl's. This is a consequence of the fact the PCRE2 works internally
|
||||
just with numbers, using an external table to translate between numbers and
|
||||
names. In particular, a pattern such as (?|(?<a>A)|(?<b>B), where the two
|
||||
capture groups have the same number but different names, is not supported, and
|
||||
causes an error at compile time. If it were allowed, it would not be possible
|
||||
to distinguish which group matched, because both names map to capture group
|
||||
number 1. To avoid this confusing situation, an error is given at compile time.
|
||||
</P>
|
||||
<P>
|
||||
14. Perl used to recognize comments in some places that PCRE2 does not, for
|
||||
example, between the ( and ? at the start of a subpattern. If the /x modifier
|
||||
is set, Perl allowed white space between ( and ? though the latest Perls give
|
||||
an error (for a while it was just deprecated). There may still be some cases
|
||||
where Perl behaves differently.
|
||||
example, between the ( and ? at the start of a group. If the /x modifier is
|
||||
set, Perl allowed white space between ( and ? though the latest Perls give an
|
||||
error (for a while it was just deprecated). There may still be some cases where
|
||||
Perl behaves differently.
|
||||
</P>
|
||||
<P>
|
||||
15. Perl, when in warning mode, gives warnings for character classes such as
|
||||
|
@ -235,9 +233,9 @@ Cambridge, England.
|
|||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 28 July 2018
|
||||
Last updated: 03 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -50,17 +50,17 @@ All values in repeating quantifiers must be less than 65536.
|
|||
The maximum length of a lookbehind assertion is 65535 characters.
|
||||
</P>
|
||||
<P>
|
||||
There is no limit to the number of parenthesized subpatterns, but there can be
|
||||
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
||||
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
||||
order to limit the amount of system stack used at compile time. The default
|
||||
limit can be specified when PCRE2 is built; if not, the default is set to 250.
|
||||
An application can change this limit by calling pcre2_set_parens_nest_limit()
|
||||
to set the limit in a compile context.
|
||||
There is no limit to the number of parenthesized groups, but there can be no
|
||||
more than 65535 capture groups, and there is a limit to the depth of nesting of
|
||||
parenthesized subpatterns of all kinds. This is imposed in order to limit the
|
||||
amount of system stack used at compile time. The default limit can be specified
|
||||
when PCRE2 is built; if not, the default is set to 250. An application can
|
||||
change this limit by calling pcre2_set_parens_nest_limit() to set the limit in
|
||||
a compile context.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of name for a named subpattern is 32 code units, and the
|
||||
maximum number of named subpatterns is 10000.
|
||||
The maximum length of name for a named capture group is 32 code units, and the
|
||||
maximum number of such groups is 10000.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
|
||||
|
@ -86,9 +86,9 @@ Cambridge, England.
|
|||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 30 March 2017
|
||||
Last updated: 02 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -31,9 +31,9 @@ of them.
|
|||
Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
|
||||
so that most simple patterns do not use much memory for storing the compiled
|
||||
version. However, there is one case where the memory usage of a compiled
|
||||
pattern can be unexpectedly large. If a parenthesized subpattern has a
|
||||
quantifier with a minimum greater than 1 and/or a limited maximum, the whole
|
||||
subpattern is repeated in the compiled code. For example, the pattern
|
||||
pattern can be unexpectedly large. If a parenthesized group has a quantifier
|
||||
with a minimum greater than 1 and/or a limited maximum, the whole group is
|
||||
repeated in the compiled code. For example, the pattern
|
||||
<pre>
|
||||
(abc|def){2,4}
|
||||
</pre>
|
||||
|
@ -252,9 +252,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 25 April 2018
|
||||
Last updated: 03 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -424,20 +424,23 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
|||
<br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(...) capturing group
|
||||
(?<name>...) named capturing group (Perl)
|
||||
(?'name'...) named capturing group (Perl)
|
||||
(?P<name>...) named capturing group (Python)
|
||||
(?:...) non-capturing group
|
||||
(?|...) non-capturing group; reset group numbers for
|
||||
capturing groups in each alternative
|
||||
</PRE>
|
||||
(...) capture group
|
||||
(?<name>...) named capture group (Perl)
|
||||
(?'name'...) named capture group (Perl)
|
||||
(?P<name>...) named capture group (Python)
|
||||
(?:...) non-capture group
|
||||
(?|...) non-capture group; reset group numbers for
|
||||
capture groups in each alternative
|
||||
</pre>
|
||||
In non-UTF modes, names may contain underscores and ASCII letters and digits;
|
||||
in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
|
||||
both cases, a name must not start with a digit.
|
||||
</P>
|
||||
<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
|
||||
<P>
|
||||
<pre>
|
||||
(?>...) atomic, non-capturing group
|
||||
(*atomic:...) atomic, non-capturing group
|
||||
(?>...) atomic non-capture group
|
||||
(*atomic:...) atomic non-capture group
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
|
||||
|
@ -465,7 +468,7 @@ of the group.
|
|||
Unsetting x or xx unsets both. Several options may be set at once, and a
|
||||
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
|
||||
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
|
||||
(?^in). An option setting may appear at the start of a non-capturing group, for
|
||||
(?^in). An option setting may appear at the start of a non-capture group, for
|
||||
example (?i:...).
|
||||
</P>
|
||||
<P>
|
||||
|
@ -565,19 +568,19 @@ Each top-level branch of a lookbehind must be of a fixed length.
|
|||
<P>
|
||||
<pre>
|
||||
(?R) recurse whole pattern
|
||||
(?n) call subpattern by absolute number
|
||||
(?+n) call subpattern by relative number
|
||||
(?-n) call subpattern by relative number
|
||||
(?&name) call subpattern by name (Perl)
|
||||
(?P>name) call subpattern by name (Python)
|
||||
\g<name> call subpattern by name (Oniguruma)
|
||||
\g'name' call subpattern by name (Oniguruma)
|
||||
\g<n> call subpattern by absolute number (Oniguruma)
|
||||
\g'n' call subpattern by absolute number (Oniguruma)
|
||||
\g<+n> call subpattern by relative number (PCRE2 extension)
|
||||
\g'+n' call subpattern by relative number (PCRE2 extension)
|
||||
\g<-n> call subpattern by relative number (PCRE2 extension)
|
||||
\g'-n' call subpattern by relative number (PCRE2 extension)
|
||||
(?n) call subroutine by absolute number
|
||||
(?+n) call subroutine by relative number
|
||||
(?-n) call subroutine by relative number
|
||||
(?&name) call subroutine by name (Perl)
|
||||
(?P>name) call subroutine by name (Python)
|
||||
\g<name> call subroutine by name (Oniguruma)
|
||||
\g'name' call subroutine by name (Oniguruma)
|
||||
\g<n> call subroutine by absolute number (Oniguruma)
|
||||
\g'n' call subroutine by absolute number (Oniguruma)
|
||||
\g<+n> call subroutine by relative number (PCRE2 extension)
|
||||
\g'+n' call subroutine by relative number (PCRE2 extension)
|
||||
\g<-n> call subroutine by relative number (PCRE2 extension)
|
||||
\g'-n' call subroutine by relative number (PCRE2 extension)
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC23" href="#TOC1">CONDITIONAL PATTERNS</a><br>
|
||||
|
@ -595,7 +598,7 @@ Each top-level branch of a lookbehind must be of a fixed length.
|
|||
(?(R) overall recursion condition
|
||||
(?(Rn) specific numbered group recursion condition
|
||||
(?(R&name) specific named group recursion condition
|
||||
(?(DEFINE) define subpattern for reference
|
||||
(?(DEFINE) define groups for reference
|
||||
(?(VERSION[>]=n.m) test PCRE2 version
|
||||
(?(assert) assertion condition
|
||||
</pre>
|
||||
|
@ -657,9 +660,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC28" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 10 October 2018
|
||||
Last updated: 03 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -716,14 +716,14 @@ information is obtained from the <b>pcre2_pattern_info()</b> function. Here are
|
|||
some typical examples:
|
||||
<pre>
|
||||
re> /(?i)(^a|^b)/m,info
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Compile options: multiline
|
||||
Overall options: caseless multiline
|
||||
First code unit at start or follows newline
|
||||
Subject length lower bound = 1
|
||||
|
||||
re> /(?i)abc/info
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: <none>
|
||||
Overall options: caseless
|
||||
First code unit = 'a' (caseless)
|
||||
|
@ -1353,8 +1353,8 @@ Testing substring extraction functions
|
|||
<P>
|
||||
The <b>copy</b> and <b>get</b> modifiers can be used to test the
|
||||
<b>pcre2_substring_copy_xxx()</b> and <b>pcre2_substring_get_xxx()</b> functions.
|
||||
They can be given more than once, and each can specify a group name or number,
|
||||
for example:
|
||||
They can be given more than once, and each can specify a capture group name or
|
||||
number, for example:
|
||||
<pre>
|
||||
abcd\=copy=1,copy=3,get=G1
|
||||
</pre>
|
||||
|
@ -2075,9 +2075,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 12 November 2018
|
||||
Last updated: 03 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
|
@ -38,10 +38,11 @@ UNICODE PROPERTY SUPPORT
|
|||
</b><br>
|
||||
<P>
|
||||
When PCRE2 is built with Unicode support, the escape sequences \p{..},
|
||||
\P{..}, and \X can be used. The Unicode properties that can be tested are
|
||||
limited to the general category properties such as Lu for an upper case letter
|
||||
or Nd for a decimal number, the Unicode script names such as Arabic or Han, and
|
||||
the derived properties Any and L&. Full lists are given in the
|
||||
\P{..}, and \X can be used. This is not dependent on the PCRE2_UTF setting.
|
||||
The Unicode properties that can be tested are limited to the general category
|
||||
properties such as Lu for an upper case letter or Nd for a decimal number, the
|
||||
Unicode script names such as Arabic or Han, and the derived properties Any and
|
||||
L&. Full lists are given in the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
and
|
||||
<a href="pcre2syntax.html"><b>pcre2syntax</b></a>
|
||||
|
@ -73,11 +74,17 @@ In UTF modes, the dot metacharacter matches one UTF character instead of a
|
|||
single code unit.
|
||||
</P>
|
||||
<P>
|
||||
In UTF modes, capture group names are not restricted to ASCII, and may contain
|
||||
any Unicode letters and decimal digits, as well as underscore.
|
||||
</P>
|
||||
<P>
|
||||
The escape sequence \C can be used to match a single code unit in a UTF mode,
|
||||
but its use can lead to some strange effects because it breaks up multi-unit
|
||||
characters (see the description of \C in the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
documentation).
|
||||
documentation). For this reason, there is a build-time option that disables
|
||||
support for \C completely. There is also a less draconian compile-time option
|
||||
for locking out the use of \C when a pattern is compiled.
|
||||
</P>
|
||||
<P>
|
||||
The use of \C is not supported by the alternative matching function
|
||||
|
@ -410,9 +417,9 @@ Cambridge, England.
|
|||
REVISION
|
||||
</b><br>
|
||||
<P>
|
||||
Last updated: 12 October 2018
|
||||
Last updated: 03 February 2019
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
3281
doc/pcre2.txt
3281
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_SUBSTRING_NAMETABLE_SCAN 3 "21 October 2014" "PCRE2 10.00"
|
||||
.TH PCRE2_SUBSTRING_NAMETABLE_SCAN 3 "03 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -15,8 +15,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
|||
.rs
|
||||
.sp
|
||||
This convenience function finds, for a compiled pattern, the first and last
|
||||
entries for a given name in the table that translates capturing parenthesis
|
||||
names into numbers.
|
||||
entries for a given name in the table that translates capture group names into
|
||||
numbers.
|
||||
.sp
|
||||
\fIcode\fP Compiled regular expression
|
||||
\fIname\fP Name whose entries required
|
||||
|
|
195
doc/pcre2api.3
195
doc/pcre2api.3
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "04 January 2019" "PCRE2 10.33"
|
||||
.TH PCRE2API 3 "04 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -1429,10 +1429,10 @@ independent of the setting of PCRE2_DOTALL.
|
|||
.sp
|
||||
PCRE2_DUPNAMES
|
||||
.sp
|
||||
If this bit is set, names used to identify capturing subpatterns need not be
|
||||
unique. This can be helpful for certain types of pattern when it is known that
|
||||
only one instance of the named subpattern can ever be matched. There are more
|
||||
details of named subpatterns below; see also the
|
||||
If this bit is set, names used to identify capture groups need not be unique.
|
||||
This can be helpful for certain types of pattern when it is known that only one
|
||||
instance of the named group can ever be matched. There are more details of
|
||||
named capture groups below; see also the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
|
@ -1466,11 +1466,11 @@ the end of the subject.
|
|||
If this bit is set, most white space characters in the pattern are totally
|
||||
ignored except when escaped or inside a character class. However, white space
|
||||
is not allowed within sequences such as (?> that introduce various
|
||||
parenthesized subpatterns, nor within numerical quantifiers such as {1,3}.
|
||||
Ignorable white space is permitted between an item and a following quantifier
|
||||
and between a quantifier and a following + that indicates possessiveness.
|
||||
PCRE2_EXTENDED is equivalent to Perl's /x option, and it can be changed within
|
||||
a pattern by a (?x) option setting.
|
||||
parenthesized groups, nor within numerical quantifiers such as {1,3}. Ignorable
|
||||
white space is permitted between an item and a following quantifier and between
|
||||
a quantifier and a following + that indicates possessiveness. PCRE2_EXTENDED is
|
||||
equivalent to Perl's /x option, and it can be changed within a pattern by a
|
||||
(?x) option setting.
|
||||
.P
|
||||
When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as
|
||||
white space only those characters with code points less than 256 that are
|
||||
|
@ -1547,7 +1547,7 @@ error.
|
|||
.sp
|
||||
PCRE2_MATCH_UNSET_BACKREF
|
||||
.sp
|
||||
If this option is set, a backreference to an unset subpattern group matches an
|
||||
If this option is set, a backreference to an unset capture group matches an
|
||||
empty string (by default this causes the current matching alternative to fail).
|
||||
A pattern such as (\e1)(a) succeeds when this option is set (assuming it can
|
||||
find an "a" in the subject), whereas it fails by default, for Perl
|
||||
|
@ -1608,7 +1608,7 @@ If this option is set, it disables the use of numbered capturing parentheses in
|
|||
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
|
||||
were followed by ?: but named parentheses can still be used for capturing (and
|
||||
they acquire numbers in the usual way). This is the same as Perl's /n option.
|
||||
Note that, when this option is set, references to capturing groups
|
||||
Note that, when this option is set, references to capture groups
|
||||
(backreferences or recursion/subroutine calls) may only refer to named groups,
|
||||
though the reference can be by name or by number.
|
||||
.sp
|
||||
|
@ -1627,7 +1627,7 @@ purposes.
|
|||
If this option is set, it disables an optimization that is applied when .* is
|
||||
the first significant item in a top-level branch of a pattern, and all the
|
||||
other branches also start with .* or with \eA or \eG or ^. The optimization is
|
||||
automatically disabled for .* if it is inside an atomic group or a capturing
|
||||
automatically disabled for .* if it is inside an atomic group or a capture
|
||||
group that is the subject of a backreference, or if the pattern contains
|
||||
(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
|
||||
automatically anchored if PCRE2_DOTALL is set for all the .* items and
|
||||
|
@ -2025,7 +2025,7 @@ following are true:
|
|||
.sp
|
||||
.* is not in an atomic group
|
||||
.\" JOIN
|
||||
.* is not in a capturing group that is the subject
|
||||
.* is not in a capture group that is the subject
|
||||
of a backreference
|
||||
PCRE2_DOTALL is in force for .*
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||
|
@ -2037,12 +2037,12 @@ options returned for PCRE2_INFO_ALLOPTIONS.
|
|||
PCRE2_INFO_BACKREFMAX
|
||||
.sp
|
||||
Return the number of the highest backreference in the pattern. The third
|
||||
argument should point to an \fBuint32_t\fP variable. Named subpatterns acquire
|
||||
numbers as well as names, and these count towards the highest backreference.
|
||||
Backreferences such as \e4 or \eg{12} match the captured characters of the
|
||||
given group, but in addition, the check that a capturing group is set in a
|
||||
conditional subpattern such as (?(3)a|b) is also a backreference. Zero is
|
||||
returned if there are no backreferences.
|
||||
argument should point to an \fBuint32_t\fP variable. Named capture groups
|
||||
acquire numbers as well as names, and these count towards the highest
|
||||
backreference. Backreferences such as \e4 or \eg{12} match the captured
|
||||
characters of the given group, but in addition, the check that a capture
|
||||
group is set in a conditional group such as (?(3)a|b) is also a backreference.
|
||||
Zero is returned if there are no backreferences.
|
||||
.sp
|
||||
PCRE2_INFO_BSR
|
||||
.sp
|
||||
|
@ -2053,9 +2053,9 @@ that \eR matches only CR, LF, or CRLF.
|
|||
.sp
|
||||
PCRE2_INFO_CAPTURECOUNT
|
||||
.sp
|
||||
Return the highest capturing subpattern number in the pattern. In patterns
|
||||
where (?| is not used, this is also the total number of capturing subpatterns.
|
||||
The third argument should point to an \fBuint32_t\fP variable.
|
||||
Return the highest capture group number in the pattern. In patterns where (?|
|
||||
is not used, this is also the total number of capture groups. The third
|
||||
argument should point to an \fBuint32_t\fP variable.
|
||||
.sp
|
||||
PCRE2_INFO_DEPTHLIMIT
|
||||
.sp
|
||||
|
@ -2103,7 +2103,7 @@ Return the size (in bytes) of the data frames that are used to remember
|
|||
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
|
||||
without the use of JIT. The third argument should point to a \fBsize_t\fP
|
||||
variable. The frame size depends on the number of capturing parentheses in the
|
||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||
pattern. Each additional capture group adds two PCRE2_SIZE variables.
|
||||
.sp
|
||||
PCRE2_INFO_HASBACKSLASHC
|
||||
.sp
|
||||
|
@ -2224,11 +2224,11 @@ library, the pointer points to 32-bit code units, the first of which contains
|
|||
the parenthesis number. The rest of the entry is the corresponding name, zero
|
||||
terminated.
|
||||
.P
|
||||
The names are in alphabetical order. If (?| is used to create multiple groups
|
||||
with the same number, as described in the
|
||||
.\" HTML <a href="pcre2pattern.html#dupsubpatternnumber">
|
||||
The names are in alphabetical order. If (?| is used to create multiple capture
|
||||
groups with the same number, as described in the
|
||||
.\" HTML <a href="pcre2pattern.html#dupgroupnumber">
|
||||
.\" </a>
|
||||
section on duplicate subpattern numbers
|
||||
section on duplicate group numbers
|
||||
.\"
|
||||
in the
|
||||
.\" HREF
|
||||
|
@ -2237,11 +2237,11 @@ in the
|
|||
page, the groups may be given the same name, but there is only one entry in the
|
||||
table. Different names for groups of the same number are not permitted.
|
||||
.P
|
||||
Duplicate names for subpatterns with different numbers are permitted, but only
|
||||
if PCRE2_DUPNAMES is set. They appear in the table in the order in which they
|
||||
were found in the pattern. In the absence of (?| this is the order of
|
||||
Duplicate names for capture groups with different numbers are permitted, but
|
||||
only if PCRE2_DUPNAMES is set. They appear in the table in the order in which
|
||||
they were found in the pattern. In the absence of (?| this is the order of
|
||||
increasing number; when (?| is used this is not necessarily the case because
|
||||
later subpatterns may have lower numbers.
|
||||
later capture groups may have lower numbers.
|
||||
.P
|
||||
As a simple example of the name/number table, consider the following pattern
|
||||
after compilation by the 8-bit library (assume PCRE2_EXTENDED is set, so white
|
||||
|
@ -2251,16 +2251,16 @@ space - including newlines - is ignored):
|
|||
(?<date> (?<year>(\ed\ed)?\ed\ed) -
|
||||
(?<month>\ed\ed) - (?<day>\ed\ed) )
|
||||
.sp
|
||||
There are four named subpatterns, so the table has four entries, and each entry
|
||||
in the table is eight bytes long. The table is as follows, with non-printing
|
||||
bytes shows in hexadecimal, and undefined bytes shown as ??:
|
||||
There are four named capture groups, so the table has four entries, and each
|
||||
entry in the table is eight bytes long. The table is as follows, with
|
||||
non-printing bytes shows in hexadecimal, and undefined bytes shown as ??:
|
||||
.sp
|
||||
00 01 d a t e 00 ??
|
||||
00 05 d a y 00 ?? ??
|
||||
00 04 m o n t h 00
|
||||
00 02 y e a r 00 ??
|
||||
.sp
|
||||
When writing code to extract data from named subpatterns using the
|
||||
When writing code to extract data from named capture groups using the
|
||||
name-to-number map, remember that the length of the entries is likely to be
|
||||
different for each compiled pattern.
|
||||
.sp
|
||||
|
@ -2740,12 +2740,12 @@ valid newline sequence and explicit \er or \en escapes appear in the pattern.
|
|||
In general, a pattern matches a certain portion of the subject, and in
|
||||
addition, further substrings from the subject may be picked out by
|
||||
parenthesized parts of the pattern. Following the usage in Jeffrey Friedl's
|
||||
book, this is called "capturing" in what follows, and the phrase "capturing
|
||||
subpattern" or "capturing group" is used for a fragment of a pattern that picks
|
||||
out a substring. PCRE2 supports several other kinds of parenthesized subpattern
|
||||
that do not cause substrings to be captured. The \fBpcre2_pattern_info()\fP
|
||||
function can be used to find out how many capturing subpatterns there are in a
|
||||
compiled pattern.
|
||||
book, this is called "capturing" in what follows, and the phrase "capture
|
||||
group" (Perl terminology) is used for a fragment of a pattern that picks out a
|
||||
substring. PCRE2 supports several other kinds of parenthesized group that do
|
||||
not cause substrings to be captured. The \fBpcre2_pattern_info()\fP function
|
||||
can be used to find out how many capture groups there are in a compiled
|
||||
pattern.
|
||||
.P
|
||||
You can use auxiliary functions for accessing captured substrings
|
||||
.\" HTML <a href="#extractbynumber">
|
||||
|
@ -2798,30 +2798,28 @@ reported start of a successful match can be greater than the end of the match.
|
|||
For example, if the pattern (?=ab\eK) is matched against "ab", the start and
|
||||
end offset values for the match are 2 and 0.
|
||||
.P
|
||||
If a capturing subpattern group is matched repeatedly within a single match
|
||||
operation, it is the last portion of the subject that it matched that is
|
||||
returned.
|
||||
If a capture group is matched repeatedly within a single match operation, it is
|
||||
the last portion of the subject that it matched that is returned.
|
||||
.P
|
||||
If the ovector is too small to hold all the captured substring offsets, as much
|
||||
as possible is filled in, and the function returns a value of zero. If captured
|
||||
substrings are not of interest, \fBpcre2_match()\fP may be called with a match
|
||||
data block whose ovector is of minimum length (that is, one pair).
|
||||
.P
|
||||
It is possible for capturing subpattern number \fIn+1\fP to match some part of
|
||||
the subject when subpattern \fIn\fP has not been used at all. For example, if
|
||||
the string "abc" is matched against the pattern (a|(z))(bc) the return from the
|
||||
function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this
|
||||
happens, both values in the offset pairs corresponding to unused subpatterns
|
||||
are set to PCRE2_UNSET.
|
||||
.P
|
||||
Offset values that correspond to unused subpatterns at the end of the
|
||||
expression are also set to PCRE2_UNSET. For example, if the string "abc" is
|
||||
matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched.
|
||||
The return from the function is 2, because the highest used capturing
|
||||
subpattern number is 1. The offsets for for the second and third capturing
|
||||
subpatterns (assuming the vector is large enough, of course) are set to
|
||||
It is possible for capture group number \fIn+1\fP to match some part of the
|
||||
subject when group \fIn\fP has not been used at all. For example, if the string
|
||||
"abc" is matched against the pattern (a|(z))(bc) the return from the function
|
||||
is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both
|
||||
values in the offset pairs corresponding to unused groups are set to
|
||||
PCRE2_UNSET.
|
||||
.P
|
||||
Offset values that correspond to unused groups at the end of the expression are
|
||||
also set to PCRE2_UNSET. For example, if the string "abc" is matched against
|
||||
the pattern (abc)(x(yz)?)? groups 2 and 3 are not matched. The return from the
|
||||
function is 2, because the highest used capture group number is 1. The offsets
|
||||
for for the second and third capture groupss (assuming the vector is large
|
||||
enough, of course) are set to PCRE2_UNSET.
|
||||
.P
|
||||
Elements in the ovector that do not correspond to capturing parentheses in the
|
||||
pattern are never changed. That is, if a pattern contains \fIn\fP capturing
|
||||
parentheses, no more than \fIovector[0]\fP to \fIovector[2n+1]\fP are set by
|
||||
|
@ -3006,11 +3004,11 @@ as NULL.
|
|||
.sp
|
||||
This error is returned when \fBpcre2_match()\fP detects a recursion loop within
|
||||
the pattern. Specifically, it means that either the whole pattern or a
|
||||
subpattern has been called recursively for the second time at the same position
|
||||
in the subject string. Some simple patterns that might do this are detected and
|
||||
faulted at compile time, but more complicated cases, in particular mutual
|
||||
recursions between two different subpatterns, cannot be detected until matching
|
||||
is attempted.
|
||||
capture group has been called recursively for the second time at the same
|
||||
position in the subject string. Some simple patterns that might do this are
|
||||
detected and faulted at compile time, but more complicated cases, in particular
|
||||
mutual recursions between two different groups, cannot be detected until
|
||||
matching is attempted.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="geterrormessage"></a>
|
||||
|
@ -3090,7 +3088,7 @@ The \fBpcre2_substring_copy_bynumber()\fP function copies a captured substring
|
|||
into a supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it
|
||||
into new memory, obtained using the same memory allocation function that was
|
||||
used for the match data block. The first two arguments of these functions are a
|
||||
pointer to the match data block and a capturing group number.
|
||||
pointer to the match data block and a capture group number.
|
||||
.P
|
||||
The final arguments of \fBpcre2_substring_copy_bynumber()\fP are a pointer to
|
||||
the buffer and a pointer to a variable that contains its length in code units.
|
||||
|
@ -3162,9 +3160,9 @@ could not be obtained. When the list is no longer needed, it should be freed by
|
|||
calling \fBpcre2_substring_list_free()\fP.
|
||||
.P
|
||||
If this function encounters a substring that is unset, which can happen when
|
||||
capturing subpattern number \fIn+1\fP matches some part of the subject, but
|
||||
subpattern \fIn\fP has not been used at all, it returns an empty string. This
|
||||
can be distinguished from a genuine zero-length substring by inspecting the
|
||||
capture group number \fIn+1\fP matches some part of the subject, but group
|
||||
\fIn\fP has not been used at all, it returns an empty string. This can be
|
||||
distinguished from a genuine zero-length substring by inspecting the
|
||||
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
||||
substrings, or by calling \fBpcre2_substring_length_bynumber()\fP.
|
||||
.
|
||||
|
@ -3194,20 +3192,20 @@ For example, for this pattern:
|
|||
.sp
|
||||
(a+)b(?<xxx>\ed+)...
|
||||
.sp
|
||||
the number of the subpattern called "xxx" is 2. If the name is known to be
|
||||
the number of the capture group called "xxx" is 2. If the name is known to be
|
||||
unique (PCRE2_DUPNAMES was not set), you can find the number from the name by
|
||||
calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
|
||||
compiled pattern, and the second is the name. The yield of the function is the
|
||||
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
|
||||
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
|
||||
that name. Given the number, you can extract the substring directly from the
|
||||
ovector, or use one of the "bynumber" functions described above.
|
||||
group number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or
|
||||
PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name.
|
||||
Given the number, you can extract the substring directly from the ovector, or
|
||||
use one of the "bynumber" functions described above.
|
||||
.P
|
||||
For convenience, there are also "byname" functions that correspond to the
|
||||
"bynumber" functions, the only difference being that the second argument is a
|
||||
name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
|
||||
names, these functions scan all the groups with the given name, and return the
|
||||
first named string that is set.
|
||||
captured substring from the first named group that is set.
|
||||
.P
|
||||
If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
|
||||
returned. If all groups with the name have numbers that are greater than the
|
||||
|
@ -3216,18 +3214,18 @@ is at least one group with a slot in the ovector, but no group is found to be
|
|||
set, PCRE2_ERROR_UNSET is returned.
|
||||
.P
|
||||
\fBWarning:\fP If the pattern uses the (?| feature to set up multiple
|
||||
subpatterns with the same number, as described in the
|
||||
.\" HTML <a href="pcre2pattern.html#dupsubpatternnumber">
|
||||
capture groups with the same number, as described in the
|
||||
.\" HTML <a href="pcre2pattern.html#dupgroupnumber">
|
||||
.\" </a>
|
||||
section on duplicate subpattern numbers
|
||||
section on duplicate group numbers
|
||||
.\"
|
||||
in the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
page, you cannot use names to distinguish the different subpatterns, because
|
||||
page, you cannot use names to distinguish the different capture groups, because
|
||||
names are not included in the compiled code. The matching process uses only
|
||||
numbers. For this reason, the use of different names for subpatterns of the
|
||||
numbers. For this reason, the use of different names for groups with the
|
||||
same number causes an error at compile time.
|
||||
.
|
||||
.
|
||||
|
@ -3288,7 +3286,7 @@ length is in code units, not bytes.
|
|||
In the replacement string, which is interpreted as a UTF string in UTF mode,
|
||||
and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
|
||||
dollar character is an escape character that can specify the insertion of
|
||||
characters from capturing groups or names from (*MARK) or other control verbs
|
||||
characters from capture groups or names from (*MARK) or other control verbs
|
||||
in the pattern. The following forms are always recognized:
|
||||
.sp
|
||||
$$ insert a dollar character
|
||||
|
@ -3351,12 +3349,12 @@ operation is carried out twice. Depending on the application, it may be more
|
|||
efficient to allocate a large buffer and free the excess afterwards, instead of
|
||||
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capturing groups that do
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do
|
||||
not appear in the pattern to be treated as unset groups. This option should be
|
||||
used with care, because it means that a typo in a group name or number no
|
||||
longer causes the PCRE2_ERROR_NOSUBSTRING error.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups (including unknown
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including unknown
|
||||
groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty
|
||||
strings when inserted as described above. If this option is not set, an attempt
|
||||
to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does
|
||||
|
@ -3381,14 +3379,15 @@ terminating a \eQ quoted sequence) reverts to no case forcing. The sequences
|
|||
\eu and \el force the next character (if it is a letter) to upper or lower
|
||||
case, respectively, and then the state automatically reverts to no case
|
||||
forcing. Case forcing applies to all inserted characters, including those from
|
||||
captured groups and letters within \eQ...\eE quoted sequences.
|
||||
capture groups and letters within \eQ...\eE quoted sequences.
|
||||
.P
|
||||
Note that case forcing sequences such as \eU...\eE do not nest. For example,
|
||||
the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
|
||||
effect.
|
||||
.P
|
||||
The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
|
||||
flexibility to group substitution. The syntax is similar to that used by Bash:
|
||||
flexibility to capture group substitution. The syntax is similar to that used
|
||||
by Bash:
|
||||
.sp
|
||||
${<n>:-<string>}
|
||||
${<n>:+<string1>:<string2>}
|
||||
|
@ -3510,7 +3509,7 @@ output and the call to \fBpcre2_substitute()\fP exits, returning the number of
|
|||
matches so far.
|
||||
.
|
||||
.
|
||||
.SH "DUPLICATE SUBPATTERN NAMES"
|
||||
.SH "DUPLICATE CAPTURE GROUP NAMES"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
|
@ -3518,13 +3517,14 @@ matches so far.
|
|||
.B " PCRE2_SPTR \fIname\fP, PCRE2_SPTR *\fIfirst\fP, PCRE2_SPTR *\fIlast\fP);"
|
||||
.fi
|
||||
.P
|
||||
When a pattern is compiled with the PCRE2_DUPNAMES option, names for
|
||||
subpatterns are not required to be unique. Duplicate names are always allowed
|
||||
for subpatterns with the same number, created by using the (?| feature. Indeed,
|
||||
if such subpatterns are named, they are required to use the same names.
|
||||
When a pattern is compiled with the PCRE2_DUPNAMES option, names for capture
|
||||
groups are not required to be unique. Duplicate names are always allowed for
|
||||
groups with the same number, created by using the (?| feature. Indeed, if such
|
||||
groups are named, they are required to use the same names.
|
||||
.P
|
||||
Normally, patterns with duplicate names are such that in any one match, only
|
||||
one of the named subpatterns participates. An example is shown in the
|
||||
Normally, patterns that use duplicate names are such that in any one match,
|
||||
only one of each set of identically-named groups participates. An example is
|
||||
shown in the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
|
@ -3705,9 +3705,8 @@ the three matched strings are
|
|||
On success, the yield of the function is a number greater than zero, which is
|
||||
the number of matched substrings. The offsets of the substrings are returned in
|
||||
the ovector, and can be extracted by number in the same way as for
|
||||
\fBpcre2_match()\fP, but the numbers bear no relation to any capturing groups
|
||||
that may exist in the pattern, because DFA matching does not support group
|
||||
capture.
|
||||
\fBpcre2_match()\fP, but the numbers bear no relation to any capture groups
|
||||
that may exist in the pattern, because DFA matching does not support capturing.
|
||||
.P
|
||||
Calls to the convenience functions that extract substrings by name
|
||||
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
|
||||
|
@ -3749,7 +3748,7 @@ a backreference.
|
|||
.sp
|
||||
This return is given if \fBpcre2_dfa_match()\fP encounters a condition item
|
||||
that uses a backreference for the condition, or a test for recursion in a
|
||||
specific group. These are not supported.
|
||||
specific capture group. These are not supported.
|
||||
.sp
|
||||
PCRE2_ERROR_DFA_WSSIZE
|
||||
.sp
|
||||
|
@ -3758,9 +3757,9 @@ This return is given if \fBpcre2_dfa_match()\fP runs out of space in the
|
|||
.sp
|
||||
PCRE2_ERROR_DFA_RECURSE
|
||||
.sp
|
||||
When a recursive subpattern is processed, the matching function calls itself
|
||||
recursively, using private memory for the ovector and \fIworkspace\fP. This
|
||||
error is given if the internal ovector is not large enough. This should be
|
||||
When a recursion or subroutine call is processed, the matching function calls
|
||||
itself recursively, using private memory for the ovector and \fIworkspace\fP.
|
||||
This error is given if the internal ovector is not large enough. This should be
|
||||
extremely rare, as a vector of size 1000 is used.
|
||||
.sp
|
||||
PCRE2_ERROR_DFA_BADRESTART
|
||||
|
@ -3793,6 +3792,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 04 January 2019
|
||||
Last updated: 04 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2CALLOUT 3 "17 September 2018" "PCRE2 10.33"
|
||||
.TH PCRE2CALLOUT 3 "03 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -137,7 +137,7 @@ start only after an internal newline or at the beginning of the subject, and
|
|||
branch, automatic anchoring occurs if all branches are anchorable.
|
||||
.P
|
||||
This optimization is disabled, however, if .* is in an atomic group or if there
|
||||
is a backreference to the capturing group in which it appears. It is also
|
||||
is a backreference to the capture group in which it appears. It is also
|
||||
disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
|
||||
callouts does not affect it.
|
||||
.P
|
||||
|
@ -331,8 +331,8 @@ callout before an assertion such as (?=ab) the length is 3. For an an
|
|||
alternation bar or a closing parenthesis, the length is one, unless a closing
|
||||
parenthesis is followed by a quantifier, in which case its length is included.
|
||||
(This changed in release 10.23. In earlier releases, before an opening
|
||||
parenthesis the length was that of the entire subpattern, and before an
|
||||
alternation bar or a closing parenthesis the length was zero.)
|
||||
parenthesis the length was that of the entire group, and before an alternation
|
||||
bar or a closing parenthesis the length was zero.)
|
||||
.P
|
||||
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
|
||||
help in distinguishing between different automatic callouts, which all have the
|
||||
|
@ -452,6 +452,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 17 September 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
Last updated: 03 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2COMPAT 3 "28 July 2018" "PCRE2 10.32"
|
||||
.TH PCRE2COMPAT 3 "03 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
||||
|
@ -23,10 +23,9 @@ character is not "a" three times (in principle; PCRE2 optimizes this to run the
|
|||
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
||||
for example, \eb* (but not \eb{3}), but these do not seem to have any use.
|
||||
.P
|
||||
3. Capturing subpatterns that occur inside negative lookaround assertions are
|
||||
counted, but their entries in the offsets vector are set only when a negative
|
||||
assertion is a condition that has a matching branch (that is, the condition is
|
||||
false).
|
||||
3. Capture groups that occur inside negative lookaround assertions are counted,
|
||||
but their entries in the offsets vector are set only when a negative assertion
|
||||
is a condition that has a matching branch (that is, the condition is false).
|
||||
.P
|
||||
4. The following Perl escape sequences are not supported: \eF, \el, \eL, \eu,
|
||||
\eU, and \eN when followed by a character name. \eN on its own, matching a
|
||||
|
@ -79,13 +78,13 @@ documentation for details.
|
|||
to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
|
||||
into subroutine calls is now supported, as in Perl.
|
||||
.P
|
||||
9. If any of the backtracking control verbs are used in a subpattern that is
|
||||
called as a subroutine (whether or not recursively), their effect is confined
|
||||
to that subpattern; it does not extend to the surrounding pattern. This is not
|
||||
always the case in Perl. In particular, if (*THEN) is present in a group that
|
||||
is called as a subroutine, its action is limited to that group, even if the
|
||||
group does not contain any | characters. Note that such subpatterns are
|
||||
processed as anchored at the point where they are tested.
|
||||
9. If any of the backtracking control verbs are used in a group that is called
|
||||
as a subroutine (whether or not recursively), their effect is confined to that
|
||||
group; it does not extend to the surrounding pattern. This is not always the
|
||||
case in Perl. In particular, if (*THEN) is present in a group that is called as
|
||||
a subroutine, its action is limited to that group, even if the group does not
|
||||
contain any | characters. Note that such groups are processed as anchored
|
||||
at the point where they are tested.
|
||||
.P
|
||||
10. If a pattern contains more than one backtracking control verb, the first
|
||||
one that is backtracked onto acts. For example, in the pattern
|
||||
|
@ -101,21 +100,20 @@ strings when part of a pattern is repeated. For example, matching "aba" against
|
|||
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
|
||||
"b".
|
||||
.P
|
||||
13. PCRE2's handling of duplicate subpattern numbers and duplicate subpattern
|
||||
names is not as general as Perl's. This is a consequence of the fact the PCRE2
|
||||
works internally just with numbers, using an external table to translate
|
||||
between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b>B),
|
||||
where the two capturing parentheses have the same number but different names,
|
||||
is not supported, and causes an error at compile time. If it were allowed, it
|
||||
would not be possible to distinguish which parentheses matched, because both
|
||||
names map to capturing subpattern number 1. To avoid this confusing situation,
|
||||
an error is given at compile time.
|
||||
13. PCRE2's handling of duplicate capture group numbers and names is not as
|
||||
general as Perl's. This is a consequence of the fact the PCRE2 works internally
|
||||
just with numbers, using an external table to translate between numbers and
|
||||
names. In particular, a pattern such as (?|(?<a>A)|(?<b>B), where the two
|
||||
capture groups have the same number but different names, is not supported, and
|
||||
causes an error at compile time. If it were allowed, it would not be possible
|
||||
to distinguish which group matched, because both names map to capture group
|
||||
number 1. To avoid this confusing situation, an error is given at compile time.
|
||||
.P
|
||||
14. Perl used to recognize comments in some places that PCRE2 does not, for
|
||||
example, between the ( and ? at the start of a subpattern. If the /x modifier
|
||||
is set, Perl allowed white space between ( and ? though the latest Perls give
|
||||
an error (for a while it was just deprecated). There may still be some cases
|
||||
where Perl behaves differently.
|
||||
example, between the ( and ? at the start of a group. If the /x modifier is
|
||||
set, Perl allowed white space between ( and ? though the latest Perls give an
|
||||
error (for a while it was just deprecated). There may still be some cases where
|
||||
Perl behaves differently.
|
||||
.P
|
||||
15. Perl, when in warning mode, gives warnings for character classes such as
|
||||
[A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
|
||||
|
@ -200,6 +198,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 28 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
Last updated: 03 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2LIMITS 3 "30 March 2017" "PCRE2 10.30"
|
||||
.TH PCRE2LIMITS 3 "03 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "SIZE AND OTHER LIMITATIONS"
|
||||
|
@ -34,16 +34,16 @@ All values in repeating quantifiers must be less than 65536.
|
|||
.P
|
||||
The maximum length of a lookbehind assertion is 65535 characters.
|
||||
.P
|
||||
There is no limit to the number of parenthesized subpatterns, but there can be
|
||||
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
||||
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
||||
order to limit the amount of system stack used at compile time. The default
|
||||
limit can be specified when PCRE2 is built; if not, the default is set to 250.
|
||||
An application can change this limit by calling pcre2_set_parens_nest_limit()
|
||||
to set the limit in a compile context.
|
||||
There is no limit to the number of parenthesized groups, but there can be no
|
||||
more than 65535 capture groups, and there is a limit to the depth of nesting of
|
||||
parenthesized subpatterns of all kinds. This is imposed in order to limit the
|
||||
amount of system stack used at compile time. The default limit can be specified
|
||||
when PCRE2 is built; if not, the default is set to 250. An application can
|
||||
change this limit by calling pcre2_set_parens_nest_limit() to set the limit in
|
||||
a compile context.
|
||||
.P
|
||||
The maximum length of name for a named subpattern is 32 code units, and the
|
||||
maximum number of named subpatterns is 10000.
|
||||
The maximum length of name for a named capture group is 32 code units, and the
|
||||
maximum number of such groups is 10000.
|
||||
.P
|
||||
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
|
||||
is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
|
||||
|
@ -67,6 +67,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 30 March 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
Last updated: 02 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PERFORM 3 "25 April 2018" "PCRE2 10.32"
|
||||
.TH PCRE2PERFORM 3 "03 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 PERFORMANCE"
|
||||
|
@ -14,9 +14,9 @@ of them.
|
|||
Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
|
||||
so that most simple patterns do not use much memory for storing the compiled
|
||||
version. However, there is one case where the memory usage of a compiled
|
||||
pattern can be unexpectedly large. If a parenthesized subpattern has a
|
||||
quantifier with a minimum greater than 1 and/or a limited maximum, the whole
|
||||
subpattern is repeated in the compiled code. For example, the pattern
|
||||
pattern can be unexpectedly large. If a parenthesized group has a quantifier
|
||||
with a minimum greater than 1 and/or a limited maximum, the whole group is
|
||||
repeated in the compiled code. For example, the pattern
|
||||
.sp
|
||||
(abc|def){2,4}
|
||||
.sp
|
||||
|
@ -239,6 +239,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 25 April 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
Last updated: 03 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2SYNTAX 3 "10 October 2018" "PCRE2 10.33"
|
||||
.TH PCRE2SYNTAX 3 "03 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||
|
@ -398,20 +398,24 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
|||
.SH "CAPTURING"
|
||||
.rs
|
||||
.sp
|
||||
(...) capturing group
|
||||
(?<name>...) named capturing group (Perl)
|
||||
(?'name'...) named capturing group (Perl)
|
||||
(?P<name>...) named capturing group (Python)
|
||||
(?:...) non-capturing group
|
||||
(?|...) non-capturing group; reset group numbers for
|
||||
capturing groups in each alternative
|
||||
(...) capture group
|
||||
(?<name>...) named capture group (Perl)
|
||||
(?'name'...) named capture group (Perl)
|
||||
(?P<name>...) named capture group (Python)
|
||||
(?:...) non-capture group
|
||||
(?|...) non-capture group; reset group numbers for
|
||||
capture groups in each alternative
|
||||
.sp
|
||||
In non-UTF modes, names may contain underscores and ASCII letters and digits;
|
||||
in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
|
||||
both cases, a name must not start with a digit.
|
||||
.
|
||||
.
|
||||
.SH "ATOMIC GROUPS"
|
||||
.rs
|
||||
.sp
|
||||
(?>...) atomic, non-capturing group
|
||||
(*atomic:...) atomic, non-capturing group
|
||||
(?>...) atomic non-capture group
|
||||
(*atomic:...) atomic non-capture group
|
||||
.
|
||||
.
|
||||
.SH "COMMENT"
|
||||
|
@ -439,7 +443,7 @@ of the group.
|
|||
Unsetting x or xx unsets both. Several options may be set at once, and a
|
||||
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
|
||||
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
|
||||
(?^in). An option setting may appear at the start of a non-capturing group, for
|
||||
(?^in). An option setting may appear at the start of a non-capture group, for
|
||||
example (?i:...).
|
||||
.P
|
||||
The following are recognized only at the very start of a pattern or after one
|
||||
|
@ -542,19 +546,19 @@ Each top-level branch of a lookbehind must be of a fixed length.
|
|||
.rs
|
||||
.sp
|
||||
(?R) recurse whole pattern
|
||||
(?n) call subpattern by absolute number
|
||||
(?+n) call subpattern by relative number
|
||||
(?-n) call subpattern by relative number
|
||||
(?&name) call subpattern by name (Perl)
|
||||
(?P>name) call subpattern by name (Python)
|
||||
\eg<name> call subpattern by name (Oniguruma)
|
||||
\eg'name' call subpattern by name (Oniguruma)
|
||||
\eg<n> call subpattern by absolute number (Oniguruma)
|
||||
\eg'n' call subpattern by absolute number (Oniguruma)
|
||||
\eg<+n> call subpattern by relative number (PCRE2 extension)
|
||||
\eg'+n' call subpattern by relative number (PCRE2 extension)
|
||||
\eg<-n> call subpattern by relative number (PCRE2 extension)
|
||||
\eg'-n' call subpattern by relative number (PCRE2 extension)
|
||||
(?n) call subroutine by absolute number
|
||||
(?+n) call subroutine by relative number
|
||||
(?-n) call subroutine by relative number
|
||||
(?&name) call subroutine by name (Perl)
|
||||
(?P>name) call subroutine by name (Python)
|
||||
\eg<name> call subroutine by name (Oniguruma)
|
||||
\eg'name' call subroutine by name (Oniguruma)
|
||||
\eg<n> call subroutine by absolute number (Oniguruma)
|
||||
\eg'n' call subroutine by absolute number (Oniguruma)
|
||||
\eg<+n> call subroutine by relative number (PCRE2 extension)
|
||||
\eg'+n' call subroutine by relative number (PCRE2 extension)
|
||||
\eg<-n> call subroutine by relative number (PCRE2 extension)
|
||||
\eg'-n' call subroutine by relative number (PCRE2 extension)
|
||||
.
|
||||
.
|
||||
.SH "CONDITIONAL PATTERNS"
|
||||
|
@ -572,7 +576,7 @@ Each top-level branch of a lookbehind must be of a fixed length.
|
|||
(?(R) overall recursion condition
|
||||
(?(Rn) specific numbered group recursion condition
|
||||
(?(R&name) specific named group recursion condition
|
||||
(?(DEFINE) define subpattern for reference
|
||||
(?(DEFINE) define groups for reference
|
||||
(?(VERSION[>]=n.m) test PCRE2 version
|
||||
(?(assert) assertion condition
|
||||
.sp
|
||||
|
@ -643,6 +647,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 10 October 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
Last updated: 03 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2TEST 1 "12 November 2018" "PCRE 10.33"
|
||||
.TH PCRE2TEST 1 "03 February 2019" "PCRE 10.33"
|
||||
.SH NAME
|
||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -672,14 +672,14 @@ information is obtained from the \fBpcre2_pattern_info()\fP function. Here are
|
|||
some typical examples:
|
||||
.sp
|
||||
re> /(?i)(^a|^b)/m,info
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Compile options: multiline
|
||||
Overall options: caseless multiline
|
||||
First code unit at start or follows newline
|
||||
Subject length lower bound = 1
|
||||
.sp
|
||||
re> /(?i)abc/info
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: <none>
|
||||
Overall options: caseless
|
||||
First code unit = 'a' (caseless)
|
||||
|
@ -1325,8 +1325,8 @@ current character is CR followed by LF, an advance of two characters occurs.
|
|||
.sp
|
||||
The \fBcopy\fP and \fBget\fP modifiers can be used to test the
|
||||
\fBpcre2_substring_copy_xxx()\fP and \fBpcre2_substring_get_xxx()\fP functions.
|
||||
They can be given more than once, and each can specify a group name or number,
|
||||
for example:
|
||||
They can be given more than once, and each can specify a capture group name or
|
||||
number, for example:
|
||||
.sp
|
||||
abcd\e=copy=1,copy=3,get=G1
|
||||
.sp
|
||||
|
@ -2056,6 +2056,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 12 November 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
Last updated: 03 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -646,14 +646,14 @@ PATTERN MODIFIERS
|
|||
are some typical examples:
|
||||
|
||||
re> /(?i)(^a|^b)/m,info
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Compile options: multiline
|
||||
Overall options: caseless multiline
|
||||
First code unit at start or follows newline
|
||||
Subject length lower bound = 1
|
||||
|
||||
re> /(?i)abc/info
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: <none>
|
||||
Overall options: caseless
|
||||
First code unit = 'a' (caseless)
|
||||
|
@ -1214,8 +1214,8 @@ SUBJECT MODIFIERS
|
|||
|
||||
The copy and get modifiers can be used to test the pcre2_sub-
|
||||
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
|
||||
given more than once, and each can specify a group name or number, for
|
||||
example:
|
||||
given more than once, and each can specify a capture group name or num-
|
||||
ber, for example:
|
||||
|
||||
abcd\=copy=1,copy=3,get=G1
|
||||
|
||||
|
@ -1887,5 +1887,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 12 November 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
Last updated: 03 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2UNICODE 3 "12 October 2018" "PCRE2 10.33"
|
||||
.TH PCRE2UNICODE 3 "03 February 2019" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions (revised API)
|
||||
.SH "UNICODE AND UTF SUPPORT"
|
||||
|
@ -27,10 +27,11 @@ case the library will be smaller.
|
|||
.rs
|
||||
.sp
|
||||
When PCRE2 is built with Unicode support, the escape sequences \ep{..},
|
||||
\eP{..}, and \eX can be used. The Unicode properties that can be tested are
|
||||
limited to the general category properties such as Lu for an upper case letter
|
||||
or Nd for a decimal number, the Unicode script names such as Arabic or Han, and
|
||||
the derived properties Any and L&. Full lists are given in the
|
||||
\eP{..}, and \eX can be used. This is not dependent on the PCRE2_UTF setting.
|
||||
The Unicode properties that can be tested are limited to the general category
|
||||
properties such as Lu for an upper case letter or Nd for a decimal number, the
|
||||
Unicode script names such as Arabic or Han, and the derived properties Any and
|
||||
L&. Full lists are given in the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
|
@ -62,13 +63,18 @@ individual code units.
|
|||
In UTF modes, the dot metacharacter matches one UTF character instead of a
|
||||
single code unit.
|
||||
.P
|
||||
In UTF modes, capture group names are not restricted to ASCII, and may contain
|
||||
any Unicode letters and decimal digits, as well as underscore.
|
||||
.P
|
||||
The escape sequence \eC can be used to match a single code unit in a UTF mode,
|
||||
but its use can lead to some strange effects because it breaks up multi-unit
|
||||
characters (see the description of \eC in the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
documentation).
|
||||
documentation). For this reason, there is a build-time option that disables
|
||||
support for \eC completely. There is also a less draconian compile-time option
|
||||
for locking out the use of \eC when a pattern is compiled.
|
||||
.P
|
||||
The use of \eC is not supported by the alternative matching function
|
||||
\fBpcre2_dfa_match()\fP when in UTF-8 or UTF-16 mode, that is, when a character
|
||||
|
@ -387,6 +393,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 12 October 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
Last updated: 03 February 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -2194,6 +2194,7 @@ so it is simplest just to return both.
|
|||
Arguments:
|
||||
ptrptr points to the character pointer variable
|
||||
ptrend points to the end of the input string
|
||||
utf true if the input is UTF-encoded
|
||||
terminator the terminator of a subpattern name must be this
|
||||
offsetptr where to put the offset from the start of the pattern
|
||||
nameptr where to put a pointer to the name in the input
|
||||
|
@ -2206,13 +2207,12 @@ Returns: TRUE if a name was read
|
|||
*/
|
||||
|
||||
static BOOL
|
||||
read_name(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t terminator,
|
||||
read_name(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, BOOL utf, uint32_t terminator,
|
||||
PCRE2_SIZE *offsetptr, PCRE2_SPTR *nameptr, uint32_t *namelenptr,
|
||||
int *errorcodeptr, compile_block *cb)
|
||||
{
|
||||
PCRE2_SPTR ptr = *ptrptr;
|
||||
BOOL is_group = (*ptr != CHAR_ASTERISK);
|
||||
uint32_t namelen = 0;
|
||||
|
||||
if (++ptr >= ptrend) /* No characters in name */
|
||||
{
|
||||
|
@ -2221,35 +2221,74 @@ if (++ptr >= ptrend) /* No characters in name */
|
|||
goto FAILED;
|
||||
}
|
||||
|
||||
/* A group name must not start with a digit. If either of the others start with
|
||||
a digit it just won't be recognized. */
|
||||
|
||||
if (is_group && IS_DIGIT(*ptr))
|
||||
{
|
||||
*errorcodeptr = ERR44;
|
||||
goto FAILED;
|
||||
}
|
||||
|
||||
*nameptr = ptr;
|
||||
*offsetptr = (PCRE2_SIZE)(ptr - cb->start_pattern);
|
||||
|
||||
while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype_word) != 0)
|
||||
/* In UTF mode, a group name may contain letters and decimal digits as defined
|
||||
by Unicode properties, and underscores, but must not start with a digit. */
|
||||
|
||||
#ifdef SUPPORT_UNICODE
|
||||
if (utf && is_group)
|
||||
{
|
||||
ptr++;
|
||||
namelen++;
|
||||
if (namelen > MAX_NAME_SIZE)
|
||||
uint32_t c, type;
|
||||
|
||||
GETCHAR(c, ptr);
|
||||
type = UCD_CHARTYPE(c);
|
||||
|
||||
if (type == ucp_Nd)
|
||||
{
|
||||
*errorcodeptr = ERR48;
|
||||
*errorcodeptr = ERR44;
|
||||
goto FAILED;
|
||||
}
|
||||
|
||||
for(;;)
|
||||
{
|
||||
if (type != ucp_Nd && PRIV(ucp_gentype)[type] != ucp_L &&
|
||||
c != CHAR_UNDERSCORE) break;
|
||||
ptr++;
|
||||
FORWARDCHAR(ptr);
|
||||
if (ptr >= ptrend) break;
|
||||
GETCHAR(c, ptr);
|
||||
type = UCD_CHARTYPE(c);
|
||||
}
|
||||
}
|
||||
else
|
||||
#else
|
||||
(void)utf; /* Avoid compiler warning */
|
||||
#endif /* SUPPORT_UNICODE */
|
||||
|
||||
/* Handle non-group names and group names in non-UTF modes. A group name must
|
||||
not start with a digit. If either of the others start with a digit it just
|
||||
won't be recognized. */
|
||||
|
||||
{
|
||||
if (is_group && IS_DIGIT(*ptr))
|
||||
{
|
||||
*errorcodeptr = ERR44;
|
||||
goto FAILED;
|
||||
}
|
||||
|
||||
while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype_word) != 0)
|
||||
{
|
||||
ptr++;
|
||||
}
|
||||
}
|
||||
|
||||
/* Check name length */
|
||||
|
||||
if (ptr > *nameptr + MAX_NAME_SIZE)
|
||||
{
|
||||
*errorcodeptr = ERR48;
|
||||
goto FAILED;
|
||||
}
|
||||
*namelenptr = ptr - *nameptr;
|
||||
|
||||
/* Subpattern names must not be empty, and their terminator is checked here.
|
||||
(What follows a verb or alpha assertion name is checked separately.) */
|
||||
|
||||
if (is_group)
|
||||
{
|
||||
if (namelen == 0)
|
||||
if (ptr == *nameptr)
|
||||
{
|
||||
*errorcodeptr = ERR62; /* Subpattern name expected */
|
||||
goto FAILED;
|
||||
|
@ -2262,7 +2301,6 @@ if (is_group)
|
|||
ptr++;
|
||||
}
|
||||
|
||||
*namelenptr = namelen;
|
||||
*ptrptr = ptr;
|
||||
return TRUE;
|
||||
|
||||
|
@ -2981,7 +3019,7 @@ while (ptr < ptrend)
|
|||
|
||||
/* Not a numerical recursion */
|
||||
|
||||
if (!read_name(&ptr, ptrend, terminator, &offset, &name, &namelen,
|
||||
if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen,
|
||||
&errorcode, cb)) goto ESCAPE_FAILED;
|
||||
|
||||
/* \k and \g when used with braces are back references, whereas \g used
|
||||
|
@ -3554,8 +3592,8 @@ while (ptr < ptrend)
|
|||
uint32_t meta;
|
||||
|
||||
vn = alasnames;
|
||||
if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode,
|
||||
cb)) goto FAILED;
|
||||
if (!read_name(&ptr, ptrend, utf, 0, &offset, &name, &namelen,
|
||||
&errorcode, cb)) goto FAILED;
|
||||
if (ptr >= ptrend || *ptr != CHAR_COLON)
|
||||
{
|
||||
errorcode = ERR95; /* Malformed */
|
||||
|
@ -3651,8 +3689,8 @@ while (ptr < ptrend)
|
|||
else
|
||||
{
|
||||
vn = verbnames;
|
||||
if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode,
|
||||
cb)) goto FAILED;
|
||||
if (!read_name(&ptr, ptrend, utf, 0, &offset, &name, &namelen,
|
||||
&errorcode, cb)) goto FAILED;
|
||||
if (ptr >= ptrend || (*ptr != CHAR_COLON &&
|
||||
*ptr != CHAR_RIGHT_PARENTHESIS))
|
||||
{
|
||||
|
@ -3907,7 +3945,7 @@ while (ptr < ptrend)
|
|||
errorcode = ERR41;
|
||||
goto FAILED;
|
||||
}
|
||||
if (!read_name(&ptr, ptrend, CHAR_RIGHT_PARENTHESIS, &offset, &name,
|
||||
if (!read_name(&ptr, ptrend, utf, CHAR_RIGHT_PARENTHESIS, &offset, &name,
|
||||
&namelen, &errorcode, cb)) goto FAILED;
|
||||
*parsed_pattern++ = META_BACKREF_BYNAME;
|
||||
*parsed_pattern++ = namelen;
|
||||
|
@ -3967,7 +4005,7 @@ while (ptr < ptrend)
|
|||
|
||||
case CHAR_AMPERSAND:
|
||||
RECURSE_BY_NAME:
|
||||
if (!read_name(&ptr, ptrend, CHAR_RIGHT_PARENTHESIS, &offset, &name,
|
||||
if (!read_name(&ptr, ptrend, utf, CHAR_RIGHT_PARENTHESIS, &offset, &name,
|
||||
&namelen, &errorcode, cb)) goto FAILED;
|
||||
*parsed_pattern++ = META_RECURSE_BYNAME;
|
||||
*parsed_pattern++ = namelen;
|
||||
|
@ -4215,7 +4253,7 @@ while (ptr < ptrend)
|
|||
terminator = CHAR_RIGHT_PARENTHESIS;
|
||||
ptr--; /* Point to char before name */
|
||||
}
|
||||
if (!read_name(&ptr, ptrend, terminator, &offset, &name, &namelen,
|
||||
if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen,
|
||||
&errorcode, cb)) goto FAILED;
|
||||
|
||||
/* Handle (?(R&name) */
|
||||
|
@ -4349,7 +4387,7 @@ while (ptr < ptrend)
|
|||
terminator = CHAR_APOSTROPHE; /* Terminator */
|
||||
|
||||
DEFINE_NAME:
|
||||
if (!read_name(&ptr, ptrend, terminator, &offset, &name, &namelen,
|
||||
if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen,
|
||||
&errorcode, cb)) goto FAILED;
|
||||
|
||||
/* We have a name for this capturing group. It is also assigned a number,
|
||||
|
|
|
@ -95,7 +95,7 @@ static const unsigned char compile_error_texts[] =
|
|||
/* 25 */
|
||||
"lookbehind assertion is not fixed length\0"
|
||||
"a relative value of zero is not allowed\0"
|
||||
"conditional group contains more than two branches\0"
|
||||
"conditional subpattern contains more than two branches\0"
|
||||
"assertion expected after (?( or (?(?C)\0"
|
||||
"digit expected after (?+ or (?-\0"
|
||||
/* 30 */
|
||||
|
@ -113,21 +113,21 @@ static const unsigned char compile_error_texts[] =
|
|||
/* 40 */
|
||||
"invalid escape sequence in (*VERB) name\0"
|
||||
"unrecognized character after (?P\0"
|
||||
"syntax error in subpattern name (missing terminator)\0"
|
||||
"syntax error in subpattern name (missing terminator?)\0"
|
||||
"two named subpatterns have the same name (PCRE2_DUPNAMES not set)\0"
|
||||
"group name must start with a non-digit\0"
|
||||
"subpattern name must start with a non-digit\0"
|
||||
/* 45 */
|
||||
"this version of PCRE2 does not have support for \\P, \\p, or \\X\0"
|
||||
"malformed \\P or \\p sequence\0"
|
||||
"unknown property name after \\P or \\p\0"
|
||||
"subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " characters)\0"
|
||||
"subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " code units)\0"
|
||||
"too many named subpatterns (maximum " XSTRING(MAX_NAME_COUNT) ")\0"
|
||||
/* 50 */
|
||||
"invalid range in character class\0"
|
||||
"octal value is greater than \\377 in 8-bit non-UTF-8 mode\0"
|
||||
"internal error: overran compiling workspace\0"
|
||||
"internal error: previously-checked referenced subpattern not found\0"
|
||||
"DEFINE group contains more than one branch\0"
|
||||
"DEFINE subpattern contains more than one branch\0"
|
||||
/* 55 */
|
||||
"missing opening brace after \\o\0"
|
||||
"internal error: unknown newline setting\0"
|
||||
|
@ -137,7 +137,7 @@ static const unsigned char compile_error_texts[] =
|
|||
"obsolete error (should not occur)\0" /* Was the above */
|
||||
/* 60 */
|
||||
"(*VERB) not recognized or malformed\0"
|
||||
"group number is too big\0"
|
||||
"subpattern number is too big\0"
|
||||
"subpattern name expected\0"
|
||||
"internal error: parsed pattern overflow\0"
|
||||
"non-octal character in \\o{} (closing brace missing?)\0"
|
||||
|
|
|
@ -3049,13 +3049,14 @@ return yield;
|
|||
|
||||
|
||||
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
/*************************************************
|
||||
* Convert character value to UTF-8 *
|
||||
*************************************************/
|
||||
|
||||
/* This function takes an integer value in the range 0 - 0x7fffffff
|
||||
and encodes it as a UTF-8 character in 0 to 6 bytes.
|
||||
and encodes it as a UTF-8 character in 0 to 6 bytes. It is needed even when the
|
||||
8-bit library is not supported, to generate UTF-8 output for non-ASCII
|
||||
characters.
|
||||
|
||||
Arguments:
|
||||
cvalue the character value
|
||||
|
@ -3081,7 +3082,6 @@ for (j = i; j > 0; j--)
|
|||
*utf8bytes = utf8_table2[i] | cvalue;
|
||||
return i + 1;
|
||||
}
|
||||
#endif /* SUPPORT_PCRE2_8 */
|
||||
|
||||
|
||||
|
||||
|
@ -4374,6 +4374,7 @@ static int
|
|||
show_pattern_info(void)
|
||||
{
|
||||
uint32_t compile_options, overall_options, extra_options;
|
||||
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
|
||||
|
||||
if ((pat_patctl.control & (CTL_BINCODE|CTL_FULLBINCODE)) != 0)
|
||||
{
|
||||
|
@ -4463,7 +4464,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)
|
|||
!= 0)
|
||||
return PR_ABEND;
|
||||
|
||||
fprintf(outfile, "Capturing subpattern count = %d\n", capture_count);
|
||||
fprintf(outfile, "Capture group count = %d\n", capture_count);
|
||||
|
||||
if (backrefmax > 0)
|
||||
fprintf(outfile, "Max back reference = %d\n", backrefmax);
|
||||
|
@ -4482,14 +4483,60 @@ if ((pat_patctl.control & CTL_INFO) != 0)
|
|||
|
||||
if (namecount > 0)
|
||||
{
|
||||
fprintf(outfile, "Named capturing subpatterns:\n");
|
||||
fprintf(outfile, "Named capture groups:\n");
|
||||
for (; namecount > 0; namecount--)
|
||||
{
|
||||
int imm2_size = test_mode == PCRE8_MODE ? 2 : 1;
|
||||
uint32_t length = (uint32_t)STRLEN(nametable + imm2_size);
|
||||
fprintf(outfile, " ");
|
||||
PCHARSV(nametable, imm2_size, length, FALSE, outfile);
|
||||
|
||||
/* In UTF mode the name may be a UTF string containing non-ASCII
|
||||
letters and digits. We must output it as a UTF-8 string. In non-UTF mode,
|
||||
use the normal string printing functions, which use escapes for all
|
||||
non-ASCII characters. */
|
||||
|
||||
if (utf)
|
||||
{
|
||||
#ifdef SUPPORT_PCRE2_32
|
||||
if (test_mode == PCRE32_MODE)
|
||||
{
|
||||
PCRE2_SPTR32 nameptr = (PCRE2_SPTR32)nametable + imm2_size;
|
||||
while (*nameptr != 0)
|
||||
{
|
||||
uint8_t u8buff[6];
|
||||
int len = ord2utf8(*nameptr++, u8buff);
|
||||
fprintf(outfile, "%.*s", len, u8buff);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_16
|
||||
if (test_mode == PCRE16_MODE)
|
||||
{
|
||||
PCRE2_SPTR16 nameptr = (PCRE2_SPTR16)nametable + imm2_size;
|
||||
while (*nameptr != 0)
|
||||
{
|
||||
int len;
|
||||
uint8_t u8buff[6];
|
||||
uint32_t c = *nameptr++ & 0xffff;
|
||||
if (c >= 0xD800 && c < 0xDC00)
|
||||
c = ((c & 0x3ff) << 10) + (*nameptr++ & 0x3ff) + 0x10000;
|
||||
len = ord2utf8(c, u8buff);
|
||||
fprintf(outfile, "%.*s", len, u8buff);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
if (test_mode == PCRE8_MODE)
|
||||
fprintf(outfile, "%s", (PCRE2_SPTR8)nametable + imm2_size);
|
||||
#endif
|
||||
}
|
||||
else /* Not UTF mode */
|
||||
{
|
||||
PCHARSV(nametable, imm2_size, length, FALSE, outfile);
|
||||
}
|
||||
|
||||
while (length++ < nameentrysize - imm2_size) putc(' ', outfile);
|
||||
|
||||
#ifdef SUPPORT_PCRE2_32
|
||||
if (test_mode == PCRE32_MODE)
|
||||
fprintf(outfile, "%3d\n", (int)(((PCRE2_SPTR32)nametable)[0]));
|
||||
|
@ -4503,6 +4550,7 @@ if ((pat_patctl.control & CTL_INFO) != 0)
|
|||
fprintf(outfile, "%3d\n", (int)(
|
||||
((((PCRE2_SPTR8)nametable)[0]) << 8) | ((PCRE2_SPTR8)nametable)[1]));
|
||||
#endif
|
||||
|
||||
nametable = (void*)((PCRE2_SPTR8)nametable + nameentrysize * code_unit_size);
|
||||
}
|
||||
}
|
||||
|
|
|
@ -481,4 +481,12 @@
|
|||
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
|
||||
123abcáyzabcdef789abcሴqr
|
||||
|
||||
# Check name length with non-ASCII characters
|
||||
|
||||
/(?'ABáC678901234567890123456789012'...)/utf
|
||||
|
||||
/(?'ABáC6789012345678901234567890123'...)/utf
|
||||
|
||||
/(?'ABZC6789012345678901234567890123'...)/utf
|
||||
|
||||
# End of testinput10
|
||||
|
|
|
@ -2457,4 +2457,27 @@
|
|||
|
||||
# -------
|
||||
|
||||
# Test group names containing non-ASCII letters and digits
|
||||
|
||||
/(?'ABáC'...)\g{ABáC}/utf
|
||||
abcabcdefg
|
||||
|
||||
/(?'XʰABC'...)/utf
|
||||
xyzpq
|
||||
|
||||
/(?'XאABC'...)/utf
|
||||
12345
|
||||
|
||||
/(?'XᾈABC'...)/utf
|
||||
%^&*(...
|
||||
|
||||
/(?'𐨐ABC'...)/utf
|
||||
abcde
|
||||
|
||||
/^(?'אABC'...)(?&אABC)(?P=אABC)/utf
|
||||
123123123456
|
||||
|
||||
/^(?'אABC'...)(?&אABC)/utf
|
||||
123123123456
|
||||
|
||||
# End of testinput4
|
||||
|
|
|
@ -2149,4 +2149,19 @@
|
|||
|
||||
# -------
|
||||
|
||||
# Test reference and errors in non-ASCII characters in group names
|
||||
|
||||
/(?'𑠅ABC'...)/I,utf
|
||||
abcde\=copy=𑠅ABC
|
||||
|
||||
# Bad ones
|
||||
|
||||
/(?'AB၌C'...)\g{AB၌C}/utf
|
||||
|
||||
/(?'٠ABC'...)/utf
|
||||
|
||||
/(?'²ABC'...)/utf
|
||||
|
||||
/(?'X²ABC'...)/utf
|
||||
|
||||
# End of testinput5
|
||||
|
|
|
@ -248,7 +248,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc4
|
||||
Last code unit = \x80
|
||||
|
@ -261,7 +261,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xe1
|
||||
Last code unit = \x80
|
||||
|
@ -274,7 +274,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xf0
|
||||
Last code unit = \x80
|
||||
|
@ -287,7 +287,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xf4
|
||||
Last code unit = \x80
|
||||
|
@ -300,7 +300,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xf4
|
||||
Last code unit = \xbf
|
||||
|
@ -313,7 +313,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc3
|
||||
Last code unit = \xbf
|
||||
|
@ -326,7 +326,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc4
|
||||
Last code unit = \x80
|
||||
|
@ -339,7 +339,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc2
|
||||
Last code unit = \x80
|
||||
|
@ -352,7 +352,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc3
|
||||
Last code unit = \xbf
|
||||
|
@ -365,7 +365,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xed
|
||||
Last code unit = \xb4
|
||||
|
@ -380,7 +380,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xe6
|
||||
Last code unit = \x9e
|
||||
|
@ -395,7 +395,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc2
|
||||
Last code unit = \x80
|
||||
|
@ -408,7 +408,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc2
|
||||
Last code unit = \x84
|
||||
|
@ -421,7 +421,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc4
|
||||
Last code unit = \x84
|
||||
|
@ -434,7 +434,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xe0
|
||||
Last code unit = \xa1
|
||||
|
@ -447,7 +447,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xf0
|
||||
Last code unit = \xab
|
||||
|
@ -460,7 +460,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
|
||||
|
@ -495,7 +495,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc4
|
||||
Last code unit = \x80
|
||||
|
@ -514,7 +514,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: x \xc4
|
||||
Subject length lower bound = 1
|
||||
|
@ -531,7 +531,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: a x \xc4
|
||||
Subject length lower bound = 1
|
||||
|
@ -548,7 +548,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: a x \xc4
|
||||
Subject length lower bound = 1
|
||||
|
@ -566,7 +566,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: x \xc4
|
||||
Subject length lower bound = 1
|
||||
|
@ -578,7 +578,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc4
|
||||
Last code unit = \x80
|
||||
|
@ -592,7 +592,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'a'
|
||||
Last code unit = \x80
|
||||
|
@ -606,7 +606,7 @@ Subject length lower bound = 2
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'a'
|
||||
Last code unit = \x81
|
||||
|
@ -619,7 +619,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/[\x{100}]/IB,utf
|
||||
|
@ -629,7 +629,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc4
|
||||
Last code unit = \x80
|
||||
|
@ -648,7 +648,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc3
|
||||
Last code unit = \xbf
|
||||
|
@ -663,7 +663,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Subject length lower bound = 1
|
||||
|
||||
|
@ -678,14 +678,14 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
First code unit = \xc4
|
||||
Last code unit = 'z'
|
||||
Subject length lower bound = 7
|
||||
|
||||
/\777/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc7
|
||||
Last code unit = \xbf
|
||||
|
@ -703,7 +703,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc4
|
||||
Last code unit = \x80
|
||||
|
@ -717,7 +717,7 @@ Subject length lower bound = 2
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc4
|
||||
Last code unit = 'X'
|
||||
|
@ -761,7 +761,7 @@ No match
|
|||
0: \x{1234}
|
||||
|
||||
/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: <none>
|
||||
Overall options: utf
|
||||
\R matches any Unicode newline
|
||||
|
@ -771,7 +771,7 @@ Last code unit = 'b'
|
|||
Subject length lower bound = 3
|
||||
|
||||
/\h/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x20 \xc2 \xe1 \xe2 \xe3
|
||||
Subject length lower bound = 1
|
||||
|
@ -795,7 +795,7 @@ Subject length lower bound = 1
|
|||
0: \x{3000}
|
||||
|
||||
/\v/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
|
||||
Subject length lower bound = 1
|
||||
|
@ -813,7 +813,7 @@ Subject length lower bound = 1
|
|||
0: \x{2028}
|
||||
|
||||
/\h*A/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x20 A \xc2 \xe1 \xe2 \xe3
|
||||
Last code unit = 'A'
|
||||
|
@ -822,21 +822,21 @@ Subject length lower bound = 1
|
|||
0: A
|
||||
|
||||
/\v+A/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
|
||||
Last code unit = 'A'
|
||||
Subject length lower bound = 2
|
||||
|
||||
/\s?xxx\s/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x
|
||||
Last code unit = 'x'
|
||||
Subject length lower bound = 4
|
||||
|
||||
/\sxxx\s/I,utf,tables=2
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc2
|
||||
Last code unit = 'x'
|
||||
|
@ -847,7 +847,7 @@ Subject length lower bound = 5
|
|||
0: \x{a0}xxx\x{85}
|
||||
|
||||
/\S \S/I,utf,tables=2
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
|
||||
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
|
||||
|
@ -883,25 +883,25 @@ Error -36 (bad UTF-8 offset)
|
|||
No match
|
||||
|
||||
/\x{1234}+/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: \xe1
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\x{1234}+?/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: \xe1
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\x{1234}++/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: \xe1
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\x{1234}{2}/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: \xe1
|
||||
Subject length lower bound = 2
|
||||
|
@ -913,7 +913,7 @@ Subject length lower bound = 2
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Subject length lower bound = 1
|
||||
|
||||
|
@ -925,14 +925,14 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'X'
|
||||
Last code unit = \x80
|
||||
Subject length lower bound = 2
|
||||
|
||||
/\R/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
|
||||
Subject length lower bound = 1
|
||||
|
@ -944,7 +944,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xc7
|
||||
Last code unit = \xbf
|
||||
|
@ -1105,7 +1105,7 @@ Failed: error 174 at offset 0: using UTF is disabled by the application
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = 'A' (caseless)
|
||||
Subject length lower bound = 5
|
||||
|
@ -1117,7 +1117,7 @@ Subject length lower bound = 5
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = \xb0
|
||||
|
@ -1130,7 +1130,7 @@ Subject length lower bound = 5
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = \xb0
|
||||
|
@ -1143,14 +1143,14 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = 'A' (caseless)
|
||||
Last code unit = 'B' (caseless)
|
||||
Subject length lower bound = 3
|
||||
|
||||
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: \xd0 \xd1
|
||||
Subject length lower bound = 17
|
||||
|
@ -1176,17 +1176,17 @@ Subject length lower bound = 17
|
|||
------------------------------------------------------------------
|
||||
|
||||
/\h/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x09 \x20 \xa0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\v/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\R/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85
|
||||
Subject length lower bound = 1
|
||||
|
||||
|
@ -1199,7 +1199,7 @@ Subject length lower bound = 1
|
|||
------------------------------------------------------------------
|
||||
|
||||
/\x{212a}+/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: K k \xe2
|
||||
Subject length lower bound = 1
|
||||
|
@ -1207,7 +1207,7 @@ Subject length lower bound = 1
|
|||
0: KKkk\x{212a}
|
||||
|
||||
/s+/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: S s \xc5
|
||||
Subject length lower bound = 1
|
||||
|
@ -1222,7 +1222,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: A \xc4
|
||||
Last code unit = 'A'
|
||||
|
@ -1239,7 +1239,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4
|
||||
Subject length lower bound = 1
|
||||
|
@ -1251,7 +1251,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: Z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd
|
||||
\xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc
|
||||
|
@ -1273,7 +1273,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9
|
||||
\xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8
|
||||
|
@ -1289,7 +1289,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: - ] a d z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc
|
||||
\xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb
|
||||
|
@ -1314,7 +1314,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: a b \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd
|
||||
\xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc
|
||||
|
@ -1332,7 +1332,7 @@ Subject length lower bound = 7
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc4
|
||||
Subject length lower bound = 1
|
||||
|
@ -1345,7 +1345,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4
|
||||
Subject length lower bound = 1
|
||||
|
@ -1358,7 +1358,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
|
@ -1373,7 +1373,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
|
||||
|
@ -1395,7 +1395,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
|
||||
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
|
||||
|
@ -1416,7 +1416,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
|
||||
|
@ -1435,7 +1435,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce
|
||||
\xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd
|
||||
|
@ -1462,7 +1462,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8
|
||||
\xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7
|
||||
|
@ -1503,7 +1503,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8
|
||||
\xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7
|
||||
|
@ -1520,7 +1520,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: \xce \xcf
|
||||
Last code unit = 'B' (caseless)
|
||||
|
@ -1531,7 +1531,7 @@ Subject length lower bound = 2
|
|||
Failed: error -3: UTF-8 error: 1 byte missing at end
|
||||
|
||||
/(?<=(a)(?-1))x/I,utf
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max lookbehind = 2
|
||||
Options: utf
|
||||
First code unit = 'x'
|
||||
|
@ -1579,7 +1579,7 @@ Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP),
|
|||
# but subjects containing them must not be UTF-checked.
|
||||
|
||||
/\x{d800}/I,utf,allow_surrogate_escapes
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Extra options: allow_surrogate_escapes
|
||||
First code unit = \xed
|
||||
|
@ -1602,7 +1602,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: utf
|
||||
Overall options: anchored utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
|
@ -1636,4 +1636,13 @@ No match
|
|||
4(2) Old 22 22 "" New 28 30 "<>"
|
||||
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
|
||||
|
||||
# Check name length with non-ASCII characters
|
||||
|
||||
/(?'ABáC678901234567890123456789012'...)/utf
|
||||
|
||||
/(?'ABáC6789012345678901234567890123'...)/utf
|
||||
Failed: error 148 at offset 36: subpattern name is too long (maximum 32 code units)
|
||||
|
||||
/(?'ABZC6789012345678901234567890123'...)/utf
|
||||
|
||||
# End of testinput10
|
||||
|
|
|
@ -13,11 +13,11 @@
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\x{100}/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
First code unit = \x{100}
|
||||
Subject length lower bound = 1
|
||||
|
||||
|
@ -215,7 +215,7 @@ Subject length lower bound = 1
|
|||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* # optional trailing comment
|
||||
/Ix
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
|
||||
|
@ -260,7 +260,7 @@ Subject length lower bound = 3
|
|||
------------------------------------------------------------------
|
||||
|
||||
/\h+/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x09 \x20 \xa0 \xff
|
||||
Subject length lower bound = 1
|
||||
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
|
||||
|
@ -275,7 +275,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x09 \x20 \xa0 \xff
|
||||
Subject length lower bound = 1
|
||||
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
|
||||
|
@ -284,7 +284,7 @@ Subject length lower bound = 1
|
|||
0: \x{200a}\xa0\x{2000}
|
||||
|
||||
/\H+/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
|
||||
0: \x{167f}\x{1681}\x{180d}\x{180f}
|
||||
|
@ -306,7 +306,7 @@ Subject length lower bound = 1
|
|||
0: \x9f\xa1\x{2fff}\x{3001}
|
||||
|
||||
/\v+/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Subject length lower bound = 1
|
||||
\x{2027}\x{2030}\x{2028}\x{2029}
|
||||
|
@ -321,7 +321,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Subject length lower bound = 1
|
||||
\x{2027}\x{2030}\x{2028}\x{2029}
|
||||
|
@ -330,7 +330,7 @@ Subject length lower bound = 1
|
|||
0: \x85\x0a\x0b\x0c\x0d
|
||||
|
||||
/\V+/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
\x{2028}\x{2029}\x{2027}\x{2030}
|
||||
0: \x{2027}\x{2030}
|
||||
|
@ -344,7 +344,7 @@ Subject length lower bound = 1
|
|||
0: \x09\x0e\x84\x86
|
||||
|
||||
/\R+/I,bsr=unicode
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
\R matches any Unicode newline
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -354,7 +354,7 @@ Subject length lower bound = 1
|
|||
0: \x85\x0a\x0b\x0c\x0d
|
||||
|
||||
/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
First code unit = \x{d800}
|
||||
Last code unit = \x{dd00}
|
||||
Subject length lower bound = 6
|
||||
|
@ -600,7 +600,7 @@ Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b
|
||||
\x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a
|
||||
\x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9
|
||||
|
@ -624,7 +624,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e
|
||||
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
|
||||
\x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
|
||||
|
|
|
@ -13,11 +13,11 @@
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\x{100}/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
First code unit = \x{100}
|
||||
Subject length lower bound = 1
|
||||
|
||||
|
@ -215,7 +215,7 @@ Subject length lower bound = 1
|
|||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* # optional trailing comment
|
||||
/Ix
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
|
||||
|
@ -260,7 +260,7 @@ Subject length lower bound = 3
|
|||
------------------------------------------------------------------
|
||||
|
||||
/\h+/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x09 \x20 \xa0 \xff
|
||||
Subject length lower bound = 1
|
||||
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
|
||||
|
@ -275,7 +275,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x09 \x20 \xa0 \xff
|
||||
Subject length lower bound = 1
|
||||
\x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000}
|
||||
|
@ -284,7 +284,7 @@ Subject length lower bound = 1
|
|||
0: \x{200a}\xa0\x{2000}
|
||||
|
||||
/\H+/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
\x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f}
|
||||
0: \x{167f}\x{1681}\x{180d}\x{180f}
|
||||
|
@ -306,7 +306,7 @@ Subject length lower bound = 1
|
|||
0: \x9f\xa1\x{2fff}\x{3001}
|
||||
|
||||
/\v+/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Subject length lower bound = 1
|
||||
\x{2027}\x{2030}\x{2028}\x{2029}
|
||||
|
@ -321,7 +321,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Subject length lower bound = 1
|
||||
\x{2027}\x{2030}\x{2028}\x{2029}
|
||||
|
@ -330,7 +330,7 @@ Subject length lower bound = 1
|
|||
0: \x85\x0a\x0b\x0c\x0d
|
||||
|
||||
/\V+/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
\x{2028}\x{2029}\x{2027}\x{2030}
|
||||
0: \x{2027}\x{2030}
|
||||
|
@ -344,7 +344,7 @@ Subject length lower bound = 1
|
|||
0: \x09\x0e\x84\x86
|
||||
|
||||
/\R+/I,bsr=unicode
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
\R matches any Unicode newline
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -354,7 +354,7 @@ Subject length lower bound = 1
|
|||
0: \x85\x0a\x0b\x0c\x0d
|
||||
|
||||
/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
First code unit = \x{d800}
|
||||
Last code unit = \x{dd00}
|
||||
Subject length lower bound = 6
|
||||
|
@ -558,19 +558,19 @@ Failed: error 134 at offset 12: character code point value in \x{} or \o{} is to
|
|||
Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large
|
||||
|
||||
/\x{7fffffff}\x{7fffffff}/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
First code unit = \x{7fffffff}
|
||||
Last code unit = \x{7fffffff}
|
||||
Subject length lower bound = 2
|
||||
|
||||
/\x{80000000}\x{80000000}/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
First code unit = \x{80000000}
|
||||
Last code unit = \x{80000000}
|
||||
Subject length lower bound = 2
|
||||
|
||||
/\x{ffffffff}\x{ffffffff}/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
First code unit = \x{ffffffff}
|
||||
Last code unit = \x{ffffffff}
|
||||
Subject length lower bound = 2
|
||||
|
@ -588,7 +588,7 @@ Subject length lower bound = 2
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless
|
||||
First code unit = \x{400000}
|
||||
Last code unit = \x{800000}
|
||||
|
@ -603,7 +603,7 @@ Subject length lower bound = 2
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b
|
||||
\x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a
|
||||
\x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9
|
||||
|
@ -627,7 +627,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e
|
||||
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
|
||||
\x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
|
||||
|
|
|
@ -18,7 +18,7 @@
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{ffff}
|
||||
Subject length lower bound = 1
|
||||
|
@ -30,7 +30,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{d800}
|
||||
Last code unit = \x{dc00}
|
||||
|
@ -43,7 +43,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Subject length lower bound = 1
|
||||
|
@ -55,7 +55,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{1000}
|
||||
Subject length lower bound = 1
|
||||
|
@ -67,7 +67,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{d800}
|
||||
Last code unit = \x{dc00}
|
||||
|
@ -80,7 +80,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{dbc0}
|
||||
Last code unit = \x{dc00}
|
||||
|
@ -93,7 +93,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{dbff}
|
||||
Last code unit = \x{dfff}
|
||||
|
@ -106,7 +106,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -118,7 +118,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Subject length lower bound = 1
|
||||
|
@ -130,7 +130,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x80
|
||||
Subject length lower bound = 1
|
||||
|
@ -142,7 +142,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -154,7 +154,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{d55c}
|
||||
Last code unit = \x{c5b4}
|
||||
|
@ -169,7 +169,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{65e5}
|
||||
Last code unit = \x{8a9e}
|
||||
|
@ -184,7 +184,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x80
|
||||
Subject length lower bound = 1
|
||||
|
@ -196,7 +196,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x84
|
||||
Subject length lower bound = 1
|
||||
|
@ -208,7 +208,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{104}
|
||||
Subject length lower bound = 1
|
||||
|
@ -220,7 +220,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{861}
|
||||
Subject length lower bound = 1
|
||||
|
@ -232,7 +232,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{d844}
|
||||
Last code unit = \x{deab}
|
||||
|
@ -245,7 +245,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
|
||||
|
@ -281,7 +281,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Last code unit = \x{100}
|
||||
|
@ -300,7 +300,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: x \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -317,7 +317,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: a x \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -334,7 +334,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: a x \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -352,7 +352,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: x \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -364,7 +364,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Subject length lower bound = 1
|
||||
|
@ -377,7 +377,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'a'
|
||||
Last code unit = \x{100}
|
||||
|
@ -391,7 +391,7 @@ Subject length lower bound = 2
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'a'
|
||||
Last code unit = \x{101}
|
||||
|
@ -404,7 +404,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/[\x{100}]/IB,utf
|
||||
|
@ -414,7 +414,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Subject length lower bound = 1
|
||||
|
@ -432,7 +432,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -446,7 +446,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Subject length lower bound = 1
|
||||
|
||||
|
@ -461,14 +461,14 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Last code unit = 'z'
|
||||
Subject length lower bound = 7
|
||||
|
||||
/\777/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{1ff}
|
||||
Subject length lower bound = 1
|
||||
|
@ -485,7 +485,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Last code unit = \x{200}
|
||||
|
@ -499,7 +499,7 @@ Subject length lower bound = 2
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Last code unit = 'X'
|
||||
|
@ -547,7 +547,7 @@ Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
|
|||
0: \x{11234}
|
||||
|
||||
/(*UTF)\x{11234}/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: <none>
|
||||
Overall options: utf
|
||||
First code unit = \x{d804}
|
||||
|
@ -565,7 +565,7 @@ Failed: error 160 at offset 5: (*VERB) not recognized or malformed
|
|||
abcd\x{11234}pqr
|
||||
|
||||
/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: <none>
|
||||
Overall options: utf
|
||||
\R matches any Unicode newline
|
||||
|
@ -578,7 +578,7 @@ Subject length lower bound = 3
|
|||
Failed: error 160 at offset 14: (*VERB) not recognized or malformed
|
||||
|
||||
/\h/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x20 \xa0 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -602,7 +602,7 @@ Subject length lower bound = 1
|
|||
0: \x{3000}
|
||||
|
||||
/\v/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -620,7 +620,7 @@ Subject length lower bound = 1
|
|||
0: \x{2028}
|
||||
|
||||
/\h*A/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x20 A \xa0 \xff
|
||||
Last code unit = 'A'
|
||||
|
@ -631,7 +631,7 @@ Subject length lower bound = 1
|
|||
0: \x{2000}A
|
||||
|
||||
/\R*A/I,bsr=unicode,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
\R matches any Unicode newline
|
||||
Starting code units: \x0a \x0b \x0c \x0d A \x85 \xff
|
||||
|
@ -643,21 +643,21 @@ Subject length lower bound = 1
|
|||
0: \x{2028}A
|
||||
|
||||
/\v+A/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Last code unit = 'A'
|
||||
Subject length lower bound = 2
|
||||
|
||||
/\s?xxx\s/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x
|
||||
Last code unit = 'x'
|
||||
Subject length lower bound = 4
|
||||
|
||||
/\sxxx\s/I,utf,tables=2
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
|
||||
Last code unit = 'x'
|
||||
|
@ -668,7 +668,7 @@ Subject length lower bound = 5
|
|||
0: \x{a0}xxx\x{85}
|
||||
|
||||
/\S \S/I,utf,tables=2
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
|
||||
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
|
||||
|
@ -708,25 +708,25 @@ Failed: error -33: bad offset value
|
|||
Failed: error -33: bad offset value
|
||||
|
||||
/\x{1234}+/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = \x{1234}
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\x{1234}+?/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = \x{1234}
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\x{1234}++/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = \x{1234}
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\x{1234}{2}/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = \x{1234}
|
||||
Last code unit = \x{1234}
|
||||
|
@ -739,7 +739,7 @@ Subject length lower bound = 2
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Subject length lower bound = 1
|
||||
|
||||
|
@ -751,14 +751,14 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'X'
|
||||
Last code unit = \x{200}
|
||||
Subject length lower bound = 2
|
||||
|
||||
/\R/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -936,7 +936,7 @@ Failed: error 174 at offset 0: using UTF is disabled by the application
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = 'A' (caseless)
|
||||
Last code unit = \x{1fb0} (caseless)
|
||||
|
@ -949,7 +949,7 @@ Subject length lower bound = 5
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = \x{1fb0}
|
||||
|
@ -962,7 +962,7 @@ Subject length lower bound = 5
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = \x{1fb0}
|
||||
|
@ -975,14 +975,14 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = 'A' (caseless)
|
||||
Last code unit = \x{1fb0} (caseless)
|
||||
Subject length lower bound = 3
|
||||
|
||||
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = \x{401} (caseless)
|
||||
Last code unit = \x{42f} (caseless)
|
||||
|
@ -1017,7 +1017,7 @@ Subject length lower bound = 17
|
|||
------------------------------------------------------------------
|
||||
|
||||
/\x{212a}+/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: K k \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1025,7 +1025,7 @@ Subject length lower bound = 1
|
|||
0: KKkk\x{212a}
|
||||
|
||||
/s+/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: S s \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1048,7 +1048,7 @@ Failed: error 134 at offset 10: character code point value in \x{} or \o{} is to
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: A \xff
|
||||
Last code unit = 'A'
|
||||
|
@ -1065,7 +1065,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1077,7 +1077,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: Z \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1095,7 +1095,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87
|
||||
\x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96
|
||||
|
@ -1115,7 +1115,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: - ] a d z \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1136,7 +1136,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: a b \xff
|
||||
Last code unit = 'z'
|
||||
|
@ -1150,7 +1150,7 @@ Subject length lower bound = 7
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1163,7 +1163,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1176,7 +1176,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
|
@ -1191,7 +1191,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
|
||||
|
@ -1217,7 +1217,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
|
||||
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
|
||||
|
@ -1243,7 +1243,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
|
||||
|
@ -1266,7 +1266,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1289,7 +1289,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
|
||||
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
|
||||
|
@ -1335,7 +1335,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
|
||||
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
|
||||
|
@ -1357,7 +1357,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: \xff
|
||||
Last code unit = 'B' (caseless)
|
||||
|
@ -1443,7 +1443,7 @@ Failed: error 191 at offset 0: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowe
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: utf
|
||||
Overall options: anchored utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
|
|
|
@ -18,7 +18,7 @@
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{ffff}
|
||||
Subject length lower bound = 1
|
||||
|
@ -30,7 +30,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{10000}
|
||||
Subject length lower bound = 1
|
||||
|
@ -42,7 +42,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Subject length lower bound = 1
|
||||
|
@ -54,7 +54,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{1000}
|
||||
Subject length lower bound = 1
|
||||
|
@ -66,7 +66,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{10000}
|
||||
Subject length lower bound = 1
|
||||
|
@ -78,7 +78,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100000}
|
||||
Subject length lower bound = 1
|
||||
|
@ -90,7 +90,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{10ffff}
|
||||
Subject length lower bound = 1
|
||||
|
@ -102,7 +102,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -114,7 +114,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Subject length lower bound = 1
|
||||
|
@ -126,7 +126,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x80
|
||||
Subject length lower bound = 1
|
||||
|
@ -138,7 +138,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -150,7 +150,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{d55c}
|
||||
Last code unit = \x{c5b4}
|
||||
|
@ -165,7 +165,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{65e5}
|
||||
Last code unit = \x{8a9e}
|
||||
|
@ -180,7 +180,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x80
|
||||
Subject length lower bound = 1
|
||||
|
@ -192,7 +192,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x84
|
||||
Subject length lower bound = 1
|
||||
|
@ -204,7 +204,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{104}
|
||||
Subject length lower bound = 1
|
||||
|
@ -216,7 +216,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{861}
|
||||
Subject length lower bound = 1
|
||||
|
@ -228,7 +228,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{212ab}
|
||||
Subject length lower bound = 1
|
||||
|
@ -240,7 +240,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
|
||||
|
@ -276,7 +276,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Last code unit = \x{100}
|
||||
|
@ -295,7 +295,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: x \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -312,7 +312,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: a x \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -329,7 +329,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: a x \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -347,7 +347,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: x \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -359,7 +359,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Subject length lower bound = 1
|
||||
|
@ -372,7 +372,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'a'
|
||||
Last code unit = \x{100}
|
||||
|
@ -386,7 +386,7 @@ Subject length lower bound = 2
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'a'
|
||||
Last code unit = \x{101}
|
||||
|
@ -399,7 +399,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/[\x{100}]/IB,utf
|
||||
|
@ -409,7 +409,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Subject length lower bound = 1
|
||||
|
@ -427,7 +427,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -441,7 +441,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Subject length lower bound = 1
|
||||
|
||||
|
@ -456,14 +456,14 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Last code unit = 'z'
|
||||
Subject length lower bound = 7
|
||||
|
||||
/\777/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{1ff}
|
||||
Subject length lower bound = 1
|
||||
|
@ -480,7 +480,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Last code unit = \x{200}
|
||||
|
@ -494,7 +494,7 @@ Subject length lower bound = 2
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{100}
|
||||
Last code unit = 'X'
|
||||
|
@ -542,7 +542,7 @@ Failed: error 160 at offset 7: (*VERB) not recognized or malformed
|
|||
abcd\x{11234}pqr
|
||||
|
||||
/(*UTF)\x{11234}/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: <none>
|
||||
Overall options: utf
|
||||
First code unit = \x{11234}
|
||||
|
@ -562,7 +562,7 @@ Failed: error 160 at offset 5: (*VERB) not recognized or malformed
|
|||
Failed: error 160 at offset 14: (*VERB) not recognized or malformed
|
||||
|
||||
/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: <none>
|
||||
Overall options: utf
|
||||
\R matches any Unicode newline
|
||||
|
@ -572,7 +572,7 @@ Last code unit = 'b'
|
|||
Subject length lower bound = 3
|
||||
|
||||
/\h/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x20 \xa0 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -596,7 +596,7 @@ Subject length lower bound = 1
|
|||
0: \x{3000}
|
||||
|
||||
/\v/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -614,7 +614,7 @@ Subject length lower bound = 1
|
|||
0: \x{2028}
|
||||
|
||||
/\h*A/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x20 A \xa0 \xff
|
||||
Last code unit = 'A'
|
||||
|
@ -625,7 +625,7 @@ Subject length lower bound = 1
|
|||
0: \x{2000}A
|
||||
|
||||
/\R*A/I,bsr=unicode,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
\R matches any Unicode newline
|
||||
Starting code units: \x0a \x0b \x0c \x0d A \x85 \xff
|
||||
|
@ -637,21 +637,21 @@ Subject length lower bound = 1
|
|||
0: \x{2028}A
|
||||
|
||||
/\v+A/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Last code unit = 'A'
|
||||
Subject length lower bound = 2
|
||||
|
||||
/\s?xxx\s/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x
|
||||
Last code unit = 'x'
|
||||
Subject length lower bound = 4
|
||||
|
||||
/\sxxx\s/I,utf,tables=2
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
|
||||
Last code unit = 'x'
|
||||
|
@ -662,7 +662,7 @@ Subject length lower bound = 5
|
|||
0: \x{a0}xxx\x{85}
|
||||
|
||||
/\S \S/I,utf,tables=2
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
|
||||
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
|
||||
|
@ -702,25 +702,25 @@ Failed: error -33: bad offset value
|
|||
Failed: error -33: bad offset value
|
||||
|
||||
/\x{1234}+/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = \x{1234}
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\x{1234}+?/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = \x{1234}
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\x{1234}++/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = \x{1234}
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\x{1234}{2}/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = \x{1234}
|
||||
Last code unit = \x{1234}
|
||||
|
@ -733,7 +733,7 @@ Subject length lower bound = 2
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Subject length lower bound = 1
|
||||
|
||||
|
@ -745,14 +745,14 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'X'
|
||||
Last code unit = \x{200}
|
||||
Subject length lower bound = 2
|
||||
|
||||
/\R/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -930,7 +930,7 @@ Failed: error 174 at offset 0: using UTF is disabled by the application
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = 'A' (caseless)
|
||||
Last code unit = \x{1fb0} (caseless)
|
||||
|
@ -943,7 +943,7 @@ Subject length lower bound = 5
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = \x{1fb0}
|
||||
|
@ -956,7 +956,7 @@ Subject length lower bound = 5
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = \x{1fb0}
|
||||
|
@ -969,14 +969,14 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = 'A' (caseless)
|
||||
Last code unit = \x{1fb0} (caseless)
|
||||
Subject length lower bound = 3
|
||||
|
||||
/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = \x{401} (caseless)
|
||||
Last code unit = \x{42f} (caseless)
|
||||
|
@ -1011,7 +1011,7 @@ Subject length lower bound = 17
|
|||
------------------------------------------------------------------
|
||||
|
||||
/\x{212a}+/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: K k \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1019,7 +1019,7 @@ Subject length lower bound = 1
|
|||
0: KKkk\x{212a}
|
||||
|
||||
/s+/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: S s \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1042,7 +1042,7 @@ Failed: error 134 at offset 10: character code point value in \x{} or \o{} is to
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: A \xff
|
||||
Last code unit = 'A'
|
||||
|
@ -1059,7 +1059,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1071,7 +1071,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: Z \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1089,7 +1089,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87
|
||||
\x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96
|
||||
|
@ -1109,7 +1109,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: - ] a d z \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1130,7 +1130,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Starting code units: a b \xff
|
||||
Last code unit = 'z'
|
||||
|
@ -1144,7 +1144,7 @@ Subject length lower bound = 7
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1157,7 +1157,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1170,7 +1170,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
|
@ -1185,7 +1185,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
|
||||
|
@ -1211,7 +1211,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
|
||||
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
|
||||
|
@ -1237,7 +1237,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
|
||||
|
@ -1260,7 +1260,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: \xff
|
||||
Subject length lower bound = 1
|
||||
|
@ -1283,7 +1283,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
|
||||
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
|
||||
|
@ -1329,7 +1329,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86
|
||||
\x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95
|
||||
|
@ -1351,7 +1351,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Starting code units: \xff
|
||||
Last code unit = 'B' (caseless)
|
||||
|
@ -1418,7 +1418,7 @@ No match
|
|||
# errors in 16-bit mode.
|
||||
|
||||
/\x{d800}/I,utf,allow_surrogate_escapes
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Extra options: allow_surrogate_escapes
|
||||
First code unit = \x{d800}
|
||||
|
@ -1440,7 +1440,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: utf
|
||||
Overall options: anchored utf
|
||||
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
# (2) Other tests that must not be run with JIT.
|
||||
|
||||
/(a+)*zz/I
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Starting code units: a z
|
||||
Last code unit = 'z'
|
||||
Subject length lower bound = 2
|
||||
|
@ -24,7 +24,7 @@ Minimum depth limit = 30
|
|||
No match
|
||||
|
||||
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
/* this is a C style comment */\=find_limits
|
||||
|
@ -117,7 +117,7 @@ Failed: error 160 at offset 17: (*VERB) not recognized or malformed
|
|||
Failed: error 160 at offset 24: (*VERB) not recognized or malformed
|
||||
|
||||
/(*LIMIT_DEPTH=4294967280)abc/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Depth limit = 4294967280
|
||||
First code unit = 'a'
|
||||
Last code unit = 'c'
|
||||
|
@ -137,7 +137,7 @@ Failed: error -47: match limit exceeded
|
|||
Failed: error -53: matching depth limit exceeded
|
||||
|
||||
/(*LIMIT_MATCH=3000)(a+)*zz/I
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Match limit = 3000
|
||||
Starting code units: a z
|
||||
Last code unit = 'z'
|
||||
|
@ -150,7 +150,7 @@ Failed: error -47: match limit exceeded
|
|||
Failed: error -47: match limit exceeded
|
||||
|
||||
/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Match limit = 3000
|
||||
Starting code units: a z
|
||||
Last code unit = 'z'
|
||||
|
@ -160,7 +160,7 @@ Subject length lower bound = 2
|
|||
Failed: error -47: match limit exceeded
|
||||
|
||||
/(*LIMIT_MATCH=60000)(a+)*zz/I
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Match limit = 60000
|
||||
Starting code units: a z
|
||||
Last code unit = 'z'
|
||||
|
@ -173,7 +173,7 @@ No match
|
|||
Failed: error -47: match limit exceeded
|
||||
|
||||
/(*LIMIT_DEPTH=10)(a+)*zz/I
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Depth limit = 10
|
||||
Starting code units: a z
|
||||
Last code unit = 'z'
|
||||
|
@ -186,7 +186,7 @@ Failed: error -53: matching depth limit exceeded
|
|||
Failed: error -53: matching depth limit exceeded
|
||||
|
||||
/(*LIMIT_DEPTH=10)(*LIMIT_DEPTH=1000)(a+)*zz/I
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Depth limit = 1000
|
||||
Starting code units: a z
|
||||
Last code unit = 'z'
|
||||
|
@ -196,7 +196,7 @@ Subject length lower bound = 2
|
|||
No match
|
||||
|
||||
/(*LIMIT_DEPTH=1000)(a+)*zz/I
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Depth limit = 1000
|
||||
Starting code units: a z
|
||||
Last code unit = 'z'
|
||||
|
@ -269,14 +269,14 @@ Failed: error -52: nested recursion at the same subject position
|
|||
# when JIT is used.
|
||||
|
||||
/(?R)/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
abcd
|
||||
Failed: error -52: nested recursion at the same subject position
|
||||
|
||||
/(a|(?R))/I
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
abcd
|
||||
|
@ -286,7 +286,7 @@ Subject length lower bound = 0
|
|||
Failed: error -52: nested recursion at the same subject position
|
||||
|
||||
/(ab|(bc|(de|(?R))))/I
|
||||
Capturing subpattern count = 3
|
||||
Capture group count = 3
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
abcd
|
||||
|
@ -296,7 +296,7 @@ Subject length lower bound = 0
|
|||
Failed: error -52: nested recursion at the same subject position
|
||||
|
||||
/(ab|(bc|(de|(?1))))/I
|
||||
Capturing subpattern count = 3
|
||||
Capture group count = 3
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
abcd
|
||||
|
@ -306,7 +306,7 @@ Subject length lower bound = 0
|
|||
Failed: error -52: nested recursion at the same subject position
|
||||
|
||||
/x(ab|(bc|(de|(?1)x)x)x)/I
|
||||
Capturing subpattern count = 3
|
||||
Capture group count = 3
|
||||
First code unit = 'x'
|
||||
Subject length lower bound = 3
|
||||
xab123
|
||||
|
@ -352,7 +352,7 @@ Failed: error -52: nested recursion at the same subject position
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Subject length lower bound = 1
|
||||
abcd
|
||||
Failed: error -52: nested recursion at the same subject position
|
||||
|
@ -367,7 +367,7 @@ Failed: error -52: nested recursion at the same subject position
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: no_auto_possess
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
|
@ -390,7 +390,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: <none>
|
||||
Overall options: no_auto_possess
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
|
|
|
@ -3,14 +3,14 @@
|
|||
# are different without JIT.
|
||||
|
||||
/abc/I,jit,jitverify
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
First code unit = 'a'
|
||||
Last code unit = 'c'
|
||||
Subject length lower bound = 3
|
||||
JIT support is not available in this version of PCRE2
|
||||
|
||||
/a*/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
||||
|
|
File diff suppressed because one or more lines are too long
File diff suppressed because it is too large
Load Diff
|
@ -32,9 +32,9 @@
|
|||
#load testsaved2
|
||||
|
||||
#pop info
|
||||
Capturing subpattern count = 2
|
||||
Capture group count = 2
|
||||
Max back reference = 2
|
||||
Named capturing subpatterns:
|
||||
Named capture groups:
|
||||
n 1
|
||||
n 2
|
||||
Options: dupnames
|
||||
|
@ -66,8 +66,8 @@ No match, mark = A
|
|||
4: A
|
||||
|
||||
#pop info
|
||||
Capturing subpattern count = 4
|
||||
Named capturing subpatterns:
|
||||
Capture group count = 4
|
||||
Named capture groups:
|
||||
ADDR 2
|
||||
ADDRESS_PAT 4
|
||||
NAME 1
|
||||
|
|
|
@ -79,7 +79,7 @@
|
|||
Failed: error 183 at offset 4: using \C is disabled by the application
|
||||
|
||||
/ab\Cde/info
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Contains \C
|
||||
First code unit = 'a'
|
||||
Last code unit = 'e'
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
# in some widths and not in others.
|
||||
|
||||
/ab\Cde/utf,info
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Contains \C
|
||||
Options: utf
|
||||
First code unit = 'a'
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
# in some widths and not in others.
|
||||
|
||||
/ab\Cde/utf,info
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Contains \C
|
||||
Options: utf
|
||||
First code unit = 'a'
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
# in some widths and not in others.
|
||||
|
||||
/ab\Cde/utf,info
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Contains \C
|
||||
Options: utf
|
||||
First code unit = 'a'
|
||||
|
|
|
@ -78,13 +78,13 @@ No match
|
|||
0: école
|
||||
|
||||
/\w/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\w/I,locale=fr_FR
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
|
||||
|
@ -153,7 +153,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
|
||||
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
|
||||
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í
|
||||
|
|
|
@ -78,13 +78,13 @@ No match
|
|||
0: école
|
||||
|
||||
/\w/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\w/I,locale=fr_FR
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
|
||||
|
@ -153,7 +153,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
|
||||
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
|
||||
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í
|
||||
|
|
|
@ -78,13 +78,13 @@ No match
|
|||
0: école
|
||||
|
||||
/\w/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\w/I,locale=fr_FR
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â
|
||||
|
@ -153,7 +153,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
|
||||
a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç
|
||||
È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í
|
||||
|
|
|
@ -3975,4 +3975,41 @@ No match
|
|||
|
||||
# -------
|
||||
|
||||
# Test group names containing non-ASCII letters and digits
|
||||
|
||||
/(?'ABáC'...)\g{ABáC}/utf
|
||||
abcabcdefg
|
||||
0: abcabc
|
||||
1: abc
|
||||
|
||||
/(?'XʰABC'...)/utf
|
||||
xyzpq
|
||||
0: xyz
|
||||
1: xyz
|
||||
|
||||
/(?'XאABC'...)/utf
|
||||
12345
|
||||
0: 123
|
||||
1: 123
|
||||
|
||||
/(?'XᾈABC'...)/utf
|
||||
%^&*(...
|
||||
0: %^&
|
||||
1: %^&
|
||||
|
||||
/(?'𐨐ABC'...)/utf
|
||||
abcde
|
||||
0: abc
|
||||
1: abc
|
||||
|
||||
/^(?'אABC'...)(?&אABC)(?P=אABC)/utf
|
||||
123123123456
|
||||
0: 123123123
|
||||
1: 123
|
||||
|
||||
/^(?'אABC'...)(?&אABC)/utf
|
||||
123123123456
|
||||
0: 123123
|
||||
1: 123
|
||||
|
||||
# End of testinput4
|
||||
|
|
|
@ -147,7 +147,7 @@ Failed: error 173 at offset 9: disallowed Unicode code point (>= 0xd800 && <= 0x
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = '.'
|
||||
|
@ -164,7 +164,7 @@ Subject length lower bound = 4
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Last code unit = 'X'
|
||||
Subject length lower bound = 4
|
||||
|
@ -179,7 +179,7 @@ Subject length lower bound = 4
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Subject length lower bound = 3
|
||||
\x{212ab}\x{212ab}\x{212ab}\x{861}
|
||||
|
@ -193,7 +193,7 @@ Subject length lower bound = 3
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: utf
|
||||
Overall options: anchored utf
|
||||
Starting code units: a b
|
||||
|
@ -238,7 +238,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Options: utf
|
||||
Subject length lower bound = 0
|
||||
|
@ -251,7 +251,7 @@ Subject length lower bound = 0
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'a'
|
||||
Subject length lower bound = 1
|
||||
|
@ -264,7 +264,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'a'
|
||||
Last code unit = 'b'
|
||||
|
@ -291,7 +291,7 @@ No match
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
First code unit = \xff
|
||||
Subject length lower bound = 1
|
||||
>\xff<
|
||||
|
@ -304,7 +304,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/[Ä-Ü]/utf
|
||||
|
@ -343,7 +343,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: utf
|
||||
Last code unit = 'z'
|
||||
Subject length lower bound = 7
|
||||
|
@ -363,7 +363,7 @@ Subject length lower bound = 7
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 2
|
||||
Capture group count = 2
|
||||
May match empty string
|
||||
Options: utf
|
||||
Subject length lower bound = 0
|
||||
|
@ -394,7 +394,7 @@ Subject length lower bound = 0
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 2
|
||||
Capture group count = 2
|
||||
May match empty string
|
||||
Options: utf
|
||||
Subject length lower bound = 0
|
||||
|
@ -414,7 +414,7 @@ Subject length lower bound = 0
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 2
|
||||
Capture group count = 2
|
||||
May match empty string
|
||||
Options: utf
|
||||
Subject length lower bound = 0
|
||||
|
@ -445,7 +445,7 @@ Subject length lower bound = 0
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 2
|
||||
Capture group count = 2
|
||||
May match empty string
|
||||
Options: utf
|
||||
Subject length lower bound = 0
|
||||
|
@ -471,7 +471,7 @@ Subject length lower bound = 0
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Compile options: no_start_optimize utf
|
||||
Overall options: anchored no_start_optimize utf
|
||||
Subject length lower bound = 0
|
||||
|
@ -713,7 +713,7 @@ No match
|
|||
0: \x{1ec5}
|
||||
|
||||
/a\Rb/I,bsr=anycrlf,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
\R matches CR, LF, or CRLF
|
||||
First code unit = 'a'
|
||||
|
@ -732,7 +732,7 @@ No match
|
|||
No match
|
||||
|
||||
/a\Rb/I,bsr=unicode,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
\R matches any Unicode newline
|
||||
First code unit = 'a'
|
||||
|
@ -750,7 +750,7 @@ Subject length lower bound = 3
|
|||
0: a\x{0b}b
|
||||
|
||||
/a\R?b/I,bsr=anycrlf,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
\R matches CR, LF, or CRLF
|
||||
First code unit = 'a'
|
||||
|
@ -769,7 +769,7 @@ No match
|
|||
No match
|
||||
|
||||
/a\R?b/I,bsr=unicode,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
\R matches any Unicode newline
|
||||
First code unit = 'a'
|
||||
|
@ -1408,22 +1408,22 @@ Failed: error 168 at offset 3: \c must be followed by a printable ASCII characte
|
|||
2: \x{0d}
|
||||
|
||||
/[^\x{1234}]+/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Subject length lower bound = 1
|
||||
|
||||
/[^\x{1234}]+?/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Subject length lower bound = 1
|
||||
|
||||
/[^\x{1234}]++/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Subject length lower bound = 1
|
||||
|
||||
/[^\x{1234}]{2}/Ii,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
Subject length lower bound = 2
|
||||
|
||||
|
@ -1703,7 +1703,7 @@ Partial match: \x{0d}\x{0d}
|
|||
------------------------------------------------------------------
|
||||
|
||||
/(?<=\x{1234}\x{1234})\bxy/I,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Max lookbehind = 2
|
||||
Options: utf
|
||||
First code unit = 'x'
|
||||
|
@ -1768,7 +1768,7 @@ Failed: error 173 at offset 6: disallowed Unicode code point (>= 0xd800 && <= 0x
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/[\p{^L}]/IB
|
||||
|
@ -1778,7 +1778,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/[\P{L}]/IB
|
||||
|
@ -1788,7 +1788,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/[\P{^L}]/IB
|
||||
|
@ -1798,7 +1798,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/[abc\p{L}\x{0660}]/IB,utf
|
||||
|
@ -1808,7 +1808,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Subject length lower bound = 1
|
||||
|
||||
|
@ -1819,7 +1819,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Subject length lower bound = 1
|
||||
1234
|
||||
|
@ -1832,7 +1832,7 @@ Subject length lower bound = 1
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
Subject length lower bound = 1
|
||||
1234
|
||||
|
@ -2998,7 +2998,7 @@ Partial match: AA
|
|||
Ket
|
||||
End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless utf
|
||||
First code unit = 'A' (caseless)
|
||||
Last code unit = 'B' (caseless)
|
||||
|
@ -3914,7 +3914,7 @@ No match
|
|||
------------------------------------------------------------------
|
||||
|
||||
/^s?c/Iim,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: caseless multiline utf
|
||||
First code unit at start or follows newline
|
||||
Last code unit = 'c' (caseless)
|
||||
|
@ -4889,4 +4889,31 @@ MK: ABC
|
|||
|
||||
# -------
|
||||
|
||||
# Test reference and errors in non-ASCII characters in group names
|
||||
|
||||
/(?'𑠅ABC'...)/I,utf
|
||||
Capture group count = 1
|
||||
Named capture groups:
|
||||
𑠅ABC 1
|
||||
Options: utf
|
||||
Subject length lower bound = 3
|
||||
abcde\=copy=𑠅ABC
|
||||
0: abc
|
||||
1: abc
|
||||
C abc (3) 𑠅ABC (group 1)
|
||||
|
||||
# Bad ones
|
||||
|
||||
/(?'AB၌C'...)\g{AB၌C}/utf
|
||||
Failed: error 142 at offset 5: syntax error in subpattern name (missing terminator?)
|
||||
|
||||
/(?'٠ABC'...)/utf
|
||||
Failed: error 144 at offset 3: subpattern name must start with a non-digit
|
||||
|
||||
/(?'²ABC'...)/utf
|
||||
Failed: error 162 at offset 3: subpattern name expected
|
||||
|
||||
/(?'X²ABC'...)/utf
|
||||
Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?)
|
||||
|
||||
# End of testinput5
|
||||
|
|
|
@ -5978,7 +5978,7 @@ Partial match: 123
|
|||
0: Content-Type:xxxyyyz
|
||||
|
||||
/^abc/Im,newline=lf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: multiline
|
||||
Forced newline is LF
|
||||
First code unit at start or follows newline
|
||||
|
@ -6001,7 +6001,7 @@ No match
|
|||
No match
|
||||
|
||||
/^abc/Im,newline=crlf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: multiline
|
||||
Forced newline is CRLF
|
||||
First code unit at start or follows newline
|
||||
|
@ -6016,7 +6016,7 @@ No match
|
|||
No match
|
||||
|
||||
/^abc/Im,newline=cr
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: multiline
|
||||
Forced newline is CR
|
||||
First code unit at start or follows newline
|
||||
|
@ -6031,7 +6031,7 @@ No match
|
|||
No match
|
||||
|
||||
/.*/I,newline=lf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Forced newline is LF
|
||||
First code unit at start or follows newline
|
||||
|
@ -6044,7 +6044,7 @@ Subject length lower bound = 0
|
|||
0: abc\x0d
|
||||
|
||||
/.*/I,newline=cr
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Forced newline is CR
|
||||
First code unit at start or follows newline
|
||||
|
@ -6057,7 +6057,7 @@ Subject length lower bound = 0
|
|||
0: abc
|
||||
|
||||
/.*/I,newline=crlf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Forced newline is CRLF
|
||||
First code unit at start or follows newline
|
||||
|
@ -6070,7 +6070,7 @@ Subject length lower bound = 0
|
|||
0: abc
|
||||
|
||||
/\w+(.)(.)?def/Is
|
||||
Capturing subpattern count = 2
|
||||
Capture group count = 2
|
||||
Options: dotall
|
||||
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
|
||||
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
||||
|
@ -6447,7 +6447,7 @@ No match
|
|||
0: \x0aA
|
||||
|
||||
/a\Rb/I,bsr=anycrlf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
\R matches CR, LF, or CRLF
|
||||
First code unit = 'a'
|
||||
Last code unit = 'b'
|
||||
|
@ -6465,7 +6465,7 @@ No match
|
|||
No match
|
||||
|
||||
/a\Rb/I,bsr=unicode
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
\R matches any Unicode newline
|
||||
First code unit = 'a'
|
||||
Last code unit = 'b'
|
||||
|
@ -6482,7 +6482,7 @@ Subject length lower bound = 3
|
|||
0: a\x0bb
|
||||
|
||||
/a\R?b/I,bsr=anycrlf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
\R matches CR, LF, or CRLF
|
||||
First code unit = 'a'
|
||||
Last code unit = 'b'
|
||||
|
@ -6500,7 +6500,7 @@ No match
|
|||
No match
|
||||
|
||||
/a\R?b/I,bsr=unicode
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
\R matches any Unicode newline
|
||||
First code unit = 'a'
|
||||
Last code unit = 'b'
|
||||
|
@ -6517,7 +6517,7 @@ Subject length lower bound = 2
|
|||
0: a\x0bb
|
||||
|
||||
/a\R{2,4}b/I,bsr=anycrlf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
\R matches CR, LF, or CRLF
|
||||
First code unit = 'a'
|
||||
Last code unit = 'b'
|
||||
|
@ -6535,7 +6535,7 @@ No match
|
|||
No match
|
||||
|
||||
/a\R{2,4}b/I,bsr=unicode
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
\R matches any Unicode newline
|
||||
First code unit = 'a'
|
||||
Last code unit = 'b'
|
||||
|
@ -6831,7 +6831,7 @@ Partial match: +ab
|
|||
0+ CBA
|
||||
|
||||
/(abc|def|xyz)/I
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Starting code units: a d x
|
||||
Subject length lower bound = 3
|
||||
terhjk;abcdaadsfe
|
||||
|
@ -6843,7 +6843,7 @@ Subject length lower bound = 3
|
|||
No match
|
||||
|
||||
/(abc|def|xyz)/I,no_start_optimize
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Options: no_start_optimize
|
||||
Subject length lower bound = 0
|
||||
terhjk;abcdaadsfe
|
||||
|
|
|
@ -1030,7 +1030,7 @@ No match
|
|||
No match
|
||||
|
||||
/a\Rb/I,bsr=anycrlf,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
\R matches CR, LF, or CRLF
|
||||
First code unit = 'a'
|
||||
|
@ -1049,7 +1049,7 @@ No match
|
|||
No match
|
||||
|
||||
/a\Rb/I,bsr=unicode,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
\R matches any Unicode newline
|
||||
First code unit = 'a'
|
||||
|
@ -1067,7 +1067,7 @@ Subject length lower bound = 3
|
|||
0: a\x{0b}b
|
||||
|
||||
/a\R?b/I,bsr=anycrlf,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
\R matches CR, LF, or CRLF
|
||||
First code unit = 'a'
|
||||
|
@ -1086,7 +1086,7 @@ No match
|
|||
No match
|
||||
|
||||
/a\R?b/I,bsr=unicode,utf
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
\R matches any Unicode newline
|
||||
First code unit = 'a'
|
||||
|
|
|
@ -67,7 +67,7 @@ Memory allocation (code space): 10
|
|||
2 2 Ket
|
||||
4 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Options: extended
|
||||
Subject length lower bound = 0
|
||||
|
@ -80,7 +80,7 @@ Memory allocation (code space): 14
|
|||
4 4 Ket
|
||||
6 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: extended
|
||||
First code unit = 'a'
|
||||
Subject length lower bound = 1
|
||||
|
@ -376,7 +376,7 @@ Memory allocation (code space): 26
|
|||
10 10 Ket
|
||||
12 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = '.'
|
||||
|
@ -390,7 +390,7 @@ Memory allocation (code space): 22
|
|||
8 8 Ket
|
||||
10 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{d55c}
|
||||
Last code unit = \x{c5b4}
|
||||
|
@ -404,7 +404,7 @@ Memory allocation (code space): 22
|
|||
8 8 Ket
|
||||
10 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{65e5}
|
||||
Last code unit = \x{8a9e}
|
||||
|
@ -904,7 +904,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
|
|||
79 79 Ket
|
||||
81 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -938,7 +938,7 @@ Subject length lower bound = 0
|
|||
43 43 Ket
|
||||
45 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -1011,7 +1011,7 @@ No match
|
|||
133 133 Ket
|
||||
135 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 10
|
||||
Capture group count = 10
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
||||
|
|
|
@ -67,7 +67,7 @@ Memory allocation (code space): 14
|
|||
3 3 Ket
|
||||
6 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Options: extended
|
||||
Subject length lower bound = 0
|
||||
|
@ -80,7 +80,7 @@ Memory allocation (code space): 18
|
|||
5 5 Ket
|
||||
8 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: extended
|
||||
First code unit = 'a'
|
||||
Subject length lower bound = 1
|
||||
|
@ -376,7 +376,7 @@ Memory allocation (code space): 30
|
|||
11 11 Ket
|
||||
14 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = '.'
|
||||
|
@ -390,7 +390,7 @@ Memory allocation (code space): 26
|
|||
9 9 Ket
|
||||
12 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{d55c}
|
||||
Last code unit = \x{c5b4}
|
||||
|
@ -404,7 +404,7 @@ Memory allocation (code space): 26
|
|||
9 9 Ket
|
||||
12 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{65e5}
|
||||
Last code unit = \x{8a9e}
|
||||
|
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
|
|||
110 110 Ket
|
||||
113 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -937,7 +937,7 @@ Subject length lower bound = 0
|
|||
58 58 Ket
|
||||
61 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -1010,7 +1010,7 @@ No match
|
|||
194 194 Ket
|
||||
197 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 10
|
||||
Capture group count = 10
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
||||
|
|
|
@ -67,7 +67,7 @@ Memory allocation (code space): 14
|
|||
3 3 Ket
|
||||
6 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Options: extended
|
||||
Subject length lower bound = 0
|
||||
|
@ -80,7 +80,7 @@ Memory allocation (code space): 18
|
|||
5 5 Ket
|
||||
8 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: extended
|
||||
First code unit = 'a'
|
||||
Subject length lower bound = 1
|
||||
|
@ -376,7 +376,7 @@ Memory allocation (code space): 30
|
|||
11 11 Ket
|
||||
14 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = '.'
|
||||
|
@ -390,7 +390,7 @@ Memory allocation (code space): 26
|
|||
9 9 Ket
|
||||
12 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{d55c}
|
||||
Last code unit = \x{c5b4}
|
||||
|
@ -404,7 +404,7 @@ Memory allocation (code space): 26
|
|||
9 9 Ket
|
||||
12 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{65e5}
|
||||
Last code unit = \x{8a9e}
|
||||
|
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
|
|||
110 110 Ket
|
||||
113 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -937,7 +937,7 @@ Subject length lower bound = 0
|
|||
58 58 Ket
|
||||
61 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -1010,7 +1010,7 @@ No match
|
|||
194 194 Ket
|
||||
197 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 10
|
||||
Capture group count = 10
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
||||
|
|
|
@ -67,7 +67,7 @@ Memory allocation (code space): 20
|
|||
2 2 Ket
|
||||
4 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Options: extended
|
||||
Subject length lower bound = 0
|
||||
|
@ -80,7 +80,7 @@ Memory allocation (code space): 28
|
|||
4 4 Ket
|
||||
6 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: extended
|
||||
First code unit = 'a'
|
||||
Subject length lower bound = 1
|
||||
|
@ -376,7 +376,7 @@ Memory allocation (code space): 52
|
|||
10 10 Ket
|
||||
12 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = '.'
|
||||
|
@ -390,7 +390,7 @@ Memory allocation (code space): 44
|
|||
8 8 Ket
|
||||
10 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{d55c}
|
||||
Last code unit = \x{c5b4}
|
||||
|
@ -404,7 +404,7 @@ Memory allocation (code space): 44
|
|||
8 8 Ket
|
||||
10 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{65e5}
|
||||
Last code unit = \x{8a9e}
|
||||
|
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
|
|||
79 79 Ket
|
||||
81 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -937,7 +937,7 @@ Subject length lower bound = 0
|
|||
43 43 Ket
|
||||
45 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -1010,7 +1010,7 @@ No match
|
|||
133 133 Ket
|
||||
135 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 10
|
||||
Capture group count = 10
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
||||
|
|
|
@ -67,7 +67,7 @@ Memory allocation (code space): 20
|
|||
2 2 Ket
|
||||
4 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Options: extended
|
||||
Subject length lower bound = 0
|
||||
|
@ -80,7 +80,7 @@ Memory allocation (code space): 28
|
|||
4 4 Ket
|
||||
6 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: extended
|
||||
First code unit = 'a'
|
||||
Subject length lower bound = 1
|
||||
|
@ -376,7 +376,7 @@ Memory allocation (code space): 52
|
|||
10 10 Ket
|
||||
12 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = '.'
|
||||
|
@ -390,7 +390,7 @@ Memory allocation (code space): 44
|
|||
8 8 Ket
|
||||
10 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{d55c}
|
||||
Last code unit = \x{c5b4}
|
||||
|
@ -404,7 +404,7 @@ Memory allocation (code space): 44
|
|||
8 8 Ket
|
||||
10 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{65e5}
|
||||
Last code unit = \x{8a9e}
|
||||
|
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
|
|||
79 79 Ket
|
||||
81 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -937,7 +937,7 @@ Subject length lower bound = 0
|
|||
43 43 Ket
|
||||
45 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -1010,7 +1010,7 @@ No match
|
|||
133 133 Ket
|
||||
135 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 10
|
||||
Capture group count = 10
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
||||
|
|
|
@ -67,7 +67,7 @@ Memory allocation (code space): 20
|
|||
2 2 Ket
|
||||
4 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Options: extended
|
||||
Subject length lower bound = 0
|
||||
|
@ -80,7 +80,7 @@ Memory allocation (code space): 28
|
|||
4 4 Ket
|
||||
6 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: extended
|
||||
First code unit = 'a'
|
||||
Subject length lower bound = 1
|
||||
|
@ -376,7 +376,7 @@ Memory allocation (code space): 52
|
|||
10 10 Ket
|
||||
12 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = '.'
|
||||
|
@ -390,7 +390,7 @@ Memory allocation (code space): 44
|
|||
8 8 Ket
|
||||
10 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{d55c}
|
||||
Last code unit = \x{c5b4}
|
||||
|
@ -404,7 +404,7 @@ Memory allocation (code space): 44
|
|||
8 8 Ket
|
||||
10 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \x{65e5}
|
||||
Last code unit = \x{8a9e}
|
||||
|
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
|
|||
79 79 Ket
|
||||
81 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -937,7 +937,7 @@ Subject length lower bound = 0
|
|||
43 43 Ket
|
||||
45 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -1010,7 +1010,7 @@ No match
|
|||
133 133 Ket
|
||||
135 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 10
|
||||
Capture group count = 10
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
||||
|
|
|
@ -67,7 +67,7 @@ Memory allocation (code space): 7
|
|||
3 3 Ket
|
||||
6 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Options: extended
|
||||
Subject length lower bound = 0
|
||||
|
@ -80,7 +80,7 @@ Memory allocation (code space): 9
|
|||
5 5 Ket
|
||||
8 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: extended
|
||||
First code unit = 'a'
|
||||
Subject length lower bound = 1
|
||||
|
@ -376,7 +376,7 @@ Memory allocation (code space): 18
|
|||
14 14 Ket
|
||||
17 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = '.'
|
||||
|
@ -390,7 +390,7 @@ Memory allocation (code space): 19
|
|||
15 15 Ket
|
||||
18 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xed
|
||||
Last code unit = \xb4
|
||||
|
@ -404,7 +404,7 @@ Memory allocation (code space): 19
|
|||
15 15 Ket
|
||||
18 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xe6
|
||||
Last code unit = \x9e
|
||||
|
@ -904,7 +904,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
|
|||
119 119 Ket
|
||||
122 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -938,7 +938,7 @@ Subject length lower bound = 0
|
|||
61 61 Ket
|
||||
64 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -1011,7 +1011,7 @@ No match
|
|||
205 205 Ket
|
||||
208 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 10
|
||||
Capture group count = 10
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
||||
|
|
|
@ -67,7 +67,7 @@ Memory allocation (code space): 9
|
|||
4 4 Ket
|
||||
8 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Options: extended
|
||||
Subject length lower bound = 0
|
||||
|
@ -80,7 +80,7 @@ Memory allocation (code space): 11
|
|||
6 6 Ket
|
||||
10 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: extended
|
||||
First code unit = 'a'
|
||||
Subject length lower bound = 1
|
||||
|
@ -376,7 +376,7 @@ Memory allocation (code space): 20
|
|||
15 15 Ket
|
||||
19 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = '.'
|
||||
|
@ -390,7 +390,7 @@ Memory allocation (code space): 21
|
|||
16 16 Ket
|
||||
20 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xed
|
||||
Last code unit = \xb4
|
||||
|
@ -404,7 +404,7 @@ Memory allocation (code space): 21
|
|||
16 16 Ket
|
||||
20 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xe6
|
||||
Last code unit = \x9e
|
||||
|
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
|
|||
150 150 Ket
|
||||
154 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -937,7 +937,7 @@ Subject length lower bound = 0
|
|||
76 76 Ket
|
||||
80 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -1010,7 +1010,7 @@ No match
|
|||
266 266 Ket
|
||||
270 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 10
|
||||
Capture group count = 10
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
||||
|
|
|
@ -67,7 +67,7 @@ Memory allocation (code space): 11
|
|||
5 5 Ket
|
||||
10 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
May match empty string
|
||||
Options: extended
|
||||
Subject length lower bound = 0
|
||||
|
@ -80,7 +80,7 @@ Memory allocation (code space): 13
|
|||
7 7 Ket
|
||||
12 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: extended
|
||||
First code unit = 'a'
|
||||
Subject length lower bound = 1
|
||||
|
@ -376,7 +376,7 @@ Memory allocation (code space): 22
|
|||
16 16 Ket
|
||||
21 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = 'A'
|
||||
Last code unit = '.'
|
||||
|
@ -390,7 +390,7 @@ Memory allocation (code space): 23
|
|||
17 17 Ket
|
||||
22 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xed
|
||||
Last code unit = \xb4
|
||||
|
@ -404,7 +404,7 @@ Memory allocation (code space): 23
|
|||
17 17 Ket
|
||||
22 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Options: utf
|
||||
First code unit = \xe6
|
||||
Last code unit = \x9e
|
||||
|
@ -903,7 +903,7 @@ Failed: error 186 at offset 12820: regular expression is too complicated
|
|||
181 181 Ket
|
||||
186 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -937,7 +937,7 @@ Subject length lower bound = 0
|
|||
91 91 Ket
|
||||
96 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 1
|
||||
Capture group count = 1
|
||||
Max back reference = 1
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
@ -1010,7 +1010,7 @@ No match
|
|||
327 327 Ket
|
||||
332 End
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 10
|
||||
Capture group count = 10
|
||||
May match empty string
|
||||
Subject length lower bound = 0
|
||||
|
||||
|
|
|
@ -215,7 +215,7 @@ Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too
|
|||
(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )*
|
||||
\) )* # optional trailing comment
|
||||
/Ix
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Contains explicit CR or LF match
|
||||
Options: extended
|
||||
Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
|
||||
|
@ -224,25 +224,25 @@ Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8
|
|||
Subject length lower bound = 3
|
||||
|
||||
/\h/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x09 \x20 \xa0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\H/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\v/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\V/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Subject length lower bound = 1
|
||||
|
||||
/\R/I
|
||||
Capturing subpattern count = 0
|
||||
Capture group count = 0
|
||||
Starting code units: \x0a \x0b \x0c \x0d \x85
|
||||
Subject length lower bound = 1
|
||||
|
||||
|
|
Loading…
Reference in New Issue