Major refactoring of pcre2_compile.c; see ChangeLog and HACKING.
This commit is contained in:
parent
dda1e79060
commit
99264dfc23
40
ChangeLog
40
ChangeLog
|
@ -14,6 +14,46 @@ cause all characters greater than 255 to match, whatever else is in the class.
|
||||||
There was a bug that caused this not to happen if a Unicode property item was
|
There was a bug that caused this not to happen if a Unicode property item was
|
||||||
added to such a class, for example [\D\P{Nd}] or [\W\pL].
|
added to such a class, for example [\D\P{Nd}] or [\W\pL].
|
||||||
|
|
||||||
|
3. There has been a major re-factoring of the pcre2_compile.c file. Most syntax
|
||||||
|
checking is now done in the pre-pass that identifies capturing groups. This has
|
||||||
|
reduced the amount of duplication and made the code tidier. While doing this,
|
||||||
|
some minor bugs and Perl incompatibilities were fixed, including:
|
||||||
|
|
||||||
|
(a) \Q\E in the middle of a quantifier such as A+\Q\E+ is now ignored instead
|
||||||
|
of giving an invalid quantifier error.
|
||||||
|
(b) {0} can now be used after a group in a lookbehind assertion; previously
|
||||||
|
this caused an "assertion is not fixed length" error.
|
||||||
|
(c) Perl always treats (?(DEFINE) as a "define" group, even if a group with
|
||||||
|
the name "DEFINE" exists. PCRE2 now does likewise.
|
||||||
|
(d) A recursion condition test such as (?(R2)...) must now refer to an
|
||||||
|
existing subpattern.
|
||||||
|
|
||||||
|
One effect of the refactoring is that some error numbers and messages have
|
||||||
|
changed, and the pattern offset given for compiling errors is not always the
|
||||||
|
right-most character that has been read. In particular, for a variable-length
|
||||||
|
lookbehind assertion it now points to the start of the assertion. Another
|
||||||
|
change is that when a callout appears before a group, the "length of next
|
||||||
|
pattern item" that is passed now just gives the length of the opening
|
||||||
|
parenthesis item, not the length of the whole group. A length of zero is now
|
||||||
|
given only for a callout at the end of the pattern. Automatic callouts are no
|
||||||
|
longer inserted before and after explicit callouts in the pattern.
|
||||||
|
|
||||||
|
4. Back references are now permitted in lookbehind assertions when there are
|
||||||
|
no duplicated group numbers (that is, (?| has not been used), and, if the
|
||||||
|
reference is by name, there is only one group of that name. The referenced
|
||||||
|
group must, of course be of fixed length.
|
||||||
|
|
||||||
|
5. pcre2test has been upgraded so that, when run under valgrind with valgrind
|
||||||
|
support enabled, reading past the end of the pattern is detected, both when
|
||||||
|
compiling and during callout processing.
|
||||||
|
|
||||||
|
6. \g{+<number>} (e.g. \g{+2)} ) is now supported. It is a "forward back
|
||||||
|
reference" and can be useful in repetitions (compare \g{-<number>}). Perl does
|
||||||
|
not recognize this syntax.
|
||||||
|
|
||||||
|
7. Automatic callouts are no longer generated before and after callouts in the
|
||||||
|
pattern.
|
||||||
|
|
||||||
|
|
||||||
Version 10.22 29-July-2016
|
Version 10.22 29-July-2016
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
241
HACKING
241
HACKING
|
@ -7,8 +7,8 @@ but with a revised (and incompatible) API. To avoid confusion, the original
|
||||||
library is referred to as PCRE1 below. For information about testing PCRE2, see
|
library is referred to as PCRE1 below. For information about testing PCRE2, see
|
||||||
the pcre2test documentation and the comment at the head of the RunTest file.
|
the pcre2test documentation and the comment at the head of the RunTest file.
|
||||||
|
|
||||||
PCRE1 releases were up to 8.3x when PCRE2 was developed. The 8.xx series will
|
PCRE1 releases were up to 8.3x when PCRE2 was developed, and later bug fix
|
||||||
continue for bugfixes if necessary. PCRE2 releases started at 10.00 to avoid
|
releases remain in the 8.xx series. PCRE2 releases started at 10.00 to avoid
|
||||||
confusion with PCRE1.
|
confusion with PCRE1.
|
||||||
|
|
||||||
|
|
||||||
|
@ -16,19 +16,20 @@ Historical note 1
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
Many years ago I implemented some regular expression functions to an algorithm
|
Many years ago I implemented some regular expression functions to an algorithm
|
||||||
suggested by Martin Richards. These were not Unix-like in form, and were quite
|
suggested by Martin Richards. The rather simple patterns were not Unix-like in
|
||||||
restricted in what they could do by comparison with Perl. The interesting part
|
form, and were quite restricted in what they could do by comparison with Perl.
|
||||||
about the algorithm was that the amount of space required to hold the compiled
|
The interesting part about the algorithm was that the amount of space required
|
||||||
form of an expression was known in advance. The code to apply an expression did
|
to hold the compiled form of an expression was known in advance. The code to
|
||||||
not operate by backtracking, as the original Henry Spencer code and current
|
apply an expression did not operate by backtracking, as the original Henry
|
||||||
PCRE2 and Perl code does, but instead checked all possibilities simultaneously
|
Spencer code and current PCRE2 and Perl code does, but instead checked all
|
||||||
by keeping a list of current states and checking all of them as it advanced
|
possibilities simultaneously by keeping a list of current states and checking
|
||||||
through the subject string. In the terminology of Jeffrey Friedl's book, it was
|
all of them as it advanced through the subject string. In the terminology of
|
||||||
a "DFA algorithm", though it was not a traditional Finite State Machine (FSM).
|
Jeffrey Friedl's book, it was a "DFA algorithm", though it was not a
|
||||||
When the pattern was all used up, all remaining states were possible matches,
|
traditional Finite State Machine (FSM). When the pattern was all used up, all
|
||||||
and the one matching the longest subset of the subject string was chosen. This
|
remaining states were possible matches, and the one matching the longest subset
|
||||||
did not necessarily maximize the individual wild portions of the pattern, as is
|
of the subject string was chosen. This did not necessarily maximize the
|
||||||
expected in Unix and Perl-style regular expressions.
|
individual wild portions of the pattern, as is expected in Unix and Perl-style
|
||||||
|
regular expressions.
|
||||||
|
|
||||||
|
|
||||||
Historical note 2
|
Historical note 2
|
||||||
|
@ -85,7 +86,7 @@ had become very complicated and hard to maintain. Indeed one of the early
|
||||||
things I did for 6.8 was to fix Yet Another Bug in the memory computation. Then
|
things I did for 6.8 was to fix Yet Another Bug in the memory computation. Then
|
||||||
I had a flash of inspiration as to how I could run the real compile function in
|
I had a flash of inspiration as to how I could run the real compile function in
|
||||||
a "fake" mode that enables it to compute how much memory it would need, while
|
a "fake" mode that enables it to compute how much memory it would need, while
|
||||||
actually only ever using a few hundred bytes of working memory, and without too
|
in most cases only ever using a small amount of working memory, and without too
|
||||||
many tests of the mode that might slow it down. So I refactored the compiling
|
many tests of the mode that might slow it down. So I refactored the compiling
|
||||||
functions to work this way. This got rid of about 600 lines of source. It
|
functions to work this way. This got rid of about 600 lines of source. It
|
||||||
should make future maintenance and development easier. As this was such a major
|
should make future maintenance and development easier. As this was such a major
|
||||||
|
@ -104,20 +105,204 @@ system stack used by the compile function, which uses recursive function calls
|
||||||
for nested parenthesized groups. This is a safety feature for environments with
|
for nested parenthesized groups. This is a safety feature for environments with
|
||||||
small stacks where the patterns are provided by users.
|
small stacks where the patterns are provided by users.
|
||||||
|
|
||||||
History repeated itself for release 10.20. A number of bugs relating to named
|
|
||||||
subpatterns had been discovered by fuzzers. Most of these were related to the
|
Yet another pattern scan
|
||||||
handling of forward references when it was not known if the named pattern was
|
------------------------
|
||||||
|
|
||||||
|
History repeated itself for PCRE2 release 10.20. A number of bugs relating to
|
||||||
|
named subpatterns had been discovered by fuzzers. Most of these were related to
|
||||||
|
the handling of forward references when it was not known if the named group was
|
||||||
unique. (References to non-unique names use a different opcode and more
|
unique. (References to non-unique names use a different opcode and more
|
||||||
memory.) The use of duplicate group numbers (the (?| facility) also caused
|
memory.) The use of duplicate group numbers (the (?| facility) also caused
|
||||||
issues.
|
issues.
|
||||||
|
|
||||||
To get around these problems I adopted a new approach by adding a third pass,
|
To get around these problems I adopted a new approach by adding a third pass
|
||||||
really a "pre-pass", over the pattern, which does nothing other than identify
|
over the pattern (really a "pre-pass"), which did nothing other than identify
|
||||||
all the named subpatterns and their corresponding group numbers. This means
|
all the named subpatterns and their corresponding group numbers. This means
|
||||||
that the actual compile (both pre-pass and real compile) have full knowledge of
|
that the actual compile (both the memory-computing dummy run and the real
|
||||||
group names and numbers throughout. Several dozen lines of messy code were
|
compile) has full knowledge of group names and numbers throughout. Several
|
||||||
eliminated, though the new pre-pass is not short (skipping over [] classes is
|
dozen lines of messy code were eliminated, though the new pre-pass was not
|
||||||
complicated).
|
short. In particular, parsing and skipping over [] classes is complicated.
|
||||||
|
|
||||||
|
While working on 10.22 I realized that I could simplify yet again by moving
|
||||||
|
more of the parsing into the pre-pass, thus avoiding doing it in two places, so
|
||||||
|
after 10.22 was released, the code underwent yet another big refactoring. This
|
||||||
|
is how it is from 10.23 onwards:
|
||||||
|
|
||||||
|
The function called parse_regex() scans the pattern characters, parsing them
|
||||||
|
into literal data and meta characters. It converts escapes such as \x{123}
|
||||||
|
into literals, handles \Q...\E, and skips over comments and non-significant
|
||||||
|
white space. The result of the scanning is put into a vector of 32-bit unsigned
|
||||||
|
integers. Values less than 0x80000000 are literal data. Higher values represent
|
||||||
|
meta-characters. The top 16-bits of such values identify the meta-character,
|
||||||
|
and these are given names such as META_CAPTURE. The lower 16-bits are available
|
||||||
|
for data, for example, the capturing group number. The only situation in which
|
||||||
|
literal data values greater than 0x7fffffff can appear is when the 32-bit
|
||||||
|
library is running in non-UTF mode. This is handled by having a special
|
||||||
|
meta-character that is followed by the 32-bit data value.
|
||||||
|
|
||||||
|
The size of the parsed pattern vector, when auto-callouts are not enabled, is
|
||||||
|
bounded by the length of the pattern (with one exception). The code is written
|
||||||
|
so that each item in the pattern uses no more vector elements than the number
|
||||||
|
of code units in the item itself. The exception is the aforementioned large
|
||||||
|
32-bit number handling. For this reason, 32-bit non-UTF patterns are scanned in
|
||||||
|
advance to check for such values. When auto-callouts are enabled, the generous
|
||||||
|
assumption is made that there will be a callout for each pattern code unit
|
||||||
|
(which of course is only actually true if all code units are literals) plus one
|
||||||
|
at the end. There is a default parsed pattern vector on the stack, but if this
|
||||||
|
is not big enough, heap memory is used.
|
||||||
|
|
||||||
|
As before, the actual compiling function is run twice, the first time to
|
||||||
|
determine the amount of memory needed for the final compiled pattern. It
|
||||||
|
now processes the parsed pattern vector, not the pattern itself, although some
|
||||||
|
of the parsed items refer to strings in the pattern - for example, group
|
||||||
|
names. As escapes and comments have already been processed, the code is a bit
|
||||||
|
simpler than before.
|
||||||
|
|
||||||
|
Most errors can be diagnosed during the parsing scan. For those that cannot
|
||||||
|
(for example, "lookbehind assertion is not fixed length"), the parsed code
|
||||||
|
contains offsets into the pattern so that the actual compiling code can
|
||||||
|
identify where errors occur.
|
||||||
|
|
||||||
|
|
||||||
|
The elements of the parsed pattern vector
|
||||||
|
-----------------------------------------
|
||||||
|
|
||||||
|
The word "offset" below means a code unit offset into the pattern. When
|
||||||
|
PCRE2_SIZE (which is usually size_t) is no bigger than uint32_t, an offset is
|
||||||
|
stored in a single parsed pattern element. Otherwise (typically on 64-bit
|
||||||
|
systems) it occupies two elements. The following meta items occupy just one
|
||||||
|
element, with no data:
|
||||||
|
|
||||||
|
META_ACCEPT (*ACCEPT)
|
||||||
|
META_ALT | alternation
|
||||||
|
META_ASTERISK *
|
||||||
|
META_ASTERISK_PLUS *+
|
||||||
|
META_ASTERISK_QUERY *?
|
||||||
|
META_ATOMIC (?> start of atomic group
|
||||||
|
META_CIRCUMFLEX ^ metacharacter
|
||||||
|
META_CLASS [ start of non-empty class
|
||||||
|
META_CLASS_EMPTY [] empty class - only with PCRE2_ALLOW_EMPTY_CLASS
|
||||||
|
META_CLASS_EMPTY_NOT [^] negative empty class - ditto
|
||||||
|
META_CLASS_END ] end of non-empty class
|
||||||
|
META_CLASS_NOT [^ start non-empty negative class
|
||||||
|
META_COMMIT (*COMMIT)
|
||||||
|
META_DOLLAR $ metacharacter
|
||||||
|
META_DOT . metacharacter
|
||||||
|
META_END End of pattern (this value is 0x80000000)
|
||||||
|
META_FAIL (*FAIL)
|
||||||
|
META_KET ) closing parenthesis
|
||||||
|
META_LOOKAHEAD (?= start of lookahead
|
||||||
|
META_LOOKAHEADNOT (?! start of negative lookahead
|
||||||
|
META_NOCAPTURE (?: no capture parens
|
||||||
|
META_PLUS +
|
||||||
|
META_PLUS_PLUS ++
|
||||||
|
META_PLUS_QUERY +?
|
||||||
|
META_PRUNE (*PRUNE) - no argument
|
||||||
|
META_QUERY ?
|
||||||
|
META_QUERY_PLUS ?+
|
||||||
|
META_QUERY_QUERY ??
|
||||||
|
META_RANGE_ESCAPED hyphen in class range with at least one escape
|
||||||
|
META_RANGE_LITERAL hyphen in class range defined literally
|
||||||
|
META_SKIP (*SKIP) - no argument
|
||||||
|
META_THEN (*THEN) - no argument
|
||||||
|
|
||||||
|
The two RANGE values occur only in character classes. They are positioned
|
||||||
|
between two literals that define the start and end of the range. In an EBCDIC
|
||||||
|
evironment it is necessary to know whether either of the range values was
|
||||||
|
specified as an escape. In an ASCII/Unicode environment the distinction is not
|
||||||
|
relevant.
|
||||||
|
|
||||||
|
The following have data in the lower 16 bits, and may be followed by other data
|
||||||
|
elements:
|
||||||
|
|
||||||
|
META_BACKREF
|
||||||
|
META_CAPTURE
|
||||||
|
META_ESCAPE
|
||||||
|
META_RECURSE
|
||||||
|
|
||||||
|
META_BACKREF, META_CAPTURE, and META_RECURSE have the capture group number as
|
||||||
|
their data in the lower 16 bits of the element.
|
||||||
|
|
||||||
|
META_BACKREF is followed by an offset if the back reference group number is 10
|
||||||
|
or more. The offsets of the first ocurrences of references to groups whose
|
||||||
|
numbers are less than 10 are put in cb->small_ref_offset[] (only the first
|
||||||
|
occurrence is useful). On 64-bit systems this avoids using more than two parsed
|
||||||
|
pattern elements for items such as \3. The offset is used when an error is
|
||||||
|
given for a reference to a non-existent group.
|
||||||
|
|
||||||
|
META_RECURSE is always followed by an offset, for use in error messages.
|
||||||
|
|
||||||
|
META_ESCAPE has an ESC_xxx value as its data. For ESC_P and ESC_p, the next
|
||||||
|
element contains the 16-bit type and data property values, packed together.
|
||||||
|
ESC_g and ESC_k are used only for named references - numerical ones are turned
|
||||||
|
into META_RECURSE or META_BACKREF as appropriate. They are followed by a length
|
||||||
|
and an offset into the pattern to specify the name.
|
||||||
|
|
||||||
|
The following have one data item that follows in the next vector element:
|
||||||
|
|
||||||
|
META_BIGVALUE Next is a literal >= META_END
|
||||||
|
META_OPTIONS (?i) and friends (data is new option bits)
|
||||||
|
META_POSIX POSIX class item (data identifies the class)
|
||||||
|
META_POSIX_NEG negative POSIX class item (ditto)
|
||||||
|
|
||||||
|
The following are followed by a length element, then a number of character code
|
||||||
|
values (which should match with the length):
|
||||||
|
|
||||||
|
META_MARK (*MARK:xxxx)
|
||||||
|
META_PRUNE_ARG (*PRUNE:xxx)
|
||||||
|
META_SKIP_ARG (*SKIP:xxxx)
|
||||||
|
META_THEN_ARG (*THEN:xxxx)
|
||||||
|
|
||||||
|
The following are followed by a length element, then an offset in the pattern
|
||||||
|
that identifies the name:
|
||||||
|
|
||||||
|
META_COND_NAME (?(<name>) or (?('name') or (?(name)
|
||||||
|
META_COND_RNAME (?(R&name)
|
||||||
|
META_COND_RNUMBER (?(Rdigits)
|
||||||
|
META_RECURSE_BYNAME (?&name)
|
||||||
|
META_BACKREF_BYNAME \k'name'
|
||||||
|
|
||||||
|
META_COND_RNUMBER is used for names that start with R and continue with digits,
|
||||||
|
because this is an ambiguous case. It could be a back reference to a group with
|
||||||
|
that name, or it could be a recursion test on a numbered group.
|
||||||
|
|
||||||
|
This one is followed by an offset, for use in error messages, then a number:
|
||||||
|
|
||||||
|
META_COND_NUMBER (?([+-]digits)
|
||||||
|
|
||||||
|
The following are followed just by an offset, for use in error messages:
|
||||||
|
|
||||||
|
META_COND_ASSERT (?(?assertion)
|
||||||
|
META_COND_DEFINE (?(DEFINE)
|
||||||
|
META_LOOKBEHIND (?<=
|
||||||
|
META_LOOKBEHINDNOT (?<!
|
||||||
|
|
||||||
|
In fact, META_COND_ASSERT is used for any group starting (?( that does not
|
||||||
|
match any of the other META_COND cases. The check that this group is an
|
||||||
|
assertion (optionally preceded by a callout) happens at compile time.
|
||||||
|
|
||||||
|
The following are followed by two values, the minimum and maximum. Repeat
|
||||||
|
values are limited to 65535 (MAX_REPEAT). A maximum value of "unlimited" is
|
||||||
|
represented by UNLIMITED_REPEAT, which is bigger than MAX_REPEAT:
|
||||||
|
|
||||||
|
META_MINMAX {n,m} repeat
|
||||||
|
META_MINMAX_PLUS {n,m}+ repeat
|
||||||
|
META_MINMAX_QUERY {n,m}? repeat
|
||||||
|
|
||||||
|
This one is followed by three elements. The first is 0 for '>' and 1 for '>=';
|
||||||
|
the next two are the major and minor numbers:
|
||||||
|
|
||||||
|
META_COND_VERSION (?(VERSION<op>x.y)
|
||||||
|
|
||||||
|
Callouts are converted into one of two items:
|
||||||
|
|
||||||
|
META_CALLOUT_NUMBER (?C with numerical argument
|
||||||
|
META_CALLOUT_STRING (?C with string argument
|
||||||
|
|
||||||
|
In both cases, the next two elements contain the offset and length of the next
|
||||||
|
item in the pattern. Then there is either one callout number, or a length and
|
||||||
|
an offset for the string argument. The length includes both delimiters.
|
||||||
|
|
||||||
|
|
||||||
Traditional matching function
|
Traditional matching function
|
||||||
|
@ -606,4 +791,4 @@ not a real opcode, but is used to check that tables indexed by opcode are the
|
||||||
correct length, in order to catch updating errors.
|
correct length, in order to catch updating errors.
|
||||||
|
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
June 2016
|
September 2016
|
||||||
|
|
|
@ -65,6 +65,7 @@ dist_html_DATA = \
|
||||||
doc/html/pcre2_set_character_tables.html \
|
doc/html/pcre2_set_character_tables.html \
|
||||||
doc/html/pcre2_set_compile_recursion_guard.html \
|
doc/html/pcre2_set_compile_recursion_guard.html \
|
||||||
doc/html/pcre2_set_match_limit.html \
|
doc/html/pcre2_set_match_limit.html \
|
||||||
|
doc/html/pcre2_set_max_pattern_length.html \
|
||||||
doc/html/pcre2_set_offset_limit.html \
|
doc/html/pcre2_set_offset_limit.html \
|
||||||
doc/html/pcre2_set_newline.html \
|
doc/html/pcre2_set_newline.html \
|
||||||
doc/html/pcre2_set_parens_nest_limit.html \
|
doc/html/pcre2_set_parens_nest_limit.html \
|
||||||
|
@ -146,6 +147,7 @@ dist_man_MANS = \
|
||||||
doc/pcre2_set_character_tables.3 \
|
doc/pcre2_set_character_tables.3 \
|
||||||
doc/pcre2_set_compile_recursion_guard.3 \
|
doc/pcre2_set_compile_recursion_guard.3 \
|
||||||
doc/pcre2_set_match_limit.3 \
|
doc/pcre2_set_match_limit.3 \
|
||||||
|
doc/pcre2_set_max_pattern_length.3 \
|
||||||
doc/pcre2_set_offset_limit.3 \
|
doc/pcre2_set_offset_limit.3 \
|
||||||
doc/pcre2_set_newline.3 \
|
doc/pcre2_set_newline.3 \
|
||||||
doc/pcre2_set_parens_nest_limit.3 \
|
doc/pcre2_set_parens_nest_limit.3 \
|
||||||
|
|
2
RunTest
2
RunTest
|
@ -502,7 +502,7 @@ for bmode in "$test8" "$test16" "$test32"; do
|
||||||
for opt in "" $jitopt; do
|
for opt in "" $jitopt; do
|
||||||
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $test2stack $bmode $opt $testdata/testinput2 testtry
|
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $test2stack $bmode $opt $testdata/testinput2 testtry
|
||||||
if [ $? = 0 ] ; then
|
if [ $? = 0 ] ; then
|
||||||
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -63,-62,-2,-1,0,100,188,189 >>testtry
|
$sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -63,-62,-2,-1,0,100,188,189,190 >>testtry
|
||||||
checkresult $? 2 "$opt"
|
checkresult $? 2 "$opt"
|
||||||
else
|
else
|
||||||
echo " "
|
echo " "
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "17 June 2016" "PCRE2 10.22"
|
.TH PCRE2API 3 "30 September 2016" "PCRE2 10.23"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -693,7 +693,8 @@ functions, \fIpcre2_match()\fP and \fIpcre2_dfa_match()\fP.
|
||||||
.sp
|
.sp
|
||||||
This parameter ajusts the limit, set when PCRE2 is built (default 250), on the
|
This parameter ajusts the limit, set when PCRE2 is built (default 250), on the
|
||||||
depth of parenthesis nesting in a pattern. This limit stops rogue patterns
|
depth of parenthesis nesting in a pattern. This limit stops rogue patterns
|
||||||
using up too much system stack when being compiled.
|
using up too much system stack when being compiled. The limit applies to
|
||||||
|
parentheses of all kinds, not just capturing parentheses.
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
.B int pcre2_set_compile_recursion_guard(pcre2_compile_context *\fIccontext\fP,
|
.B int pcre2_set_compile_recursion_guard(pcre2_compile_context *\fIccontext\fP,
|
||||||
|
@ -1091,7 +1092,13 @@ NULL immediately. Otherwise, the variables to which these point are set to an
|
||||||
error code and an offset (number of code units) within the pattern,
|
error code and an offset (number of code units) within the pattern,
|
||||||
respectively, when \fBpcre2_compile()\fP returns NULL because a compilation
|
respectively, when \fBpcre2_compile()\fP returns NULL because a compilation
|
||||||
error has occurred. The values are not defined when compilation is successful
|
error has occurred. The values are not defined when compilation is successful
|
||||||
and \fBpcre2_compile()\fP returns a non-NULL value.
|
and \fBpcre2_compile()\fP returns a non-NULL value.
|
||||||
|
.P
|
||||||
|
The value returned in \fIerroroffset\fP is an indication of where in the
|
||||||
|
pattern the error occurred. It is not necessarily the furthest point in the
|
||||||
|
pattern that was read. For example, after the error "lookbehind assertion is
|
||||||
|
not fixed length", the error offset points to the start of the failing
|
||||||
|
assertion.
|
||||||
.P
|
.P
|
||||||
The \fBpcre2_get_error_message()\fP function (see "Obtaining a textual error
|
The \fBpcre2_get_error_message()\fP function (see "Obtaining a textual error
|
||||||
message"
|
message"
|
||||||
|
@ -1184,8 +1191,8 @@ recognized, exactly as in the rest of the pattern.
|
||||||
PCRE2_AUTO_CALLOUT
|
PCRE2_AUTO_CALLOUT
|
||||||
.sp
|
.sp
|
||||||
If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
|
If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
|
||||||
all with number 255, before each pattern item. For discussion of the callout
|
all with number 255, before each pattern item, except immediately before or
|
||||||
facility, see the
|
after a callout in the pattern. For discussion of the callout facility, see the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2callout\fP
|
\fBpcre2callout\fP
|
||||||
.\"
|
.\"
|
||||||
|
@ -3292,6 +3299,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 17 June 2016
|
Last updated: 30 September 2016
|
||||||
Copyright (c) 1997-2016 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2CALLOUT 3 "23 March 2015" "PCRE2 10.20"
|
.TH PCRE2CALLOUT 3 "29 September 2016" "PCRE2 10.23"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -40,11 +40,20 @@ two callout points:
|
||||||
.sp
|
.sp
|
||||||
If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
|
If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
|
||||||
automatically inserts callouts, all with number 255, before each item in the
|
automatically inserts callouts, all with number 255, before each item in the
|
||||||
pattern. For example, if PCRE2_AUTO_CALLOUT is used with the pattern
|
pattern except for immediately before or after a callout item in the pattern.
|
||||||
|
For example, if PCRE2_AUTO_CALLOUT is used with the pattern
|
||||||
|
.sp
|
||||||
|
A(?C3)B
|
||||||
|
.sp
|
||||||
|
it is processed as if it were
|
||||||
|
.sp
|
||||||
|
(?C255)A(?C3)B(?C255)
|
||||||
|
.sp
|
||||||
|
Here is a more complicated example:
|
||||||
.sp
|
.sp
|
||||||
A(\ed{2}|--)
|
A(\ed{2}|--)
|
||||||
.sp
|
.sp
|
||||||
it is processed as if it were
|
With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were
|
||||||
.sp
|
.sp
|
||||||
(?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
|
(?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
|
||||||
.sp
|
.sp
|
||||||
|
@ -91,10 +100,10 @@ with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied to the string
|
||||||
No match
|
No match
|
||||||
.sp
|
.sp
|
||||||
This indicates that when matching [bc] fails, there is no backtracking into a+
|
This indicates that when matching [bc] fails, there is no backtracking into a+
|
||||||
and therefore the callouts that would be taken for the backtracks do not occur.
|
(because it is being treated as a++) and therefore the callouts that would be
|
||||||
You can disable the auto-possessify feature by passing PCRE2_NO_AUTO_POSSESS to
|
taken for the backtracks do not occur. You can disable the auto-possessify
|
||||||
\fBpcre2_compile()\fP, or starting the pattern with (*NO_AUTO_POSSESS). In this
|
feature by passing PCRE2_NO_AUTO_POSSESS to \fBpcre2_compile()\fP, or starting
|
||||||
case, the output changes to this:
|
the pattern with (*NO_AUTO_POSSESS). In this case, the output changes to this:
|
||||||
.sp
|
.sp
|
||||||
--->aaaa
|
--->aaaa
|
||||||
+0 ^ a+
|
+0 ^ a+
|
||||||
|
@ -220,8 +229,8 @@ but the intention is never to remove any of the existing fields.
|
||||||
.sp
|
.sp
|
||||||
For a numerical callout, \fIcallout_string\fP is NULL, and \fIcallout_number\fP
|
For a numerical callout, \fIcallout_string\fP is NULL, and \fIcallout_number\fP
|
||||||
contains the number of the callout, in the range 0-255. This is the number
|
contains the number of the callout, in the range 0-255. This is the number
|
||||||
that follows (?C for manual callouts; it is 255 for automatically generated
|
that follows (?C for callouts that part of the pattern; it is 255 for
|
||||||
callouts.
|
automatically generated callouts.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Fields for string callouts"
|
.SS "Fields for string callouts"
|
||||||
|
@ -286,10 +295,15 @@ The \fIpattern_position\fP field contains the offset in the pattern string to
|
||||||
the next item to be matched.
|
the next item to be matched.
|
||||||
.P
|
.P
|
||||||
The \fInext_item_length\fP field contains the length of the next item to be
|
The \fInext_item_length\fP field contains the length of the next item to be
|
||||||
matched in the pattern string. When the callout immediately precedes an
|
processed in the pattern string. When the callout is at the end of the pattern,
|
||||||
alternation bar, a closing parenthesis, or the end of the pattern, the length
|
the length is zero. When the callout precedes an opening parenthesis, the
|
||||||
is zero. When the callout precedes an opening parenthesis, the length is that
|
length includes meta characters that follow the parenthesis. For example, in a
|
||||||
of the entire subpattern.
|
callout before an assertion such as (?=ab) the length is 3. For an an
|
||||||
|
alternation bar or a closing parenthesis, the length is one, unless a closing
|
||||||
|
parenthesis is followed by a quantifier, in which case its length is included.
|
||||||
|
(This changed in release 10.23. In earlier releases, before an opening
|
||||||
|
parenthesis the length was that of the entire subpattern, and before an
|
||||||
|
alternation bar or a closing parenthesis the length was zero.)
|
||||||
.P
|
.P
|
||||||
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
|
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
|
||||||
help in distinguishing between different automatic callouts, which all have the
|
help in distinguishing between different automatic callouts, which all have the
|
||||||
|
@ -382,6 +396,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 23 March 2015
|
Last updated: 29 September 2016
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2COMPAT 3 "15 March 2015" "PCRE2 10.20"
|
.TH PCRE2COMPAT 3 "30 September 2016" "PCRE2 10.23"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
||||||
|
@ -96,7 +96,7 @@ processed as anchored at the point where they are tested.
|
||||||
one that is backtracked onto acts. For example, in the pattern
|
one that is backtracked onto acts. For example, in the pattern
|
||||||
A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
|
A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
|
||||||
triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
|
triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
|
||||||
same as PCRE2, but there are examples where it differs.
|
same as PCRE2, but there are cases where it differs.
|
||||||
.P
|
.P
|
||||||
11. Most backtracking verbs in assertions have their normal actions. They are
|
11. Most backtracking verbs in assertions have their normal actions. They are
|
||||||
not confined to the assertion.
|
not confined to the assertion.
|
||||||
|
@ -116,10 +116,11 @@ would not be possible to distinguish which parentheses matched, because both
|
||||||
names map to capturing subpattern number 1. To avoid this confusing situation,
|
names map to capturing subpattern number 1. To avoid this confusing situation,
|
||||||
an error is given at compile time.
|
an error is given at compile time.
|
||||||
.P
|
.P
|
||||||
14. Perl recognizes comments in some places that PCRE2 does not, for example,
|
14. Perl used to recognize comments in some places that PCRE2 does not, for
|
||||||
between the ( and ? at the start of a subpattern. If the /x modifier is set,
|
example, between the ( and ? at the start of a subpattern. If the /x modifier
|
||||||
Perl allows white space between ( and ? (though current Perls warn that this is
|
is set, Perl allowed white space between ( and ? though the latest Perls give
|
||||||
deprecated) but PCRE2 never does, even if the PCRE2_EXTENDED option is set.
|
an error (for a while it was just deprecated). There may still be some cases
|
||||||
|
where Perl behaves differently.
|
||||||
.P
|
.P
|
||||||
15. Perl, when in warning mode, gives warnings for character classes such as
|
15. Perl, when in warning mode, gives warnings for character classes such as
|
||||||
[A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
|
[A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
|
||||||
|
@ -139,35 +140,39 @@ list is with respect to Perl 5.10:
|
||||||
.sp
|
.sp
|
||||||
(a) Although lookbehind assertions in PCRE2 must match fixed length strings,
|
(a) Although lookbehind assertions in PCRE2 must match fixed length strings,
|
||||||
each alternative branch of a lookbehind assertion can match a different length
|
each alternative branch of a lookbehind assertion can match a different length
|
||||||
of string. Perl requires them all to have the same length.
|
of string. Perl requires them all to have the same length.
|
||||||
.sp
|
.sp
|
||||||
(b) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
|
(b) From PCRE2 10.23, back references to groups of fixed length are supported
|
||||||
|
in lookbehinds, provided that there is no possibility of referencing a
|
||||||
|
non-unique number or name. Perl does not support backreferences in lookbehinds.
|
||||||
|
.sp
|
||||||
|
(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
|
||||||
meta-character matches only at the very end of the string.
|
meta-character matches only at the very end of the string.
|
||||||
.sp
|
.sp
|
||||||
(c) A backslash followed by a letter with no special meaning is faulted. (Perl
|
(d) A backslash followed by a letter with no special meaning is faulted. (Perl
|
||||||
can be made to issue a warning.)
|
can be made to issue a warning.)
|
||||||
.sp
|
.sp
|
||||||
(d) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
|
(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
|
||||||
inverted, that is, by default they are not greedy, but if followed by a
|
inverted, that is, by default they are not greedy, but if followed by a
|
||||||
question mark they are.
|
question mark they are.
|
||||||
.sp
|
.sp
|
||||||
(e) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
|
(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
|
||||||
only at the first matching position in the subject string.
|
only at the first matching position in the subject string.
|
||||||
.sp
|
.sp
|
||||||
(f) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, and
|
(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, and
|
||||||
PCRE2_NO_AUTO_CAPTURE options have no Perl equivalents.
|
PCRE2_NO_AUTO_CAPTURE options have no Perl equivalents.
|
||||||
.sp
|
.sp
|
||||||
(g) The \eR escape sequence can be restricted to match only CR, LF, or CRLF
|
(h) The \eR escape sequence can be restricted to match only CR, LF, or CRLF
|
||||||
by the PCRE2_BSR_ANYCRLF option.
|
by the PCRE2_BSR_ANYCRLF option.
|
||||||
.sp
|
.sp
|
||||||
(h) The callout facility is PCRE2-specific.
|
(i) The callout facility is PCRE2-specific.
|
||||||
.sp
|
.sp
|
||||||
(i) The partial matching facility is PCRE2-specific.
|
(j) The partial matching facility is PCRE2-specific.
|
||||||
.sp
|
.sp
|
||||||
(j) The alternative matching function (\fBpcre2_dfa_match()\fP matches in a
|
(k) The alternative matching function (\fBpcre2_dfa_match()\fP matches in a
|
||||||
different way and is not Perl-compatible.
|
different way and is not Perl-compatible.
|
||||||
.sp
|
.sp
|
||||||
(k) PCRE2 recognizes some special sequences such as (*CR) at the start of
|
(l) PCRE2 recognizes some special sequences such as (*CR) at the start of
|
||||||
a pattern that set overall options that cannot be changed within the pattern.
|
a pattern that set overall options that cannot be changed within the pattern.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
@ -185,6 +190,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 15 March 2015
|
Last updated: 30 September 2016
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2LIMITS 3 "05 November 2015" "PCRE2 10.21"
|
.TH PCRE2LIMITS 3 "29 September 2016" "PCRE2 10.23"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "SIZE AND OTHER LIMITATIONS"
|
.SH "SIZE AND OTHER LIMITATIONS"
|
||||||
|
@ -46,19 +46,19 @@ The maximum length of a lookbehind assertion is 65535 characters.
|
||||||
There is no limit to the number of parenthesized subpatterns, but there can be
|
There is no limit to the number of parenthesized subpatterns, but there can be
|
||||||
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
no more than 65535 capturing subpatterns. There is, however, a limit to the
|
||||||
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
|
||||||
order to limit the amount of system stack used at compile time. The limit can
|
order to limit the amount of system stack used at compile time. The default
|
||||||
be specified when PCRE2 is built; the default is 250.
|
limit can be specified when PCRE2 is built; the default default is 250. An
|
||||||
.P
|
application can change this limit by calling pcre2_set_parens_nest_limit() to
|
||||||
There is a limit to the number of forward references to subsequent subpatterns
|
set the limit in a compile context.
|
||||||
of around 200,000. Repeated forward references with fixed upper limits, for
|
|
||||||
example, (?2){0,100} when subpattern number 2 is to the right, are included in
|
|
||||||
the count. There is no limit to the number of backward references.
|
|
||||||
.P
|
.P
|
||||||
The maximum length of name for a named subpattern is 32 code units, and the
|
The maximum length of name for a named subpattern is 32 code units, and the
|
||||||
maximum number of named subpatterns is 10000.
|
maximum number of named subpatterns is 10000.
|
||||||
.P
|
.P
|
||||||
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
|
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
|
||||||
is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit libraries.
|
is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit libraries.
|
||||||
|
.P
|
||||||
|
The maximum length of a string argument to a callout is the largest number a
|
||||||
|
32-bit unsigned integer can hold.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH AUTHOR
|
.SH AUTHOR
|
||||||
|
@ -75,6 +75,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 05 November 2015
|
Last updated: 29 September 2016
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "20 June 2016" "PCRE2 10.22"
|
.TH PCRE2PATTERN 3 "30 September 2016" "PCRE2 10.23"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -508,9 +508,9 @@ by code point, as described in the previous section.
|
||||||
.SS "Absolute and relative back references"
|
.SS "Absolute and relative back references"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The sequence \eg followed by an unsigned or a negative number, optionally
|
The sequence \eg followed by a signed or unsigned number, optionally enclosed
|
||||||
enclosed in braces, is an absolute or relative back reference. A named back
|
in braces, is an absolute or relative back reference. A named back reference
|
||||||
reference can be coded as \eg{name}. Back references are discussed
|
can be coded as \eg{name}. Back references are discussed
|
||||||
.\" HTML <a href="#backreferences">
|
.\" HTML <a href="#backreferences">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
later,
|
later,
|
||||||
|
@ -1325,13 +1325,33 @@ when matching character classes, whatever line-ending sequence is in use, and
|
||||||
whatever setting of the PCRE2_DOTALL and PCRE2_MULTILINE options is used. A
|
whatever setting of the PCRE2_DOTALL and PCRE2_MULTILINE options is used. A
|
||||||
class such as [^a] always matches one of these characters.
|
class such as [^a] always matches one of these characters.
|
||||||
.P
|
.P
|
||||||
|
The character escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev,
|
||||||
|
\eV, \ew, and \eW may appear in a character class, and add the characters that
|
||||||
|
they match to the class. For example, [\edABCDEF] matches any hexadecimal
|
||||||
|
digit. In UTF modes, the PCRE2_UCP option affects the meanings of \ed, \es, \ew
|
||||||
|
and their upper case partners, just as it does when they appear outside a
|
||||||
|
character class, as described in the section entitled
|
||||||
|
.\" HTML <a href="#genericchartypes">
|
||||||
|
.\" </a>
|
||||||
|
"Generic character types"
|
||||||
|
.\"
|
||||||
|
above. The escape sequence \eb has a different meaning inside a character
|
||||||
|
class; it matches the backspace character. The sequences \eB, \eN, \eR, and \eX
|
||||||
|
are not special inside a character class. Like any other unrecognized escape
|
||||||
|
sequences, they cause an error.
|
||||||
|
.P
|
||||||
The minus (hyphen) character can be used to specify a range of characters in a
|
The minus (hyphen) character can be used to specify a range of characters in a
|
||||||
character class. For example, [d-m] matches any letter between d and m,
|
character class. For example, [d-m] matches any letter between d and m,
|
||||||
inclusive. If a minus character is required in a class, it must be escaped with
|
inclusive. If a minus character is required in a class, it must be escaped with
|
||||||
a backslash or appear in a position where it cannot be interpreted as
|
a backslash or appear in a position where it cannot be interpreted as
|
||||||
indicating a range, typically as the first or last character in the class, or
|
indicating a range, typically as the first or last character in the class,
|
||||||
immediately after a range. For example, [b-d-z] matches letters in the range b
|
or immediately after a range. For example, [b-d-z] matches letters in the range
|
||||||
to d, a hyphen character, or z.
|
b to d, a hyphen character, or z.
|
||||||
|
.P
|
||||||
|
Perl treats a hyphen as a literal if it appears before a POSIX class (see
|
||||||
|
below) or a character type escape such as as \ed, but gives a warning in its
|
||||||
|
warning mode, as this is most likely a user error. As PCRE2 has no facility for
|
||||||
|
warning, an error is given in these cases.
|
||||||
.P
|
.P
|
||||||
It is not possible to have the literal character "]" as the end character of a
|
It is not possible to have the literal character "]" as the end character of a
|
||||||
range. A pattern such as [W-]46] is interpreted as a class of two characters
|
range. A pattern such as [W-]46] is interpreted as a class of two characters
|
||||||
|
@ -1341,11 +1361,6 @@ the end of range, so [W-\e]46] is interpreted as a class containing a range
|
||||||
followed by two other characters. The octal or hexadecimal representation of
|
followed by two other characters. The octal or hexadecimal representation of
|
||||||
"]" can also be used to end a range.
|
"]" can also be used to end a range.
|
||||||
.P
|
.P
|
||||||
An error is generated if a POSIX character class (see below) or an escape
|
|
||||||
sequence other than one that defines a single character appears at a point
|
|
||||||
where a range ending character is expected. For example, [z-\exff] is valid,
|
|
||||||
but [A-\ed] and [A-[:digit:]] are not.
|
|
||||||
.P
|
|
||||||
Ranges normally include all code points between the start and end characters,
|
Ranges normally include all code points between the start and end characters,
|
||||||
inclusive. They can also be used for code points specified numerically, for
|
inclusive. They can also be used for code points specified numerically, for
|
||||||
example [\e000-\e037]. Ranges can include any characters that are valid for the
|
example [\e000-\e037]. Ranges can include any characters that are valid for the
|
||||||
|
@ -1365,21 +1380,6 @@ matches the letters in either case. For example, [W-c] is equivalent to
|
||||||
tables for a French locale are in use, [\exc8-\excb] matches accented E
|
tables for a French locale are in use, [\exc8-\excb] matches accented E
|
||||||
characters in both cases.
|
characters in both cases.
|
||||||
.P
|
.P
|
||||||
The character escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev,
|
|
||||||
\eV, \ew, and \eW may appear in a character class, and add the characters that
|
|
||||||
they match to the class. For example, [\edABCDEF] matches any hexadecimal
|
|
||||||
digit. In UTF modes, the PCRE2_UCP option affects the meanings of \ed, \es, \ew
|
|
||||||
and their upper case partners, just as it does when they appear outside a
|
|
||||||
character class, as described in the section entitled
|
|
||||||
.\" HTML <a href="#genericchartypes">
|
|
||||||
.\" </a>
|
|
||||||
"Generic character types"
|
|
||||||
.\"
|
|
||||||
above. The escape sequence \eb has a different meaning inside a character
|
|
||||||
class; it matches the backspace character. The sequences \eB, \eN, \eR, and \eX
|
|
||||||
are not special inside a character class. Like any other unrecognized escape
|
|
||||||
sequences, they cause an error.
|
|
||||||
.P
|
|
||||||
A circumflex can conveniently be used with the upper case character types to
|
A circumflex can conveniently be used with the upper case character types to
|
||||||
specify a more restricted set of characters than the matching lower case type.
|
specify a more restricted set of characters than the matching lower case type.
|
||||||
For example, the class [^\eW_] matches any letter or digit, but not underscore,
|
For example, the class [^\eW_] matches any letter or digit, but not underscore,
|
||||||
|
@ -2096,9 +2096,9 @@ no such problem when named parentheses are used. A back reference to any
|
||||||
subpattern is possible using named parentheses (see below).
|
subpattern is possible using named parentheses (see below).
|
||||||
.P
|
.P
|
||||||
Another way of avoiding the ambiguity inherent in the use of digits following a
|
Another way of avoiding the ambiguity inherent in the use of digits following a
|
||||||
backslash is to use the \eg escape sequence. This escape must be followed by an
|
backslash is to use the \eg escape sequence. This escape must be followed by a
|
||||||
unsigned number or a negative number, optionally enclosed in braces. These
|
signed or unsigned number, optionally enclosed in braces. These examples are
|
||||||
examples are all identical:
|
all identical:
|
||||||
.sp
|
.sp
|
||||||
(ring), \e1
|
(ring), \e1
|
||||||
(ring), \eg1
|
(ring), \eg1
|
||||||
|
@ -2106,8 +2106,7 @@ examples are all identical:
|
||||||
.sp
|
.sp
|
||||||
An unsigned number specifies an absolute reference without the ambiguity that
|
An unsigned number specifies an absolute reference without the ambiguity that
|
||||||
is present in the older syntax. It is also useful when literal digits follow
|
is present in the older syntax. It is also useful when literal digits follow
|
||||||
the reference. A negative number is a relative reference. Consider this
|
the reference. A signed number is a relative reference. Consider this example:
|
||||||
example:
|
|
||||||
.sp
|
.sp
|
||||||
(abc(def)ghi)\eg{-1}
|
(abc(def)ghi)\eg{-1}
|
||||||
.sp
|
.sp
|
||||||
|
@ -2117,6 +2116,10 @@ Similarly, \eg{-2} would be equivalent to \e1. The use of relative references
|
||||||
can be helpful in long patterns, and also in patterns that are created by
|
can be helpful in long patterns, and also in patterns that are created by
|
||||||
joining together fragments that contain references within themselves.
|
joining together fragments that contain references within themselves.
|
||||||
.P
|
.P
|
||||||
|
The sequence \eg{+1} is a reference to the next capturing subpattern. This kind
|
||||||
|
of forward reference can be useful it patterns that repeat. Perl does not
|
||||||
|
support the use of + in this way.
|
||||||
|
.P
|
||||||
A back reference matches whatever actually matched the capturing subpattern in
|
A back reference matches whatever actually matched the capturing subpattern in
|
||||||
the current subject string, rather than anything matching the subpattern
|
the current subject string, rather than anything matching the subpattern
|
||||||
itself (see
|
itself (see
|
||||||
|
@ -2321,23 +2324,34 @@ temporarily move the current position back by the fixed length and then try to
|
||||||
match. If there are insufficient characters before the current position, the
|
match. If there are insufficient characters before the current position, the
|
||||||
assertion fails.
|
assertion fails.
|
||||||
.P
|
.P
|
||||||
In a UTF mode, PCRE2 does not allow the \eC escape (which matches a single code
|
In UTF-8 and UTF-16 modes, PCRE2 does not allow the \eC escape (which matches a
|
||||||
unit even in a UTF mode) to appear in lookbehind assertions, because it makes
|
single code unit even in a UTF mode) to appear in lookbehind assertions,
|
||||||
it impossible to calculate the length of the lookbehind. The \eX and \eR
|
because it makes it impossible to calculate the length of the lookbehind. The
|
||||||
escapes, which can match different numbers of code units, are also not
|
\eX and \eR escapes, which can match different numbers of code units, are never
|
||||||
permitted.
|
permitted in lookbehinds.
|
||||||
.P
|
.P
|
||||||
.\" HTML <a href="#subpatternsassubroutines">
|
.\" HTML <a href="#subpatternsassubroutines">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
"Subroutine"
|
"Subroutine"
|
||||||
.\"
|
.\"
|
||||||
calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
|
calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
|
||||||
as the subpattern matches a fixed-length string.
|
as the subpattern matches a fixed-length string. However,
|
||||||
.\" HTML <a href="#recursion">
|
.\" HTML <a href="#recursion">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
Recursion,
|
recursion,
|
||||||
.\"
|
.\"
|
||||||
however, is not supported.
|
that is, a "subroutine" call into a group that is already active,
|
||||||
|
is not supported.
|
||||||
|
.P
|
||||||
|
Perl does not support back references in lookbehinds. PCRE2 does support them,
|
||||||
|
but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option
|
||||||
|
must not be set, there must be no use of (?| in the pattern (it creates
|
||||||
|
duplicate subpattern numbers), and if the back reference is by name, the name
|
||||||
|
must be unique. Of course, the referenced subpattern must itself be of fixed
|
||||||
|
length. The following pattern matches words containing at least two characters
|
||||||
|
that begin and end with the same character:
|
||||||
|
.sp
|
||||||
|
\eb(\ew)\ew++(?<=\e1)
|
||||||
.P
|
.P
|
||||||
Possessive quantifiers can be used in conjunction with lookbehind assertions to
|
Possessive quantifiers can be used in conjunction with lookbehind assertions to
|
||||||
specify efficient matching of fixed-length strings at the end of subject
|
specify efficient matching of fixed-length strings at the end of subject
|
||||||
|
@ -2476,7 +2490,9 @@ This makes the fragment independent of the parentheses in the larger pattern.
|
||||||
.sp
|
.sp
|
||||||
Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a used
|
Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a used
|
||||||
subpattern by name. For compatibility with earlier versions of PCRE1, which had
|
subpattern by name. For compatibility with earlier versions of PCRE1, which had
|
||||||
this facility before Perl, the syntax (?(name)...) is also recognized.
|
this facility before Perl, the syntax (?(name)...) is also recognized. Note,
|
||||||
|
however, that undelimited names consisting of the letter R followed by digits
|
||||||
|
are ambiguous (see the following section).
|
||||||
.P
|
.P
|
||||||
Rewriting the above example to use a named subpattern gives this:
|
Rewriting the above example to use a named subpattern gives this:
|
||||||
.sp
|
.sp
|
||||||
|
@ -2490,33 +2506,55 @@ matched.
|
||||||
.SS "Checking for pattern recursion"
|
.SS "Checking for pattern recursion"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
If the condition is the string (R), and there is no subpattern with the name R,
|
"Recursion" in this sense refers to any subroutine-like call from one part of
|
||||||
the condition is true if a recursive call to the whole pattern or any
|
the pattern to another, whether or not it is actually recursive. See the
|
||||||
subpattern has been made. If digits or a name preceded by ampersand follow the
|
sections entitled
|
||||||
letter R, for example:
|
|
||||||
.sp
|
|
||||||
(?(R3)...) or (?(R&name)...)
|
|
||||||
.sp
|
|
||||||
the condition is true if the most recent recursion is into a subpattern whose
|
|
||||||
number or name is given. This condition does not check the entire recursion
|
|
||||||
stack. If the name used in a condition of this kind is a duplicate, the test is
|
|
||||||
applied to all subpatterns of the same name, and is true if any one of them is
|
|
||||||
the most recent recursion.
|
|
||||||
.P
|
|
||||||
At "top level", all these recursion test conditions are false.
|
|
||||||
.\" HTML <a href="#recursion">
|
.\" HTML <a href="#recursion">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
The syntax for recursive patterns
|
"Recursive patterns"
|
||||||
.\"
|
.\"
|
||||||
is described below.
|
and
|
||||||
|
.\" HTML <a href="#subpatternsassubroutines">
|
||||||
|
.\" </a>
|
||||||
|
"Subpatterns as subroutines"
|
||||||
|
.\"
|
||||||
|
below for details of recursion and subpattern calls.
|
||||||
|
.P
|
||||||
|
If a condition is the string (R), and there is no subpattern with the name R,
|
||||||
|
the condition is true if matching is currently in a recursion or subroutine
|
||||||
|
call to the whole pattern or any subpattern. If digits follow the letter R, and
|
||||||
|
there is no subpattern with that name, the condition is true if the most recent
|
||||||
|
call is into a subpattern with the given number, which must exist somewhere in
|
||||||
|
the overall pattern. This is a contrived example that is equivalent to a+b:
|
||||||
|
.sp
|
||||||
|
((?(R1)a+|(?1)b))
|
||||||
|
.sp
|
||||||
|
However, in both cases, if there is a subpattern with a matching name, the
|
||||||
|
condition tests for its being set, as described in the section above, instead
|
||||||
|
of testing for recursion. For example, creating a group with the name R1 by
|
||||||
|
adding (?<R1>) to the above pattern completely changes its meaning.
|
||||||
|
.P
|
||||||
|
If a name preceded by ampersand follows the letter R, for example:
|
||||||
|
.sp
|
||||||
|
(?(R&name)...)
|
||||||
|
.sp
|
||||||
|
the condition is true if the most recent recursion is into a subpattern of that
|
||||||
|
name (which must exist within the pattern).
|
||||||
|
.P
|
||||||
|
This condition does not check the entire recursion stack. It tests only the
|
||||||
|
current level. If the name used in a condition of this kind is a duplicate, the
|
||||||
|
test is applied to all subpatterns of the same name, and is true if any one of
|
||||||
|
them is the most recent recursion.
|
||||||
|
.P
|
||||||
|
At "top level", all these recursion test conditions are false.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="subdefine"></a>
|
.\" HTML <a name="subdefine"></a>
|
||||||
.SS "Defining subpatterns for use by reference only"
|
.SS "Defining subpatterns for use by reference only"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
If the condition is the string (DEFINE), and there is no subpattern with the
|
If the condition is the string (DEFINE), the condition is always false, even if
|
||||||
name DEFINE, the condition is always false. In this case, there may be only one
|
there is a group with the name DEFINE. In this case, there may be only one
|
||||||
alternative in the subpattern. It is always skipped if control reaches this
|
alternative in the subpattern. It is always skipped if control reaches this
|
||||||
point in the pattern; the idea of DEFINE is that it can be used to define
|
point in the pattern; the idea of DEFINE is that it can be used to define
|
||||||
subroutines that can be referenced from elsewhere. (The use of
|
subroutines that can be referenced from elsewhere. (The use of
|
||||||
|
@ -2994,12 +3032,20 @@ depending on whether or not a name is present.
|
||||||
By default, for compatibility with Perl, a name is any sequence of characters
|
By default, for compatibility with Perl, a name is any sequence of characters
|
||||||
that does not include a closing parenthesis. The name is not processed in
|
that does not include a closing parenthesis. The name is not processed in
|
||||||
any way, and it is not possible to include a closing parenthesis in the name.
|
any way, and it is not possible to include a closing parenthesis in the name.
|
||||||
However, if the PCRE2_ALT_VERBNAMES option is set, normal backslash processing
|
This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result
|
||||||
is applied to verb names and only an unescaped closing parenthesis terminates
|
is no longer Perl-compatible.
|
||||||
the name. A closing parenthesis can be included in a name either as \e) or
|
.P
|
||||||
between \eQ and \eE. If the PCRE2_EXTENDED option is set, unescaped whitespace
|
When PCRE2_ALT_VERBNAMES is set, backslash processing is applied to verb names
|
||||||
in verb names is skipped and #-comments are recognized, exactly as in the rest
|
and only an unescaped closing parenthesis terminates the name. However, the
|
||||||
of the pattern.
|
only backslash items that are permitted are \eQ, \eE, and sequences such as
|
||||||
|
\ex{100} that define character code points. Character type escapes such as \ed
|
||||||
|
are faulted.
|
||||||
|
.P
|
||||||
|
A closing parenthesis can be included in a name either as \e) or between \eQ
|
||||||
|
and \eE. In addition to backslash processing, if the PCRE2_EXTENDED option is
|
||||||
|
also set, unescaped whitespace in verb names is skipped, and #-comments are
|
||||||
|
recognized, exactly as in the rest of the pattern. PCRE2_EXTENDED does not
|
||||||
|
affect verb names unless PCRE2_ALT_VERBNAMES is also set.
|
||||||
.P
|
.P
|
||||||
The maximum length of a name is 255 in the 8-bit library and 65535 in the
|
The maximum length of a name is 255 in the 8-bit library and 65535 in the
|
||||||
16-bit and 32-bit libraries. If the name is empty, that is, if the closing
|
16-bit and 32-bit libraries. If the name is empty, that is, if the closing
|
||||||
|
@ -3429,6 +3475,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 20 June 2016
|
Last updated: 30 September 2016
|
||||||
Copyright (c) 1997-2016 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2SYNTAX 3 "16 October 2015" "PCRE2 10.21"
|
.TH PCRE2SYNTAX 3 "28 September 2016" "PCRE2 10.23"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||||
|
@ -473,6 +473,9 @@ Each top-level branch of a look behind must be of a fixed length.
|
||||||
\en reference by number (can be ambiguous)
|
\en reference by number (can be ambiguous)
|
||||||
\egn reference by number
|
\egn reference by number
|
||||||
\eg{n} reference by number
|
\eg{n} reference by number
|
||||||
|
\eg+n relative reference by number (PCRE2 extension)
|
||||||
|
\eg-n relative reference by number
|
||||||
|
\eg{+n} relative reference by number (PCRE2 extension)
|
||||||
\eg{-n} relative reference by number
|
\eg{-n} relative reference by number
|
||||||
\ek<name> reference by name (Perl)
|
\ek<name> reference by name (Perl)
|
||||||
\ek'name' reference by name (Perl)
|
\ek'name' reference by name (Perl)
|
||||||
|
@ -511,13 +514,17 @@ Each top-level branch of a look behind must be of a fixed length.
|
||||||
(?(-n) relative reference condition
|
(?(-n) relative reference condition
|
||||||
(?(<name>) named reference condition (Perl)
|
(?(<name>) named reference condition (Perl)
|
||||||
(?('name') named reference condition (Perl)
|
(?('name') named reference condition (Perl)
|
||||||
(?(name) named reference condition (PCRE2)
|
(?(name) named reference condition (PCRE2, deprecated)
|
||||||
(?(R) overall recursion condition
|
(?(R) overall recursion condition
|
||||||
(?(Rn) specific group recursion condition
|
(?(Rn) specific numbered group recursion condition
|
||||||
(?(R&name) specific recursion condition
|
(?(R&name) specific named group recursion condition
|
||||||
(?(DEFINE) define subpattern for reference
|
(?(DEFINE) define subpattern for reference
|
||||||
(?(VERSION[>]=n.m) test PCRE2 version
|
(?(VERSION[>]=n.m) test PCRE2 version
|
||||||
(?(assert) assertion condition
|
(?(assert) assertion condition
|
||||||
|
.sp
|
||||||
|
Note the ambiguity of (?(R) and (?(Rn) which might be named reference
|
||||||
|
conditions or recursion tests. Such a condition is interpreted as a reference
|
||||||
|
condition if the relevant named group exists.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "BACKTRACKING CONTROL"
|
.SH "BACKTRACKING CONTROL"
|
||||||
|
@ -577,6 +584,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 16 October 2015
|
Last updated: 28 September 2016
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
24
perltest.sh
24
perltest.sh
|
@ -1,14 +1,17 @@
|
||||||
#! /bin/sh
|
#! /bin/sh
|
||||||
|
|
||||||
# Script for testing regular expressions with perl to check that PCRE2 handles
|
# Script for testing regular expressions with perl to check that PCRE2 handles
|
||||||
# them the same. The Perl code has to have "use utf8" and "require Encode" at
|
# them the same. If the first argument to this script is "-w", Perl is also
|
||||||
# the start when running UTF-8 tests, but *not* for non-utf8 tests. (The
|
# called with "-w", which turns on its warning mode.
|
||||||
# "require" would actually be OK for non-utf8-tests, but is not always
|
#
|
||||||
# installed, so this way the script will always run for these tests.)
|
# The Perl code has to have "use utf8" and "require Encode" at the start when
|
||||||
|
# running UTF-8 tests, but *not* for non-utf8 tests. (The "require" would
|
||||||
|
# actually be OK for non-utf8-tests, but is not always installed, so this way
|
||||||
|
# the script will always run for these tests.)
|
||||||
#
|
#
|
||||||
# The desired effect is achieved by making this a shell script that passes the
|
# The desired effect is achieved by making this a shell script that passes the
|
||||||
# Perl script to Perl through a pipe. If the first argument is "-utf8", a
|
# Perl script to Perl through a pipe. If the first argument (possibly after
|
||||||
# suitable prefix is set up.
|
# removing "-w") is "-utf8", a suitable prefix is set up.
|
||||||
#
|
#
|
||||||
# The remaining arguments, if any, are passed to Perl. They are an input file
|
# The remaining arguments, if any, are passed to Perl. They are an input file
|
||||||
# and an output file. If there is one argument, the output is written to
|
# and an output file. If there is one argument, the output is written to
|
||||||
|
@ -17,7 +20,14 @@
|
||||||
# of the contorted piping input.)
|
# of the contorted piping input.)
|
||||||
|
|
||||||
perl=perl
|
perl=perl
|
||||||
|
perlarg=''
|
||||||
prefix=''
|
prefix=''
|
||||||
|
|
||||||
|
if [ $# -gt 0 -a "$1" = "-w" ] ; then
|
||||||
|
perlarg="-w"
|
||||||
|
shift
|
||||||
|
fi
|
||||||
|
|
||||||
if [ $# -gt 0 -a "$1" = "-utf8" ] ; then
|
if [ $# -gt 0 -a "$1" = "-utf8" ] ; then
|
||||||
prefix="use utf8; require Encode;"
|
prefix="use utf8; require Encode;"
|
||||||
shift
|
shift
|
||||||
|
@ -292,6 +302,6 @@ for (;;)
|
||||||
# printf $outfile "\n";
|
# printf $outfile "\n";
|
||||||
|
|
||||||
PERLEND
|
PERLEND
|
||||||
) | $perl - $@
|
) | $perl $perlarg - $@
|
||||||
|
|
||||||
# End
|
# End
|
||||||
|
|
11753
src/pcre2_compile.c
11753
src/pcre2_compile.c
File diff suppressed because it is too large
Load Diff
|
@ -91,13 +91,13 @@ static const unsigned char compile_error_texts[] =
|
||||||
"failed to allocate heap memory\0"
|
"failed to allocate heap memory\0"
|
||||||
"unmatched closing parenthesis\0"
|
"unmatched closing parenthesis\0"
|
||||||
"internal error: code overflow\0"
|
"internal error: code overflow\0"
|
||||||
"letter or underscore expected after (?< or (?'\0"
|
"missing closing parenthesis for condition\0"
|
||||||
/* 25 */
|
/* 25 */
|
||||||
"lookbehind assertion is not fixed length\0"
|
"lookbehind assertion is not fixed length\0"
|
||||||
"malformed number or name after (?(\0"
|
"a relative value of zero is not allowed\0"
|
||||||
"conditional group contains more than two branches\0"
|
"conditional group contains more than two branches\0"
|
||||||
"assertion expected after (?( or (?(?C)\0"
|
"assertion expected after (?( or (?(?C)\0"
|
||||||
"(?R or (?[+-]digits must be followed by )\0"
|
"digit expected after (?+ or (?-\0"
|
||||||
/* 30 */
|
/* 30 */
|
||||||
"unknown POSIX class name\0"
|
"unknown POSIX class name\0"
|
||||||
"internal error in pcre2_study(): should not occur\0"
|
"internal error in pcre2_study(): should not occur\0"
|
||||||
|
@ -105,7 +105,7 @@ static const unsigned char compile_error_texts[] =
|
||||||
"parentheses are too deeply nested (stack check)\0"
|
"parentheses are too deeply nested (stack check)\0"
|
||||||
"character code point value in \\x{} or \\o{} is too large\0"
|
"character code point value in \\x{} or \\o{} is too large\0"
|
||||||
/* 35 */
|
/* 35 */
|
||||||
"invalid condition (?(0)\0"
|
"lookbehind is too complicated\0"
|
||||||
"\\C is not allowed in a lookbehind assertion in UTF-" XSTRING(PCRE2_CODE_UNIT_WIDTH) " mode\0"
|
"\\C is not allowed in a lookbehind assertion in UTF-" XSTRING(PCRE2_CODE_UNIT_WIDTH) " mode\0"
|
||||||
"PCRE does not support \\L, \\l, \\N{name}, \\U, or \\u\0"
|
"PCRE does not support \\L, \\l, \\N{name}, \\U, or \\u\0"
|
||||||
"number after (?C is greater than 255\0"
|
"number after (?C is greater than 255\0"
|
||||||
|
@ -132,13 +132,13 @@ static const unsigned char compile_error_texts[] =
|
||||||
"missing opening brace after \\o\0"
|
"missing opening brace after \\o\0"
|
||||||
"internal error: unknown newline setting\0"
|
"internal error: unknown newline setting\0"
|
||||||
"\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0"
|
"\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0"
|
||||||
"a numbered reference must not be zero\0"
|
"(?R (recursive pattern call) must be followed by a closing parenthesis\0"
|
||||||
"an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0"
|
"an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0"
|
||||||
/* 60 */
|
/* 60 */
|
||||||
"(*VERB) not recognized or malformed\0"
|
"(*VERB) not recognized or malformed\0"
|
||||||
"number is too big\0"
|
"group number is too big\0"
|
||||||
"subpattern name expected\0"
|
"subpattern name expected\0"
|
||||||
"digit expected after (?+\0"
|
"SPARE ERROR\0"
|
||||||
"non-octal character in \\o{} (closing brace missing?)\0"
|
"non-octal character in \\o{} (closing brace missing?)\0"
|
||||||
/* 65 */
|
/* 65 */
|
||||||
"different names for subpatterns of the same number are not allowed\0"
|
"different names for subpatterns of the same number are not allowed\0"
|
||||||
|
@ -151,9 +151,9 @@ static const unsigned char compile_error_texts[] =
|
||||||
#endif
|
#endif
|
||||||
"\\k is not followed by a braced, angle-bracketed, or quoted name\0"
|
"\\k is not followed by a braced, angle-bracketed, or quoted name\0"
|
||||||
/* 70 */
|
/* 70 */
|
||||||
"internal error: unknown opcode in find_fixedlength()\0"
|
"internal error: unknown meta code in check_lookbehinds()\0"
|
||||||
"\\N is not supported in a class\0"
|
"\\N is not supported in a class\0"
|
||||||
"SPARE ERROR\0"
|
"callout string is too long\0"
|
||||||
"disallowed Unicode code point (>= 0xd800 && <= 0xdfff)\0"
|
"disallowed Unicode code point (>= 0xd800 && <= 0xdfff)\0"
|
||||||
"using UTF is disabled by the application\0"
|
"using UTF is disabled by the application\0"
|
||||||
/* 75 */
|
/* 75 */
|
||||||
|
@ -161,7 +161,7 @@ static const unsigned char compile_error_texts[] =
|
||||||
"name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0"
|
"name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0"
|
||||||
"character code point value in \\u.... sequence is too large\0"
|
"character code point value in \\u.... sequence is too large\0"
|
||||||
"digits missing in \\x{} or \\o{}\0"
|
"digits missing in \\x{} or \\o{}\0"
|
||||||
"syntax error in (?(VERSION condition\0"
|
"syntax error or number too big in (?(VERSION condition\0"
|
||||||
/* 80 */
|
/* 80 */
|
||||||
"internal error: unknown opcode in auto_possessify()\0"
|
"internal error: unknown opcode in auto_possessify()\0"
|
||||||
"missing terminating delimiter for callout with string argument\0"
|
"missing terminating delimiter for callout with string argument\0"
|
||||||
|
@ -173,6 +173,8 @@ static const unsigned char compile_error_texts[] =
|
||||||
"regular expression is too complicated\0"
|
"regular expression is too complicated\0"
|
||||||
"lookbehind assertion is too long\0"
|
"lookbehind assertion is too long\0"
|
||||||
"pattern string is longer than the limit set by the application\0"
|
"pattern string is longer than the limit set by the application\0"
|
||||||
|
"internal error: unknown code in parsed pattern\0"
|
||||||
|
/* 90 */
|
||||||
;
|
;
|
||||||
|
|
||||||
/* Match-time and UTF error texts are in the same format. */
|
/* Match-time and UTF error texts are in the same format. */
|
||||||
|
|
|
@ -1298,23 +1298,16 @@ mode rather than an escape sequence. It is also used for [^] in JavaScript
|
||||||
compatibility mode, and for \C in non-utf mode. In non-DOTALL mode, "." behaves
|
compatibility mode, and for \C in non-utf mode. In non-DOTALL mode, "." behaves
|
||||||
like \N.
|
like \N.
|
||||||
|
|
||||||
The special values ESC_DU, ESC_du, etc. are used instead of ESC_D, ESC_d, etc.
|
|
||||||
when PCRE2_UCP is set and replacement of \d etc by \p sequences is required.
|
|
||||||
They must be contiguous, and remain in order so that the replacements can be
|
|
||||||
looked up from a table.
|
|
||||||
|
|
||||||
Negative numbers are used to encode a backreference (\1, \2, \3, etc.) in
|
Negative numbers are used to encode a backreference (\1, \2, \3, etc.) in
|
||||||
check_escape(). There are two tests in the code for an escape
|
check_escape(). There are tests in the code for an escape greater than ESC_b
|
||||||
greater than ESC_b and less than ESC_Z to detect the types that may be
|
and less than ESC_Z to detect the types that may be repeated. These are the
|
||||||
repeated. These are the types that consume characters. If any new escapes are
|
types that consume characters. If any new escapes are put in between that don't
|
||||||
put in between that don't consume a character, that code will have to change.
|
consume a character, that code will have to change. */
|
||||||
*/
|
|
||||||
|
|
||||||
enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s,
|
enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s,
|
||||||
ESC_W, ESC_w, ESC_N, ESC_dum, ESC_C, ESC_P, ESC_p, ESC_R, ESC_H,
|
ESC_W, ESC_w, ESC_N, ESC_dum, ESC_C, ESC_P, ESC_p, ESC_R, ESC_H,
|
||||||
ESC_h, ESC_V, ESC_v, ESC_X, ESC_Z, ESC_z,
|
ESC_h, ESC_V, ESC_v, ESC_X, ESC_Z, ESC_z,
|
||||||
ESC_E, ESC_Q, ESC_g, ESC_k,
|
ESC_E, ESC_Q, ESC_g, ESC_k };
|
||||||
ESC_DU, ESC_du, ESC_SU, ESC_su, ESC_WU, ESC_wu };
|
|
||||||
|
|
||||||
|
|
||||||
/********************** Opcode definitions ******************/
|
/********************** Opcode definitions ******************/
|
||||||
|
@ -1380,7 +1373,8 @@ enum {
|
||||||
OP_CIRC, /* 27 Start of line - not multiline */
|
OP_CIRC, /* 27 Start of line - not multiline */
|
||||||
OP_CIRCM, /* 28 Start of line - multiline */
|
OP_CIRCM, /* 28 Start of line - multiline */
|
||||||
|
|
||||||
/* Single characters; caseful must precede the caseless ones */
|
/* Single characters; caseful must precede the caseless ones, and these
|
||||||
|
must remain in this order, and adjacent. */
|
||||||
|
|
||||||
OP_CHAR, /* 29 Match one character, casefully */
|
OP_CHAR, /* 29 Match one character, casefully */
|
||||||
OP_CHARI, /* 30 Match one character, caselessly */
|
OP_CHARI, /* 30 Match one character, caselessly */
|
||||||
|
|
|
@ -648,18 +648,24 @@ typedef struct pcre2_real_match_data {
|
||||||
|
|
||||||
#ifndef PCRE2_PCRE2TEST
|
#ifndef PCRE2_PCRE2TEST
|
||||||
|
|
||||||
/* Structure for checking for mutual recursion when scanning compiled code. */
|
/* Structures for checking for mutual recursion when scanning compiled or
|
||||||
|
parsed code. */
|
||||||
|
|
||||||
typedef struct recurse_check {
|
typedef struct recurse_check {
|
||||||
struct recurse_check *prev;
|
struct recurse_check *prev;
|
||||||
PCRE2_SPTR group;
|
PCRE2_SPTR group;
|
||||||
} recurse_check;
|
} recurse_check;
|
||||||
|
|
||||||
|
typedef struct parsed_recurse_check {
|
||||||
|
struct parsed_recurse_check *prev;
|
||||||
|
uint32_t *groupptr;
|
||||||
|
} parsed_recurse_check;
|
||||||
|
|
||||||
/* Structure for building a cache when filling in recursion offsets. */
|
/* Structure for building a cache when filling in recursion offsets. */
|
||||||
|
|
||||||
typedef struct recurse_cache {
|
typedef struct recurse_cache {
|
||||||
PCRE2_SPTR group;
|
PCRE2_SPTR group;
|
||||||
int recno;
|
int groupnumber;
|
||||||
} recurse_cache;
|
} recurse_cache;
|
||||||
|
|
||||||
/* Structure for maintaining a chain of pointers to the currently incomplete
|
/* Structure for maintaining a chain of pointers to the currently incomplete
|
||||||
|
@ -693,9 +699,10 @@ typedef struct compile_block {
|
||||||
PCRE2_SPTR start_code; /* The start of the compiled code */
|
PCRE2_SPTR start_code; /* The start of the compiled code */
|
||||||
PCRE2_SPTR start_pattern; /* The start of the pattern */
|
PCRE2_SPTR start_pattern; /* The start of the pattern */
|
||||||
PCRE2_SPTR end_pattern; /* The end of the pattern */
|
PCRE2_SPTR end_pattern; /* The end of the pattern */
|
||||||
PCRE2_SPTR nestptr[2]; /* Pointer(s) saved for string substitution */
|
|
||||||
PCRE2_UCHAR *name_table; /* The name/number table */
|
PCRE2_UCHAR *name_table; /* The name/number table */
|
||||||
size_t workspace_size; /* Size of workspace */
|
PCRE2_SIZE workspace_size; /* Size of workspace */
|
||||||
|
PCRE2_SIZE small_ref_offset[10]; /* Offsets for \1 to \9 */
|
||||||
|
PCRE2_SIZE erroroffset; /* Offset of error in pattern */
|
||||||
uint16_t names_found; /* Number of entries so far */
|
uint16_t names_found; /* Number of entries so far */
|
||||||
uint16_t name_entry_size; /* Size of each entry */
|
uint16_t name_entry_size; /* Size of each entry */
|
||||||
open_capitem *open_caps; /* Chain of open capture items */
|
open_capitem *open_caps; /* Chain of open capture items */
|
||||||
|
@ -703,8 +710,9 @@ typedef struct compile_block {
|
||||||
uint32_t named_group_list_size; /* Number of entries in the list */
|
uint32_t named_group_list_size; /* Number of entries in the list */
|
||||||
uint32_t external_options; /* External (initial) options */
|
uint32_t external_options; /* External (initial) options */
|
||||||
uint32_t external_flags; /* External flag bits to be set */
|
uint32_t external_flags; /* External flag bits to be set */
|
||||||
uint32_t bracount; /* Count of capturing parens as we compile */
|
uint32_t bracount; /* Count of capturing parentheses */
|
||||||
uint32_t final_bracount; /* Saved value after first pass */
|
uint32_t lastcapture; /* Last capture encountered */
|
||||||
|
uint32_t *parsed_pattern; /* Parsed pattern buffer */
|
||||||
uint32_t *groupinfo; /* Group info vector */
|
uint32_t *groupinfo; /* Group info vector */
|
||||||
uint32_t top_backref; /* Maximum back reference */
|
uint32_t top_backref; /* Maximum back reference */
|
||||||
uint32_t backref_map; /* Bitmap of low back refs */
|
uint32_t backref_map; /* Bitmap of low back refs */
|
||||||
|
@ -718,9 +726,7 @@ typedef struct compile_block {
|
||||||
BOOL had_accept; /* (*ACCEPT) encountered */
|
BOOL had_accept; /* (*ACCEPT) encountered */
|
||||||
BOOL had_pruneorskip; /* (*PRUNE) or (*SKIP) encountered */
|
BOOL had_pruneorskip; /* (*PRUNE) or (*SKIP) encountered */
|
||||||
BOOL had_recurse; /* Had a recursion or subroutine call */
|
BOOL had_recurse; /* Had a recursion or subroutine call */
|
||||||
BOOL check_lookbehind; /* Lookbehinds need later checking */
|
|
||||||
BOOL dupnames; /* Duplicate names exist */
|
BOOL dupnames; /* Duplicate names exist */
|
||||||
BOOL iscondassert; /* Next assert is a condition */
|
|
||||||
} compile_block;
|
} compile_block;
|
||||||
|
|
||||||
/* Structure for keeping the properties of the in-memory stack used
|
/* Structure for keeping the properties of the in-memory stack used
|
||||||
|
|
|
@ -114,7 +114,7 @@ for (; ptr < ptrend; ptr++)
|
||||||
else if (*ptr == CHAR_BACKSLASH)
|
else if (*ptr == CHAR_BACKSLASH)
|
||||||
{
|
{
|
||||||
int erc;
|
int erc;
|
||||||
int errorcode = 0;
|
int errorcode;
|
||||||
uint32_t ch;
|
uint32_t ch;
|
||||||
|
|
||||||
if (ptr < ptrend - 1) switch (ptr[1])
|
if (ptr < ptrend - 1) switch (ptr[1])
|
||||||
|
@ -127,8 +127,10 @@ for (; ptr < ptrend; ptr++)
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
ptr += 1; /* Must point after \ */
|
||||||
erc = PRIV(check_escape)(&ptr, ptrend, &ch, &errorcode,
|
erc = PRIV(check_escape)(&ptr, ptrend, &ch, &errorcode,
|
||||||
code->overall_options, FALSE, NULL);
|
code->overall_options, FALSE, NULL);
|
||||||
|
ptr -= 1; /* Back to last code unit of escape */
|
||||||
if (errorcode != 0)
|
if (errorcode != 0)
|
||||||
{
|
{
|
||||||
rc = errorcode;
|
rc = errorcode;
|
||||||
|
@ -698,7 +700,7 @@ do
|
||||||
else if ((suboptions & PCRE2_SUBSTITUTE_EXTENDED) != 0 &&
|
else if ((suboptions & PCRE2_SUBSTITUTE_EXTENDED) != 0 &&
|
||||||
*ptr == CHAR_BACKSLASH)
|
*ptr == CHAR_BACKSLASH)
|
||||||
{
|
{
|
||||||
int errorcode = 0;
|
int errorcode;
|
||||||
|
|
||||||
if (ptr < repend - 1) switch (ptr[1])
|
if (ptr < repend - 1) switch (ptr[1])
|
||||||
{
|
{
|
||||||
|
@ -728,10 +730,10 @@ do
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
ptr++; /* Point after \ */
|
||||||
rc = PRIV(check_escape)(&ptr, repend, &ch, &errorcode,
|
rc = PRIV(check_escape)(&ptr, repend, &ch, &errorcode,
|
||||||
code->overall_options, FALSE, NULL);
|
code->overall_options, FALSE, NULL);
|
||||||
if (errorcode != 0) goto BADESCAPE;
|
if (errorcode != 0) goto BADESCAPE;
|
||||||
ptr++;
|
|
||||||
|
|
||||||
switch(rc)
|
switch(rc)
|
||||||
{
|
{
|
||||||
|
|
182
src/pcre2test.c
182
src/pcre2test.c
|
@ -2808,7 +2808,7 @@ return 0;
|
||||||
|
|
||||||
/* In UTF mode the input is always interpreted as a string of UTF-8 bytes using
|
/* In UTF mode the input is always interpreted as a string of UTF-8 bytes using
|
||||||
the original UTF-8 definition of RFC 2279, which allows for up to 6 bytes, and
|
the original UTF-8 definition of RFC 2279, which allows for up to 6 bytes, and
|
||||||
code values from 0 to 0x7fffffff. However, values greater than the later UTF
|
code values from 0 to 0x7fffffff. However, values greater than the later UTF
|
||||||
limit of 0x10ffff cause an error.
|
limit of 0x10ffff cause an error.
|
||||||
|
|
||||||
In non-UTF mode the input is interpreted as UTF-8 if the utf8_input modifier
|
In non-UTF mode the input is interpreted as UTF-8 if the utf8_input modifier
|
||||||
|
@ -2867,7 +2867,7 @@ if (!utf && (pat_patctl.control & CTL_UTF8_INPUT) == 0)
|
||||||
|
|
||||||
else while (len > 0)
|
else while (len > 0)
|
||||||
{
|
{
|
||||||
int chlen;
|
int chlen;
|
||||||
uint32_t c;
|
uint32_t c;
|
||||||
uint32_t topbit = 0;
|
uint32_t topbit = 0;
|
||||||
if (!utf && *p == 0xff && len > 1)
|
if (!utf && *p == 0xff && len > 1)
|
||||||
|
@ -2875,7 +2875,7 @@ else while (len > 0)
|
||||||
topbit = 0x80000000u;
|
topbit = 0x80000000u;
|
||||||
p++;
|
p++;
|
||||||
len--;
|
len--;
|
||||||
}
|
}
|
||||||
chlen = utf82ord(p, &c);
|
chlen = utf82ord(p, &c);
|
||||||
if (chlen <= 0) return -1;
|
if (chlen <= 0) return -1;
|
||||||
if (utf && c > 0x10ffff) return -2;
|
if (utf && c > 0x10ffff) return -2;
|
||||||
|
@ -4494,6 +4494,7 @@ unsigned int delimiter = *p++;
|
||||||
int errorcode;
|
int errorcode;
|
||||||
void *use_pat_context;
|
void *use_pat_context;
|
||||||
PCRE2_SIZE patlen;
|
PCRE2_SIZE patlen;
|
||||||
|
PCRE2_SIZE valgrind_access_length;
|
||||||
PCRE2_SIZE erroroffset;
|
PCRE2_SIZE erroroffset;
|
||||||
|
|
||||||
/* Initialize the context and pattern/data controls for this test from the
|
/* Initialize the context and pattern/data controls for this test from the
|
||||||
|
@ -4537,7 +4538,7 @@ patlen = p - buffer - 2;
|
||||||
if (!decode_modifiers(p, CTX_PAT, &pat_patctl, NULL)) return PR_SKIP;
|
if (!decode_modifiers(p, CTX_PAT, &pat_patctl, NULL)) return PR_SKIP;
|
||||||
utf = (pat_patctl.options & PCRE2_UTF) != 0;
|
utf = (pat_patctl.options & PCRE2_UTF) != 0;
|
||||||
|
|
||||||
/* The utf8_input modifier is not allowed in 8-bit mode, and is mutually
|
/* The utf8_input modifier is not allowed in 8-bit mode, and is mutually
|
||||||
exclusive with the utf modifier. */
|
exclusive with the utf modifier. */
|
||||||
|
|
||||||
if ((pat_patctl.control & CTL_UTF8_INPUT) != 0)
|
if ((pat_patctl.control & CTL_UTF8_INPUT) != 0)
|
||||||
|
@ -4550,8 +4551,8 @@ if ((pat_patctl.control & CTL_UTF8_INPUT) != 0)
|
||||||
if (utf)
|
if (utf)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "** The utf and utf8_input modifiers are mutually exclusive\n");
|
fprintf(outfile, "** The utf and utf8_input modifiers are mutually exclusive\n");
|
||||||
return PR_SKIP;
|
return PR_SKIP;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Check for mutually exclusive modifiers. At present, these are all in the
|
/* Check for mutually exclusive modifiers. At present, these are all in the
|
||||||
|
@ -4949,11 +4950,43 @@ switch(errorcode)
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* The pattern is now in pbuffer[8|16|32], with the length in patlen. By
|
/* The pattern is now in pbuffer[8|16|32], with the length in code units in
|
||||||
default, however, we pass a zero-terminated pattern. The length is passed only
|
patlen. By default, however, we pass a zero-terminated pattern. The length is
|
||||||
if we had a hex pattern. */
|
passed only if we had a hex pattern. When valgrind is supported, arrange for
|
||||||
|
the unused part of the buffer to be marked as no access. */
|
||||||
|
|
||||||
if ((pat_patctl.control & CTL_HEXPAT) == 0) patlen = PCRE2_ZERO_TERMINATED;
|
valgrind_access_length = patlen;
|
||||||
|
if ((pat_patctl.control & CTL_HEXPAT) == 0)
|
||||||
|
{
|
||||||
|
patlen = PCRE2_ZERO_TERMINATED;
|
||||||
|
valgrind_access_length += 1; /* For the terminating zero */
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifdef SUPPORT_VALGRIND
|
||||||
|
#ifdef SUPPORT_PCRE2_8
|
||||||
|
if (test_mode == PCRE8_MODE && pbuffer8 != NULL)
|
||||||
|
{
|
||||||
|
VALGRIND_MAKE_MEM_NOACCESS(pbuffer8 + valgrind_access_length,
|
||||||
|
pbuffer8_size - valgrind_access_length);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
#ifdef SUPPORT_PCRE2_16
|
||||||
|
if (test_mode == PCRE16_MODE && pbuffer16 != NULL)
|
||||||
|
{
|
||||||
|
VALGRIND_MAKE_MEM_NOACCESS(pbuffer16 + valgrind_access_length,
|
||||||
|
pbuffer16_size - valgrind_access_length*sizeof(uint16_t));
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
#ifdef SUPPORT_PCRE2_32
|
||||||
|
if (test_mode == PCRE32_MODE && pbuffer32 != NULL)
|
||||||
|
{
|
||||||
|
VALGRIND_MAKE_MEM_NOACCESS(pbuffer32 + valgrind_access_length,
|
||||||
|
pbuffer32_size - valgrind_access_length*sizeof(uint32_t));
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
#else /* Valgrind not supported */
|
||||||
|
(void)valgrind_access_length; /* Avoid compiler warning */
|
||||||
|
#endif
|
||||||
|
|
||||||
/* If #newline_default has been used and the library was not compiled with an
|
/* If #newline_default has been used and the library was not compiled with an
|
||||||
appropriate default newline setting, local_newline_default will be non-zero. We
|
appropriate default newline setting, local_newline_default will be non-zero. We
|
||||||
|
@ -4996,6 +5029,65 @@ if (timeit > 0)
|
||||||
PCRE2_COMPILE(compiled_code, pbuffer, patlen, pat_patctl.options|forbid_utf,
|
PCRE2_COMPILE(compiled_code, pbuffer, patlen, pat_patctl.options|forbid_utf,
|
||||||
&errorcode, &erroroffset, use_pat_context);
|
&errorcode, &erroroffset, use_pat_context);
|
||||||
|
|
||||||
|
/* Call the JIT compiler if requested. When timing, we must free and recompile
|
||||||
|
the pattern each time because that is the only way to free the JIT compiled
|
||||||
|
code. We know that compilation will always succeed. */
|
||||||
|
|
||||||
|
if (TEST(compiled_code, !=, NULL) && pat_patctl.jit != 0)
|
||||||
|
{
|
||||||
|
if (timeit > 0)
|
||||||
|
{
|
||||||
|
register int i;
|
||||||
|
clock_t time_taken = 0;
|
||||||
|
for (i = 0; i < timeit; i++)
|
||||||
|
{
|
||||||
|
clock_t start_time;
|
||||||
|
SUB1(pcre2_code_free, compiled_code);
|
||||||
|
PCRE2_COMPILE(compiled_code, pbuffer, patlen,
|
||||||
|
pat_patctl.options|forbid_utf, &errorcode, &erroroffset,
|
||||||
|
use_pat_context);
|
||||||
|
start_time = clock();
|
||||||
|
PCRE2_JIT_COMPILE(jitrc,compiled_code, pat_patctl.jit);
|
||||||
|
time_taken += clock() - start_time;
|
||||||
|
}
|
||||||
|
total_jit_compile_time += time_taken;
|
||||||
|
fprintf(outfile, "JIT compile %.4f milliseconds\n",
|
||||||
|
(((double)time_taken * 1000.0) / (double)timeit) /
|
||||||
|
(double)CLOCKS_PER_SEC);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* If valgrind is supported, mark the pbuffer as accessible again. The 16-bit
|
||||||
|
and 32-bit buffers can be marked completely undefined, but we must leave the
|
||||||
|
pattern in the 8-bit buffer defined because it may be read from a callout
|
||||||
|
during matching. */
|
||||||
|
|
||||||
|
#ifdef SUPPORT_VALGRIND
|
||||||
|
#ifdef SUPPORT_PCRE2_8
|
||||||
|
if (test_mode == PCRE8_MODE)
|
||||||
|
{
|
||||||
|
VALGRIND_MAKE_MEM_UNDEFINED(pbuffer8 + valgrind_access_length,
|
||||||
|
pbuffer8_size - valgrind_access_length);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
#ifdef SUPPORT_PCRE2_16
|
||||||
|
if (test_mode == PCRE16_MODE)
|
||||||
|
{
|
||||||
|
VALGRIND_MAKE_MEM_UNDEFINED(pbuffer16, pbuffer16_size);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
#ifdef SUPPORT_PCRE2_32
|
||||||
|
if (test_mode == PCRE32_MODE)
|
||||||
|
{
|
||||||
|
VALGRIND_MAKE_MEM_UNDEFINED(pbuffer32, pbuffer32_size);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
#endif
|
||||||
|
|
||||||
/* Compilation failed; go back for another re, skipping to blank line
|
/* Compilation failed; go back for another re, skipping to blank line
|
||||||
if non-interactive. */
|
if non-interactive. */
|
||||||
|
|
||||||
|
@ -5029,38 +5121,6 @@ if (forbid_utf != 0)
|
||||||
if (pattern_info(PCRE2_INFO_MAXLOOKBEHIND, &maxlookbehind, FALSE) != 0)
|
if (pattern_info(PCRE2_INFO_MAXLOOKBEHIND, &maxlookbehind, FALSE) != 0)
|
||||||
return PR_ABEND;
|
return PR_ABEND;
|
||||||
|
|
||||||
/* Call the JIT compiler if requested. When timing, we must free and recompile
|
|
||||||
the pattern each time because that is the only way to free the JIT compiled
|
|
||||||
code. We know that compilation will always succeed. */
|
|
||||||
|
|
||||||
if (pat_patctl.jit != 0)
|
|
||||||
{
|
|
||||||
if (timeit > 0)
|
|
||||||
{
|
|
||||||
register int i;
|
|
||||||
clock_t time_taken = 0;
|
|
||||||
for (i = 0; i < timeit; i++)
|
|
||||||
{
|
|
||||||
clock_t start_time;
|
|
||||||
SUB1(pcre2_code_free, compiled_code);
|
|
||||||
PCRE2_COMPILE(compiled_code, pbuffer, patlen,
|
|
||||||
pat_patctl.options|forbid_utf, &errorcode, &erroroffset,
|
|
||||||
use_pat_context);
|
|
||||||
start_time = clock();
|
|
||||||
PCRE2_JIT_COMPILE(jitrc,compiled_code, pat_patctl.jit);
|
|
||||||
time_taken += clock() - start_time;
|
|
||||||
}
|
|
||||||
total_jit_compile_time += time_taken;
|
|
||||||
fprintf(outfile, "JIT compile %.4f milliseconds\n",
|
|
||||||
(((double)time_taken * 1000.0) / (double)timeit) /
|
|
||||||
(double)CLOCKS_PER_SEC);
|
|
||||||
}
|
|
||||||
else
|
|
||||||
{
|
|
||||||
PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/* If an explicit newline modifier was given, set the information flag in the
|
/* If an explicit newline modifier was given, set the information flag in the
|
||||||
pattern so that it is preserved over push/pop. */
|
pattern so that it is preserved over push/pop. */
|
||||||
|
|
||||||
|
@ -5300,10 +5360,10 @@ if (post_start > 0)
|
||||||
for (i = 0; i < subject_length - pre_start - post_start + 4; i++)
|
for (i = 0; i < subject_length - pre_start - post_start + 4; i++)
|
||||||
fprintf(outfile, " ");
|
fprintf(outfile, " ");
|
||||||
|
|
||||||
fprintf(outfile, "%.*s",
|
if (cb->next_item_length != 0)
|
||||||
(int)((cb->next_item_length == 0)? 1 : cb->next_item_length),
|
fprintf(outfile, "%.*s", (int)(cb->next_item_length),
|
||||||
pbuffer8 + cb->pattern_position);
|
pbuffer8 + cb->pattern_position);
|
||||||
|
|
||||||
fprintf(outfile, "\n");
|
fprintf(outfile, "\n");
|
||||||
first_callout = FALSE;
|
first_callout = FALSE;
|
||||||
|
|
||||||
|
@ -5740,18 +5800,18 @@ while ((c = *p++) != 0)
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Handle a non-escaped character. In non-UTF 32-bit mode with utf8_input
|
/* Handle a non-escaped character. In non-UTF 32-bit mode with utf8_input
|
||||||
set, do the fudge for setting the top bit. */
|
set, do the fudge for setting the top bit. */
|
||||||
|
|
||||||
if (c != '\\')
|
if (c != '\\')
|
||||||
{
|
{
|
||||||
uint32_t topbit = 0;
|
uint32_t topbit = 0;
|
||||||
if (test_mode == PCRE32_MODE && c == 0xff && *p != 0)
|
if (test_mode == PCRE32_MODE && c == 0xff && *p != 0)
|
||||||
{
|
{
|
||||||
topbit = 0x80000000;
|
topbit = 0x80000000;
|
||||||
c = *p++;
|
c = *p++;
|
||||||
}
|
}
|
||||||
if ((utf || (pat_patctl.control & CTL_UTF8_INPUT) != 0) &&
|
if ((utf || (pat_patctl.control & CTL_UTF8_INPUT) != 0) &&
|
||||||
HASUTF8EXTRALEN(c)) { GETUTF8INC(c, p); }
|
HASUTF8EXTRALEN(c)) { GETUTF8INC(c, p); }
|
||||||
c |= topbit;
|
c |= topbit;
|
||||||
}
|
}
|
||||||
|
@ -6405,7 +6465,7 @@ else for (gmatched = 0;; gmatched++)
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Otherwise just run a single match, setting up a callout if required (the
|
/* Otherwise just run a single match, setting up a callout if required (the
|
||||||
default). */
|
default). There is a copy of the pattern in pbuffer8 for use by callouts. */
|
||||||
|
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
|
@ -7583,6 +7643,10 @@ if (argc > 1 && strcmp(argv[op], "-") != 0)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT)
|
||||||
|
if (INTERACTIVE(infile)) using_history();
|
||||||
|
#endif
|
||||||
|
|
||||||
if (argc > 2)
|
if (argc > 2)
|
||||||
{
|
{
|
||||||
outfile = fopen(argv[op+1], OUTPUT_MODE);
|
outfile = fopen(argv[op+1], OUTPUT_MODE);
|
||||||
|
@ -7621,8 +7685,7 @@ while (notdone)
|
||||||
p = buffer;
|
p = buffer;
|
||||||
|
|
||||||
/* If we have a pattern set up for testing, or we are skipping after a
|
/* If we have a pattern set up for testing, or we are skipping after a
|
||||||
compile failure, a blank line terminates this test; otherwise process the
|
compile failure, a blank line terminates this test. */
|
||||||
line as a data line. */
|
|
||||||
|
|
||||||
if (expectdata || skipping)
|
if (expectdata || skipping)
|
||||||
{
|
{
|
||||||
|
@ -7645,14 +7708,21 @@ while (notdone)
|
||||||
skipping = FALSE;
|
skipping = FALSE;
|
||||||
setlocale(LC_CTYPE, "C");
|
setlocale(LC_CTYPE, "C");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Otherwise, if we are not skipping, and the line is not a data comment
|
||||||
|
line starting with "\=", process a data line. */
|
||||||
|
|
||||||
else if (!skipping && !(p[0] == '\\' && p[1] == '=' && isspace(p[2])))
|
else if (!skipping && !(p[0] == '\\' && p[1] == '=' && isspace(p[2])))
|
||||||
|
{
|
||||||
rc = process_data();
|
rc = process_data();
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* We do not have a pattern set up for testing. Lines starting with # are
|
/* We do not have a pattern set up for testing. Lines starting with # are
|
||||||
either comments or special commands. Blank lines are ignored. Otherwise, the
|
either comments or special commands. Blank lines are ignored. Otherwise, the
|
||||||
line must start with a valid delimiter. It is then processed as a pattern
|
line must start with a valid delimiter. It is then processed as a pattern
|
||||||
line. */
|
line. A copy of the pattern is left in pbuffer8 for use by callouts. Under
|
||||||
|
valgrind, make the unused part of the buffer undefined, to catch overruns. */
|
||||||
|
|
||||||
else if (*p == '#')
|
else if (*p == '#')
|
||||||
{
|
{
|
||||||
|
@ -7713,6 +7783,10 @@ if (showtotaltimes)
|
||||||
|
|
||||||
EXIT:
|
EXIT:
|
||||||
|
|
||||||
|
#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT)
|
||||||
|
if (infile != NULL && INTERACTIVE(infile)) clear_history();
|
||||||
|
#endif
|
||||||
|
|
||||||
if (infile != NULL && infile != stdin) fclose(infile);
|
if (infile != NULL && infile != stdin) fclose(infile);
|
||||||
if (outfile != NULL && outfile != stdout) fclose(outfile);
|
if (outfile != NULL && outfile != stdout) fclose(outfile);
|
||||||
|
|
||||||
|
|
|
@ -5792,4 +5792,18 @@ name)/mark
|
||||||
aaaccccaaa
|
aaaccccaaa
|
||||||
bccccb
|
bccccb
|
||||||
|
|
||||||
|
# /x does not apply to MARK labels
|
||||||
|
|
||||||
|
/x (*MARK:ab cd # comment
|
||||||
|
ef) x/x,mark
|
||||||
|
axxz
|
||||||
|
|
||||||
|
/(?<=a(B){0}c)X/
|
||||||
|
acX
|
||||||
|
|
||||||
|
/(?<DEFINE>b)(?(DEFINE)(a+))(?&DEFINE)/
|
||||||
|
bbbb
|
||||||
|
\= Expect no match
|
||||||
|
baaab
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -79,7 +79,7 @@
|
||||||
/((?2))((?1))/
|
/((?2))((?1))/
|
||||||
abc
|
abc
|
||||||
|
|
||||||
/((?(R2)a+|(?1)b))/
|
/((?(R2)a+|(?1)b))()/
|
||||||
aaaabcde
|
aaaabcde
|
||||||
|
|
||||||
/(?(R)a*(?1)|((?R))b)/
|
/(?(R)a*(?1)|((?R))b)/
|
||||||
|
|
|
@ -177,7 +177,7 @@
|
||||||
/((?2))((?1))/
|
/((?2))((?1))/
|
||||||
abc
|
abc
|
||||||
|
|
||||||
/((?(R2)a+|(?1)b))/
|
/((?(R2)a+|(?1)b))()/
|
||||||
aaaabcde
|
aaaabcde
|
||||||
|
|
||||||
/(?(R)a*(?1)|((?R))b)/
|
/(?(R)a*(?1)|((?R))b)/
|
||||||
|
|
|
@ -189,9 +189,9 @@
|
||||||
the barfoo
|
the barfoo
|
||||||
and cattlefoo
|
and cattlefoo
|
||||||
|
|
||||||
/(?<=a+)b/
|
/abc(?<=a+)b/
|
||||||
|
|
||||||
/(?<=aaa|b{0,3})b/
|
/12345(?<=aaa|b{0,3})b/
|
||||||
|
|
||||||
/(?<!(foo)a\1)bar/
|
/(?<!(foo)a\1)bar/
|
||||||
|
|
||||||
|
@ -4518,6 +4518,18 @@
|
||||||
\ B)x/x,alt_verbnames,mark
|
\ B)x/x,alt_verbnames,mark
|
||||||
x
|
x
|
||||||
|
|
||||||
|
/(*: A \ and #comment
|
||||||
|
\ B)x/alt_verbnames,mark
|
||||||
|
x
|
||||||
|
|
||||||
|
/(*: A \ and #comment
|
||||||
|
\ B)x/x,mark
|
||||||
|
x
|
||||||
|
|
||||||
|
/(*: A \ and #comment
|
||||||
|
\ B)x/mark
|
||||||
|
x
|
||||||
|
|
||||||
/(*:A
|
/(*:A
|
||||||
B)x/alt_verbnames,mark
|
B)x/alt_verbnames,mark
|
||||||
x
|
x
|
||||||
|
@ -4819,4 +4831,61 @@ a)"xI
|
||||||
|
|
||||||
/\[AB]{6000000000000000000000}/expand
|
/\[AB]{6000000000000000000000}/expand
|
||||||
|
|
||||||
|
# Hex uses pattern length, not zero-terminated. This tests for overrunning
|
||||||
|
# the given length of a pattern.
|
||||||
|
|
||||||
|
/'(*U'/hex
|
||||||
|
|
||||||
|
/'(*'/hex
|
||||||
|
|
||||||
|
/'('/hex
|
||||||
|
|
||||||
|
//hex
|
||||||
|
|
||||||
|
# These tests are here because Perl never allows a back reference in a
|
||||||
|
# lookbehind. PCRE2 supports some limited cases.
|
||||||
|
|
||||||
|
/([ab])...(?<=\1)z/
|
||||||
|
a11az
|
||||||
|
b11bz
|
||||||
|
\= Expect no match
|
||||||
|
b11az
|
||||||
|
|
||||||
|
/(?|([ab]))...(?<=\1)z/
|
||||||
|
|
||||||
|
/([ab])(\1)...(?<=\2)z/
|
||||||
|
aa11az
|
||||||
|
|
||||||
|
/(a\2)(b\1)(?<=\2)/
|
||||||
|
|
||||||
|
/(?<A>[ab])...(?<=\k'A')z/
|
||||||
|
a11az
|
||||||
|
b11bz
|
||||||
|
\= Expect no match
|
||||||
|
b11az
|
||||||
|
|
||||||
|
/(?<A>[ab])...(?<=\k'A')(?<A>)z/dupnames
|
||||||
|
|
||||||
|
# Perl does not support \g+n
|
||||||
|
|
||||||
|
/((\g+1X)?([ab]))+/
|
||||||
|
aaXbbXa
|
||||||
|
|
||||||
|
/ab(?C1)c/auto_callout
|
||||||
|
abc
|
||||||
|
|
||||||
|
/'ab(?C1)c'/hex,auto_callout
|
||||||
|
abc
|
||||||
|
|
||||||
|
# Perl accepts these, but gives a warning. We can't warn, so give an error.
|
||||||
|
|
||||||
|
/[a-[:digit:]]+/
|
||||||
|
a-a9-a
|
||||||
|
|
||||||
|
/[A-[:digit:]]+/
|
||||||
|
A-A9-A
|
||||||
|
|
||||||
|
/[a-\d]+/
|
||||||
|
a-a9-a
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -1723,5 +1723,14 @@
|
||||||
\x{1d7cf}
|
\x{1d7cf}
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
\x{10000}
|
\x{10000}
|
||||||
|
|
||||||
|
# Hex uses pattern length, not zero-terminated. This tests for overrunning
|
||||||
|
# the given length of a pattern.
|
||||||
|
|
||||||
|
/'(*UTF)'/hex
|
||||||
|
|
||||||
|
/a(?<=A\XB)/utf
|
||||||
|
|
||||||
|
/ab(?<=A\RB)/utf
|
||||||
|
|
||||||
# End of testinput5
|
# End of testinput5
|
||||||
|
|
|
@ -4635,7 +4635,7 @@
|
||||||
/((?(R)a+|(?1)b))/
|
/((?(R)a+|(?1)b))/
|
||||||
aaaabcde
|
aaaabcde
|
||||||
|
|
||||||
/((?(R2)a+|(?1)b))/
|
/((?(R2)a+|(?1)b))()/
|
||||||
aaaabcde
|
aaaabcde
|
||||||
|
|
||||||
/(?(R)a*(?1)|((?R))b)/
|
/(?(R)a*(?1)|((?R))b)/
|
||||||
|
|
|
@ -161,18 +161,14 @@
|
||||||
|
|
||||||
# Use "expand" to create some very long patterns with nested parentheses, in
|
# Use "expand" to create some very long patterns with nested parentheses, in
|
||||||
# order to test workspace overflow. Again, this varies with code unit width,
|
# order to test workspace overflow. Again, this varies with code unit width,
|
||||||
# and even with it fails in two modes, the error offset differs. It also varies
|
# and even when it fails in two modes, the error offset differs. It also varies
|
||||||
# with link size - hence multiple tests with different values.
|
# with link size - hence multiple tests with different values.
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
|
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
|
|
||||||
|
|
||||||
/(?(1)(?1)){8,}+()/debug
|
/(?(1)(?1)){8,}+()/debug
|
||||||
abcd
|
abcd
|
||||||
|
|
|
@ -9257,4 +9257,24 @@ No match
|
||||||
1: b
|
1: b
|
||||||
2: cccc
|
2: cccc
|
||||||
|
|
||||||
|
# /x does not apply to MARK labels
|
||||||
|
|
||||||
|
/x (*MARK:ab cd # comment
|
||||||
|
ef) x/x,mark
|
||||||
|
axxz
|
||||||
|
0: xx
|
||||||
|
MK: ab cd # comment\x0aef
|
||||||
|
|
||||||
|
/(?<=a(B){0}c)X/
|
||||||
|
acX
|
||||||
|
0: X
|
||||||
|
|
||||||
|
/(?<DEFINE>b)(?(DEFINE)(a+))(?&DEFINE)/
|
||||||
|
bbbb
|
||||||
|
0: bb
|
||||||
|
1: b
|
||||||
|
\= Expect no match
|
||||||
|
baaab
|
||||||
|
No match
|
||||||
|
|
||||||
# End of testinput1
|
# End of testinput1
|
||||||
|
|
|
@ -557,7 +557,7 @@ Subject length lower bound = 1
|
||||||
0: \x{11234}
|
0: \x{11234}
|
||||||
|
|
||||||
/(*UTF-32)\x{11234}/
|
/(*UTF-32)\x{11234}/
|
||||||
Failed: error 134 at offset 17: character code point value in \x{} or \o{} is too large
|
Failed: error 160 at offset 5: (*VERB) not recognized or malformed
|
||||||
abcd\x{11234}pqr
|
abcd\x{11234}pqr
|
||||||
|
|
||||||
/(*UTF-32)\x{112}/
|
/(*UTF-32)\x{112}/
|
||||||
|
|
|
@ -188,7 +188,7 @@ Failed: error -53: recursion limit exceeded
|
||||||
abc
|
abc
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
|
||||||
/((?(R2)a+|(?1)b))/
|
/((?(R2)a+|(?1)b))()/
|
||||||
aaaabcde
|
aaaabcde
|
||||||
Failed: error -52: nested recursion at the same subject position
|
Failed: error -52: nested recursion at the same subject position
|
||||||
|
|
||||||
|
|
|
@ -335,7 +335,7 @@ Failed: error -47: match limit exceeded
|
||||||
abc
|
abc
|
||||||
Failed: error -46: JIT stack limit reached
|
Failed: error -46: JIT stack limit reached
|
||||||
|
|
||||||
/((?(R2)a+|(?1)b))/
|
/((?(R2)a+|(?1)b))()/
|
||||||
aaaabcde
|
aaaabcde
|
||||||
Failed: error -46: JIT stack limit reached
|
Failed: error -46: JIT stack limit reached
|
||||||
|
|
||||||
|
|
|
@ -139,7 +139,7 @@ No match: POSIX code 17: match failed
|
||||||
0+ issippi
|
0+ issippi
|
||||||
|
|
||||||
/abc/\
|
/abc/\
|
||||||
Failed: POSIX code 9: bad escape sequence at offset 3
|
Failed: POSIX code 9: bad escape sequence at offset 4
|
||||||
|
|
||||||
"(?(?C)"
|
"(?(?C)"
|
||||||
Failed: POSIX code 11: unbalanced () at offset 6
|
Failed: POSIX code 11: unbalanced () at offset 6
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -76,7 +76,7 @@
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
||||||
/ab\Cde/never_backslash_c
|
/ab\Cde/never_backslash_c
|
||||||
Failed: error 183 at offset 3: using \C is disabled by the application
|
Failed: error 183 at offset 4: using \C is disabled by the application
|
||||||
|
|
||||||
/ab\Cde/info
|
/ab\Cde/info
|
||||||
Capturing subpattern count = 0
|
Capturing subpattern count = 0
|
||||||
|
|
|
@ -17,7 +17,7 @@ Subject length lower bound = 0
|
||||||
# 16-bit modes, but not in 32-bit mode.
|
# 16-bit modes, but not in 32-bit mode.
|
||||||
|
|
||||||
/(?<=ab\Cde)X/utf
|
/(?<=ab\Cde)X/utf
|
||||||
Failed: error 136 at offset 10: \C is not allowed in a lookbehind assertion in UTF-16 mode
|
Failed: error 136 at offset 0: \C is not allowed in a lookbehind assertion in UTF-16 mode
|
||||||
ab!deXYZ
|
ab!deXYZ
|
||||||
|
|
||||||
# Autopossessification tests
|
# Autopossessification tests
|
||||||
|
|
|
@ -17,7 +17,7 @@ Subject length lower bound = 0
|
||||||
# 16-bit modes, but not in 32-bit mode.
|
# 16-bit modes, but not in 32-bit mode.
|
||||||
|
|
||||||
/(?<=ab\Cde)X/utf
|
/(?<=ab\Cde)X/utf
|
||||||
Failed: error 136 at offset 10: \C is not allowed in a lookbehind assertion in UTF-8 mode
|
Failed: error 136 at offset 0: \C is not allowed in a lookbehind assertion in UTF-8 mode
|
||||||
ab!deXYZ
|
ab!deXYZ
|
||||||
|
|
||||||
# Autopossessification tests
|
# Autopossessification tests
|
||||||
|
|
|
@ -3,6 +3,6 @@
|
||||||
# correct error message.
|
# correct error message.
|
||||||
|
|
||||||
/a\Cb/
|
/a\Cb/
|
||||||
Failed: error 185 at offset 2: using \C is disabled in this PCRE2 library
|
Failed: error 185 at offset 3: using \C is disabled in this PCRE2 library
|
||||||
|
|
||||||
# End of testinput23
|
# End of testinput23
|
||||||
|
|
|
@ -1746,7 +1746,7 @@ No match
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
||||||
/\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
|
/\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
|
||||||
Failed: error 173 at offset 5: disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
|
Failed: error 173 at offset 6: disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
|
||||||
|
|
||||||
/^a+[a\x{200}]/B,utf
|
/^a+[a\x{200}]/B,utf
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
@ -3997,7 +3997,7 @@ Failed: error 122 at offset 1227: unmatched closing parenthesis
|
||||||
/$(&.+[\p{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(?(R)){0,6}?|){12\x8a\X*?\x8a\x0b\xd1^9\3*+(\xc1,\k'P'\xb4)\xcc(z\z(?JJ)(?'X'8};(\x0b\xd1^9\?'3*+(\xc1.]k+\x0b'Pm'\xb4\xcc4'\xd1'(?'X'))?-%--\x95$9*\4'|\xd1(''%\x95*$9)#(?'R')3\x07?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+!~:(?)''(d'E:yD!\s(?'R'\x1e;\x10:U))|')g!\xb0*){29+))#(?'P'})*?/
|
/$(&.+[\p{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(?(R)){0,6}?|){12\x8a\X*?\x8a\x0b\xd1^9\3*+(\xc1,\k'P'\xb4)\xcc(z\z(?JJ)(?'X'8};(\x0b\xd1^9\?'3*+(\xc1.]k+\x0b'Pm'\xb4\xcc4'\xd1'(?'X'))?-%--\x95$9*\4'|\xd1(''%\x95*$9)#(?'R')3\x07?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+!~:(?)''(d'E:yD!\s(?'R'\x1e;\x10:U))|')g!\xb0*){29+))#(?'P'})*?/
|
||||||
|
|
||||||
"(*UTF)(*UCP)(.UTF).+X(\V+;\^(\D|)!999}(?(?C{7(?C')\H*\S*/^\x5\xa\\xd3\x85n?(;\D*(?m).[^mH+((*UCP)(*U:F)})(?!^)(?'"
|
"(*UTF)(*UCP)(.UTF).+X(\V+;\^(\D|)!999}(?(?C{7(?C')\H*\S*/^\x5\xa\\xd3\x85n?(;\D*(?m).[^mH+((*UCP)(*U:F)})(?!^)(?'"
|
||||||
Failed: error 124 at offset 113: letter or underscore expected after (?< or (?'
|
Failed: error 162 at offset 113: subpattern name expected
|
||||||
|
|
||||||
/[\pS#moq]/
|
/[\pS#moq]/
|
||||||
=
|
=
|
||||||
|
@ -4159,5 +4159,16 @@ No match
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
\x{10000}
|
\x{10000}
|
||||||
No match
|
No match
|
||||||
|
|
||||||
|
# Hex uses pattern length, not zero-terminated. This tests for overrunning
|
||||||
|
# the given length of a pattern.
|
||||||
|
|
||||||
|
/'(*UTF)'/hex
|
||||||
|
|
||||||
|
/a(?<=A\XB)/utf
|
||||||
|
Failed: error 125 at offset 1: lookbehind assertion is not fixed length
|
||||||
|
|
||||||
|
/ab(?<=A\RB)/utf
|
||||||
|
Failed: error 125 at offset 2: lookbehind assertion is not fixed length
|
||||||
|
|
||||||
# End of testinput5
|
# End of testinput5
|
||||||
|
|
|
@ -713,7 +713,7 @@ No match
|
||||||
/(ab|cd){3,4}/auto_callout
|
/(ab|cd){3,4}/auto_callout
|
||||||
ababab
|
ababab
|
||||||
--->ababab
|
--->ababab
|
||||||
+0 ^ (ab|cd){3,4}
|
+0 ^ (
|
||||||
+1 ^ a
|
+1 ^ a
|
||||||
+4 ^ c
|
+4 ^ c
|
||||||
+2 ^^ b
|
+2 ^^ b
|
||||||
|
@ -732,7 +732,7 @@ No match
|
||||||
0: ababab
|
0: ababab
|
||||||
abcdabcd
|
abcdabcd
|
||||||
--->abcdabcd
|
--->abcdabcd
|
||||||
+0 ^ (ab|cd){3,4}
|
+0 ^ (
|
||||||
+1 ^ a
|
+1 ^ a
|
||||||
+4 ^ c
|
+4 ^ c
|
||||||
+2 ^^ b
|
+2 ^^ b
|
||||||
|
@ -740,7 +740,7 @@ No match
|
||||||
+1 ^ ^ a
|
+1 ^ ^ a
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^ d
|
+5 ^ ^ d
|
||||||
+6 ^ ^ )
|
+6 ^ ^ ){3,4}
|
||||||
+1 ^ ^ a
|
+1 ^ ^ a
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+2 ^ ^ b
|
+2 ^ ^ b
|
||||||
|
@ -749,13 +749,13 @@ No match
|
||||||
+1 ^ ^ a
|
+1 ^ ^ a
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^ d
|
+5 ^ ^ d
|
||||||
+6 ^ ^ )
|
+6 ^ ^ ){3,4}
|
||||||
+12 ^ ^
|
+12 ^ ^
|
||||||
0: abcdabcd
|
0: abcdabcd
|
||||||
1: abcdab
|
1: abcdab
|
||||||
abcdcdcdcdcd
|
abcdcdcdcdcd
|
||||||
--->abcdcdcdcdcd
|
--->abcdcdcdcdcd
|
||||||
+0 ^ (ab|cd){3,4}
|
+0 ^ (
|
||||||
+1 ^ a
|
+1 ^ a
|
||||||
+4 ^ c
|
+4 ^ c
|
||||||
+2 ^^ b
|
+2 ^^ b
|
||||||
|
@ -763,16 +763,16 @@ No match
|
||||||
+1 ^ ^ a
|
+1 ^ ^ a
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^ d
|
+5 ^ ^ d
|
||||||
+6 ^ ^ )
|
+6 ^ ^ ){3,4}
|
||||||
+1 ^ ^ a
|
+1 ^ ^ a
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^ d
|
+5 ^ ^ d
|
||||||
+6 ^ ^ )
|
+6 ^ ^ ){3,4}
|
||||||
+12 ^ ^
|
+12 ^ ^
|
||||||
+1 ^ ^ a
|
+1 ^ ^ a
|
||||||
+4 ^ ^ c
|
+4 ^ ^ c
|
||||||
+5 ^ ^ d
|
+5 ^ ^ d
|
||||||
+6 ^ ^ )
|
+6 ^ ^ ){3,4}
|
||||||
+12 ^ ^
|
+12 ^ ^
|
||||||
0: abcdcdcd
|
0: abcdcdcd
|
||||||
1: abcdcd
|
1: abcdcd
|
||||||
|
@ -6712,26 +6712,26 @@ No match
|
||||||
--->"ab"
|
--->"ab"
|
||||||
+0 ^ ^
|
+0 ^ ^
|
||||||
+1 ^ "
|
+1 ^ "
|
||||||
+2 ^^ ((?(?=[a])[^"])|b)*
|
+2 ^^ (
|
||||||
+21 ^^ "
|
+21 ^^ "
|
||||||
+3 ^^ (?(?=[a])[^"])
|
+3 ^^ (?
|
||||||
+18 ^^ b
|
+18 ^^ b
|
||||||
+5 ^^ (?=[a])
|
+5 ^^ (?=
|
||||||
+8 ^ [a]
|
+8 ^ [a]
|
||||||
+11 ^^ )
|
+11 ^^ )
|
||||||
+12 ^^ [^"]
|
+12 ^^ [^"]
|
||||||
+16 ^ ^ )
|
+16 ^ ^ )
|
||||||
+17 ^ ^ |
|
+17 ^ ^ |
|
||||||
+21 ^ ^ "
|
+21 ^ ^ "
|
||||||
+3 ^ ^ (?(?=[a])[^"])
|
+3 ^ ^ (?
|
||||||
+18 ^ ^ b
|
+18 ^ ^ b
|
||||||
+5 ^ ^ (?=[a])
|
+5 ^ ^ (?=
|
||||||
+8 ^ [a]
|
+8 ^ [a]
|
||||||
+19 ^ ^ )
|
+19 ^ ^ )*
|
||||||
+21 ^ ^ "
|
+21 ^ ^ "
|
||||||
+3 ^ ^ (?(?=[a])[^"])
|
+3 ^ ^ (?
|
||||||
+18 ^ ^ b
|
+18 ^ ^ b
|
||||||
+5 ^ ^ (?=[a])
|
+5 ^ ^ (?=
|
||||||
+8 ^ [a]
|
+8 ^ [a]
|
||||||
+17 ^ ^ |
|
+17 ^ ^ |
|
||||||
+22 ^ ^ $
|
+22 ^ ^ $
|
||||||
|
@ -7154,7 +7154,7 @@ Failed: error -52: nested recursion at the same subject position
|
||||||
aaaabcde
|
aaaabcde
|
||||||
0: aaaab
|
0: aaaab
|
||||||
|
|
||||||
/((?(R2)a+|(?1)b))/
|
/((?(R2)a+|(?1)b))()/
|
||||||
aaaabcde
|
aaaabcde
|
||||||
Failed: error -40: backreference condition or recursion test is not supported for DFA matching
|
Failed: error -40: backreference condition or recursion test is not supported for DFA matching
|
||||||
|
|
||||||
|
@ -7548,7 +7548,7 @@ Callout (10): {AB} last capture = 0
|
||||||
Bra
|
Bra
|
||||||
^
|
^
|
||||||
Cond
|
Cond
|
||||||
Callout 25 9 7
|
Callout 25 9 3
|
||||||
Assert
|
Assert
|
||||||
abc
|
abc
|
||||||
Ket
|
Ket
|
||||||
|
@ -7561,11 +7561,11 @@ Callout (10): {AB} last capture = 0
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
abcdefg
|
abcdefg
|
||||||
--->abcdefg
|
--->abcdefg
|
||||||
25 ^ (?=abc)
|
25 ^ (?=
|
||||||
0: abcd
|
0: abcd
|
||||||
xyz123
|
xyz123
|
||||||
--->xyz123
|
--->xyz123
|
||||||
25 ^ (?=abc)
|
25 ^ (?=
|
||||||
0: xyz
|
0: xyz
|
||||||
|
|
||||||
/^(?(?C$abc$)(?=abc)abcd|xyz)/B
|
/^(?(?C$abc$)(?=abc)abcd|xyz)/B
|
||||||
|
@ -7573,7 +7573,7 @@ Callout (10): {AB} last capture = 0
|
||||||
Bra
|
Bra
|
||||||
^
|
^
|
||||||
Cond
|
Cond
|
||||||
CalloutStr $abc$ 7 12 7
|
CalloutStr $abc$ 7 12 3
|
||||||
Assert
|
Assert
|
||||||
abc
|
abc
|
||||||
Ket
|
Ket
|
||||||
|
@ -7587,12 +7587,12 @@ Callout (10): {AB} last capture = 0
|
||||||
abcdefg
|
abcdefg
|
||||||
Callout (7): $abc$
|
Callout (7): $abc$
|
||||||
--->abcdefg
|
--->abcdefg
|
||||||
^ (?=abc)
|
^ (?=
|
||||||
0: abcd
|
0: abcd
|
||||||
xyz123
|
xyz123
|
||||||
Callout (7): $abc$
|
Callout (7): $abc$
|
||||||
--->xyz123
|
--->xyz123
|
||||||
^ (?=abc)
|
^ (?=
|
||||||
0: xyz
|
0: xyz
|
||||||
|
|
||||||
/^ab(?C'first')cd(?C"second")ef/
|
/^ab(?C'first')cd(?C"second")ef/
|
||||||
|
@ -7609,13 +7609,13 @@ Callout (20): "second"
|
||||||
aaaXY
|
aaaXY
|
||||||
Callout (8): `code`
|
Callout (8): `code`
|
||||||
--->aaaXY
|
--->aaaXY
|
||||||
^^ )
|
^^ ){3}
|
||||||
Callout (8): `code`
|
Callout (8): `code`
|
||||||
--->aaaXY
|
--->aaaXY
|
||||||
^ ^ )
|
^ ^ ){3}
|
||||||
Callout (8): `code`
|
Callout (8): `code`
|
||||||
--->aaaXY
|
--->aaaXY
|
||||||
^ ^ )
|
^ ^ ){3}
|
||||||
0: aaaX
|
0: aaaX
|
||||||
|
|
||||||
# Binary zero in callout string
|
# Binary zero in callout string
|
||||||
|
|
|
@ -854,23 +854,17 @@ Failed: error 184 at offset 1540: (?| and/or (?J: or (?x: parentheses are too de
|
||||||
|
|
||||||
# Use "expand" to create some very long patterns with nested parentheses, in
|
# Use "expand" to create some very long patterns with nested parentheses, in
|
||||||
# order to test workspace overflow. Again, this varies with code unit width,
|
# order to test workspace overflow. Again, this varies with code unit width,
|
||||||
# and even with it fails in two modes, the error offset differs. It also varies
|
# and even when it fails in two modes, the error offset differs. It also varies
|
||||||
# with link size - hence multiple tests with different values.
|
# with link size - hence multiple tests with different values.
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
Failed: error 186 at offset 594: regular expression is too complicated
|
Failed: error 186 at offset 5813: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
Failed: error 186 at offset 594: regular expression is too complicated
|
Failed: error 186 at offset 5820: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||||
Failed: error 186 at offset 594: regular expression is too complicated
|
Failed: error 186 at offset 12820: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 594: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 594: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?(1)(?1)){8,}+()/debug
|
/(?(1)(?1)){8,}+()/debug
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
@ -1031,6 +1025,5 @@ Subject length lower bound = 0
|
||||||
Failed: error 114 at offset 509: missing closing parenthesis
|
Failed: error 114 at offset 509: missing closing parenthesis
|
||||||
|
|
||||||
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/-fullbincode
|
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/-fullbincode
|
||||||
Failed: error 186 at offset 490: regular expression is too complicated
|
|
||||||
|
|
||||||
# End of testinput8
|
# End of testinput8
|
||||||
|
|
|
@ -853,20 +853,15 @@ Memory allocation (code space): 18
|
||||||
|
|
||||||
# Use "expand" to create some very long patterns with nested parentheses, in
|
# Use "expand" to create some very long patterns with nested parentheses, in
|
||||||
# order to test workspace overflow. Again, this varies with code unit width,
|
# order to test workspace overflow. Again, this varies with code unit width,
|
||||||
# and even with it fails in two modes, the error offset differs. It also varies
|
# and even when it fails in two modes, the error offset differs. It also varies
|
||||||
# with link size - hence multiple tests with different values.
|
# with link size - hence multiple tests with different values.
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||||
|
Failed: error 186 at offset 12820: regular expression is too complicated
|
||||||
/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 1147: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 1147: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?(1)(?1)){8,}+()/debug
|
/(?(1)(?1)){8,}+()/debug
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
|
@ -853,20 +853,15 @@ Memory allocation (code space): 18
|
||||||
|
|
||||||
# Use "expand" to create some very long patterns with nested parentheses, in
|
# Use "expand" to create some very long patterns with nested parentheses, in
|
||||||
# order to test workspace overflow. Again, this varies with code unit width,
|
# order to test workspace overflow. Again, this varies with code unit width,
|
||||||
# and even with it fails in two modes, the error offset differs. It also varies
|
# and even when it fails in two modes, the error offset differs. It also varies
|
||||||
# with link size - hence multiple tests with different values.
|
# with link size - hence multiple tests with different values.
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||||
|
Failed: error 186 at offset 12820: regular expression is too complicated
|
||||||
/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 1147: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 1147: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?(1)(?1)){8,}+()/debug
|
/(?(1)(?1)){8,}+()/debug
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
|
@ -853,20 +853,17 @@ Memory allocation (code space): 28
|
||||||
|
|
||||||
# Use "expand" to create some very long patterns with nested parentheses, in
|
# Use "expand" to create some very long patterns with nested parentheses, in
|
||||||
# order to test workspace overflow. Again, this varies with code unit width,
|
# order to test workspace overflow. Again, this varies with code unit width,
|
||||||
# and even with it fails in two modes, the error offset differs. It also varies
|
# and even when it fails in two modes, the error offset differs. It also varies
|
||||||
# with link size - hence multiple tests with different values.
|
# with link size - hence multiple tests with different values.
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
Failed: error 186 at offset 5813: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
Failed: error 186 at offset 5820: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||||
|
Failed: error 186 at offset 12820: regular expression is too complicated
|
||||||
/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 979: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 979: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?(1)(?1)){8,}+()/debug
|
/(?(1)(?1)){8,}+()/debug
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
|
@ -853,20 +853,17 @@ Memory allocation (code space): 28
|
||||||
|
|
||||||
# Use "expand" to create some very long patterns with nested parentheses, in
|
# Use "expand" to create some very long patterns with nested parentheses, in
|
||||||
# order to test workspace overflow. Again, this varies with code unit width,
|
# order to test workspace overflow. Again, this varies with code unit width,
|
||||||
# and even with it fails in two modes, the error offset differs. It also varies
|
# and even when it fails in two modes, the error offset differs. It also varies
|
||||||
# with link size - hence multiple tests with different values.
|
# with link size - hence multiple tests with different values.
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
Failed: error 186 at offset 5813: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
Failed: error 186 at offset 5820: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||||
|
Failed: error 186 at offset 12820: regular expression is too complicated
|
||||||
/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 979: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 979: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?(1)(?1)){8,}+()/debug
|
/(?(1)(?1)){8,}+()/debug
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
|
@ -853,20 +853,17 @@ Memory allocation (code space): 28
|
||||||
|
|
||||||
# Use "expand" to create some very long patterns with nested parentheses, in
|
# Use "expand" to create some very long patterns with nested parentheses, in
|
||||||
# order to test workspace overflow. Again, this varies with code unit width,
|
# order to test workspace overflow. Again, this varies with code unit width,
|
||||||
# and even with it fails in two modes, the error offset differs. It also varies
|
# and even when it fails in two modes, the error offset differs. It also varies
|
||||||
# with link size - hence multiple tests with different values.
|
# with link size - hence multiple tests with different values.
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
Failed: error 186 at offset 5813: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
Failed: error 186 at offset 5820: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||||
|
Failed: error 186 at offset 12820: regular expression is too complicated
|
||||||
/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 979: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 979: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?(1)(?1)){8,}+()/debug
|
/(?(1)(?1)){8,}+()/debug
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
|
@ -854,22 +854,16 @@ Failed: error 184 at offset 1540: (?| and/or (?J: or (?x: parentheses are too de
|
||||||
|
|
||||||
# Use "expand" to create some very long patterns with nested parentheses, in
|
# Use "expand" to create some very long patterns with nested parentheses, in
|
||||||
# order to test workspace overflow. Again, this varies with code unit width,
|
# order to test workspace overflow. Again, this varies with code unit width,
|
||||||
# and even with it fails in two modes, the error offset differs. It also varies
|
# and even when it fails in two modes, the error offset differs. It also varies
|
||||||
# with link size - hence multiple tests with different values.
|
# with link size - hence multiple tests with different values.
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
Failed: error 186 at offset 637: regular expression is too complicated
|
Failed: error 186 at offset 5820: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||||
Failed: error 186 at offset 637: regular expression is too complicated
|
Failed: error 186 at offset 12820: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 637: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 637: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?(1)(?1)){8,}+()/debug
|
/(?(1)(?1)){8,}+()/debug
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
|
@ -853,21 +853,15 @@ Memory allocation (code space): 12
|
||||||
|
|
||||||
# Use "expand" to create some very long patterns with nested parentheses, in
|
# Use "expand" to create some very long patterns with nested parentheses, in
|
||||||
# order to test workspace overflow. Again, this varies with code unit width,
|
# order to test workspace overflow. Again, this varies with code unit width,
|
||||||
# and even with it fails in two modes, the error offset differs. It also varies
|
# and even when it fails in two modes, the error offset differs. It also varies
|
||||||
# with link size - hence multiple tests with different values.
|
# with link size - hence multiple tests with different values.
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||||
Failed: error 186 at offset 936: regular expression is too complicated
|
Failed: error 186 at offset 12820: regular expression is too complicated
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 936: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 936: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?(1)(?1)){8,}+()/debug
|
/(?(1)(?1)){8,}+()/debug
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
|
@ -853,19 +853,15 @@ Memory allocation (code space): 14
|
||||||
|
|
||||||
# Use "expand" to create some very long patterns with nested parentheses, in
|
# Use "expand" to create some very long patterns with nested parentheses, in
|
||||||
# order to test workspace overflow. Again, this varies with code unit width,
|
# order to test workspace overflow. Again, this varies with code unit width,
|
||||||
# and even with it fails in two modes, the error offset differs. It also varies
|
# and even when it fails in two modes, the error offset differs. It also varies
|
||||||
# with link size - hence multiple tests with different values.
|
# with link size - hence multiple tests with different values.
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
|
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
|
||||||
|
Failed: error 186 at offset 12820: regular expression is too complicated
|
||||||
/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
|
|
||||||
|
|
||||||
/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
|
|
||||||
Failed: error 186 at offset 1224: regular expression is too complicated
|
|
||||||
|
|
||||||
/(?(1)(?1)){8,}+()/debug
|
/(?(1)(?1)){8,}+()/debug
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
|
@ -307,14 +307,14 @@ Subject length lower bound = 1
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
||||||
/\777/I
|
/\777/I
|
||||||
Failed: error 151 at offset 3: octal value is greater than \377 in 8-bit non-UTF-8 mode
|
Failed: error 151 at offset 4: octal value is greater than \377 in 8-bit non-UTF-8 mode
|
||||||
|
|
||||||
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark
|
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark
|
||||||
Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
|
Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
|
||||||
XX
|
XX
|
||||||
|
|
||||||
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark,alt_verbnames
|
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark,alt_verbnames
|
||||||
Failed: error 176 at offset 258: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
|
Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
|
||||||
XX
|
XX
|
||||||
|
|
||||||
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark
|
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark
|
||||||
|
@ -328,10 +328,10 @@ MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789AB
|
||||||
MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE
|
MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE
|
||||||
|
|
||||||
/\u0100/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/\u0100/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||||
Failed: error 177 at offset 5: character code point value in \u.... sequence is too large
|
Failed: error 177 at offset 6: character code point value in \u.... sequence is too large
|
||||||
|
|
||||||
/[\u0100-\u0200]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
/[\u0100-\u0200]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
|
||||||
Failed: error 177 at offset 6: character code point value in \u.... sequence is too large
|
Failed: error 177 at offset 7: character code point value in \u.... sequence is too large
|
||||||
|
|
||||||
/[^\x00-a]{12,}[^b-\xff]*/B
|
/[^\x00-a]{12,}[^b-\xff]*/B
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
|
Loading…
Reference in New Issue