Add subject_literal and allow jitstack in pcre2test pattern modifiers, and add

another big pattern test.
This commit is contained in:
Philip.Hazel 2017-06-12 17:48:03 +00:00
parent 1381c3fe28
commit 6e30ed1b40
6 changed files with 452 additions and 32 deletions

View File

@ -184,6 +184,9 @@ starting offset greater than zero.
39. Implement REG_PEND (GNU extension) for the POSIX wrapper.
40. Implement the subject_literal modifier in pcre2test, and allow jitstack on
pattern lines.
Version 10.23 14-February-2017
------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "06 June 2017" "PCRE 10.30"
.TH PCRE2TEST 1 "12 June 2017" "PCRE 10.30"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -69,10 +69,10 @@ want that action.
The input is processed using using C's string functions, so must not
contain binary zeros, even though in Unix-like environments, \fBfgets()\fP
treats any bytes other than newline as data characters. An error is generated
if a binary zero is encountered. Subject lines are processed for backslash
escapes, which makes it possible to include any data value in strings that are
passed to the library for matching. For patterns, there is a facility for
specifying some or all of the 8-bit input characters as hexadecimal pairs,
if a binary zero is encountered. By default subject lines are processed for
backslash escapes, which makes it possible to include any data value in strings
that are passed to the library for matching. For patterns, there is a facility
for specifying some or all of the 8-bit input characters as hexadecimal pairs,
which makes it possible to include binary zeros.
.
.
@ -442,8 +442,9 @@ A pattern can be followed by a modifier list (details below).
.sp
Before each subject line is passed to \fBpcre2_match()\fP or
\fBpcre2_dfa_match()\fP, leading and trailing white space is removed, and the
line is scanned for backslash escapes. The following provide a means of
encoding non-printing characters in a visible way:
line is scanned for backslash escapes, unless the \fBsubject_literal\fP
modifier was set for the pattern. The following provide a means of encoding
non-printing characters in a visible way:
.sp
\ea alarm (BEL, \ex07)
\eb backspace (\ex08)
@ -505,6 +506,11 @@ character. A backslash followed by anything else causes an error. However, if
the very last character in the line is a backslash (and there is no modifier
list), it is ignored. This gives a way of passing an empty line as data, since
a real empty line terminates the data input.
.P
If the \fBsubject_literal\fP modifier is set for a pattern, all subject lines
that follow are treated as literals, with no special treatment of backslashes.
No replication is possible, and any subject modifiers must be set as defaults
by a \fB#subject\fP command.
.
.
.SH "PATTERN MODIFIERS"
@ -602,6 +608,7 @@ heavily used in the test files.
push push compiled pattern onto the stack
pushcopy push a copy onto the stack
stackguard=<number> test the stackguard feature
subject_literal treat all subject lines as literal
tables=[0|1|2] select internal tables
use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8
@ -967,17 +974,18 @@ are mutually exclusive.
.SS "Setting certain match controls"
.rs
.sp
The following modifiers are really subject modifiers, and are described below.
However, they may be included in a pattern's modifier list, in which case they
are applied to every subject line that is processed with that pattern. They may
not appear in \fB#pattern\fP commands. These modifiers do not affect the
compilation process.
The following modifiers are really subject modifiers, and are described under
"Subject Modifiers" below. However, they may be included in a pattern's
modifier list, in which case they are applied to every subject line that is
processed with that pattern. They may not appear in \fB#pattern\fP commands.
These modifiers do not affect the compilation process.
.sp
aftertext show text after match
allaftertext show text after captures
allcaptures show all captures
allusedtext show all consulted text
/g global global matching
jitstack=<n> set size of JIT stack
mark show mark values
replace=<string> specify a replacement string
startchar show starting character when relevant
@ -990,6 +998,15 @@ These modifiers may not appear in a \fB#pattern\fP command. If you want them as
defaults, set them in a \fB#subject\fP command.
.
.
.SS "Specifying literal subject lines"
.rs
.sp
If the \fBsubject_literal\fP modifier is present on a pattern, all the subject
lines that it matches are taken as literal strings, with no interpretation of
backslashes. It is not possible to set subject modifiers on such lines, but any
that are set as defaults by a \fB#subject\fP command are recognized.
.
.
.SS "Saving a compiled pattern"
.rs
.sp
@ -1321,9 +1338,11 @@ matching provokes an error return ("bad option value") from
.sp
The \fBjitstack\fP modifier provides a way of setting the maximum stack size
that is used by the just-in-time optimization code. It is ignored if JIT
optimization is not being used. The value is a number of kilobytes. Providing a
stack that is larger than the default 32K is necessary only for very
complicated patterns.
optimization is not being used. The value is a number of kilobytes. Setting
zero reverts to the default of 32K. Providing a stack that is larger than the
default is necessary only for very complicated patterns. If \fBjitstack\fP is
set non-zero on a subject line it overrides any value that was set on the
pattern.
.
.
.SS "Setting heap, match, and depth limits"
@ -1815,6 +1834,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 06 June 2017
Last updated: 12 June 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -42,13 +42,16 @@ fi
# aftertext interpreted as "print $' afterwards"
# afteralltext ignored
# dupnames ignored (Perl always allows)
# jitstack ignored
# mark ignored
# no_auto_possess ignored
# no_start_optimize ignored
# subject_literal does not process subjects for escapes
# ucp sets Perl's /u modifier
# utf invoke UTF-8 functionality
#
# The data lines must not have any pcre2test modifiers. They are processed as
# The data lines must not have any pcre2test modifiers. Unless
# "subject_litersl" is on the pattern, data lines are processed as
# Perl double-quoted strings, so if they contain " $ or @ characters, these
# have to be escaped. For this reason, all such characters in the
# Perl-compatible testinput1 and testinput4 files are escaped so that they can
@ -138,16 +141,20 @@ for (;;)
chomp($pattern);
$pattern =~ s/\s+$//;
# Split the pattern from the modifiers and adjust them as necessary.
$pattern =~ /^\s*((.).*\2)(.*)$/s;
$pat = $1;
$mod = $3;
# The private "aftertext" modifier means "print $' afterwards".
$showrest = ($mod =~ s/aftertext,?//);
# The "subject_literal" modifer disables escapes in subjects.
$subject_literal = ($mod =~ s/subject_literal,?//);
# "allaftertext" is used by pcre2test to print remainders after captures
@ -161,6 +168,10 @@ for (;;)
$mod =~ s/dupnames,?//;
# Remove "jitstack".
$mod =~ s/jitstack=\d+,?//;
# Remove "mark" (asks pcre2test to check MARK data) */
$mod =~ s/mark,?//;
@ -222,7 +233,14 @@ for (;;)
last if ($_ eq "");
next if $_ =~ /^\\=(?:\s|$)/; # Comment line
$x = eval "\"$_\""; # To get escapes processed
if ($subject_literal)
{
$x = $_;
}
else
{
$x = eval "\"$_\""; # To get escapes processed
}
# Empty array for holding results, ensure $REGERROR and $REGMARK are
# unset, then do the matching.

View File

@ -479,6 +479,7 @@ so many of them that they are split into two fields. */
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000002u
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000004u
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
#define CTL2_SUBJECT_LITERAL 0x00000010u
#define CTL_NL_SET 0x40000000u /* Informational */
#define CTL_BSR_SET 0x80000000u /* Informational */
@ -518,6 +519,7 @@ typedef struct patctl { /* Structure for pattern modifiers. */
uint32_t options; /* Must be in same position as datctl */
uint32_t control; /* Must be in same position as datctl */
uint32_t control2; /* Must be in same position as datctl */
uint32_t jitstack; /* Must be in same position as datctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t jit;
uint32_t stackguard_test;
@ -537,6 +539,7 @@ typedef struct datctl { /* Structure for data line modifiers. */
uint32_t options; /* Must be in same position as patctl */
uint32_t control; /* Must be in same position as patctl */
uint32_t control2; /* Must be in same position as patctl */
uint32_t jitstack; /* Must be in same position as patctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t startend[2];
uint32_t cerror[2];
@ -544,7 +547,6 @@ typedef struct datctl { /* Structure for data line modifiers. */
int32_t callout_data;
int32_t copy_numbers[MAXCPYGET];
int32_t get_numbers[MAXCPYGET];
uint32_t jitstack;
uint32_t oveccount;
uint32_t offset;
uint8_t copy_names[LENCPYGET];
@ -630,7 +632,7 @@ static modstruct modlist[] = {
{ "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) },
{ "jit", MOD_PAT, MOD_IND, 7, PO(jit) },
{ "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) },
{ "jitstack", MOD_DAT, MOD_INT, 0, DO(jitstack) },
{ "jitstack", MOD_PNDP, MOD_INT, 0, PO(jitstack) },
{ "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) },
{ "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) },
{ "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) },
@ -674,6 +676,7 @@ static modstruct modlist[] = {
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
{ "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) },
{ "subject_literal", MOD_PATP, MOD_CTL, CTL2_SUBJECT_LITERAL, PO(control2) },
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
{ "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
@ -3477,7 +3480,8 @@ switch (m->which)
case MOD_PND: /* Ditto, but not default pattern */
case MOD_PNDP: /* Ditto, allowed for Perl test */
if (dctl != NULL) field = dctl;
else if (pctl != NULL && (m->which == MOD_PD || ctx != CTX_DEFPAT))
else if (pctl != NULL && (m->which == MOD_PD || m->which == MOD_PDP ||
ctx != CTX_DEFPAT))
field = pctl;
break;
}
@ -6216,6 +6220,7 @@ uint8_t *p, *pp, *start_rep;
size_t needlen;
void *use_dat_context;
BOOL utf;
BOOL subject_literal;
#ifdef SUPPORT_PCRE2_8
uint8_t *q8 = NULL;
@ -6227,6 +6232,8 @@ uint16_t *q16 = NULL;
uint32_t *q32 = NULL;
#endif
subject_literal = (pat_patctl.control2 & CTL2_SUBJECT_LITERAL) != 0;
/* Copy the default context and data control blocks to the active ones. Then
copy from the pattern the controls that can be set in either the pattern or the
data. This allows them to be overridden in the data line. We do not do this for
@ -6238,6 +6245,7 @@ memcpy(&dat_datctl, &def_datctl, sizeof(datctl));
dat_datctl.control |= (pat_patctl.control & CTL_ALLPD);
dat_datctl.control2 |= (pat_patctl.control2 & CTL2_ALLPD);
strcpy((char *)dat_datctl.replacement, (char *)pat_patctl.replacement);
if (dat_datctl.jitstack == 0) dat_datctl.jitstack = pat_patctl.jitstack;
/* Initialize for scanning the data line. */
@ -6373,7 +6381,7 @@ while ((c = *p++) != 0)
/* Handle a non-escaped character. In non-UTF 32-bit mode with utf8_input
set, do the fudge for setting the top bit. */
if (c != '\\')
if (c != '\\' || subject_literal)
{
uint32_t topbit = 0;
if (test_mode == PCRE32_MODE && c == 0xff && *p != 0)

184
testdata/testinput1 vendored
View File

@ -5924,9 +5924,9 @@ ef) x/x,mark
# addresses in various formats. It's a heavy test for named subpatterns. In the
# <atext> group, slash is coded as \x{2f} so that this pattern can also be
# processed by perltest.sh, which does not cater for an escaped delimiter
# within the pattern. All $ and @ characters in subject strings are escaped so
# that Perl doesn't interpret them as variable insertions and " characters must
# also be escaped for Perl.
# within the pattern. $ within the pattern must also be escaped. All $ and @
# characters in subject strings are escaped so that Perl doesn't interpret them
# as variable insertions and " characters must also be escaped for Perl.
# This set of subpatterns is more or less a direct transliteration of the BNF
# definitions in RFC2822, without any of the obsolete features. The addition of
@ -5937,7 +5937,7 @@ ef) x/x,mark
/(?ix)(?(DEFINE)
(?<addr_spec> (?&local_part) \@ (?&domain) )
(?<angle_addr> (?&CFWS)?+ < (?&addr_spec) > (?&CFWS)?+ )
(?<atext> [a-z\d!#$%&'*+-\x{2f}=?^_`{|}~] )
(?<atext> [a-z\d!#\$%&'*+-\x{2f}=?^_`{|}~] )
(?<atom> (?&CFWS)?+ (?&atext)+ (?&CFWS)?+ )
(?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment) )
(?<ctext> [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ ()\\] )
@ -5981,4 +5981,180 @@ ef) x/x,mark
# --------------------------------------------------------------------------
# This pattern uses named groups to match default PCRE2 patterns. It's another
# heavy test for named subpatterns. Once again, code slash as \x{2f} and escape
# $ even in classes so that this works with pcre2test.
/(?sx)(?(DEFINE)
(?<assertion> (?&simple_assertion) | (?&lookaround) )
(?<atomic_group> \( \? > (?&regex) \) )
(?<back_reference> \\ \d+ |
\\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) |
\\k <(?&groupname)> |
\\k '(?&groupname)' |
\\k \{ (?&groupname) \} |
\( \? P= (?&groupname) \) )
(?<branch> (?:(?&assertion) |
(?&callout) |
(?&comment) |
(?&option_setting) |
(?&qualified_item) |
(?&quoted_string) |
(?&quoted_string_empty) |
(?&special_escape) |
(?&verb)
)* )
(?<callout> \(\?C (?: \d+ |
(?: (?<D>["'`^%\#\$])
(?: \k'D'\k'D' | (?!\k'D') . )* \k'D' |
\{ (?: \}\} | [^}]*+ )* \} )
)? \) )
(?<capturing_group> \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )?
(?&regex) \) )
(?<character_class> \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )
(?<character_type> (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )
(?<class_item> (?: \[ : (?:
alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print|
punct|space|upper|word|xdigit
) : \] |
(?&quoted_string) |
(?&quoted_string_empty) |
(?&escaped_character) |
(?&character_type) |
[^]] ) )
(?<comment> \(\?\# [^)]* \) | (?&quoted_string_empty) | \\E )
(?<condition> (?: \( [+-]? \d+ \) |
\( < (?&groupname) > \) |
\( ' (?&groupname) ' \) |
\( R \d* \) |
\( R & (?&groupname) \) |
\( (?&groupname) \) |
\( DEFINE \) |
\( VERSION >?=\d+(?:\.\d\d?)? \) |
(?&callout)?+ (?&comment)* (?&lookaround) ) )
(?<conditional_group> \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )
(?<delimited_regex> (?<delimiter> [-\x{2f}!"'`=_:;,%&@~]) (?&regex)
\k'delimiter' .* )
(?<escaped_character> \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} |
x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} |
[aefnrt] | c[[:print:]] |
[^[:alnum:]] ) )
(?<group> (?&capturing_group) | (?&non_capturing_group) |
(?&resetting_group) | (?&atomic_group) |
(?&conditional_group) )
(?<groupname> [a-zA-Z_]\w* )
(?<literal_character> (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )
(?<lookaround> \(\? (?: = | ! | <= | <! ) (?&regex) \) )
(?<non_capturing_group> \(\? [iJmnsUx-]* : (?&regex) \) )
(?<option_setting> \(\? [iJmnsUx-]* \) )
(?<qualified_item> (?:\. |
(?&lookaround) |
(?&back_reference) |
(?&character_class) |
(?&character_type) |
(?&escaped_character) |
(?&group) |
(?&subroutine_call) |
(?&literal_character) |
(?&quoted_string)
) (?&comment)? (?&qualifier)? )
(?<qualifier> (?: [?*+] | (?&range_qualifier) ) [+?]? )
(?<quoted_string> (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) )
(?<quoted_string_empty> \\Q\\E )
(?<range_qualifier> \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )
(?<regex> (?&start_item)* (?&branch) (?: \| (?&branch) )* )
(?<resetting_group> \( \? \| (?&regex) \) )
(?<simple_assertion> \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )
(?<special_escape> \\K )
(?<start_item> \( \* (?:
ANY |
ANYCRLF |
BSR_ANYCRLF |
BSR_UNICODE |
CR |
CRLF |
LF |
LIMIT_MATCH=\d+ |
LIMIT_DEPTH=\d+ |
LIMIT_HEAP=\d+ |
NOTEMPTY |
NOTEMPTY_ATSTART |
NO_AUTO_POSSESS |
NO_DOTSTAR_ANCHOR |
NO_JIT |
NO_START_OPT |
NUL |
UTF |
UCP ) \) )
(?<subroutine_call> (?: \(\?R\) | \(\?[+-]?\d+\) |
\(\? (?: & | P> ) (?&groupname) \) |
\\g < (?&groupname) > |
\\g ' (?&groupname) ' |
\\g < [+-]? \d+ > |
\\g ' [+-]? \d+ ) )
(?<verb> \(\* (?: ACCEPT | FAIL | F | COMMIT |
(?:MARK)?:(?&verbname) |
(?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )
(?<verbname> [^)]+ )
) # End DEFINE
# Kick it all off...
^(?&delimited_regex)$/subject_literal,jitstack=256
/^(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\11*(\3\4)\1(?#)2$/
/(cat(a(ract|tonic)|erpillar)) \1()2(3)/
/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
/^From\s+\S+\s+([a-zA-Z]{3}\s+){2}\d{1,2}\s+\d\d:\d\d/
/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is
/^(?(DEFINE) (?<A> a) (?<B> b) ) (?&A) (?&B) /
/(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/
/\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/
/^(\w++|\s++)*$/
/a+b?(*THEN)c+(*FAIL)/
/(A (A|B(*ACCEPT)|C) D)(E)/x
/^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$/i
/A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/B
/(?C`a``b`)(?C'a''b')(?C"a""b")(?C^a^^b^)(?C%a%%b%)(?C#a##b#)(?C$a$$b$)(?C{a}}b})/B,callout_info
/(?sx)(?(DEFINE)(?<assertion> (?&simple_assertion) | (?&lookaround) )(?<atomic_group> \( \? > (?&regex) \) )(?<back_reference> \\ \d+ | \\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) | \\k <(?&groupname)> | \\k '(?&groupname)' | \\k \{ (?&groupname) \} | \( \? P= (?&groupname) \) )(?<branch> (?:(?&assertion) | (?&callout) | (?&comment) | (?&option_setting) | (?&qualified_item) | (?&quoted_string) | (?&quoted_string_empty) | (?&special_escape) | (?&verb) )* )(?<callout> \(\?C (?: \d+ | (?: (?<D>["'`^%\#\$]) (?: \k'D'\k'D' | (?!\k'D') . )* \k'D' | \{ (?: \}\} | [^}]*+ )* \} ) )? \) )(?<capturing_group> \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )? (?&regex) \) )(?<character_class> \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )(?<character_type> (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )(?<class_item> (?: \[ : (?: alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print| punct|space|upper|word|xdigit ) : \] | (?&quoted_string) | (?&quoted_string_empty) | (?&escaped_character) | (?&character_type) | [^]] ) )(?<comment> \(\?\# [^)]* \) | (?&quoted_string_empty) | \\E )(?<condition> (?: \( [+-]? \d+ \) | \( < (?&groupname) > \) | \( ' (?&groupname) ' \) | \( R \d* \) | \( R & (?&groupname) \) | \( (?&groupname) \) | \( DEFINE \) | \( VERSION >?=\d+(?:\.\d\d?)? \) | (?&callout)?+ (?&comment)* (?&lookaround) ) )(?<conditional_group> \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )(?<delimited_regex> (?<delimiter> [-\x{2f}!"'`=_:;,%&@~]) (?&regex) \k'delimiter' .* )(?<escaped_character> \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} | x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} | [aefnrt] | c[[:print:]] | [^[:alnum:]] ) )(?<group> (?&capturing_group) | (?&non_capturing_group) | (?&resetting_group) | (?&atomic_group) | (?&conditional_group) )(?<groupname> [a-zA-Z_]\w* )(?<literal_character> (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )(?<lookaround> \(\? (?: = | ! | <= | <! ) (?&regex) \) )(?<non_capturing_group> \(\? [iJmnsUx-]* : (?&regex) \) )(?<option_setting> \(\? [iJmnsUx-]* \) )(?<qualified_item> (?:\. | (?&lookaround) | (?&back_reference) | (?&character_class) | (?&character_type) | (?&escaped_character) | (?&group) | (?&subroutine_call) | (?&literal_character) | (?&quoted_string) ) (?&comment)? (?&qualifier)? )(?<qualifier> (?: [?*+] | (?&range_qualifier) ) [+?]? )(?<quoted_string> (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) ) (?<quoted_string_empty> \\Q\\E ) (?<range_qualifier> \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )(?<regex> (?&start_item)* (?&branch) (?: \| (?&branch) )* )(?<resetting_group> \( \? \| (?&regex) \) )(?<simple_assertion> \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )(?<special_escape> \\K )(?<start_item> \( \* (?: ANY | ANYCRLF | BSR_ANYCRLF | BSR_UNICODE | CR | CRLF | LF | LIMIT_MATCH=\d+ | LIMIT_DEPTH=\d+ | LIMIT_HEAP=\d+ | NOTEMPTY | NOTEMPTY_ATSTART | NO_AUTO_POSSESS | NO_DOTSTAR_ANCHOR | NO_JIT | NO_START_OPT | NUL | UTF | UCP ) \) )(?<subroutine_call> (?: \(\?R\) | \(\?[+-]?\d+\) | \(\? (?: & | P> ) (?&groupname) \) | \\g < (?&groupname) > | \\g ' (?&groupname) ' | \\g < [+-]? \d+ > | \\g ' [+-]? \d+ ) )(?<verb> \(\* (?: ACCEPT | FAIL | F | COMMIT | (?:MARK)?:(?&verbname) | (?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )(?<verbname> [^)]+ ))^(?&delimited_regex)$/
\= Expect no match
/((?(?C'')\QX\E(?!((?(?C'')(?!X=X));=)r*X=X));=)/
/(?:(?(2y)a|b)(X))+/
/a(*MARK)b/
/a(*CR)b/
/(?P<abn>(?P=abn)(?<badstufxxx)/
# --------------------------------------------------------------------------
# End of testinput1

204
testdata/testoutput1 vendored
View File

@ -9496,9 +9496,9 @@ No match
# addresses in various formats. It's a heavy test for named subpatterns. In the
# <atext> group, slash is coded as \x{2f} so that this pattern can also be
# processed by perltest.sh, which does not cater for an escaped delimiter
# within the pattern. All $ and @ characters in subject strings are escaped so
# that Perl doesn't interpret them as variable insertions and " characters must
# also be escaped for Perl.
# within the pattern. $ within the pattern must also be escaped. All $ and @
# characters in subject strings are escaped so that Perl doesn't interpret them
# as variable insertions and " characters must also be escaped for Perl.
# This set of subpatterns is more or less a direct transliteration of the BNF
# definitions in RFC2822, without any of the obsolete features. The addition of
@ -9509,7 +9509,7 @@ No match
/(?ix)(?(DEFINE)
(?<addr_spec> (?&local_part) \@ (?&domain) )
(?<angle_addr> (?&CFWS)?+ < (?&addr_spec) > (?&CFWS)?+ )
(?<atext> [a-z\d!#$%&'*+-\x{2f}=?^_`{|}~] )
(?<atext> [a-z\d!#\$%&'*+-\x{2f}=?^_`{|}~] )
(?<atom> (?&CFWS)?+ (?&atext)+ (?&CFWS)?+ )
(?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment) )
(?<ctext> [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ ()\\] )
@ -9564,4 +9564,200 @@ No match
# --------------------------------------------------------------------------
# This pattern uses named groups to match default PCRE2 patterns. It's another
# heavy test for named subpatterns. Once again, code slash as \x{2f} and escape
# $ even in classes so that this works with pcre2test.
/(?sx)(?(DEFINE)
(?<assertion> (?&simple_assertion) | (?&lookaround) )
(?<atomic_group> \( \? > (?&regex) \) )
(?<back_reference> \\ \d+ |
\\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) |
\\k <(?&groupname)> |
\\k '(?&groupname)' |
\\k \{ (?&groupname) \} |
\( \? P= (?&groupname) \) )
(?<branch> (?:(?&assertion) |
(?&callout) |
(?&comment) |
(?&option_setting) |
(?&qualified_item) |
(?&quoted_string) |
(?&quoted_string_empty) |
(?&special_escape) |
(?&verb)
)* )
(?<callout> \(\?C (?: \d+ |
(?: (?<D>["'`^%\#\$])
(?: \k'D'\k'D' | (?!\k'D') . )* \k'D' |
\{ (?: \}\} | [^}]*+ )* \} )
)? \) )
(?<capturing_group> \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )?
(?&regex) \) )
(?<character_class> \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )
(?<character_type> (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )
(?<class_item> (?: \[ : (?:
alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print|
punct|space|upper|word|xdigit
) : \] |
(?&quoted_string) |
(?&quoted_string_empty) |
(?&escaped_character) |
(?&character_type) |
[^]] ) )
(?<comment> \(\?\# [^)]* \) | (?&quoted_string_empty) | \\E )
(?<condition> (?: \( [+-]? \d+ \) |
\( < (?&groupname) > \) |
\( ' (?&groupname) ' \) |
\( R \d* \) |
\( R & (?&groupname) \) |
\( (?&groupname) \) |
\( DEFINE \) |
\( VERSION >?=\d+(?:\.\d\d?)? \) |
(?&callout)?+ (?&comment)* (?&lookaround) ) )
(?<conditional_group> \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )
(?<delimited_regex> (?<delimiter> [-\x{2f}!"'`=_:;,%&@~]) (?&regex)
\k'delimiter' .* )
(?<escaped_character> \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} |
x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} |
[aefnrt] | c[[:print:]] |
[^[:alnum:]] ) )
(?<group> (?&capturing_group) | (?&non_capturing_group) |
(?&resetting_group) | (?&atomic_group) |
(?&conditional_group) )
(?<groupname> [a-zA-Z_]\w* )
(?<literal_character> (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )
(?<lookaround> \(\? (?: = | ! | <= | <! ) (?&regex) \) )
(?<non_capturing_group> \(\? [iJmnsUx-]* : (?&regex) \) )
(?<option_setting> \(\? [iJmnsUx-]* \) )
(?<qualified_item> (?:\. |
(?&lookaround) |
(?&back_reference) |
(?&character_class) |
(?&character_type) |
(?&escaped_character) |
(?&group) |
(?&subroutine_call) |
(?&literal_character) |
(?&quoted_string)
) (?&comment)? (?&qualifier)? )
(?<qualifier> (?: [?*+] | (?&range_qualifier) ) [+?]? )
(?<quoted_string> (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) )
(?<quoted_string_empty> \\Q\\E )
(?<range_qualifier> \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )
(?<regex> (?&start_item)* (?&branch) (?: \| (?&branch) )* )
(?<resetting_group> \( \? \| (?&regex) \) )
(?<simple_assertion> \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )
(?<special_escape> \\K )
(?<start_item> \( \* (?:
ANY |
ANYCRLF |
BSR_ANYCRLF |
BSR_UNICODE |
CR |
CRLF |
LF |
LIMIT_MATCH=\d+ |
LIMIT_DEPTH=\d+ |
LIMIT_HEAP=\d+ |
NOTEMPTY |
NOTEMPTY_ATSTART |
NO_AUTO_POSSESS |
NO_DOTSTAR_ANCHOR |
NO_JIT |
NO_START_OPT |
NUL |
UTF |
UCP ) \) )
(?<subroutine_call> (?: \(\?R\) | \(\?[+-]?\d+\) |
\(\? (?: & | P> ) (?&groupname) \) |
\\g < (?&groupname) > |
\\g ' (?&groupname) ' |
\\g < [+-]? \d+ > |
\\g ' [+-]? \d+ ) )
(?<verb> \(\* (?: ACCEPT | FAIL | F | COMMIT |
(?:MARK)?:(?&verbname) |
(?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )
(?<verbname> [^)]+ )
) # End DEFINE
# Kick it all off...
^(?&delimited_regex)$/subject_literal,jitstack=256
/^(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\11*(\3\4)\1(?#)2$/
0: /^(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\11*(\3\4)\1(?#)2$/
/(cat(a(ract|tonic)|erpillar)) \1()2(3)/
0: /(cat(a(ract|tonic)|erpillar)) \1()2(3)/
/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
0: /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
/^From\s+\S+\s+([a-zA-Z]{3}\s+){2}\d{1,2}\s+\d\d:\d\d/
0: /^From\s+\S+\s+([a-zA-Z]{3}\s+){2}\d{1,2}\s+\d\d:\d\d/
/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is
0: /<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is
/^(?(DEFINE) (?<A> a) (?<B> b) ) (?&A) (?&B) /
0: /^(?(DEFINE) (?<A> a) (?<B> b) ) (?&A) (?&B) /
/(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/
0: /(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/
/\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/
0: /\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/
/^(\w++|\s++)*$/
0: /^(\w++|\s++)*$/
/a+b?(*THEN)c+(*FAIL)/
0: /a+b?(*THEN)c+(*FAIL)/
/(A (A|B(*ACCEPT)|C) D)(E)/x
0: /(A (A|B(*ACCEPT)|C) D)(E)/x
/^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$/i
0: /^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$/i
/A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/B
0: /A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/B
/(?C`a``b`)(?C'a''b')(?C"a""b")(?C^a^^b^)(?C%a%%b%)(?C#a##b#)(?C$a$$b$)(?C{a}}b})/B,callout_info
0: /(?C`a``b`)(?C'a''b')(?C"a""b")(?C^a^^b^)(?C%a%%b%)(?C#a##b#)(?C$a$$b$)(?C{a}}b})/B,callout_info
/(?sx)(?(DEFINE)(?<assertion> (?&simple_assertion) | (?&lookaround) )(?<atomic_group> \( \? > (?&regex) \) )(?<back_reference> \\ \d+ | \\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) | \\k <(?&groupname)> | \\k '(?&groupname)' | \\k \{ (?&groupname) \} | \( \? P= (?&groupname) \) )(?<branch> (?:(?&assertion) | (?&callout) | (?&comment) | (?&option_setting) | (?&qualified_item) | (?&quoted_string) | (?&quoted_string_empty) | (?&special_escape) | (?&verb) )* )(?<callout> \(\?C (?: \d+ | (?: (?<D>["'`^%\#\$]) (?: \k'D'\k'D' | (?!\k'D') . )* \k'D' | \{ (?: \}\} | [^}]*+ )* \} ) )? \) )(?<capturing_group> \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )? (?&regex) \) )(?<character_class> \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )(?<character_type> (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )(?<class_item> (?: \[ : (?: alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print| punct|space|upper|word|xdigit ) : \] | (?&quoted_string) | (?&quoted_string_empty) | (?&escaped_character) | (?&character_type) | [^]] ) )(?<comment> \(\?\# [^)]* \) | (?&quoted_string_empty) | \\E )(?<condition> (?: \( [+-]? \d+ \) | \( < (?&groupname) > \) | \( ' (?&groupname) ' \) | \( R \d* \) | \( R & (?&groupname) \) | \( (?&groupname) \) | \( DEFINE \) | \( VERSION >?=\d+(?:\.\d\d?)? \) | (?&callout)?+ (?&comment)* (?&lookaround) ) )(?<conditional_group> \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )(?<delimited_regex> (?<delimiter> [-\x{2f}!"'`=_:;,%&@~]) (?&regex) \k'delimiter' .* )(?<escaped_character> \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} | x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} | [aefnrt] | c[[:print:]] | [^[:alnum:]] ) )(?<group> (?&capturing_group) | (?&non_capturing_group) | (?&resetting_group) | (?&atomic_group) | (?&conditional_group) )(?<groupname> [a-zA-Z_]\w* )(?<literal_character> (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )(?<lookaround> \(\? (?: = | ! | <= | <! ) (?&regex) \) )(?<non_capturing_group> \(\? [iJmnsUx-]* : (?&regex) \) )(?<option_setting> \(\? [iJmnsUx-]* \) )(?<qualified_item> (?:\. | (?&lookaround) | (?&back_reference) | (?&character_class) | (?&character_type) | (?&escaped_character) | (?&group) | (?&subroutine_call) | (?&literal_character) | (?&quoted_string) ) (?&comment)? (?&qualifier)? )(?<qualifier> (?: [?*+] | (?&range_qualifier) ) [+?]? )(?<quoted_string> (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) ) (?<quoted_string_empty> \\Q\\E ) (?<range_qualifier> \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )(?<regex> (?&start_item)* (?&branch) (?: \| (?&branch) )* )(?<resetting_group> \( \? \| (?&regex) \) )(?<simple_assertion> \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )(?<special_escape> \\K )(?<start_item> \( \* (?: ANY | ANYCRLF | BSR_ANYCRLF | BSR_UNICODE | CR | CRLF | LF | LIMIT_MATCH=\d+ | LIMIT_DEPTH=\d+ | LIMIT_HEAP=\d+ | NOTEMPTY | NOTEMPTY_ATSTART | NO_AUTO_POSSESS | NO_DOTSTAR_ANCHOR | NO_JIT | NO_START_OPT | NUL | UTF | UCP ) \) )(?<subroutine_call> (?: \(\?R\) | \(\?[+-]?\d+\) | \(\? (?: & | P> ) (?&groupname) \) | \\g < (?&groupname) > | \\g ' (?&groupname) ' | \\g < [+-]? \d+ > | \\g ' [+-]? \d+ ) )(?<verb> \(\* (?: ACCEPT | FAIL | F | COMMIT | (?:MARK)?:(?&verbname) | (?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )(?<verbname> [^)]+ ))^(?&delimited_regex)$/
0: /(?sx)(?(DEFINE)(?<assertion> (?&simple_assertion) | (?&lookaround) )(?<atomic_group> \( \? > (?&regex) \) )(?<back_reference> \\ \d+ | \\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) | \\k <(?&groupname)> | \\k '(?&groupname)' | \\k \{ (?&groupname) \} | \( \? P= (?&groupname) \) )(?<branch> (?:(?&assertion) | (?&callout) | (?&comment) | (?&option_setting) | (?&qualified_item) | (?&quoted_string) | (?&quoted_string_empty) | (?&special_escape) | (?&verb) )* )(?<callout> \(\?C (?: \d+ | (?: (?<D>["'`^%\#\$]) (?: \k'D'\k'D' | (?!\k'D') . )* \k'D' | \{ (?: \}\} | [^}]*+ )* \} ) )? \) )(?<capturing_group> \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )? (?&regex) \) )(?<character_class> \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )(?<character_type> (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )(?<class_item> (?: \[ : (?: alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print| punct|space|upper|word|xdigit ) : \] | (?&quoted_string) | (?&quoted_string_empty) | (?&escaped_character) | (?&character_type) | [^]] ) )(?<comment> \(\?\# [^)]* \) | (?&quoted_string_empty) | \\E )(?<condition> (?: \( [+-]? \d+ \) | \( < (?&groupname) > \) | \( ' (?&groupname) ' \) | \( R \d* \) | \( R & (?&groupname) \) | \( (?&groupname) \) | \( DEFINE \) | \( VERSION >?=\d+(?:\.\d\d?)? \) | (?&callout)?+ (?&comment)* (?&lookaround) ) )(?<conditional_group> \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )(?<delimited_regex> (?<delimiter> [-\x{2f}!"'`=_:;,%&@~]) (?&regex) \k'delimiter' .* )(?<escaped_character> \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} | x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} | [aefnrt] | c[[:print:]] | [^[:alnum:]] ) )(?<group> (?&capturing_group) | (?&non_capturing_group) | (?&resetting_group) | (?&atomic_group) | (?&conditional_group) )(?<groupname> [a-zA-Z_]\w* )(?<literal_character> (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )(?<lookaround> \(\? (?: = | ! | <= | <! ) (?&regex) \) )(?<non_capturing_group> \(\? [iJmnsUx-]* : (?&regex) \) )(?<option_setting> \(\? [iJmnsUx-]* \) )(?<qualified_item> (?:\. | (?&lookaround) | (?&back_reference) | (?&character_class) | (?&character_type) | (?&escaped_character) | (?&group) | (?&subroutine_call) | (?&literal_character) | (?&quoted_string) ) (?&comment)? (?&qualifier)? )(?<qualifier> (?: [?*+] | (?&range_qualifier) ) [+?]? )(?<quoted_string> (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) ) (?<quoted_string_empty> \\Q\\E ) (?<range_qualifier> \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )(?<regex> (?&start_item)* (?&branch) (?: \| (?&branch) )* )(?<resetting_group> \( \? \| (?&regex) \) )(?<simple_assertion> \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )(?<special_escape> \\K )(?<start_item> \( \* (?: ANY | ANYCRLF | BSR_ANYCRLF | BSR_UNICODE | CR | CRLF | LF | LIMIT_MATCH=\d+ | LIMIT_DEPTH=\d+ | LIMIT_HEAP=\d+ | NOTEMPTY | NOTEMPTY_ATSTART | NO_AUTO_POSSESS | NO_DOTSTAR_ANCHOR | NO_JIT | NO_START_OPT | NUL | UTF | UCP ) \) )(?<subroutine_call> (?: \(\?R\) | \(\?[+-]?\d+\) | \(\? (?: & | P> ) (?&groupname) \) | \\g < (?&groupname) > | \\g ' (?&groupname) ' | \\g < [+-]? \d+ > | \\g ' [+-]? \d+ ) )(?<verb> \(\* (?: ACCEPT | FAIL | F | COMMIT | (?:MARK)?:(?&verbname) | (?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )(?<verbname> [^)]+ ))^(?&delimited_regex)$/
\= Expect no match
/((?(?C'')\QX\E(?!((?(?C'')(?!X=X));=)r*X=X));=)/
No match
/(?:(?(2y)a|b)(X))+/
No match
/a(*MARK)b/
No match
/a(*CR)b/
No match
/(?P<abn>(?P=abn)(?<badstufxxx)/
No match
# --------------------------------------------------------------------------
# End of testinput1