Add subject_literal and allow jitstack in pcre2test pattern modifiers, and add

another big pattern test.
This commit is contained in:
Philip.Hazel 2017-06-12 17:48:03 +00:00
parent 1381c3fe28
commit 6e30ed1b40
6 changed files with 452 additions and 32 deletions

View File

@ -184,6 +184,9 @@ starting offset greater than zero.
39. Implement REG_PEND (GNU extension) for the POSIX wrapper. 39. Implement REG_PEND (GNU extension) for the POSIX wrapper.
40. Implement the subject_literal modifier in pcre2test, and allow jitstack on
pattern lines.
Version 10.23 14-February-2017 Version 10.23 14-February-2017
------------------------------ ------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "06 June 2017" "PCRE 10.30" .TH PCRE2TEST 1 "12 June 2017" "PCRE 10.30"
.SH NAME .SH NAME
pcre2test - a program for testing Perl-compatible regular expressions. pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
@ -69,10 +69,10 @@ want that action.
The input is processed using using C's string functions, so must not The input is processed using using C's string functions, so must not
contain binary zeros, even though in Unix-like environments, \fBfgets()\fP contain binary zeros, even though in Unix-like environments, \fBfgets()\fP
treats any bytes other than newline as data characters. An error is generated treats any bytes other than newline as data characters. An error is generated
if a binary zero is encountered. Subject lines are processed for backslash if a binary zero is encountered. By default subject lines are processed for
escapes, which makes it possible to include any data value in strings that are backslash escapes, which makes it possible to include any data value in strings
passed to the library for matching. For patterns, there is a facility for that are passed to the library for matching. For patterns, there is a facility
specifying some or all of the 8-bit input characters as hexadecimal pairs, for specifying some or all of the 8-bit input characters as hexadecimal pairs,
which makes it possible to include binary zeros. which makes it possible to include binary zeros.
. .
. .
@ -442,8 +442,9 @@ A pattern can be followed by a modifier list (details below).
.sp .sp
Before each subject line is passed to \fBpcre2_match()\fP or Before each subject line is passed to \fBpcre2_match()\fP or
\fBpcre2_dfa_match()\fP, leading and trailing white space is removed, and the \fBpcre2_dfa_match()\fP, leading and trailing white space is removed, and the
line is scanned for backslash escapes. The following provide a means of line is scanned for backslash escapes, unless the \fBsubject_literal\fP
encoding non-printing characters in a visible way: modifier was set for the pattern. The following provide a means of encoding
non-printing characters in a visible way:
.sp .sp
\ea alarm (BEL, \ex07) \ea alarm (BEL, \ex07)
\eb backspace (\ex08) \eb backspace (\ex08)
@ -505,6 +506,11 @@ character. A backslash followed by anything else causes an error. However, if
the very last character in the line is a backslash (and there is no modifier the very last character in the line is a backslash (and there is no modifier
list), it is ignored. This gives a way of passing an empty line as data, since list), it is ignored. This gives a way of passing an empty line as data, since
a real empty line terminates the data input. a real empty line terminates the data input.
.P
If the \fBsubject_literal\fP modifier is set for a pattern, all subject lines
that follow are treated as literals, with no special treatment of backslashes.
No replication is possible, and any subject modifiers must be set as defaults
by a \fB#subject\fP command.
. .
. .
.SH "PATTERN MODIFIERS" .SH "PATTERN MODIFIERS"
@ -602,6 +608,7 @@ heavily used in the test files.
push push compiled pattern onto the stack push push compiled pattern onto the stack
pushcopy push a copy onto the stack pushcopy push a copy onto the stack
stackguard=<number> test the stackguard feature stackguard=<number> test the stackguard feature
subject_literal treat all subject lines as literal
tables=[0|1|2] select internal tables tables=[0|1|2] select internal tables
use_length do not zero-terminate the pattern use_length do not zero-terminate the pattern
utf8_input treat input as UTF-8 utf8_input treat input as UTF-8
@ -967,17 +974,18 @@ are mutually exclusive.
.SS "Setting certain match controls" .SS "Setting certain match controls"
.rs .rs
.sp .sp
The following modifiers are really subject modifiers, and are described below. The following modifiers are really subject modifiers, and are described under
However, they may be included in a pattern's modifier list, in which case they "Subject Modifiers" below. However, they may be included in a pattern's
are applied to every subject line that is processed with that pattern. They may modifier list, in which case they are applied to every subject line that is
not appear in \fB#pattern\fP commands. These modifiers do not affect the processed with that pattern. They may not appear in \fB#pattern\fP commands.
compilation process. These modifiers do not affect the compilation process.
.sp .sp
aftertext show text after match aftertext show text after match
allaftertext show text after captures allaftertext show text after captures
allcaptures show all captures allcaptures show all captures
allusedtext show all consulted text allusedtext show all consulted text
/g global global matching /g global global matching
jitstack=<n> set size of JIT stack
mark show mark values mark show mark values
replace=<string> specify a replacement string replace=<string> specify a replacement string
startchar show starting character when relevant startchar show starting character when relevant
@ -990,6 +998,15 @@ These modifiers may not appear in a \fB#pattern\fP command. If you want them as
defaults, set them in a \fB#subject\fP command. defaults, set them in a \fB#subject\fP command.
. .
. .
.SS "Specifying literal subject lines"
.rs
.sp
If the \fBsubject_literal\fP modifier is present on a pattern, all the subject
lines that it matches are taken as literal strings, with no interpretation of
backslashes. It is not possible to set subject modifiers on such lines, but any
that are set as defaults by a \fB#subject\fP command are recognized.
.
.
.SS "Saving a compiled pattern" .SS "Saving a compiled pattern"
.rs .rs
.sp .sp
@ -1321,9 +1338,11 @@ matching provokes an error return ("bad option value") from
.sp .sp
The \fBjitstack\fP modifier provides a way of setting the maximum stack size The \fBjitstack\fP modifier provides a way of setting the maximum stack size
that is used by the just-in-time optimization code. It is ignored if JIT that is used by the just-in-time optimization code. It is ignored if JIT
optimization is not being used. The value is a number of kilobytes. Providing a optimization is not being used. The value is a number of kilobytes. Setting
stack that is larger than the default 32K is necessary only for very zero reverts to the default of 32K. Providing a stack that is larger than the
complicated patterns. default is necessary only for very complicated patterns. If \fBjitstack\fP is
set non-zero on a subject line it overrides any value that was set on the
pattern.
. .
. .
.SS "Setting heap, match, and depth limits" .SS "Setting heap, match, and depth limits"
@ -1815,6 +1834,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 06 June 2017 Last updated: 12 June 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -42,13 +42,16 @@ fi
# aftertext interpreted as "print $' afterwards" # aftertext interpreted as "print $' afterwards"
# afteralltext ignored # afteralltext ignored
# dupnames ignored (Perl always allows) # dupnames ignored (Perl always allows)
# jitstack ignored
# mark ignored # mark ignored
# no_auto_possess ignored # no_auto_possess ignored
# no_start_optimize ignored # no_start_optimize ignored
# subject_literal does not process subjects for escapes
# ucp sets Perl's /u modifier # ucp sets Perl's /u modifier
# utf invoke UTF-8 functionality # utf invoke UTF-8 functionality
# #
# The data lines must not have any pcre2test modifiers. They are processed as # The data lines must not have any pcre2test modifiers. Unless
# "subject_litersl" is on the pattern, data lines are processed as
# Perl double-quoted strings, so if they contain " $ or @ characters, these # Perl double-quoted strings, so if they contain " $ or @ characters, these
# have to be escaped. For this reason, all such characters in the # have to be escaped. For this reason, all such characters in the
# Perl-compatible testinput1 and testinput4 files are escaped so that they can # Perl-compatible testinput1 and testinput4 files are escaped so that they can
@ -138,16 +141,20 @@ for (;;)
chomp($pattern); chomp($pattern);
$pattern =~ s/\s+$//; $pattern =~ s/\s+$//;
# Split the pattern from the modifiers and adjust them as necessary. # Split the pattern from the modifiers and adjust them as necessary.
$pattern =~ /^\s*((.).*\2)(.*)$/s; $pattern =~ /^\s*((.).*\2)(.*)$/s;
$pat = $1; $pat = $1;
$mod = $3; $mod = $3;
# The private "aftertext" modifier means "print $' afterwards". # The private "aftertext" modifier means "print $' afterwards".
$showrest = ($mod =~ s/aftertext,?//); $showrest = ($mod =~ s/aftertext,?//);
# The "subject_literal" modifer disables escapes in subjects.
$subject_literal = ($mod =~ s/subject_literal,?//);
# "allaftertext" is used by pcre2test to print remainders after captures # "allaftertext" is used by pcre2test to print remainders after captures
@ -161,6 +168,10 @@ for (;;)
$mod =~ s/dupnames,?//; $mod =~ s/dupnames,?//;
# Remove "jitstack".
$mod =~ s/jitstack=\d+,?//;
# Remove "mark" (asks pcre2test to check MARK data) */ # Remove "mark" (asks pcre2test to check MARK data) */
$mod =~ s/mark,?//; $mod =~ s/mark,?//;
@ -222,7 +233,14 @@ for (;;)
last if ($_ eq ""); last if ($_ eq "");
next if $_ =~ /^\\=(?:\s|$)/; # Comment line next if $_ =~ /^\\=(?:\s|$)/; # Comment line
$x = eval "\"$_\""; # To get escapes processed if ($subject_literal)
{
$x = $_;
}
else
{
$x = eval "\"$_\""; # To get escapes processed
}
# Empty array for holding results, ensure $REGERROR and $REGMARK are # Empty array for holding results, ensure $REGERROR and $REGMARK are
# unset, then do the matching. # unset, then do the matching.

View File

@ -479,6 +479,7 @@ so many of them that they are split into two fields. */
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000002u #define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000002u
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000004u #define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000004u
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u #define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
#define CTL2_SUBJECT_LITERAL 0x00000010u
#define CTL_NL_SET 0x40000000u /* Informational */ #define CTL_NL_SET 0x40000000u /* Informational */
#define CTL_BSR_SET 0x80000000u /* Informational */ #define CTL_BSR_SET 0x80000000u /* Informational */
@ -518,6 +519,7 @@ typedef struct patctl { /* Structure for pattern modifiers. */
uint32_t options; /* Must be in same position as datctl */ uint32_t options; /* Must be in same position as datctl */
uint32_t control; /* Must be in same position as datctl */ uint32_t control; /* Must be in same position as datctl */
uint32_t control2; /* Must be in same position as datctl */ uint32_t control2; /* Must be in same position as datctl */
uint32_t jitstack; /* Must be in same position as datctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */ uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t jit; uint32_t jit;
uint32_t stackguard_test; uint32_t stackguard_test;
@ -537,6 +539,7 @@ typedef struct datctl { /* Structure for data line modifiers. */
uint32_t options; /* Must be in same position as patctl */ uint32_t options; /* Must be in same position as patctl */
uint32_t control; /* Must be in same position as patctl */ uint32_t control; /* Must be in same position as patctl */
uint32_t control2; /* Must be in same position as patctl */ uint32_t control2; /* Must be in same position as patctl */
uint32_t jitstack; /* Must be in same position as patctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */ uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t startend[2]; uint32_t startend[2];
uint32_t cerror[2]; uint32_t cerror[2];
@ -544,7 +547,6 @@ typedef struct datctl { /* Structure for data line modifiers. */
int32_t callout_data; int32_t callout_data;
int32_t copy_numbers[MAXCPYGET]; int32_t copy_numbers[MAXCPYGET];
int32_t get_numbers[MAXCPYGET]; int32_t get_numbers[MAXCPYGET];
uint32_t jitstack;
uint32_t oveccount; uint32_t oveccount;
uint32_t offset; uint32_t offset;
uint8_t copy_names[LENCPYGET]; uint8_t copy_names[LENCPYGET];
@ -630,7 +632,7 @@ static modstruct modlist[] = {
{ "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) }, { "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) },
{ "jit", MOD_PAT, MOD_IND, 7, PO(jit) }, { "jit", MOD_PAT, MOD_IND, 7, PO(jit) },
{ "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) }, { "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) },
{ "jitstack", MOD_DAT, MOD_INT, 0, DO(jitstack) }, { "jitstack", MOD_PNDP, MOD_INT, 0, PO(jitstack) },
{ "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) }, { "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) },
{ "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) }, { "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) },
{ "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) }, { "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) },
@ -674,6 +676,7 @@ static modstruct modlist[] = {
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) }, { "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) }, { "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
{ "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) }, { "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) },
{ "subject_literal", MOD_PATP, MOD_CTL, CTL2_SUBJECT_LITERAL, PO(control2) },
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) }, { "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) }, { "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
{ "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) }, { "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
@ -3477,7 +3480,8 @@ switch (m->which)
case MOD_PND: /* Ditto, but not default pattern */ case MOD_PND: /* Ditto, but not default pattern */
case MOD_PNDP: /* Ditto, allowed for Perl test */ case MOD_PNDP: /* Ditto, allowed for Perl test */
if (dctl != NULL) field = dctl; if (dctl != NULL) field = dctl;
else if (pctl != NULL && (m->which == MOD_PD || ctx != CTX_DEFPAT)) else if (pctl != NULL && (m->which == MOD_PD || m->which == MOD_PDP ||
ctx != CTX_DEFPAT))
field = pctl; field = pctl;
break; break;
} }
@ -6216,6 +6220,7 @@ uint8_t *p, *pp, *start_rep;
size_t needlen; size_t needlen;
void *use_dat_context; void *use_dat_context;
BOOL utf; BOOL utf;
BOOL subject_literal;
#ifdef SUPPORT_PCRE2_8 #ifdef SUPPORT_PCRE2_8
uint8_t *q8 = NULL; uint8_t *q8 = NULL;
@ -6227,6 +6232,8 @@ uint16_t *q16 = NULL;
uint32_t *q32 = NULL; uint32_t *q32 = NULL;
#endif #endif
subject_literal = (pat_patctl.control2 & CTL2_SUBJECT_LITERAL) != 0;
/* Copy the default context and data control blocks to the active ones. Then /* Copy the default context and data control blocks to the active ones. Then
copy from the pattern the controls that can be set in either the pattern or the copy from the pattern the controls that can be set in either the pattern or the
data. This allows them to be overridden in the data line. We do not do this for data. This allows them to be overridden in the data line. We do not do this for
@ -6238,6 +6245,7 @@ memcpy(&dat_datctl, &def_datctl, sizeof(datctl));
dat_datctl.control |= (pat_patctl.control & CTL_ALLPD); dat_datctl.control |= (pat_patctl.control & CTL_ALLPD);
dat_datctl.control2 |= (pat_patctl.control2 & CTL2_ALLPD); dat_datctl.control2 |= (pat_patctl.control2 & CTL2_ALLPD);
strcpy((char *)dat_datctl.replacement, (char *)pat_patctl.replacement); strcpy((char *)dat_datctl.replacement, (char *)pat_patctl.replacement);
if (dat_datctl.jitstack == 0) dat_datctl.jitstack = pat_patctl.jitstack;
/* Initialize for scanning the data line. */ /* Initialize for scanning the data line. */
@ -6373,7 +6381,7 @@ while ((c = *p++) != 0)
/* Handle a non-escaped character. In non-UTF 32-bit mode with utf8_input /* Handle a non-escaped character. In non-UTF 32-bit mode with utf8_input
set, do the fudge for setting the top bit. */ set, do the fudge for setting the top bit. */
if (c != '\\') if (c != '\\' || subject_literal)
{ {
uint32_t topbit = 0; uint32_t topbit = 0;
if (test_mode == PCRE32_MODE && c == 0xff && *p != 0) if (test_mode == PCRE32_MODE && c == 0xff && *p != 0)

184
testdata/testinput1 vendored
View File

@ -5924,9 +5924,9 @@ ef) x/x,mark
# addresses in various formats. It's a heavy test for named subpatterns. In the # addresses in various formats. It's a heavy test for named subpatterns. In the
# <atext> group, slash is coded as \x{2f} so that this pattern can also be # <atext> group, slash is coded as \x{2f} so that this pattern can also be
# processed by perltest.sh, which does not cater for an escaped delimiter # processed by perltest.sh, which does not cater for an escaped delimiter
# within the pattern. All $ and @ characters in subject strings are escaped so # within the pattern. $ within the pattern must also be escaped. All $ and @
# that Perl doesn't interpret them as variable insertions and " characters must # characters in subject strings are escaped so that Perl doesn't interpret them
# also be escaped for Perl. # as variable insertions and " characters must also be escaped for Perl.
# This set of subpatterns is more or less a direct transliteration of the BNF # This set of subpatterns is more or less a direct transliteration of the BNF
# definitions in RFC2822, without any of the obsolete features. The addition of # definitions in RFC2822, without any of the obsolete features. The addition of
@ -5937,7 +5937,7 @@ ef) x/x,mark
/(?ix)(?(DEFINE) /(?ix)(?(DEFINE)
(?<addr_spec> (?&local_part) \@ (?&domain) ) (?<addr_spec> (?&local_part) \@ (?&domain) )
(?<angle_addr> (?&CFWS)?+ < (?&addr_spec) > (?&CFWS)?+ ) (?<angle_addr> (?&CFWS)?+ < (?&addr_spec) > (?&CFWS)?+ )
(?<atext> [a-z\d!#$%&'*+-\x{2f}=?^_`{|}~] ) (?<atext> [a-z\d!#\$%&'*+-\x{2f}=?^_`{|}~] )
(?<atom> (?&CFWS)?+ (?&atext)+ (?&CFWS)?+ ) (?<atom> (?&CFWS)?+ (?&atext)+ (?&CFWS)?+ )
(?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment) ) (?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment) )
(?<ctext> [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ ()\\] ) (?<ctext> [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ ()\\] )
@ -5981,4 +5981,180 @@ ef) x/x,mark
# -------------------------------------------------------------------------- # --------------------------------------------------------------------------
# This pattern uses named groups to match default PCRE2 patterns. It's another
# heavy test for named subpatterns. Once again, code slash as \x{2f} and escape
# $ even in classes so that this works with pcre2test.
/(?sx)(?(DEFINE)
(?<assertion> (?&simple_assertion) | (?&lookaround) )
(?<atomic_group> \( \? > (?&regex) \) )
(?<back_reference> \\ \d+ |
\\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) |
\\k <(?&groupname)> |
\\k '(?&groupname)' |
\\k \{ (?&groupname) \} |
\( \? P= (?&groupname) \) )
(?<branch> (?:(?&assertion) |
(?&callout) |
(?&comment) |
(?&option_setting) |
(?&qualified_item) |
(?&quoted_string) |
(?&quoted_string_empty) |
(?&special_escape) |
(?&verb)
)* )
(?<callout> \(\?C (?: \d+ |
(?: (?<D>["'`^%\#\$])
(?: \k'D'\k'D' | (?!\k'D') . )* \k'D' |
\{ (?: \}\} | [^}]*+ )* \} )
)? \) )
(?<capturing_group> \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )?
(?&regex) \) )
(?<character_class> \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )
(?<character_type> (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )
(?<class_item> (?: \[ : (?:
alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print|
punct|space|upper|word|xdigit
) : \] |
(?&quoted_string) |
(?&quoted_string_empty) |
(?&escaped_character) |
(?&character_type) |
[^]] ) )
(?<comment> \(\?\# [^)]* \) | (?&quoted_string_empty) | \\E )
(?<condition> (?: \( [+-]? \d+ \) |
\( < (?&groupname) > \) |
\( ' (?&groupname) ' \) |
\( R \d* \) |
\( R & (?&groupname) \) |
\( (?&groupname) \) |
\( DEFINE \) |
\( VERSION >?=\d+(?:\.\d\d?)? \) |
(?&callout)?+ (?&comment)* (?&lookaround) ) )
(?<conditional_group> \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )
(?<delimited_regex> (?<delimiter> [-\x{2f}!"'`=_:;,%&@~]) (?&regex)
\k'delimiter' .* )
(?<escaped_character> \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} |
x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} |
[aefnrt] | c[[:print:]] |
[^[:alnum:]] ) )
(?<group> (?&capturing_group) | (?&non_capturing_group) |
(?&resetting_group) | (?&atomic_group) |
(?&conditional_group) )
(?<groupname> [a-zA-Z_]\w* )
(?<literal_character> (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )
(?<lookaround> \(\? (?: = | ! | <= | <! ) (?&regex) \) )
(?<non_capturing_group> \(\? [iJmnsUx-]* : (?&regex) \) )
(?<option_setting> \(\? [iJmnsUx-]* \) )
(?<qualified_item> (?:\. |
(?&lookaround) |
(?&back_reference) |
(?&character_class) |
(?&character_type) |
(?&escaped_character) |
(?&group) |
(?&subroutine_call) |
(?&literal_character) |
(?&quoted_string)
) (?&comment)? (?&qualifier)? )
(?<qualifier> (?: [?*+] | (?&range_qualifier) ) [+?]? )
(?<quoted_string> (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) )
(?<quoted_string_empty> \\Q\\E )
(?<range_qualifier> \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )
(?<regex> (?&start_item)* (?&branch) (?: \| (?&branch) )* )
(?<resetting_group> \( \? \| (?&regex) \) )
(?<simple_assertion> \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )
(?<special_escape> \\K )
(?<start_item> \( \* (?:
ANY |
ANYCRLF |
BSR_ANYCRLF |
BSR_UNICODE |
CR |
CRLF |
LF |
LIMIT_MATCH=\d+ |
LIMIT_DEPTH=\d+ |
LIMIT_HEAP=\d+ |
NOTEMPTY |
NOTEMPTY_ATSTART |
NO_AUTO_POSSESS |
NO_DOTSTAR_ANCHOR |
NO_JIT |
NO_START_OPT |
NUL |
UTF |
UCP ) \) )
(?<subroutine_call> (?: \(\?R\) | \(\?[+-]?\d+\) |
\(\? (?: & | P> ) (?&groupname) \) |
\\g < (?&groupname) > |
\\g ' (?&groupname) ' |
\\g < [+-]? \d+ > |
\\g ' [+-]? \d+ ) )
(?<verb> \(\* (?: ACCEPT | FAIL | F | COMMIT |
(?:MARK)?:(?&verbname) |
(?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )
(?<verbname> [^)]+ )
) # End DEFINE
# Kick it all off...
^(?&delimited_regex)$/subject_literal,jitstack=256
/^(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\11*(\3\4)\1(?#)2$/
/(cat(a(ract|tonic)|erpillar)) \1()2(3)/
/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
/^From\s+\S+\s+([a-zA-Z]{3}\s+){2}\d{1,2}\s+\d\d:\d\d/
/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is
/^(?(DEFINE) (?<A> a) (?<B> b) ) (?&A) (?&B) /
/(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/
/\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/
/^(\w++|\s++)*$/
/a+b?(*THEN)c+(*FAIL)/
/(A (A|B(*ACCEPT)|C) D)(E)/x
/^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$/i
/A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/B
/(?C`a``b`)(?C'a''b')(?C"a""b")(?C^a^^b^)(?C%a%%b%)(?C#a##b#)(?C$a$$b$)(?C{a}}b})/B,callout_info
/(?sx)(?(DEFINE)(?<assertion> (?&simple_assertion) | (?&lookaround) )(?<atomic_group> \( \? > (?&regex) \) )(?<back_reference> \\ \d+ | \\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) | \\k <(?&groupname)> | \\k '(?&groupname)' | \\k \{ (?&groupname) \} | \( \? P= (?&groupname) \) )(?<branch> (?:(?&assertion) | (?&callout) | (?&comment) | (?&option_setting) | (?&qualified_item) | (?&quoted_string) | (?&quoted_string_empty) | (?&special_escape) | (?&verb) )* )(?<callout> \(\?C (?: \d+ | (?: (?<D>["'`^%\#\$]) (?: \k'D'\k'D' | (?!\k'D') . )* \k'D' | \{ (?: \}\} | [^}]*+ )* \} ) )? \) )(?<capturing_group> \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )? (?&regex) \) )(?<character_class> \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )(?<character_type> (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )(?<class_item> (?: \[ : (?: alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print| punct|space|upper|word|xdigit ) : \] | (?&quoted_string) | (?&quoted_string_empty) | (?&escaped_character) | (?&character_type) | [^]] ) )(?<comment> \(\?\# [^)]* \) | (?&quoted_string_empty) | \\E )(?<condition> (?: \( [+-]? \d+ \) | \( < (?&groupname) > \) | \( ' (?&groupname) ' \) | \( R \d* \) | \( R & (?&groupname) \) | \( (?&groupname) \) | \( DEFINE \) | \( VERSION >?=\d+(?:\.\d\d?)? \) | (?&callout)?+ (?&comment)* (?&lookaround) ) )(?<conditional_group> \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )(?<delimited_regex> (?<delimiter> [-\x{2f}!"'`=_:;,%&@~]) (?&regex) \k'delimiter' .* )(?<escaped_character> \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} | x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} | [aefnrt] | c[[:print:]] | [^[:alnum:]] ) )(?<group> (?&capturing_group) | (?&non_capturing_group) | (?&resetting_group) | (?&atomic_group) | (?&conditional_group) )(?<groupname> [a-zA-Z_]\w* )(?<literal_character> (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )(?<lookaround> \(\? (?: = | ! | <= | <! ) (?&regex) \) )(?<non_capturing_group> \(\? [iJmnsUx-]* : (?&regex) \) )(?<option_setting> \(\? [iJmnsUx-]* \) )(?<qualified_item> (?:\. | (?&lookaround) | (?&back_reference) | (?&character_class) | (?&character_type) | (?&escaped_character) | (?&group) | (?&subroutine_call) | (?&literal_character) | (?&quoted_string) ) (?&comment)? (?&qualifier)? )(?<qualifier> (?: [?*+] | (?&range_qualifier) ) [+?]? )(?<quoted_string> (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) ) (?<quoted_string_empty> \\Q\\E ) (?<range_qualifier> \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )(?<regex> (?&start_item)* (?&branch) (?: \| (?&branch) )* )(?<resetting_group> \( \? \| (?&regex) \) )(?<simple_assertion> \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )(?<special_escape> \\K )(?<start_item> \( \* (?: ANY | ANYCRLF | BSR_ANYCRLF | BSR_UNICODE | CR | CRLF | LF | LIMIT_MATCH=\d+ | LIMIT_DEPTH=\d+ | LIMIT_HEAP=\d+ | NOTEMPTY | NOTEMPTY_ATSTART | NO_AUTO_POSSESS | NO_DOTSTAR_ANCHOR | NO_JIT | NO_START_OPT | NUL | UTF | UCP ) \) )(?<subroutine_call> (?: \(\?R\) | \(\?[+-]?\d+\) | \(\? (?: & | P> ) (?&groupname) \) | \\g < (?&groupname) > | \\g ' (?&groupname) ' | \\g < [+-]? \d+ > | \\g ' [+-]? \d+ ) )(?<verb> \(\* (?: ACCEPT | FAIL | F | COMMIT | (?:MARK)?:(?&verbname) | (?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )(?<verbname> [^)]+ ))^(?&delimited_regex)$/
\= Expect no match
/((?(?C'')\QX\E(?!((?(?C'')(?!X=X));=)r*X=X));=)/
/(?:(?(2y)a|b)(X))+/
/a(*MARK)b/
/a(*CR)b/
/(?P<abn>(?P=abn)(?<badstufxxx)/
# --------------------------------------------------------------------------
# End of testinput1 # End of testinput1

204
testdata/testoutput1 vendored
View File

@ -9496,9 +9496,9 @@ No match
# addresses in various formats. It's a heavy test for named subpatterns. In the # addresses in various formats. It's a heavy test for named subpatterns. In the
# <atext> group, slash is coded as \x{2f} so that this pattern can also be # <atext> group, slash is coded as \x{2f} so that this pattern can also be
# processed by perltest.sh, which does not cater for an escaped delimiter # processed by perltest.sh, which does not cater for an escaped delimiter
# within the pattern. All $ and @ characters in subject strings are escaped so # within the pattern. $ within the pattern must also be escaped. All $ and @
# that Perl doesn't interpret them as variable insertions and " characters must # characters in subject strings are escaped so that Perl doesn't interpret them
# also be escaped for Perl. # as variable insertions and " characters must also be escaped for Perl.
# This set of subpatterns is more or less a direct transliteration of the BNF # This set of subpatterns is more or less a direct transliteration of the BNF
# definitions in RFC2822, without any of the obsolete features. The addition of # definitions in RFC2822, without any of the obsolete features. The addition of
@ -9509,7 +9509,7 @@ No match
/(?ix)(?(DEFINE) /(?ix)(?(DEFINE)
(?<addr_spec> (?&local_part) \@ (?&domain) ) (?<addr_spec> (?&local_part) \@ (?&domain) )
(?<angle_addr> (?&CFWS)?+ < (?&addr_spec) > (?&CFWS)?+ ) (?<angle_addr> (?&CFWS)?+ < (?&addr_spec) > (?&CFWS)?+ )
(?<atext> [a-z\d!#$%&'*+-\x{2f}=?^_`{|}~] ) (?<atext> [a-z\d!#\$%&'*+-\x{2f}=?^_`{|}~] )
(?<atom> (?&CFWS)?+ (?&atext)+ (?&CFWS)?+ ) (?<atom> (?&CFWS)?+ (?&atext)+ (?&CFWS)?+ )
(?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment) ) (?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment) )
(?<ctext> [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ ()\\] ) (?<ctext> [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ ()\\] )
@ -9564,4 +9564,200 @@ No match
# -------------------------------------------------------------------------- # --------------------------------------------------------------------------
# This pattern uses named groups to match default PCRE2 patterns. It's another
# heavy test for named subpatterns. Once again, code slash as \x{2f} and escape
# $ even in classes so that this works with pcre2test.
/(?sx)(?(DEFINE)
(?<assertion> (?&simple_assertion) | (?&lookaround) )
(?<atomic_group> \( \? > (?&regex) \) )
(?<back_reference> \\ \d+ |
\\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) |
\\k <(?&groupname)> |
\\k '(?&groupname)' |
\\k \{ (?&groupname) \} |
\( \? P= (?&groupname) \) )
(?<branch> (?:(?&assertion) |
(?&callout) |
(?&comment) |
(?&option_setting) |
(?&qualified_item) |
(?&quoted_string) |
(?&quoted_string_empty) |
(?&special_escape) |
(?&verb)
)* )
(?<callout> \(\?C (?: \d+ |
(?: (?<D>["'`^%\#\$])
(?: \k'D'\k'D' | (?!\k'D') . )* \k'D' |
\{ (?: \}\} | [^}]*+ )* \} )
)? \) )
(?<capturing_group> \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )?
(?&regex) \) )
(?<character_class> \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )
(?<character_type> (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )
(?<class_item> (?: \[ : (?:
alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print|
punct|space|upper|word|xdigit
) : \] |
(?&quoted_string) |
(?&quoted_string_empty) |
(?&escaped_character) |
(?&character_type) |
[^]] ) )
(?<comment> \(\?\# [^)]* \) | (?&quoted_string_empty) | \\E )
(?<condition> (?: \( [+-]? \d+ \) |
\( < (?&groupname) > \) |
\( ' (?&groupname) ' \) |
\( R \d* \) |
\( R & (?&groupname) \) |
\( (?&groupname) \) |
\( DEFINE \) |
\( VERSION >?=\d+(?:\.\d\d?)? \) |
(?&callout)?+ (?&comment)* (?&lookaround) ) )
(?<conditional_group> \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )
(?<delimited_regex> (?<delimiter> [-\x{2f}!"'`=_:;,%&@~]) (?&regex)
\k'delimiter' .* )
(?<escaped_character> \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} |
x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} |
[aefnrt] | c[[:print:]] |
[^[:alnum:]] ) )
(?<group> (?&capturing_group) | (?&non_capturing_group) |
(?&resetting_group) | (?&atomic_group) |
(?&conditional_group) )
(?<groupname> [a-zA-Z_]\w* )
(?<literal_character> (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )
(?<lookaround> \(\? (?: = | ! | <= | <! ) (?&regex) \) )
(?<non_capturing_group> \(\? [iJmnsUx-]* : (?&regex) \) )
(?<option_setting> \(\? [iJmnsUx-]* \) )
(?<qualified_item> (?:\. |
(?&lookaround) |
(?&back_reference) |
(?&character_class) |
(?&character_type) |
(?&escaped_character) |
(?&group) |
(?&subroutine_call) |
(?&literal_character) |
(?&quoted_string)
) (?&comment)? (?&qualifier)? )
(?<qualifier> (?: [?*+] | (?&range_qualifier) ) [+?]? )
(?<quoted_string> (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) )
(?<quoted_string_empty> \\Q\\E )
(?<range_qualifier> \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )
(?<regex> (?&start_item)* (?&branch) (?: \| (?&branch) )* )
(?<resetting_group> \( \? \| (?&regex) \) )
(?<simple_assertion> \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )
(?<special_escape> \\K )
(?<start_item> \( \* (?:
ANY |
ANYCRLF |
BSR_ANYCRLF |
BSR_UNICODE |
CR |
CRLF |
LF |
LIMIT_MATCH=\d+ |
LIMIT_DEPTH=\d+ |
LIMIT_HEAP=\d+ |
NOTEMPTY |
NOTEMPTY_ATSTART |
NO_AUTO_POSSESS |
NO_DOTSTAR_ANCHOR |
NO_JIT |
NO_START_OPT |
NUL |
UTF |
UCP ) \) )
(?<subroutine_call> (?: \(\?R\) | \(\?[+-]?\d+\) |
\(\? (?: & | P> ) (?&groupname) \) |
\\g < (?&groupname) > |
\\g ' (?&groupname) ' |
\\g < [+-]? \d+ > |
\\g ' [+-]? \d+ ) )
(?<verb> \(\* (?: ACCEPT | FAIL | F | COMMIT |
(?:MARK)?:(?&verbname) |
(?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )
(?<verbname> [^)]+ )
) # End DEFINE
# Kick it all off...
^(?&delimited_regex)$/subject_literal,jitstack=256
/^(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\11*(\3\4)\1(?#)2$/
0: /^(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\11*(\3\4)\1(?#)2$/
/(cat(a(ract|tonic)|erpillar)) \1()2(3)/
0: /(cat(a(ract|tonic)|erpillar)) \1()2(3)/
/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
0: /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
/^From\s+\S+\s+([a-zA-Z]{3}\s+){2}\d{1,2}\s+\d\d:\d\d/
0: /^From\s+\S+\s+([a-zA-Z]{3}\s+){2}\d{1,2}\s+\d\d:\d\d/
/<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is
0: /<tr([\w\W\s\d][^<>]{0,})><TD([\w\W\s\d][^<>]{0,})>([\d]{0,}\.)(.*)((<BR>([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><TD([\w\W\s\d][^<>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is
/^(?(DEFINE) (?<A> a) (?<B> b) ) (?&A) (?&B) /
0: /^(?(DEFINE) (?<A> a) (?<B> b) ) (?&A) (?&B) /
/(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/
0: /(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/
/\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/
0: /\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?<byte>2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/
/^(\w++|\s++)*$/
0: /^(\w++|\s++)*$/
/a+b?(*THEN)c+(*FAIL)/
0: /a+b?(*THEN)c+(*FAIL)/
/(A (A|B(*ACCEPT)|C) D)(E)/x
0: /(A (A|B(*ACCEPT)|C) D)(E)/x
/^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$/i
0: /^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$/i
/A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/B
0: /A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/B
/(?C`a``b`)(?C'a''b')(?C"a""b")(?C^a^^b^)(?C%a%%b%)(?C#a##b#)(?C$a$$b$)(?C{a}}b})/B,callout_info
0: /(?C`a``b`)(?C'a''b')(?C"a""b")(?C^a^^b^)(?C%a%%b%)(?C#a##b#)(?C$a$$b$)(?C{a}}b})/B,callout_info
/(?sx)(?(DEFINE)(?<assertion> (?&simple_assertion) | (?&lookaround) )(?<atomic_group> \( \? > (?&regex) \) )(?<back_reference> \\ \d+ | \\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) | \\k <(?&groupname)> | \\k '(?&groupname)' | \\k \{ (?&groupname) \} | \( \? P= (?&groupname) \) )(?<branch> (?:(?&assertion) | (?&callout) | (?&comment) | (?&option_setting) | (?&qualified_item) | (?&quoted_string) | (?&quoted_string_empty) | (?&special_escape) | (?&verb) )* )(?<callout> \(\?C (?: \d+ | (?: (?<D>["'`^%\#\$]) (?: \k'D'\k'D' | (?!\k'D') . )* \k'D' | \{ (?: \}\} | [^}]*+ )* \} ) )? \) )(?<capturing_group> \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )? (?&regex) \) )(?<character_class> \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )(?<character_type> (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )(?<class_item> (?: \[ : (?: alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print| punct|space|upper|word|xdigit ) : \] | (?&quoted_string) | (?&quoted_string_empty) | (?&escaped_character) | (?&character_type) | [^]] ) )(?<comment> \(\?\# [^)]* \) | (?&quoted_string_empty) | \\E )(?<condition> (?: \( [+-]? \d+ \) | \( < (?&groupname) > \) | \( ' (?&groupname) ' \) | \( R \d* \) | \( R & (?&groupname) \) | \( (?&groupname) \) | \( DEFINE \) | \( VERSION >?=\d+(?:\.\d\d?)? \) | (?&callout)?+ (?&comment)* (?&lookaround) ) )(?<conditional_group> \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )(?<delimited_regex> (?<delimiter> [-\x{2f}!"'`=_:;,%&@~]) (?&regex) \k'delimiter' .* )(?<escaped_character> \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} | x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} | [aefnrt] | c[[:print:]] | [^[:alnum:]] ) )(?<group> (?&capturing_group) | (?&non_capturing_group) | (?&resetting_group) | (?&atomic_group) | (?&conditional_group) )(?<groupname> [a-zA-Z_]\w* )(?<literal_character> (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )(?<lookaround> \(\? (?: = | ! | <= | <! ) (?&regex) \) )(?<non_capturing_group> \(\? [iJmnsUx-]* : (?&regex) \) )(?<option_setting> \(\? [iJmnsUx-]* \) )(?<qualified_item> (?:\. | (?&lookaround) | (?&back_reference) | (?&character_class) | (?&character_type) | (?&escaped_character) | (?&group) | (?&subroutine_call) | (?&literal_character) | (?&quoted_string) ) (?&comment)? (?&qualifier)? )(?<qualifier> (?: [?*+] | (?&range_qualifier) ) [+?]? )(?<quoted_string> (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) ) (?<quoted_string_empty> \\Q\\E ) (?<range_qualifier> \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )(?<regex> (?&start_item)* (?&branch) (?: \| (?&branch) )* )(?<resetting_group> \( \? \| (?&regex) \) )(?<simple_assertion> \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )(?<special_escape> \\K )(?<start_item> \( \* (?: ANY | ANYCRLF | BSR_ANYCRLF | BSR_UNICODE | CR | CRLF | LF | LIMIT_MATCH=\d+ | LIMIT_DEPTH=\d+ | LIMIT_HEAP=\d+ | NOTEMPTY | NOTEMPTY_ATSTART | NO_AUTO_POSSESS | NO_DOTSTAR_ANCHOR | NO_JIT | NO_START_OPT | NUL | UTF | UCP ) \) )(?<subroutine_call> (?: \(\?R\) | \(\?[+-]?\d+\) | \(\? (?: & | P> ) (?&groupname) \) | \\g < (?&groupname) > | \\g ' (?&groupname) ' | \\g < [+-]? \d+ > | \\g ' [+-]? \d+ ) )(?<verb> \(\* (?: ACCEPT | FAIL | F | COMMIT | (?:MARK)?:(?&verbname) | (?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )(?<verbname> [^)]+ ))^(?&delimited_regex)$/
0: /(?sx)(?(DEFINE)(?<assertion> (?&simple_assertion) | (?&lookaround) )(?<atomic_group> \( \? > (?&regex) \) )(?<back_reference> \\ \d+ | \\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) | \\k <(?&groupname)> | \\k '(?&groupname)' | \\k \{ (?&groupname) \} | \( \? P= (?&groupname) \) )(?<branch> (?:(?&assertion) | (?&callout) | (?&comment) | (?&option_setting) | (?&qualified_item) | (?&quoted_string) | (?&quoted_string_empty) | (?&special_escape) | (?&verb) )* )(?<callout> \(\?C (?: \d+ | (?: (?<D>["'`^%\#\$]) (?: \k'D'\k'D' | (?!\k'D') . )* \k'D' | \{ (?: \}\} | [^}]*+ )* \} ) )? \) )(?<capturing_group> \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )? (?&regex) \) )(?<character_class> \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )(?<character_type> (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )(?<class_item> (?: \[ : (?: alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print| punct|space|upper|word|xdigit ) : \] | (?&quoted_string) | (?&quoted_string_empty) | (?&escaped_character) | (?&character_type) | [^]] ) )(?<comment> \(\?\# [^)]* \) | (?&quoted_string_empty) | \\E )(?<condition> (?: \( [+-]? \d+ \) | \( < (?&groupname) > \) | \( ' (?&groupname) ' \) | \( R \d* \) | \( R & (?&groupname) \) | \( (?&groupname) \) | \( DEFINE \) | \( VERSION >?=\d+(?:\.\d\d?)? \) | (?&callout)?+ (?&comment)* (?&lookaround) ) )(?<conditional_group> \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )(?<delimited_regex> (?<delimiter> [-\x{2f}!"'`=_:;,%&@~]) (?&regex) \k'delimiter' .* )(?<escaped_character> \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} | x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} | [aefnrt] | c[[:print:]] | [^[:alnum:]] ) )(?<group> (?&capturing_group) | (?&non_capturing_group) | (?&resetting_group) | (?&atomic_group) | (?&conditional_group) )(?<groupname> [a-zA-Z_]\w* )(?<literal_character> (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )(?<lookaround> \(\? (?: = | ! | <= | <! ) (?&regex) \) )(?<non_capturing_group> \(\? [iJmnsUx-]* : (?&regex) \) )(?<option_setting> \(\? [iJmnsUx-]* \) )(?<qualified_item> (?:\. | (?&lookaround) | (?&back_reference) | (?&character_class) | (?&character_type) | (?&escaped_character) | (?&group) | (?&subroutine_call) | (?&literal_character) | (?&quoted_string) ) (?&comment)? (?&qualifier)? )(?<qualifier> (?: [?*+] | (?&range_qualifier) ) [+?]? )(?<quoted_string> (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) ) (?<quoted_string_empty> \\Q\\E ) (?<range_qualifier> \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )(?<regex> (?&start_item)* (?&branch) (?: \| (?&branch) )* )(?<resetting_group> \( \? \| (?&regex) \) )(?<simple_assertion> \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )(?<special_escape> \\K )(?<start_item> \( \* (?: ANY | ANYCRLF | BSR_ANYCRLF | BSR_UNICODE | CR | CRLF | LF | LIMIT_MATCH=\d+ | LIMIT_DEPTH=\d+ | LIMIT_HEAP=\d+ | NOTEMPTY | NOTEMPTY_ATSTART | NO_AUTO_POSSESS | NO_DOTSTAR_ANCHOR | NO_JIT | NO_START_OPT | NUL | UTF | UCP ) \) )(?<subroutine_call> (?: \(\?R\) | \(\?[+-]?\d+\) | \(\? (?: & | P> ) (?&groupname) \) | \\g < (?&groupname) > | \\g ' (?&groupname) ' | \\g < [+-]? \d+ > | \\g ' [+-]? \d+ ) )(?<verb> \(\* (?: ACCEPT | FAIL | F | COMMIT | (?:MARK)?:(?&verbname) | (?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )(?<verbname> [^)]+ ))^(?&delimited_regex)$/
\= Expect no match
/((?(?C'')\QX\E(?!((?(?C'')(?!X=X));=)r*X=X));=)/
No match
/(?:(?(2y)a|b)(X))+/
No match
/a(*MARK)b/
No match
/a(*CR)b/
No match
/(?P<abn>(?P=abn)(?<badstufxxx)/
No match
# --------------------------------------------------------------------------
# End of testinput1 # End of testinput1