Implement PCRE2_EXTENDED_MORE and friends.

This commit is contained in:
Philip.Hazel 2017-04-18 12:32:52 +00:00
parent b9f95b5f63
commit 3dca43fdff
23 changed files with 2084 additions and 1916 deletions

View File

@ -149,6 +149,8 @@ tests to improve coverage.
for checking at compile time that tables are the right size.
(e) Add missing "fall through" comment.
29. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features.
Version 10.23 14-February-2017
------------------------------

View File

@ -1377,6 +1377,13 @@ sequence at the start of the pattern, as described in the section entitled
<a href="pcre2pattern.html#newlines">"Newline conventions"</a>
in the <b>pcre2pattern</b> documentation. A default is defined when PCRE2 is
built.
<pre>
PCRE2_EXTENDED_MORE
</pre>
This option has the effect of PCRE2_EXTENDED, but, in addition, space and
horizontal tab characters are also ignored inside a character class.
PCRE2_EXTENDED_MORE is equivalent to Perl's 5.26 /xx option, and it can be
changed within a pattern by a (?xx) option setting.
<pre>
PCRE2_FIRSTLINE
</pre>
@ -3344,7 +3351,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 14 April 2017
Last updated: 17 April 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -200,16 +200,13 @@ different way and is not Perl-compatible.
(l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
the start of a pattern that set overall options that cannot be changed within
the pattern.
<br>
<br>
18. The following new Perl 5.26 constructs are not yet supported in PCRE2:
<br>
<br>
(a) The Perl /a modifier restricts /d numbers to pure ascii, the new /aa
modifier restricts /i case-insensitive matching to pure ascii also, ignoring
unicode rules. This separation cannot be represented with PCRE2_UTF.
<br>
<br>
</P>
<P>
18. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa
modifier restricts /i case-insensitive matching to pure ascii, ignoring Unicode
rules. This separation cannot be represented with PCRE2_UCP.
</P>
<P>
19. Perl has different limits than PCRE2. See the
<a href="pcre2limit.html"><b>pcre2limit</b></a>
documentation for details. Perl went with 5.10 from recursion to iteration
@ -232,7 +229,7 @@ Cambridge, England.
REVISION
</b><br>
<P>
Last updated: 17 April 2017
Last updated: 18 April 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -1544,20 +1544,25 @@ alternative in the subpattern.
</P>
<br><a name="SEC13" href="#TOC1">INTERNAL OPTION SETTING</a><br>
<P>
The settings of the PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, and
PCRE2_EXTENDED options (which are Perl-compatible) can be changed from within
the pattern by a sequence of Perl option letters enclosed between "(?" and ")".
The option letters are
The settings of the PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL,
PCRE2_EXTENDED, and PCRE2_EXTENDED_MORE options (which are Perl-compatible) can
be changed from within the pattern by a sequence of Perl option letters
enclosed between "(?" and ")". The option letters are
<pre>
i for PCRE2_CASELESS
m for PCRE2_MULTILINE
s for PCRE2_DOTALL
x for PCRE2_EXTENDED
xx for PCRE2_EXTENDED_MORE
</pre>
For example, (?im) sets caseless, multiline matching. It is also possible to
unset these options by preceding the letter with a hyphen, and a combined
setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS and
PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
unset these options by preceding the letter with a hyphen. The two "extended"
options are not independent; unsetting either one cancels the effects of both
of them.
</P>
<P>
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
permitted. If a letter appears both before and after the hyphen, the option is
unset. An empty options setting "(?)" is allowed. Needless to say, it has no
effect.
@ -3438,7 +3443,7 @@ Cambridge, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
Last updated: 11 April 2017
Last updated: 18 April 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -432,7 +432,8 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
(?m) multiline
(?s) single line (dotall)
(?U) default ungreedy (lazy)
(?x) extended (ignore white space)
(?x) extended: ignore white space except in classes
(?xx) as (?x) but also ignore space and tab in classes
(?-...) unset option(s)
</pre>
The following are recognized only at the very start of a pattern or after one
@ -596,7 +597,7 @@ Cambridge, England.
</P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P>
Last updated: 31 March 2017
Last updated: 18 April 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -560,9 +560,11 @@ Setting compilation options
</b><br>
<P>
The following modifiers set options for <b>pcre2_compile()</b>. The most common
ones have single-letter abbreviations. See
ones have single-letter abbreviations, with special handling for /x (to make
it like Perl). If a second x is present, PCRE2_EXTENDED is converted into
PCRE2_EXTENDED_MORE. A third appearance adds PCRE2_EXTENDED as well. See
<a href="pcre2api.html"><b>pcre2api</b></a>
for a description of their effects.
for a description of the effects of these options.
<pre>
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
alt_bsux set PCRE2_ALT_BSUX
@ -576,6 +578,7 @@ for a description of their effects.
dupnames set PCRE2_DUPNAMES
endanchored set PCRE2_ENDANCHORED
/x extended set PCRE2_EXTENDED
/xx extended_more set PCRE2_EXTENDED_MORE
firstline set PCRE2_FIRSTLINE
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
/m multiline set PCRE2_MULTILINE
@ -1807,7 +1810,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 11 April 2017
Last updated: 17 April 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -1396,6 +1396,13 @@ COMPILING A PATTERN
tion entitled "Newline conventions" in the pcre2pattern documentation.
A default is defined when PCRE2 is built.
PCRE2_EXTENDED_MORE
This option has the effect of PCRE2_EXTENDED, but, in addition, space
and horizontal tab characters are also ignored inside a character
class. PCRE2_EXTENDED_MORE is equivalent to Perl's 5.26 /xx option,
and it can be changed within a pattern by a (?xx) option setting.
PCRE2_FIRSTLINE
If this option is set, an unanchored pattern is required to match
@ -3258,7 +3265,7 @@ AUTHOR
REVISION
Last updated: 14 April 2017
Last updated: 17 April 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
@ -4363,13 +4370,10 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
at the start of a pattern that set overall options that cannot be
changed within the pattern.
18. The following new Perl 5.26 constructs are not yet supported in
PCRE2:
(a) The Perl /a modifier restricts /d numbers to pure ascii, the new
/aa modifier restricts /i case-insensitive matching to pure ascii also,
ignoring unicode rules. This separation cannot be represented with
PCRE2_UTF.
18. The Perl /a modifier restricts /d numbers to pure ascii, and the
/aa modifier restricts /i case-insensitive matching to pure ascii,
ignoring Unicode rules. This separation cannot be represented with
PCRE2_UCP.
19. Perl has different limits than PCRE2. See the pcre2limit documenta-
tion for details. Perl went with 5.10 from recursion to iteration keep-
@ -4388,7 +4392,7 @@ AUTHOR
REVISION
Last updated: 17 April 2017
Last updated: 18 April 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
@ -6798,20 +6802,24 @@ VERTICAL BAR
INTERNAL OPTION SETTING
The settings of the PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, and
PCRE2_EXTENDED options (which are Perl-compatible) can be changed from
within the pattern by a sequence of Perl option letters enclosed
between "(?" and ")". The option letters are
The settings of the PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL,
PCRE2_EXTENDED, and PCRE2_EXTENDED_MORE options (which are Perl-compat-
ible) can be changed from within the pattern by a sequence of Perl
option letters enclosed between "(?" and ")". The option letters are
i for PCRE2_CASELESS
m for PCRE2_MULTILINE
s for PCRE2_DOTALL
x for PCRE2_EXTENDED
xx for PCRE2_EXTENDED_MORE
For example, (?im) sets caseless, multiline matching. It is also possi-
ble to unset these options by preceding the letter with a hyphen, and a
combined setting and unsetting such as (?im-sx), which sets PCRE2_CASE-
LESS and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and
ble to unset these options by preceding the letter with a hyphen. The
two "extended" options are not independent; unsetting either one can-
cels the effects of both of them.
A combined setting and unsetting such as (?im-sx), which sets
PCRE2_CASELESS and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and
PCRE2_EXTENDED, is also permitted. If a letter appears both before and
after the hyphen, the option is unset. An empty options setting "(?)"
is allowed. Needless to say, it has no effect.
@ -8590,7 +8598,7 @@ AUTHOR
REVISION
Last updated: 11 April 2017
Last updated: 18 April 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------
@ -9643,7 +9651,8 @@ OPTION SETTING
(?m) multiline
(?s) single line (dotall)
(?U) default ungreedy (lazy)
(?x) extended (ignore white space)
(?x) extended: ignore white space except in classes
(?xx) as (?x) but also ignore space and tab in classes
(?-...) unset option(s)
The following are recognized only at the very start of a pattern or
@ -9807,7 +9816,7 @@ AUTHOR
REVISION
Last updated: 31 March 2017
Last updated: 18 April 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "14 April 2017" "PCRE2 10.30"
.TH PCRE2API 3 "17 April 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -1347,6 +1347,13 @@ sequence at the start of the pattern, as described in the section entitled
.\"
in the \fBpcre2pattern\fP documentation. A default is defined when PCRE2 is
built.
.sp
PCRE2_EXTENDED_MORE
.sp
This option has the effect of PCRE2_EXTENDED, but, in addition, space and
horizontal tab characters are also ignored inside a character class.
PCRE2_EXTENDED_MORE is equivalent to Perl's 5.26 /xx option, and it can be
changed within a pattern by a (?xx) option setting.
.sp
PCRE2_FIRSTLINE
.sp
@ -3395,6 +3402,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 14 April 2017
Last updated: 17 April 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2COMPAT 3 "17 April 2017" "PCRE2 10.30"
.TH PCRE2COMPAT 3 "18 April 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -165,13 +165,11 @@ different way and is not Perl-compatible.
(l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
the start of a pattern that set overall options that cannot be changed within
the pattern.
.sp
18. The following new Perl 5.26 constructs are not yet supported in PCRE2:
.sp
(a) The Perl /a modifier restricts /d numbers to pure ascii, the new /aa
modifier restricts /i case-insensitive matching to pure ascii also, ignoring
unicode rules. This separation cannot be represented with PCRE2_UTF.
.sp
.P
18. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa
modifier restricts /i case-insensitive matching to pure ascii, ignoring Unicode
rules. This separation cannot be represented with PCRE2_UCP.
.P
19. Perl has different limits than PCRE2. See the
.\" HREF
\fBpcre2limit\fP
@ -196,6 +194,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 17 April 2017
Last updated: 18 April 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "11 April 2017" "PCRE2 10.30"
.TH PCRE2PATTERN 3 "18 April 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -1542,20 +1542,24 @@ alternative in the subpattern.
.SH "INTERNAL OPTION SETTING"
.rs
.sp
The settings of the PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, and
PCRE2_EXTENDED options (which are Perl-compatible) can be changed from within
the pattern by a sequence of Perl option letters enclosed between "(?" and ")".
The option letters are
The settings of the PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL,
PCRE2_EXTENDED, and PCRE2_EXTENDED_MORE options (which are Perl-compatible) can
be changed from within the pattern by a sequence of Perl option letters
enclosed between "(?" and ")". The option letters are
.sp
i for PCRE2_CASELESS
m for PCRE2_MULTILINE
s for PCRE2_DOTALL
x for PCRE2_EXTENDED
xx for PCRE2_EXTENDED_MORE
.sp
For example, (?im) sets caseless, multiline matching. It is also possible to
unset these options by preceding the letter with a hyphen, and a combined
setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS and
PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
unset these options by preceding the letter with a hyphen. The two "extended"
options are not independent; unsetting either one cancels the effects of both
of them.
.P
A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
permitted. If a letter appears both before and after the hyphen, the option is
unset. An empty options setting "(?)" is allowed. Needless to say, it has no
effect.
@ -3469,6 +3473,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 11 April 2017
Last updated: 18 April 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2SYNTAX 3 "31 March 2017" "PCRE2 10.30"
.TH PCRE2SYNTAX 3 "18 April 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -409,7 +409,8 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
(?m) multiline
(?s) single line (dotall)
(?U) default ungreedy (lazy)
(?x) extended (ignore white space)
(?x) extended: ignore white space except in classes
(?xx) as (?x) but also ignore space and tab in classes
(?-...) unset option(s)
.sp
The following are recognized only at the very start of a pattern or after one
@ -585,6 +586,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 31 March 2017
Last updated: 18 April 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "11 April 2017" "PCRE 10.30"
.TH PCRE2TEST 1 "17 April 2017" "PCRE 10.30"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -520,11 +520,13 @@ by a previous \fB#pattern\fP command.
.rs
.sp
The following modifiers set options for \fBpcre2_compile()\fP. The most common
ones have single-letter abbreviations. See
ones have single-letter abbreviations, with special handling for /x (to make
it like Perl). If a second x is present, PCRE2_EXTENDED is converted into
PCRE2_EXTENDED_MORE. A third appearance adds PCRE2_EXTENDED as well. See
.\" HREF
\fBpcre2api\fP
.\"
for a description of their effects.
for a description of the effects of these options.
.sp
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
alt_bsux set PCRE2_ALT_BSUX
@ -538,6 +540,7 @@ for a description of their effects.
dupnames set PCRE2_DUPNAMES
endanchored set PCRE2_ENDANCHORED
/x extended set PCRE2_EXTENDED
/xx extended_more set PCRE2_EXTENDED_MORE
firstline set PCRE2_FIRSTLINE
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
/m multiline set PCRE2_MULTILINE
@ -1783,6 +1786,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 11 April 2017
Last updated: 17 April 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -504,8 +504,11 @@ PATTERN MODIFIERS
Setting compilation options
The following modifiers set options for pcre2_compile(). The most com-
mon ones have single-letter abbreviations. See pcre2api for a descrip-
tion of their effects.
mon ones have single-letter abbreviations, with special handling for /x
(to make it like Perl). If a second x is present, PCRE2_EXTENDED is
converted into PCRE2_EXTENDED_MORE. A third appearance adds
PCRE2_EXTENDED as well. See pcre2api for a description of the effects
of these options.
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
alt_bsux set PCRE2_ALT_BSUX
@ -519,6 +522,7 @@ PATTERN MODIFIERS
dupnames set PCRE2_DUPNAMES
endanchored set PCRE2_ENDANCHORED
/x extended set PCRE2_EXTENDED
/xx extended_more set PCRE2_EXTENDED_MORE
firstline set PCRE2_FIRSTLINE
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
/m multiline set PCRE2_MULTILINE
@ -1640,5 +1644,5 @@ AUTHOR
REVISION
Last updated: 11 April 2017
Last updated: 17 April 2017
Copyright (c) 1997-2017 University of Cambridge.

View File

@ -137,6 +137,7 @@ D is inspected during pcre2_dfa_match() execution
#define PCRE2_ALT_CIRCUMFLEX 0x00200000u /* J M D */
#define PCRE2_ALT_VERBNAMES 0x00400000u /* C */
#define PCRE2_USE_OFFSET_LIMIT 0x00800000u /* J M D */
#define PCRE2_EXTENDED_MORE 0x01000000u /* C */
/* These are for pcre2_jit_compile(). */

View File

@ -137,6 +137,7 @@ D is inspected during pcre2_dfa_match() execution
#define PCRE2_ALT_CIRCUMFLEX 0x00200000u /* J M D */
#define PCRE2_ALT_VERBNAMES 0x00400000u /* C */
#define PCRE2_USE_OFFSET_LIMIT 0x00800000u /* J M D */
#define PCRE2_EXTENDED_MORE 0x01000000u /* C */
/* These are for pcre2_jit_compile(). */
@ -268,6 +269,7 @@ numbers must not be changed. */
#define PCRE2_ERROR_BADSUBSPATTERN (-60)
#define PCRE2_ERROR_TOOMANYREPLACE (-61)
#define PCRE2_ERROR_BADSERIALIZEDDATA (-62)
#define PCRE2_ERROR_HEAPLIMIT (-63)
/* Request types for pcre2_pattern_info() */
@ -297,6 +299,7 @@ numbers must not be changed. */
#define PCRE2_INFO_SIZE 22
#define PCRE2_INFO_HASBACKSLASHC 23
#define PCRE2_INFO_FRAMESIZE 24
#define PCRE2_INFO_HEAPLIMIT 25
/* Request types for pcre2_config(). */
@ -313,6 +316,7 @@ numbers must not be changed. */
#define PCRE2_CONFIG_UNICODE 9
#define PCRE2_CONFIG_UNICODE_VERSION 10
#define PCRE2_CONFIG_VERSION 11
#define PCRE2_CONFIG_HEAPLIMIT 12
/* Types for code units in patterns and subject strings. */
@ -452,6 +456,8 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
int (*)(pcre2_callout_block *, void *), void *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_depth_limit(pcre2_match_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_heap_limit(pcre2_match_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_match_limit(pcre2_match_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
@ -676,6 +682,7 @@ pcre2_compile are called by application code. */
#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_)
#define pcre2_set_compile_recursion_guard PCRE2_SUFFIX(pcre2_set_compile_recursion_guard_)
#define pcre2_set_depth_limit PCRE2_SUFFIX(pcre2_set_depth_limit_)
#define pcre2_set_heap_limit PCRE2_SUFFIX(pcre2_set_heap_limit_)
#define pcre2_set_match_limit PCRE2_SUFFIX(pcre2_set_match_limit_)
#define pcre2_set_max_pattern_length PCRE2_SUFFIX(pcre2_set_max_pattern_length_)
#define pcre2_set_newline PCRE2_SUFFIX(pcre2_set_newline_)

View File

@ -137,6 +137,7 @@ D is inspected during pcre2_dfa_match() execution
#define PCRE2_ALT_CIRCUMFLEX 0x00200000u /* J M D */
#define PCRE2_ALT_VERBNAMES 0x00400000u /* C */
#define PCRE2_USE_OFFSET_LIMIT 0x00800000u /* J M D */
#define PCRE2_EXTENDED_MORE 0x01000000u /* C */
/* These are for pcre2_jit_compile(). */

View File

@ -160,7 +160,7 @@ the length of compiled items varies with this.
In the real compile phase, this workspace is not currently used. */
#define COMPILE_WORK_SIZE (2048*LINK_SIZE) /* Size in code units */
#define COMPILE_WORK_SIZE (3000*LINK_SIZE) /* Size in code units */
#define C16_WORK_SIZE \
((COMPILE_WORK_SIZE * sizeof(PCRE2_UCHAR))/sizeof(uint16_t))
@ -695,7 +695,8 @@ static int posix_substitutes[] = {
#define PUBLIC_COMPILE_OPTIONS \
(PCRE2_ANCHORED|PCRE2_ALLOW_EMPTY_CLASS|PCRE2_ALT_BSUX|PCRE2_ALT_CIRCUMFLEX| \
PCRE2_ALT_VERBNAMES|PCRE2_AUTO_CALLOUT|PCRE2_CASELESS|PCRE2_DOLLAR_ENDONLY| \
PCRE2_DOTALL|PCRE2_DUPNAMES|PCRE2_ENDANCHORED|PCRE2_EXTENDED|PCRE2_FIRSTLINE| \
PCRE2_DOTALL|PCRE2_DUPNAMES|PCRE2_ENDANCHORED|PCRE2_EXTENDED| \
PCRE2_EXTENDED_MORE|PCRE2_FIRSTLINE| \
PCRE2_MATCH_UNSET_BACKREF|PCRE2_MULTILINE|PCRE2_NEVER_BACKSLASH_C| \
PCRE2_NEVER_UCP|PCRE2_NEVER_UTF|PCRE2_NO_AUTO_CAPTURE| \
PCRE2_NO_AUTO_POSSESS|PCRE2_NO_DOTSTAR_ANCHOR|PCRE2_NO_START_OPTIMIZE| \
@ -2226,12 +2227,17 @@ typedef struct nest_save {
uint16_t reset_group;
uint16_t max_group;
uint16_t flags;
uint32_t options;
} nest_save;
#define NSF_RESET 0x0001u
#define NSF_EXTENDED 0x0002u
#define NSF_DUPNAMES 0x0004u
#define NSF_CONDASSERT 0x0008u
#define NSF_CONDASSERT 0x0002u
/* These options (changeable within the pattern) are tracked during parsing.
The rest are put into META_OPTIONS items and used when compiling. */
#define PARSE_TRACKED_OPTIONS \
(PCRE2_EXTENDED|PCRE2_EXTENDED_MORE|PCRE2_DUPNAMES)
/* States used for analyzing ranges in character classes. The two OK values
must be last. */
@ -2292,6 +2298,10 @@ creating a nest_save that spans the end of the workspace. */
end_nests = (nest_save *)((char *)end_nests -
((cb->workspace_size * sizeof(PCRE2_UCHAR)) % sizeof(nest_save)));
/* PCRE2_EXTENDED_MORE implies PCRE2_EXTENDED */
if ((options & PCRE2_EXTENDED_MORE) != 0) options |= PCRE2_EXTENDED;
/* Now scan the pattern */
*has_lookbehind = FALSE;
@ -2907,7 +2917,8 @@ while (ptr < ptrend)
/* Process a regular character class. If the first character is '^', set
the negation flag. If the first few characters (either before or after ^)
are \Q\E or \E we skip them too. This makes for compatibility with Perl. */
are \Q\E or \E or space or tab in extended-more mode, we skip them too.
This makes for compatibility with Perl. */
negate_class = FALSE;
while (ptr < ptrend)
@ -2922,6 +2933,9 @@ while (ptr < ptrend)
else
break;
}
else if ((options & PCRE2_EXTENDED_MORE) != 0 &&
(c == CHAR_SPACE || c == CHAR_HT)) /* Note: just these two */
continue;
else if (!negate_class && c == CHAR_CIRCUMFLEX_ACCENT)
negate_class = TRUE;
else break;
@ -2969,6 +2983,12 @@ while (ptr < ptrend)
goto CLASS_LITERAL;
}
/* Skip over space and tab (only) in extended-more mode. */
if ((options & PCRE2_EXTENDED_MORE) != 0 &&
(c == CHAR_SPACE || c == CHAR_HT))
goto CLASS_CONTINUE;
/* Handle POSIX class names. Perl allows a negation extension of the
form [:^name:]. A square bracket that doesn't match the syntax is
treated as a literal. We also recognize the POSIX constructions
@ -3387,8 +3407,7 @@ while (ptr < ptrend)
}
top_nest->nest_depth = nest_depth;
top_nest->flags = 0;
if ((options & PCRE2_EXTENDED) != 0) top_nest->flags |= NSF_EXTENDED;
if ((options & PCRE2_DUPNAMES) != 0) top_nest->flags |= NSF_DUPNAMES;
top_nest->options = options & PARSE_TRACKED_OPTIONS;
/* Start of non-capturing group that resets the capture count for each
branch. */
@ -3403,9 +3422,9 @@ while (ptr < ptrend)
ptr++;
}
/* Scan for options imsxJU. We need to keep track of (?x) and (?J) for
use while scanning. The other options are used during the compiling
phases. */
/* Scan for options imsxJU. Some of them are tracked during parsing (see
PARSE_TRACKED_OPTIONS) as they are local to groups. Others are not needed
till compile time. */
else
{
@ -3429,9 +3448,15 @@ while (ptr < ptrend)
case CHAR_i: *optset |= PCRE2_CASELESS; break;
case CHAR_m: *optset |= PCRE2_MULTILINE; break;
case CHAR_s: *optset |= PCRE2_DOTALL; break;
case CHAR_x: *optset |= PCRE2_EXTENDED; break;
case CHAR_U: *optset |= PCRE2_UNGREEDY; break;
/* If x appears twice it sets the extended extended option. */
case CHAR_x:
*optset |= ((*optset & PCRE2_EXTENDED) != 0)?
PCRE2_EXTENDED_MORE : PCRE2_EXTENDED;
break;
default:
errorcode = ERR11;
ptr--; /* Correct the offset */
@ -3440,6 +3465,10 @@ while (ptr < ptrend)
}
options = (options | set) & (~unset);
/* Unsetting extended should also get rid of extended-more. */
if ((options & PCRE2_EXTENDED) == 0) options &= ~PCRE2_EXTENDED_MORE;
/* If the options ended with ')' this is not the start of a nested
group with option changes, so the options change at this level.
In this case, if the previous level set up a nest block, discard the
@ -3916,8 +3945,7 @@ while (ptr < ptrend)
}
top_nest->nest_depth = nest_depth;
top_nest->flags = NSF_CONDASSERT;
if ((options & PCRE2_EXTENDED) != 0) top_nest->flags |= NSF_EXTENDED;
if ((options & PCRE2_DUPNAMES) != 0) top_nest->flags |= NSF_DUPNAMES;
top_nest->options = options & PARSE_TRACKED_OPTIONS;
}
break;
@ -4038,20 +4066,17 @@ while (ptr < ptrend)
break;
/* End of group; reset the capture count to the maximum if we are in a (?|
group and/or reset the extended and dupnames options. Disallow quantifier
for a condition that is an assertion. */
group and/or reset the options that are tracked during parsing. Disallow
quantifier for a condition that is an assertion. */
case CHAR_RIGHT_PARENTHESIS:
okquantifier = TRUE;
if (top_nest != NULL && top_nest->nest_depth == nest_depth)
{
options = (options & ~PARSE_TRACKED_OPTIONS) | top_nest->options;
if ((top_nest->flags & NSF_RESET) != 0 &&
top_nest->max_group > cb->bracount)
cb->bracount = top_nest->max_group;
if ((top_nest->flags & NSF_EXTENDED) != 0) options |= PCRE2_EXTENDED;
else options &= ~PCRE2_EXTENDED;
if ((top_nest->flags & NSF_DUPNAMES) != 0) options |= PCRE2_DUPNAMES;
else options &= ~PCRE2_DUPNAMES;
if ((top_nest->flags & NSF_CONDASSERT) != 0)
okquantifier = FALSE;
if (top_nest == (nest_save *)(cb->start_workspace)) top_nest = NULL;

View File

@ -580,6 +580,7 @@ static modstruct modlist[] = {
{ "endanchored", MOD_PD, MOD_OPT, PCRE2_ENDANCHORED, PD(options) },
{ "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) },
{ "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) },
{ "extended_more", MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE, PO(options) },
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
{ "framesize", MOD_PAT, MOD_CTL, CTL_FRAMESIZE, PO(control) },
@ -3464,6 +3465,16 @@ for (;;)
field = check_modifier(modlist + index, ctx, pctl, dctl, *p);
if (field == NULL) return FALSE;
/* /x is a special case; a second appearance changes PCRE2_EXTENDED to
PCRE2_EXTENDED_MORE. */
if (cc == 'x' && (*((uint32_t *)field) & PCRE2_EXTENDED) != 0)
{
*((uint32_t *)field) &= ~PCRE2_EXTENDED;
*((uint32_t *)field) |= PCRE2_EXTENDED_MORE;
}
else
*((uint32_t *)field) |= modlist[index].value;
}
@ -3842,7 +3853,7 @@ static void
show_compile_options(uint32_t options, const char *before, const char *after)
{
if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
else fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
else fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((options & PCRE2_ALT_BSUX) != 0)? " alt_bsux" : "",
((options & PCRE2_ALT_CIRCUMFLEX) != 0)? " alt_circumflex" : "",
@ -3856,6 +3867,7 @@ else fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%
((options & PCRE2_DUPNAMES) != 0)? " dupnames" : "",
((options & PCRE2_ENDANCHORED) != 0)? " endanchored" : "",
((options & PCRE2_EXTENDED) != 0)? " extended" : "",
((options & PCRE2_EXTENDED_MORE) != 0)? " extended_more" : "",
((options & PCRE2_FIRSTLINE) != 0)? " firstline" : "",
((options & PCRE2_MATCH_UNSET_BACKREF) != 0)? " match_unset_backref" : "",
((options & PCRE2_MULTILINE) != 0)? " multiline" : "",

14
testdata/testinput2 vendored
View File

@ -5245,4 +5245,18 @@ a)"xI
# ----------------------------------------------------------------------
/[a b c]/BxxI
/[a b c]/BxxxI
/[a b c]/B,extended_more
/[ a b c ]/B,extended_more
/[a b](?xx: [ 12 ] (?-xx:[ 34 ]) )y z/B
# Unsetting /x also unsets /xx
/[a b](?xx: [ 12 ] (?-x:[ 34 ]) )y z/B
# End of testinput2

72
testdata/testoutput2 vendored
View File

@ -15873,6 +15873,78 @@ Failed: error -37: callout error code
# ----------------------------------------------------------------------
/[a b c]/BxxI
------------------------------------------------------------------
Bra
[a-c]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: extended_more
Starting code units: a b c
Subject length lower bound = 1
/[a b c]/BxxxI
------------------------------------------------------------------
Bra
[a-c]
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: extended extended_more
Starting code units: a b c
Subject length lower bound = 1
/[a b c]/B,extended_more
------------------------------------------------------------------
Bra
[a-c]
Ket
End
------------------------------------------------------------------
/[ a b c ]/B,extended_more
------------------------------------------------------------------
Bra
[a-c]
Ket
End
------------------------------------------------------------------
/[a b](?xx: [ 12 ] (?-xx:[ 34 ]) )y z/B
------------------------------------------------------------------
Bra
[ ab]
Bra
[12]
Bra
[ 34]
Ket
Ket
y z
Ket
End
------------------------------------------------------------------
# Unsetting /x also unsets /xx
/[a b](?xx: [ 12 ] (?-x:[ 34 ]) )y z/B
------------------------------------------------------------------
Bra
[ ab]
Bra
[12]
Bra
[ 34]
Ket
Ket
y z
Ket
End
------------------------------------------------------------------
# End of testinput2
Error -64: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data

View File

@ -846,7 +846,7 @@ Memory allocation (code space): 14
/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
/parens_nest_limit=1000,-fullbincode
Failed: error 184 at offset 1540: (?| and/or (?J: or (?x: parentheses are too deeply nested
Failed: error 184 at offset 1504: (?| and/or (?J: or (?x: parentheses are too deeply nested
# Use "expand" to create some very long patterns with nested parentheses, in
# order to test workspace overflow. Again, this varies with code unit width,
@ -854,10 +854,8 @@ Failed: error 184 at offset 1540: (?| and/or (?J: or (?x: parentheses are too de
# with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5813: regular expression is too complicated
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated

View File

@ -853,10 +853,8 @@ Memory allocation (code space): 28
# with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5813: regular expression is too complicated
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated

View File

@ -846,7 +846,7 @@ Memory allocation (code space): 10
/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
/parens_nest_limit=1000,-fullbincode
Failed: error 184 at offset 1540: (?| and/or (?J: or (?x: parentheses are too deeply nested
Failed: error 184 at offset 1504: (?| and/or (?J: or (?x: parentheses are too deeply nested
# Use "expand" to create some very long patterns with nested parentheses, in
# order to test workspace overflow. Again, this varies with code unit width,
@ -856,7 +856,6 @@ Failed: error 184 at offset 1540: (?| and/or (?J: or (?x: parentheses are too de
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated