Implement Perl's /n option, which is the same as PCRE2_NO_AUTO_CAPTURE.

This commit is contained in:
Philip.Hazel 2017-04-18 16:21:50 +00:00
parent 584f35c059
commit 369d82e03a
9 changed files with 54 additions and 28 deletions

View File

@ -151,6 +151,8 @@ tests to improve coverage.
29. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features. 29. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features.
30. Implement (?n: for PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
Version 10.23 14-February-2017 Version 10.23 14-February-2017
------------------------------ ------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "17 April 2017" "PCRE2 10.30" .TH PCRE2API 3 "18 April 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -1426,8 +1426,8 @@ PCRE2_NEVER_UTF causes an error.
If this option is set, it disables the use of numbered capturing parentheses in If this option is set, it disables the use of numbered capturing parentheses in
the pattern. Any opening parenthesis that is not followed by ? behaves as if it the pattern. Any opening parenthesis that is not followed by ? behaves as if it
were followed by ?: but named parentheses can still be used for capturing (and were followed by ?: but named parentheses can still be used for capturing (and
they acquire numbers in the usual way). There is no equivalent of this option they acquire numbers in the usual way). This is the same as Perl's /n option.
in Perl. Note that, if this option is set, references to capturing groups (back Note that, when this option is set, references to capturing groups (back
references or recursion/subroutine calls) may only refer to named groups, references or recursion/subroutine calls) may only refer to named groups,
though the reference can be by name or by number. though the reference can be by name or by number.
.sp .sp
@ -3402,6 +3402,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 17 April 2017 Last updated: 18 April 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -1543,12 +1543,13 @@ alternative in the subpattern.
.rs .rs
.sp .sp
The settings of the PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, The settings of the PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL,
PCRE2_EXTENDED, and PCRE2_EXTENDED_MORE options (which are Perl-compatible) can PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE options (which
be changed from within the pattern by a sequence of Perl option letters are Perl-compatible) can be changed from within the pattern by a sequence of
enclosed between "(?" and ")". The option letters are Perl option letters enclosed between "(?" and ")". The option letters are
.sp .sp
i for PCRE2_CASELESS i for PCRE2_CASELESS
m for PCRE2_MULTILINE m for PCRE2_MULTILINE
n for PCRE2_NO_AUTO_CAPTURE
s for PCRE2_DOTALL s for PCRE2_DOTALL
x for PCRE2_EXTENDED x for PCRE2_EXTENDED
xx for PCRE2_EXTENDED_MORE xx for PCRE2_EXTENDED_MORE

View File

@ -407,6 +407,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
(?i) caseless (?i) caseless
(?J) allow duplicate names (?J) allow duplicate names
(?m) multiline (?m) multiline
(?n) no auto capture
(?s) single line (dotall) (?s) single line (dotall)
(?U) default ungreedy (lazy) (?U) default ungreedy (lazy)
(?x) extended: ignore white space except in classes (?x) extended: ignore white space except in classes

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "17 April 2017" "PCRE 10.30" .TH PCRE2TEST 1 "18 April 2017" "PCRE 10.30"
.SH NAME .SH NAME
pcre2test - a program for testing Perl-compatible regular expressions. pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
@ -519,10 +519,11 @@ by a previous \fB#pattern\fP command.
.SS "Setting compilation options" .SS "Setting compilation options"
.rs .rs
.sp .sp
The following modifiers set options for \fBpcre2_compile()\fP. The most common The following modifiers set options for \fBpcre2_compile()\fP. There are some
ones have single-letter abbreviations, with special handling for /x (to make single-letter abbreviations that are the same as Perl options. There is special
it like Perl). If a second x is present, PCRE2_EXTENDED is converted into handling for /x: if a second x is present, PCRE2_EXTENDED is converted into
PCRE2_EXTENDED_MORE. A third appearance adds PCRE2_EXTENDED as well. See PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EXTENDED as well,
though this makes no difference to the way \fBpcre2_compile()\fP behaves. See
.\" HREF .\" HREF
\fBpcre2api\fP \fBpcre2api\fP
.\" .\"
@ -547,7 +548,7 @@ for a description of the effects of these options.
never_backslash_c set PCRE2_NEVER_BACKSLASH_C never_backslash_c set PCRE2_NEVER_BACKSLASH_C
never_ucp set PCRE2_NEVER_UCP never_ucp set PCRE2_NEVER_UCP
never_utf set PCRE2_NEVER_UTF never_utf set PCRE2_NEVER_UTF
no_auto_capture set PCRE2_NO_AUTO_CAPTURE /n no_auto_capture set PCRE2_NO_AUTO_CAPTURE
no_auto_possess set PCRE2_NO_AUTO_POSSESS no_auto_possess set PCRE2_NO_AUTO_POSSESS
no_dotstar_anchor set PCRE2_NO_DOTSTAR_ANCHOR no_dotstar_anchor set PCRE2_NO_DOTSTAR_ANCHOR
no_start_optimize set PCRE2_NO_START_OPTIMIZE no_start_optimize set PCRE2_NO_START_OPTIMIZE
@ -570,7 +571,8 @@ being passed to library functions.
.rs .rs
.sp .sp
The following modifiers affect the compilation process or request information The following modifiers affect the compilation process or request information
about the pattern: about the pattern. There are single-letter abbreviations for some that are
heavily used in the test files.
.sp .sp
bsr=[anycrlf|unicode] specify \eR handling bsr=[anycrlf|unicode] specify \eR handling
/B bincode show binary code without lengths /B bincode show binary code without lengths
@ -1786,6 +1788,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 17 April 2017 Last updated: 18 April 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -2233,11 +2233,11 @@ typedef struct nest_save {
#define NSF_RESET 0x0001u #define NSF_RESET 0x0001u
#define NSF_CONDASSERT 0x0002u #define NSF_CONDASSERT 0x0002u
/* These options (changeable within the pattern) are tracked during parsing. /* Of the options that are changeable within the pattern, these are tracked
The rest are put into META_OPTIONS items and used when compiling. */ during parsing. The rest are used from META_OPTIONS items when compiling. */
#define PARSE_TRACKED_OPTIONS \ #define PARSE_TRACKED_OPTIONS \
(PCRE2_EXTENDED|PCRE2_EXTENDED_MORE|PCRE2_DUPNAMES) (PCRE2_DUPNAMES|PCRE2_EXTENDED|PCRE2_EXTENDED_MORE|PCRE2_NO_AUTO_CAPTURE)
/* States used for analyzing ranges in character classes. The two OK values /* States used for analyzing ranges in character classes. The two OK values
must be last. */ must be last. */
@ -3422,9 +3422,7 @@ while (ptr < ptrend)
ptr++; ptr++;
} }
/* Scan for options imsxJU. Some of them are tracked during parsing (see /* Scan for options imnsxJU to be set or unset. */
PARSE_TRACKED_OPTIONS) as they are local to groups. Others are not needed
till compile time. */
else else
{ {
@ -3447,6 +3445,7 @@ while (ptr < ptrend)
case CHAR_i: *optset |= PCRE2_CASELESS; break; case CHAR_i: *optset |= PCRE2_CASELESS; break;
case CHAR_m: *optset |= PCRE2_MULTILINE; break; case CHAR_m: *optset |= PCRE2_MULTILINE; break;
case CHAR_n: *optset |= PCRE2_NO_AUTO_CAPTURE; break;
case CHAR_s: *optset |= PCRE2_DOTALL; break; case CHAR_s: *optset |= PCRE2_DOTALL; break;
case CHAR_U: *optset |= PCRE2_UNGREEDY; break; case CHAR_U: *optset |= PCRE2_UNGREEDY; break;

View File

@ -720,13 +720,14 @@ typedef struct c1modstruct {
} c1modstruct; } c1modstruct;
static c1modstruct c1modlist[] = { static c1modstruct c1modlist[] = {
{ "bincode", 'B', -1 }, { "bincode", 'B', -1 },
{ "info", 'I', -1 }, { "info", 'I', -1 },
{ "global", 'g', -1 }, { "global", 'g', -1 },
{ "caseless", 'i', -1 }, { "caseless", 'i', -1 },
{ "multiline", 'm', -1 }, { "multiline", 'm', -1 },
{ "dotall", 's', -1 }, { "no_auto_capture", 'n', -1 },
{ "extended", 'x', -1 } { "dotall", 's', -1 },
{ "extended", 'x', -1 }
}; };
#define C1MODLISTCOUNT sizeof(c1modlist)/sizeof(c1modstruct) #define C1MODLISTCOUNT sizeof(c1modlist)/sizeof(c1modstruct)

2
testdata/testinput2 vendored
View File

@ -5259,4 +5259,6 @@ a)"xI
/[a b](?xx: [ 12 ] (?-x:[ 34 ]) )y z/B /[a b](?xx: [ 12 ] (?-x:[ 34 ]) )y z/B
/(a)(?-n:(b))(c)/nB
# End of testinput2 # End of testinput2

18
testdata/testoutput2 vendored
View File

@ -15945,6 +15945,24 @@ Subject length lower bound = 1
End End
------------------------------------------------------------------ ------------------------------------------------------------------
/(a)(?-n:(b))(c)/nB
------------------------------------------------------------------
Bra
Bra
a
Ket
Bra
CBra 1
b
Ket
Ket
Bra
c
Ket
Ket
End
------------------------------------------------------------------
# End of testinput2 # End of testinput2
Error -64: PCRE2_ERROR_BADDATA (unknown error number) Error -64: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data Error -62: bad serialized data