Lock out \K in lookaround assertions by default, but provide an option to

re-enable the old behaviour, just in case.
This commit is contained in:
Philip Hazel 2021-08-30 16:57:44 +01:00
parent eea410b33a
commit 21c26698b3
18 changed files with 121 additions and 73 deletions

View File

@ -47,6 +47,10 @@ mode in the interpreters. Instead of just remembering whether one case matched
or not, it remembers the position of a previous match so as to avoid or not, it remembers the position of a previous match so as to avoid
unnecessary repeated searching. unnecessary repeated searching.
6. Perl now locks out \K in lookarounds, so PCRE2 now does the same by default.
However, just in case anybody was relying on the old behaviour, there is an
option called PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK that enables the old behaviour.
Version 10.37 26-May-2021 Version 10.37 26-May-2021
------------------------- -------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "28 August 2021" "PCRE2 10.38" .TH PCRE2API 3 "30 August 2021" "PCRE2 10.38"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -1875,6 +1875,13 @@ characters with code points greater than 127.
.sp .sp
The option bits that can be set in a compile context by calling the The option bits that can be set in a compile context by calling the
\fBpcre2_set_compile_extra_options()\fP function are as follows: \fBpcre2_set_compile_extra_options()\fP function are as follows:
.sp
PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
.sp
Since release 10.38 PCRE2 has forbidden the use of \eK within lookaround
assertions, following Perl's lead. This option is provided to re-enable the
previous behaviour (act in positive lookarounds, ignore in negative ones) in
case anybody is relying on it.
.sp .sp
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
.sp .sp
@ -4009,6 +4016,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 28 August 2021 Last updated: 30 August 2021
Copyright (c) 1997-2021 University of Cambridge. Copyright (c) 1997-2021 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2COMPAT 3 "06 October 2020" "PCRE2 10.36" .TH PCRE2COMPAT 3 "30 August 2021" "PCRE2 10.38"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL" .SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -133,8 +133,10 @@ in the release at the time of writing (5.32), \ep{Lu} and \ep{Ll} match all
letters, regardless of case, when case independence is specified. letters, regardless of case, when case independence is specified.
.P .P
16. From release 5.32.0, Perl locks out the use of \eK in lookaround 16. From release 5.32.0, Perl locks out the use of \eK in lookaround
assertions. In PCRE2, \eK is acted on when it occurs in positive assertions, assertions. From release 10.38 PCRE2 does the same by default. However, there
but is ignored in negative assertions. is an option for re-enabling the previous behaviour. When this option is set,
\eK is acted on when it occurs in positive assertions, but is ignored in
negative assertions.
.P .P
17. PCRE2 provides some extensions to the Perl regular expression facilities. 17. PCRE2 provides some extensions to the Perl regular expression facilities.
Perl 5.10 included new features that were not in earlier versions of Perl, some Perl 5.10 included new features that were not in earlier versions of Perl, some
@ -203,7 +205,7 @@ fall into any stack-overflow limit. PCRE2 made a similar change at release
.sp .sp
.nf .nf
Philip Hazel Philip Hazel
University Computing Service Retired from University Computing Service
Cambridge, England. Cambridge, England.
.fi .fi
. .
@ -212,6 +214,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 06 October 2020 Last updated: 30 August 2021
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2021 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PATTERN 3 "06 October 2020" "PCRE2 10.35" .TH PCRE2PATTERN 3 "3o0 August 2021" "PCRE2 10.38"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS" .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -1168,9 +1168,11 @@ For example, when the pattern
.sp .sp
matches "foobar", the first substring is still set to "foo". matches "foobar", the first substring is still set to "foo".
.P .P
Perl used to document that the use of \eK within lookaround assertions is "not From version 5.32.0 Perl forbids the use of \eK in lookaround assertions. From
well defined", but from version 5.32.0 Perl does not support this usage at all. release 10.38 PCRE2 also forbids this by default. However, the
In PCRE2, \eK is acted upon when it occurs inside positive assertions, but is PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK option can be used when calling
\fBpcre2_compile()\fP to re-enable the previous behaviour. When this option is
set, \eK is acted upon when it occurs inside positive assertions, but is
ignored in negative assertions. Note that when a pattern such as (?=ab\eK) ignored in negative assertions. Note that when a pattern such as (?=ab\eK)
matches, the reported start of the match can be greater than the end of the matches, the reported start of the match can be greater than the end of the
match. Using \eK in a lookbehind assertion at the start of a pattern can also match. Using \eK in a lookbehind assertion at the start of a pattern can also
@ -3889,7 +3891,7 @@ there is a backtrack at the outer level.
.sp .sp
.nf .nf
Philip Hazel Philip Hazel
University Computing Service Retired from University Computing Service
Cambridge, England. Cambridge, England.
.fi .fi
. .
@ -3898,6 +3900,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 06 October 2020 Last updated: 30 August 2021
Copyright (c) 1997-2020 University of Cambridge. Copyright (c) 1997-2021 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2SYNTAX 3 "28 December 2019" "PCRE2 10.35" .TH PCRE2SYNTAX 3 "30 August 2021" "PCRE2 10.38"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -401,6 +401,9 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
.sp .sp
\eK set reported start of match \eK set reported start of match
.sp .sp
From release 10.38 \eK is not permitted by default in lookaround assertions,
for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
option is set, the previous behaviour is re-enabled. When this option is set,
\eK is honoured in positive assertions, but ignored in negative ones. \eK is honoured in positive assertions, but ignored in negative ones.
. .
. .
@ -667,7 +670,7 @@ delimiter }. To encode the ending delimiter within the string, double it.
.sp .sp
.nf .nf
Philip Hazel Philip Hazel
University Computing Service Retired from University Computing Service
Cambridge, England. Cambridge, England.
.fi .fi
. .
@ -676,6 +679,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 28 December 2019 Last updated: 30 August 2021
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2021 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "28 August 2021" "PCRE 10.38" .TH PCRE2TEST 1 "30 August 2021" "PCRE 10.38"
.SH NAME .SH NAME
pcre2test - a program for testing Perl-compatible regular expressions. pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
@ -27,12 +27,7 @@ each match attempt. Modifiers on external or internal command lines, the
patterns, and the subject lines specify PCRE2 function options, control how the patterns, and the subject lines specify PCRE2 function options, control how the
subject is processed, and what output is produced. subject is processed, and what output is produced.
.P .P
As the original fairly simple PCRE library evolved, it acquired many different There are many obscure modifiers, some of which are specifically designed for
features, and as a result, the original \fBpcretest\fP program ended up with a
lot of options in a messy, arcane syntax for testing all the features. The
move to the new PCRE2 API provided an opportunity to re-implement the test
program as \fBpcre2test\fP, with a cleaner modifier syntax. Nevertheless, there
are still many obscure modifiers, some of which are specifically designed for
use in conjunction with the test script and data files that are distributed as use in conjunction with the test script and data files that are distributed as
part of PCRE2. All the modifiers are documented here, some without much part of PCRE2. All the modifiers are documented here, some without much
justification, but many of them are unlikely to be of use except when testing justification, but many of them are unlikely to be of use except when testing
@ -61,10 +56,10 @@ names used in the libraries have a suffix _8, _16, or _32, as appropriate.
.rs .rs
.sp .sp
Input to \fBpcre2test\fP is processed line by line, either by calling the C Input to \fBpcre2test\fP is processed line by line, either by calling the C
library's \fBfgets()\fP function, or via the \fBlibreadline\fP library. In some library's \fBfgets()\fP function, or via the \fBlibreadline\fP or \fBlibedit\fP
Windows environments character 26 (hex 1A) causes an immediate end of file, and library. In some Windows environments character 26 (hex 1A) causes an immediate
no further data is read, so this character should be avoided unless you really end of file, and no further data is read, so this character should be avoided
want that action. unless you really want that action.
.P .P
The input is processed using using C's string functions, so must not The input is processed using using C's string functions, so must not
contain binary zeros, even though in Unix-like environments, \fBfgets()\fP contain binary zeros, even though in Unix-like environments, \fBfgets()\fP
@ -472,11 +467,11 @@ A pattern can be followed by a modifier list (details below).
.SH "SUBJECT LINE SYNTAX" .SH "SUBJECT LINE SYNTAX"
.rs .rs
.sp .sp
Before each subject line is passed to \fBpcre2_match()\fP or Before each subject line is passed to \fBpcre2_match()\fP,
\fBpcre2_dfa_match()\fP, leading and trailing white space is removed, and the \fBpcre2_dfa_match()\fP, or \fBpcre2_jit_match()\fP, leading and trailing white
line is scanned for backslash escapes, unless the \fBsubject_literal\fP space is removed, and the line is scanned for backslash escapes, unless the
modifier was set for the pattern. The following provide a means of encoding \fBsubject_literal\fP modifier was set for the pattern. The following provide a
non-printing characters in a visible way: means of encoding non-printing characters in a visible way:
.sp .sp
\ea alarm (BEL, \ex07) \ea alarm (BEL, \ex07)
\eb backspace (\ex08) \eb backspace (\ex08)
@ -572,6 +567,7 @@ way \fBpcre2_compile()\fP behaves. See
for a description of the effects of these options. for a description of the effects of these options.
.sp .sp
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
alt_bsux set PCRE2_ALT_BSUX alt_bsux set PCRE2_ALT_BSUX
alt_circumflex set PCRE2_ALT_CIRCUMFLEX alt_circumflex set PCRE2_ALT_CIRCUMFLEX
@ -2107,6 +2103,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 28 August 2021 Last updated: 30 August 2021
Copyright (c) 1997-2021 University of Cambridge. Copyright (c) 1997-2021 University of Cambridge.
.fi .fi

View File

@ -5,7 +5,7 @@
/* This is the public header file for the PCRE library, second API, to be /* This is the public header file for the PCRE library, second API, to be
#included by applications that call PCRE2 functions. #included by applications that call PCRE2 functions.
Copyright (c) 2016-2020 University of Cambridge Copyright (c) 2016-2021 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -152,6 +152,7 @@ D is inspected during pcre2_dfa_match() execution
#define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */ #define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */
#define PCRE2_EXTRA_ESCAPED_CR_IS_LF 0x00000010u /* C */ #define PCRE2_EXTRA_ESCAPED_CR_IS_LF 0x00000010u /* C */
#define PCRE2_EXTRA_ALT_BSUX 0x00000020u /* C */ #define PCRE2_EXTRA_ALT_BSUX 0x00000020u /* C */
#define PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK 0x00000040u /* C */
/* These are for pcre2_jit_compile(). */ /* These are for pcre2_jit_compile(). */
@ -311,6 +312,7 @@ pcre2_pattern_convert(). */
#define PCRE2_ERROR_SCRIPT_RUN_NOT_AVAILABLE 196 #define PCRE2_ERROR_SCRIPT_RUN_NOT_AVAILABLE 196
#define PCRE2_ERROR_TOO_MANY_CAPTURES 197 #define PCRE2_ERROR_TOO_MANY_CAPTURES 197
#define PCRE2_ERROR_CONDITION_ATOMIC_ASSERTION_EXPECTED 198 #define PCRE2_ERROR_CONDITION_ATOMIC_ASSERTION_EXPECTED 198
#define PCRE2_ERROR_BACKSLASH_K_IN_LOOKAROUND 199
/* "Expected" matching error codes: no match and partial match. */ /* "Expected" matching error codes: no match and partial match. */

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2020 University of Cambridge New API code Copyright (c) 2016-2021 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -782,12 +782,15 @@ are allowed. */
#define PUBLIC_COMPILE_EXTRA_OPTIONS \ #define PUBLIC_COMPILE_EXTRA_OPTIONS \
(PUBLIC_LITERAL_COMPILE_EXTRA_OPTIONS| \ (PUBLIC_LITERAL_COMPILE_EXTRA_OPTIONS| \
PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES|PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL| \ PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES|PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL| \
PCRE2_EXTRA_ESCAPED_CR_IS_LF|PCRE2_EXTRA_ALT_BSUX) PCRE2_EXTRA_ESCAPED_CR_IS_LF|PCRE2_EXTRA_ALT_BSUX| \
PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK)
/* Compile time error code numbers. They are given names so that they can more /* Compile time error code numbers. They are given names so that they can more
easily be tracked. When a new number is added, the tables called eint1 and easily be tracked. When a new number is added, the tables called eint1 and
eint2 in pcre2posix.c may need to be updated, and a new error text must be eint2 in pcre2posix.c may need to be updated, and a new error text must be
added to compile_error_texts in pcre2_error.c. */ added to compile_error_texts in pcre2_error.c. Also, the error codes in
pcre2.h.in must be updated - their values are exactly 100 greater than these
values. */
enum { ERR0 = COMPILE_ERROR_BASE, enum { ERR0 = COMPILE_ERROR_BASE,
ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9, ERR10, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9, ERR10,
@ -799,7 +802,7 @@ enum { ERR0 = COMPILE_ERROR_BASE,
ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90, ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
ERR91, ERR92, ERR93, ERR94, ERR95, ERR96, ERR97, ERR98 }; ERR91, ERR92, ERR93, ERR94, ERR95, ERR96, ERR97, ERR98, ERR99 };
/* This is a table of start-of-pattern options such as (*UTF) and settings such /* This is a table of start-of-pattern options such as (*UTF) and settings such
as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
@ -7799,6 +7802,16 @@ for (;; pptr++)
} }
#endif #endif
/* \K is forbidden in lookarounds since 10.38 because that's what Perl has
done. However, there's an option, in case anyone was relying on it. */
if (cb->assert_depth > 0 && meta_arg == ESC_K &&
(cb->cx->extra_options & PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK) == 0)
{
*errorcodeptr = ERR99;
return 0;
}
/* For the rest (including \X when Unicode is supported - if not it's /* For the rest (including \X when Unicode is supported - if not it's
faulted at parse time), the OP value is the escape value when PCRE2_UCP is faulted at parse time), the OP value is the escape value when PCRE2_UCP is
not set; if it is set, these escapes do not show up here because they are not set; if it is set, these escapes do not show up here because they are

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2019 University of Cambridge New API code Copyright (c) 2016-2021 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -186,6 +186,7 @@ static const unsigned char compile_error_texts[] =
"script runs require Unicode support, which this version of PCRE2 does not have\0" "script runs require Unicode support, which this version of PCRE2 does not have\0"
"too many capturing groups (maximum 65535)\0" "too many capturing groups (maximum 65535)\0"
"atomic assertion expected after (?( or (?(?C)\0" "atomic assertion expected after (?( or (?(?C)\0"
"\\K is not allowed in lookarounds (but see PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK)\0"
; ;
/* Match-time and UTF error texts are in the same format. */ /* Match-time and UTF error texts are in the same format. */

View File

@ -217,9 +217,12 @@ pcre2_match_data_create_from_pattern() above. */
if (rc == 0) if (rc == 0)
printf("ovector was not big enough for all the captured substrings\n"); printf("ovector was not big enough for all the captured substrings\n");
/* We must guard against patterns such as /(?=.\K)/ that use \K in an assertion /* Since release 10.38 PCRE2 has locked out the use of \K in lookaround
to set the start of a match later than its end. In this demonstration program, assertions. However, there is an option to re-enable the old behaviour. If that
we just detect this case and give up. */ is set, it is possible to run patterns such as /(?=.\K)/ that use \K in an
assertion to set the start of a match later than its end. In this demonstration
program, we show how to detect this case, but it shouldn't arise because the
option is never set. */
if (ovector[0] > ovector[1]) if (ovector[0] > ovector[1])
{ {

View File

@ -148,6 +148,7 @@ static const int eint2[] = {
37, REG_EESCAPE, /* PCRE2 does not support \L, \l, \N{name}, \U, or \u */ 37, REG_EESCAPE, /* PCRE2 does not support \L, \l, \N{name}, \U, or \u */
56, REG_INVARG, /* internal error: unknown newline setting */ 56, REG_INVARG, /* internal error: unknown newline setting */
92, REG_INVARG, /* invalid option bits with PCRE2_LITERAL */ 92, REG_INVARG, /* invalid option bits with PCRE2_LITERAL */
99, REG_EESCAPE /* \K in lookaround */
}; };
/* Table of texts corresponding to POSIX error codes */ /* Table of texts corresponding to POSIX error codes */

View File

@ -11,7 +11,7 @@ hacked-up (non-) design had also run out of steam.
Written by Philip Hazel Written by Philip Hazel
Original code Copyright (c) 1997-2012 University of Cambridge Original code Copyright (c) 1997-2012 University of Cambridge
Rewritten code Copyright (c) 2016-2020 University of Cambridge Rewritten code Copyright (c) 2016-2021 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -622,6 +622,7 @@ static modstruct modlist[] = {
{ "allaftertext", MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT, PO(control) }, { "allaftertext", MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT, PO(control) },
{ "allcaptures", MOD_PND, MOD_CTL, CTL_ALLCAPTURES, PO(control) }, { "allcaptures", MOD_PND, MOD_CTL, CTL_ALLCAPTURES, PO(control) },
{ "allow_empty_class", MOD_PAT, MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS, PO(options) }, { "allow_empty_class", MOD_PAT, MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS, PO(options) },
{ "allow_lookaround_bsk", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK, CO(extra_options) },
{ "allow_surrogate_escapes", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES, CO(extra_options) }, { "allow_surrogate_escapes", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES, CO(extra_options) },
{ "allusedtext", MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT, PO(control) }, { "allusedtext", MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT, PO(control) },
{ "allvector", MOD_PND, MOD_CTL, CTL2_ALLVECTOR, PO(control2) }, { "allvector", MOD_PND, MOD_CTL, CTL2_ALLVECTOR, PO(control2) },

View File

@ -110,9 +110,6 @@
//posix_nosub //posix_nosub
\=offset=70000 \=offset=70000
/(?=(a\K))/
a
/^d(e)$/posix /^d(e)$/posix
acdef\=posix_startend=2:4 acdef\=posix_startend=2:4
acde\=posix_startend=2 acde\=posix_startend=2

23
testdata/testinput2 vendored
View File

@ -3932,7 +3932,7 @@
/[a[:<:]] should give error/ /[a[:<:]] should give error/
/(?=ab\K)/aftertext /(?=ab\K)/aftertext,allow_lookaround_bsk
abcd\=startchar abcd\=startchar
/abcd/newline=lf,firstline /abcd/newline=lf,firstline
@ -4185,7 +4185,7 @@
/(a)(b)|(c)/ /(a)(b)|(c)/
XcX\=ovector=2,get=1,get=2,get=3,get=4,getall XcX\=ovector=2,get=1,get=2,get=3,get=4,getall
/x(?=ab\K)/ /x(?=ab\K)/allow_lookaround_bsk
xab\=get=0 xab\=get=0
xab\=copy=0 xab\=copy=0
xab\=getall xab\=getall
@ -4345,10 +4345,10 @@
# Perl loops on this (PCRE2 used to!) # Perl loops on this (PCRE2 used to!)
/(?<=\Ka)/g,aftertext /(?<=\Ka)/g,aftertext,allow_lookaround_bsk
aaaaa aaaaa
/(?<=\Ka)/altglobal,aftertext /(?<=\Ka)/altglobal,aftertext,allow_lookaround_bsk
aaaaa aaaaa
/((?2){73}(?2))((?1))/info /((?2){73}(?2))((?1))/info
@ -4659,10 +4659,10 @@ B)x/alt_verbnames,mark
/(?<!a{65535})x/I /(?<!a{65535})x/I
/(?=a\K)/replace=z /(?=a\K)/replace=z,allow_lookaround_bsk
BaCaD BaCaD
/(?<=\K.)/g,replace=- /(?<=\K.)/g,replace=-,allow_lookaround_bsk
ab ab
/(?'abcdefghijklmnopqrstuvwxyzABCDEFG'toolong)/ /(?'abcdefghijklmnopqrstuvwxyzABCDEFG'toolong)/
@ -5877,18 +5877,27 @@ a)"xI
/(?(VERSION=0.0/ /(?(VERSION=0.0/
# Perl has made \K in lookarounds an error. At the moment PCRE2 still accepts. # Perl has made \K in lookarounds an error. PCRE2 now rejects as well, unless
# explicitly authorized.
/(?=a\Kb)ab/ /(?=a\Kb)ab/
/(?=a\Kb)ab/allow_lookaround_bsk
ab ab
/(?!a\Kb)ac/ /(?!a\Kb)ac/
/(?!a\Kb)ac/allow_lookaround_bsk
ac ac
/^abc(?<=b\Kc)d/ /^abc(?<=b\Kc)d/
/^abc(?<=b\Kc)d/allow_lookaround_bsk
abcd abcd
/^abc(?<!b\Kq)d/ /^abc(?<!b\Kq)d/
/^abc(?<!b\Kq)d/,allow_lookaround_bsk
abcd abcd
# --------- # ---------

4
testdata/testinput5 vendored
View File

@ -1654,10 +1654,10 @@
/[A-`]/iB,utf /[A-`]/iB,utf
abcdefghijklmno abcdefghijklmno
/(?<=\K\x{17f})/g,utf,aftertext /(?<=\K\x{17f})/g,utf,aftertext,allow_lookaround_bsk
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f} \x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
/(?<=\K\x{17f})/altglobal,utf,aftertext /(?<=\K\x{17f})/altglobal,utf,aftertext,allow_lookaround_bsk
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f} \x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
"\xa\xf<(.\pZ*\P{Xwd}+^\xa8\3'3yq.::?(?J:()\xd1+!~:3'(8?:)':(?'d'(?'d'^u]!.+.+\\A\Ah(n+?9){7}+\K;(?'X'u'(?'c'(?'z'(?<y>\xb::\xf0'|\xd3(\xae?'w(z\x8?P>l)\x8?P>a)'\H\R\xd1+!!~:3'(?:h$N{26875}\W+?\\=D{2}\x89(?i:Uy0\N({2\xa(\v\x85*){y*\A(()\p{L}+?\P{^Xan}'+?\xff\+pS\?|).{;y*\A(()\p{L}+?\8}\d?1(|)(/1){7}.+[Lp{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(\xbf(R))\x8a\X*?\x8a\xb\xd1^9\3*+(\xc1,\k'R'\xb4)\xcc(z\z(?J)(?'X'\x1b(\xb\xd1^9\?'3*+P{^Xan}+?\xff\+(\xc1.]k+\xb'Pm'\xb4)\xcc4f\xa7'\xd1V(?i:U,{2,2})'(?'X'))?-%--\x95$9*\4'|\xd1(\x9c''%\x94$9)#(?'R')3\x7?('P\xed7'\xa8\xb1^u\xeaw\1\0\0\(|(?1){7}.+[\p{Me}].\s\xdcC*^\x14?(?(<y>))(?<!^)$C((;*?(R*?))+(?(R)\x8a\X*?\x8a\xb\xd1^9\3*+|(\xc1,\k'R'\xb4)\xcc! z)\z(?JJ)(?'X';(\xb\xd1^9\?'3*+(\xc1.]k+\xb'Pm'\xb4))':(?'d')(?'RD'(d')|)|$)'|(?<x>\g{d});\g{x}\x11\g{d}\x81\|$((?'X'\'X'(?'W''\x92()'9'\x83*))\xba*\!?^ <){)':;\xcc4'\xd1'(?'X'28))?-%--\x95$9*\4'|\xd1((''e\x94*$9:)*#(?'R')3)\x7?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+0!~:(?)'d'E:yD!\s(?'R'\x1e;\x10:U))|'\x9g!\xb0*){)\\x16:;()\x1e\x10\x87*:(?<y>)\xd1+!~:(?)'}'\d'E:yD!\s(?'R'\x1e;\x10:U))|'))|)g!\xb0*R+9{29+)#(?'P'})*?pS\{3,}\x85,{0,}l{*UTF)(\xe{7}){3722,{9,}d{2,?|))|{)\(A?&d}}{\xa,}2}){3,}7,l{)22}(,}l:7{2,4}}29\x19+)#?'P'})*v?))\x5" "\xa\xf<(.\pZ*\P{Xwd}+^\xa8\3'3yq.::?(?J:()\xd1+!~:3'(8?:)':(?'d'(?'d'^u]!.+.+\\A\Ah(n+?9){7}+\K;(?'X'u'(?'c'(?'z'(?<y>\xb::\xf0'|\xd3(\xae?'w(z\x8?P>l)\x8?P>a)'\H\R\xd1+!!~:3'(?:h$N{26875}\W+?\\=D{2}\x89(?i:Uy0\N({2\xa(\v\x85*){y*\A(()\p{L}+?\P{^Xan}'+?\xff\+pS\?|).{;y*\A(()\p{L}+?\8}\d?1(|)(/1){7}.+[Lp{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(\xbf(R))\x8a\X*?\x8a\xb\xd1^9\3*+(\xc1,\k'R'\xb4)\xcc(z\z(?J)(?'X'\x1b(\xb\xd1^9\?'3*+P{^Xan}+?\xff\+(\xc1.]k+\xb'Pm'\xb4)\xcc4f\xa7'\xd1V(?i:U,{2,2})'(?'X'))?-%--\x95$9*\4'|\xd1(\x9c''%\x94$9)#(?'R')3\x7?('P\xed7'\xa8\xb1^u\xeaw\1\0\0\(|(?1){7}.+[\p{Me}].\s\xdcC*^\x14?(?(<y>))(?<!^)$C((;*?(R*?))+(?(R)\x8a\X*?\x8a\xb\xd1^9\3*+|(\xc1,\k'R'\xb4)\xcc! z)\z(?JJ)(?'X';(\xb\xd1^9\?'3*+(\xc1.]k+\xb'Pm'\xb4))':(?'d')(?'RD'(d')|)|$)'|(?<x>\g{d});\g{x}\x11\g{d}\x81\|$((?'X'\'X'(?'W''\x92()'9'\x83*))\xba*\!?^ <){)':;\xcc4'\xd1'(?'X'28))?-%--\x95$9*\4'|\xd1((''e\x94*$9:)*#(?'R')3)\x7?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+0!~:(?)'d'E:yD!\s(?'R'\x1e;\x10:U))|'\x9g!\xb0*){)\\x16:;()\x1e\x10\x87*:(?<y>)\xd1+!~:(?)'}'\d'E:yD!\s(?'R'\x1e;\x10:U))|'))|)g!\xb0*R+9{29+)#(?'P'})*?pS\{3,}\x85,{0,}l{*UTF)(\xe{7}){3722,{9,}d{2,?|))|{)\(A?&d}}{\xa,}2}){3,}7,l{)22}(,}l:7{2,4}}29\x19+)#?'P'})*v?))\x5"

View File

@ -169,12 +169,6 @@ Failed: POSIX code 4: ? * + invalid at offset 1000001
** Ignored with POSIX interface: offset ** Ignored with POSIX interface: offset
Matched with REG_NOSUB Matched with REG_NOSUB
/(?=(a\K))/
a
Start of matched string is beyond its end - displaying from end to start.
0: a
1: a
/^d(e)$/posix /^d(e)$/posix
acdef\=posix_startend=2:4 acdef\=posix_startend=2:4
0: de 0: de

27
testdata/testoutput2 vendored
View File

@ -13355,7 +13355,7 @@ No match
/[a[:<:]] should give error/ /[a[:<:]] should give error/
Failed: error 130 at offset 4: unknown POSIX class name Failed: error 130 at offset 4: unknown POSIX class name
/(?=ab\K)/aftertext /(?=ab\K)/aftertext,allow_lookaround_bsk
abcd\=startchar abcd\=startchar
Start of matched string is beyond its end - displaying from end to start. Start of matched string is beyond its end - displaying from end to start.
0: ab 0: ab
@ -13783,7 +13783,7 @@ Get substring 4 failed (-49): unknown substring
0L c 0L c
1L 1L
/x(?=ab\K)/ /x(?=ab\K)/allow_lookaround_bsk
xab\=get=0 xab\=get=0
Start of matched string is beyond its end - displaying from end to start. Start of matched string is beyond its end - displaying from end to start.
0: ab 0: ab
@ -14281,7 +14281,7 @@ Failed: error 125 at offset 1: lookbehind assertion is not fixed length
# Perl loops on this (PCRE2 used to!) # Perl loops on this (PCRE2 used to!)
/(?<=\Ka)/g,aftertext /(?<=\Ka)/g,aftertext,allow_lookaround_bsk
aaaaa aaaaa
0: a 0: a
0+ aaaa 0+ aaaa
@ -14294,7 +14294,7 @@ Failed: error 125 at offset 1: lookbehind assertion is not fixed length
0: a 0: a
0+ 0+
/(?<=\Ka)/altglobal,aftertext /(?<=\Ka)/altglobal,aftertext,allow_lookaround_bsk
aaaaa aaaaa
0: a 0: a
0+ aaaa 0+ aaaa
@ -14911,11 +14911,11 @@ Max lookbehind = 65535
First code unit = 'x' First code unit = 'x'
Subject length lower bound = 1 Subject length lower bound = 1
/(?=a\K)/replace=z /(?=a\K)/replace=z,allow_lookaround_bsk
BaCaD BaCaD
Failed: error -60: match with end before start or start moved backwards is not supported Failed: error -60: match with end before start or start moved backwards is not supported
/(?<=\K.)/g,replace=- /(?<=\K.)/g,replace=-,allow_lookaround_bsk
ab ab
Failed: error -60: match with end before start or start moved backwards is not supported Failed: error -60: match with end before start or start moved backwards is not supported
@ -17641,21 +17641,34 @@ MK: >\x00<
/(?(VERSION=0.0/ /(?(VERSION=0.0/
Failed: error 179 at offset 14: syntax error or number too big in (?(VERSION condition Failed: error 179 at offset 14: syntax error or number too big in (?(VERSION condition
# Perl has made \K in lookarounds an error. At the moment PCRE2 still accepts. # Perl has made \K in lookarounds an error. PCRE2 now rejects as well, unless
# explicitly authorized.
/(?=a\Kb)ab/ /(?=a\Kb)ab/
Failed: error 199 at offset 10: \K is not allowed in lookarounds (but see PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK)
/(?=a\Kb)ab/allow_lookaround_bsk
ab ab
0: b 0: b
/(?!a\Kb)ac/ /(?!a\Kb)ac/
Failed: error 199 at offset 10: \K is not allowed in lookarounds (but see PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK)
/(?!a\Kb)ac/allow_lookaround_bsk
ac ac
0: ac 0: ac
/^abc(?<=b\Kc)d/ /^abc(?<=b\Kc)d/
Failed: error 199 at offset 14: \K is not allowed in lookarounds (but see PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK)
/^abc(?<=b\Kc)d/allow_lookaround_bsk
abcd abcd
0: cd 0: cd
/^abc(?<!b\Kq)d/ /^abc(?<!b\Kq)d/
Failed: error 199 at offset 14: \K is not allowed in lookarounds (but see PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK)
/^abc(?<!b\Kq)d/,allow_lookaround_bsk
abcd abcd
0: abcd 0: abcd

View File

@ -3958,7 +3958,7 @@ Subject length lower bound = 1
abcdefghijklmno abcdefghijklmno
0: a 0: a
/(?<=\K\x{17f})/g,utf,aftertext /(?<=\K\x{17f})/g,utf,aftertext,allow_lookaround_bsk
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f} \x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f} 0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}\x{17f} 0+ \x{17f}\x{17f}\x{17f}\x{17f}
@ -3971,7 +3971,7 @@ Subject length lower bound = 1
0: \x{17f} 0: \x{17f}
0+ 0+
/(?<=\K\x{17f})/altglobal,utf,aftertext /(?<=\K\x{17f})/altglobal,utf,aftertext,allow_lookaround_bsk
\x{17f}\x{17f}\x{17f}\x{17f}\x{17f} \x{17f}\x{17f}\x{17f}\x{17f}\x{17f}
0: \x{17f} 0: \x{17f}
0+ \x{17f}\x{17f}\x{17f}\x{17f} 0+ \x{17f}\x{17f}\x{17f}\x{17f}