Implement PCRE2_SUBSTITUTE_UNSET_EMPTY.
This commit is contained in:
parent
38caadff03
commit
2f684a60ed
|
@ -380,6 +380,9 @@ changed when the effects of those options were all moved to compile time.
|
|||
PCRE2_ALT_VERBNAMES was set caused pcre2_compile() to malfunction. This bug
|
||||
was found by the LLVM fuzzer.
|
||||
|
||||
110. Implemented PCRE2_SUBSTITUTE_UNSET_EMPTY, and updated pcre2test to make it
|
||||
possible to test it.
|
||||
|
||||
|
||||
Version 10.20 30-June-2015
|
||||
--------------------------
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "03 December 2015" "PCRE2 10.21"
|
||||
.TH PCRE2API 3 "04 December 2015" "PCRE2 10.21"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -2734,19 +2734,26 @@ simultaneous substitutions, as this \fBpcre2test\fP example shows:
|
|||
apple lemon
|
||||
2: pear orange
|
||||
.sp
|
||||
There is an additional option, PCRE2_SUBSTITUTE_GLOBAL, which causes the
|
||||
function to iterate over the subject string, replacing every matching
|
||||
substring. If this is not set, only the first matching substring is replaced.
|
||||
If any matched substring has zero length, after the substitution has happened,
|
||||
an attempt to find a non-empty match at the same position is performed. If this
|
||||
is not successful, the current position is advanced by one character except
|
||||
when CRLF is a valid newline sequence and the next two characters are CR, LF.
|
||||
In this case, the current position is advanced by two characters.
|
||||
Three additional options are available:
|
||||
.P
|
||||
A second additional option, PCRE2_SUBSTITUTE_EXTENDED, causes extra processing
|
||||
to be applied to the replacement string. Without this option, only the dollar
|
||||
character is special, and only the group insertion forms listed above are
|
||||
valid. When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
|
||||
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
|
||||
replacing every matching substring. If this is not set, only the first matching
|
||||
substring is replaced. If any matched substring has zero length, after the
|
||||
substitution has happened, an attempt to find a non-empty match at the same
|
||||
position is performed. If this is not successful, the current position is
|
||||
advanced by one character except when CRLF is a valid newline sequence and the
|
||||
next two characters are CR, LF. In this case, the current position is advanced
|
||||
by two characters.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups to be treated as
|
||||
empty strings when inserted as described above. If this option is not set, an
|
||||
attempt to insert an unset group causes the PCRE2_ERROR_UNSET error. This
|
||||
option does not influence the extended substitution syntax described below.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
|
||||
replacement string. Without this option, only the dollar character is special,
|
||||
and only the group insertion forms listed above are valid. When
|
||||
PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
|
||||
.P
|
||||
Firstly, backslash in a replacement string is interpreted as an escape
|
||||
character. The usual forms such as \en or \ex{ddd} can be used to specify
|
||||
|
@ -2792,16 +2799,22 @@ string remains in force afterwards, as shown in this \fBpcre2test\fP example:
|
|||
somebody
|
||||
1: HELLO
|
||||
.sp
|
||||
The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
|
||||
substitutions.
|
||||
.P
|
||||
If successful, the function returns the number of replacements that were made.
|
||||
This may be zero if no matches were found, and is never greater than 1 unless
|
||||
PCRE2_SUBSTITUTE_GLOBAL is set.
|
||||
.P
|
||||
In the event of an error, a negative error code is returned. Except for
|
||||
PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
|
||||
are passed straight back. PCRE2_ERROR_NOMEMORY is returned if the output buffer
|
||||
is not big enough. PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax
|
||||
errors in the replacement string, with more particular errors being
|
||||
PCRE2_ERROR_BADREPESCAPE (invalid escape sequence),
|
||||
are passed straight back. PCRE2_ERROR_NOSUBSTRING is returned for a
|
||||
non-existent substring insertion, and PCRE2_ERROR_UNSET is returned for an
|
||||
unset substring insertion when the simple (non-extended) syntax is used and
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY is not set. PCRE2_ERROR_NOMEMORY is returned if
|
||||
the output buffer is not big enough. PCRE2_ERROR_BADREPLACEMENT is used for
|
||||
miscellaneous syntax errors in the replacement string, with more particular
|
||||
errors being PCRE2_ERROR_BADREPESCAPE (invalid escape sequence),
|
||||
PCRE2_ERROR_REPMISSING_BRACE (closing curly bracket not found),
|
||||
PCRE2_BADSUBSTITUTION (syntax error in extended group substitution), and
|
||||
PCRE2_BADSUBPATTERN (the pattern match ended before it started). As for all
|
||||
|
@ -3100,6 +3113,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 03 December 2015
|
||||
Last updated: 04 December 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2TEST 1 "21 November 2015" "PCRE 10.21"
|
||||
.TH PCRE2TEST 1 "04 December 2015" "PCRE 10.21"
|
||||
.SH NAME
|
||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -854,14 +854,16 @@ are applied to every subject line that is processed with that pattern. They may
|
|||
not appear in \fB#pattern\fP commands. These modifiers do not affect the
|
||||
compilation process.
|
||||
.sp
|
||||
aftertext show text after match
|
||||
allaftertext show text after captures
|
||||
allcaptures show all captures
|
||||
allusedtext show all consulted text
|
||||
/g global global matching
|
||||
mark show mark values
|
||||
replace=<string> specify a replacement string
|
||||
startchar show starting character when relevant
|
||||
aftertext show text after match
|
||||
allaftertext show text after captures
|
||||
allcaptures show all captures
|
||||
allusedtext show all consulted text
|
||||
/g global global matching
|
||||
mark show mark values
|
||||
replace=<string> specify a replacement string
|
||||
startchar show starting character when relevant
|
||||
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
|
||||
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||
.sp
|
||||
These modifiers may not appear in a \fB#pattern\fP command. If you want them as
|
||||
defaults, set them in a \fB#subject\fP command.
|
||||
|
@ -960,6 +962,8 @@ pattern.
|
|||
replace=<string> specify a replacement string
|
||||
startchar show startchar when relevant
|
||||
startoffset=<n> same as offset=<n>
|
||||
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
|
||||
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
|
||||
zero_terminate pass the subject as zero-terminated
|
||||
.sp
|
||||
The effects of these modifiers are described in the following sections.
|
||||
|
@ -1104,9 +1108,13 @@ individual code units are copied directly. This provides a means of passing an
|
|||
invalid UTF-8 string for testing purposes.
|
||||
.P
|
||||
If the \fBglobal\fP modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
|
||||
\fBpcre2_substitute()\fP. After a successful substitution, the modified string
|
||||
is output, preceded by the number of replacements. This may be zero if there
|
||||
were no matches. Here is a simple example of a substitution test:
|
||||
\fBpcre2_substitute()\fP. The \fBsubstitute_extended\fP and
|
||||
\fBsubstitute_unset_empty\fP modifiers set PCRE2_SUBSTITUTE_EXTENDED and
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY, respectively.
|
||||
.P
|
||||
After a successful substitution, the modified string is output, preceded by the
|
||||
number of replacements. This may be zero if there were no matches. Here is a
|
||||
simple example of a substitution test:
|
||||
.sp
|
||||
/abc/replace=xxx
|
||||
=abc=abc=
|
||||
|
@ -1610,6 +1618,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 21 November 2015
|
||||
Last updated: 04 December 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -148,8 +148,9 @@ sanity checks). */
|
|||
|
||||
/* These are additional options for pcre2_substitute(). */
|
||||
|
||||
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
|
||||
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
|
||||
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
|
||||
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
|
||||
#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u
|
||||
|
||||
/* Newline and \R settings, for use in compile contexts. The newline values
|
||||
must be kept in step with values set in config.h and both sets must all be
|
||||
|
|
|
@ -148,8 +148,9 @@ sanity checks). */
|
|||
|
||||
/* These are additional options for pcre2_substitute(). */
|
||||
|
||||
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
|
||||
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
|
||||
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
|
||||
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
|
||||
#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u
|
||||
|
||||
/* Newline and \R settings, for use in compile contexts. The newline values
|
||||
must be kept in step with values set in config.h and both sets must all be
|
||||
|
|
|
@ -197,6 +197,7 @@ BOOL match_data_created = FALSE;
|
|||
BOOL global = FALSE;
|
||||
BOOL extended = FALSE;
|
||||
BOOL literal = FALSE;
|
||||
BOOL uempty = FALSE; /* Unset/unknown groups => empty string */
|
||||
#ifdef SUPPORT_UNICODE
|
||||
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
|
||||
#endif
|
||||
|
@ -262,6 +263,12 @@ if ((options & PCRE2_SUBSTITUTE_EXTENDED) != 0)
|
|||
extended = TRUE;
|
||||
}
|
||||
|
||||
if ((options & PCRE2_SUBSTITUTE_UNSET_EMPTY) != 0)
|
||||
{
|
||||
options &= ~PCRE2_SUBSTITUTE_UNSET_EMPTY;
|
||||
uempty = TRUE;
|
||||
}
|
||||
|
||||
/* Copy up to the start offset */
|
||||
|
||||
if (start_offset > buff_length) goto NOROOM;
|
||||
|
@ -471,7 +478,6 @@ do
|
|||
|
||||
if (inparens)
|
||||
{
|
||||
|
||||
if (extended && !star && ptr < repend - 2 && next == CHAR_COLON)
|
||||
{
|
||||
special = *(++ptr);
|
||||
|
@ -562,8 +568,20 @@ do
|
|||
if (group < 0) group = GET2(first, 0);
|
||||
}
|
||||
|
||||
/* We now have a group that is identified by number. Find the length of
|
||||
the captured string. If a group in a non-special substitution is unset
|
||||
when PCRE2_SUBSTITUTE_UNSET_EMPTY is set, substitute nothing. */
|
||||
|
||||
rc = pcre2_substring_length_bynumber(match_data, group, &sublength);
|
||||
if (rc < 0 && (special == 0 || rc != PCRE2_ERROR_UNSET)) goto PTREXIT;
|
||||
if (rc < 0)
|
||||
{
|
||||
if (rc != PCRE2_ERROR_UNSET) goto PTREXIT; /* Non-unset errors */
|
||||
if (special == 0) /* Plain substitution */
|
||||
{
|
||||
if (uempty) continue; /* Treat as empty */
|
||||
goto PTREXIT; /* Else error */
|
||||
}
|
||||
}
|
||||
|
||||
/* If special is '+' we have a 'set' and possibly an 'unset' text,
|
||||
both of which are reprocessed when used. If special is '-' we have a
|
||||
|
|
239
src/pcre2test.c
239
src/pcre2test.c
|
@ -385,33 +385,34 @@ enum { MOD_CTC, /* Applies to a compile context */
|
|||
/* Control bits. Some apply to compiling, some to matching, but some can be set
|
||||
either on a pattern or a data line, so they must all be distinct. */
|
||||
|
||||
#define CTL_AFTERTEXT 0x00000001u
|
||||
#define CTL_ALLAFTERTEXT 0x00000002u
|
||||
#define CTL_ALLCAPTURES 0x00000004u
|
||||
#define CTL_ALLUSEDTEXT 0x00000008u
|
||||
#define CTL_ALTGLOBAL 0x00000010u
|
||||
#define CTL_BINCODE 0x00000020u
|
||||
#define CTL_CALLOUT_CAPTURE 0x00000040u
|
||||
#define CTL_CALLOUT_INFO 0x00000080u
|
||||
#define CTL_CALLOUT_NONE 0x00000100u
|
||||
#define CTL_DFA 0x00000200u
|
||||
#define CTL_EXPAND 0x00000400u
|
||||
#define CTL_FINDLIMITS 0x00000800u
|
||||
#define CTL_FULLBINCODE 0x00001000u
|
||||
#define CTL_GETALL 0x00002000u
|
||||
#define CTL_GLOBAL 0x00004000u
|
||||
#define CTL_HEXPAT 0x00008000u
|
||||
#define CTL_INFO 0x00010000u
|
||||
#define CTL_JITFAST 0x00020000u
|
||||
#define CTL_JITVERIFY 0x00040000u
|
||||
#define CTL_MARK 0x00080000u
|
||||
#define CTL_MEMORY 0x00100000u
|
||||
#define CTL_NULLCONTEXT 0x00200000u
|
||||
#define CTL_POSIX 0x00400000u
|
||||
#define CTL_PUSH 0x00800000u
|
||||
#define CTL_STARTCHAR 0x01000000u
|
||||
#define CTL_SUBSTITUTE_EXTENDED 0x02000000u
|
||||
#define CTL_ZERO_TERMINATE 0x04000000u
|
||||
#define CTL_AFTERTEXT 0x00000001u
|
||||
#define CTL_ALLAFTERTEXT 0x00000002u
|
||||
#define CTL_ALLCAPTURES 0x00000004u
|
||||
#define CTL_ALLUSEDTEXT 0x00000008u
|
||||
#define CTL_ALTGLOBAL 0x00000010u
|
||||
#define CTL_BINCODE 0x00000020u
|
||||
#define CTL_CALLOUT_CAPTURE 0x00000040u
|
||||
#define CTL_CALLOUT_INFO 0x00000080u
|
||||
#define CTL_CALLOUT_NONE 0x00000100u
|
||||
#define CTL_DFA 0x00000200u
|
||||
#define CTL_EXPAND 0x00000400u
|
||||
#define CTL_FINDLIMITS 0x00000800u
|
||||
#define CTL_FULLBINCODE 0x00001000u
|
||||
#define CTL_GETALL 0x00002000u
|
||||
#define CTL_GLOBAL 0x00004000u
|
||||
#define CTL_HEXPAT 0x00008000u
|
||||
#define CTL_INFO 0x00010000u
|
||||
#define CTL_JITFAST 0x00020000u
|
||||
#define CTL_JITVERIFY 0x00040000u
|
||||
#define CTL_MARK 0x00080000u
|
||||
#define CTL_MEMORY 0x00100000u
|
||||
#define CTL_NULLCONTEXT 0x00200000u
|
||||
#define CTL_POSIX 0x00400000u
|
||||
#define CTL_PUSH 0x00800000u
|
||||
#define CTL_STARTCHAR 0x01000000u
|
||||
#define CTL_SUBSTITUTE_EXTENDED 0x02000000u
|
||||
#define CTL_SUBSTITUTE_UNSET_EMPTY 0x04000000u
|
||||
#define CTL_ZERO_TERMINATE 0x08000000u
|
||||
|
||||
#define CTL_BSR_SET 0x80000000u /* This is informational */
|
||||
#define CTL_NL_SET 0x40000000u /* This is informational */
|
||||
|
@ -431,7 +432,9 @@ data line. */
|
|||
CTL_GLOBAL|\
|
||||
CTL_MARK|\
|
||||
CTL_MEMORY|\
|
||||
CTL_STARTCHAR)
|
||||
CTL_STARTCHAR|\
|
||||
CTL_SUBSTITUTE_EXTENDED|\
|
||||
CTL_SUBSTITUTE_UNSET_EMPTY)
|
||||
|
||||
/* Structures for holding modifier information for patterns and subject strings
|
||||
(data). Fields containing modifiers that can be set either for a pattern or a
|
||||
|
@ -495,91 +498,92 @@ typedef struct modstruct {
|
|||
} modstruct;
|
||||
|
||||
static modstruct modlist[] = {
|
||||
{ "aftertext", MOD_PNDP, MOD_CTL, CTL_AFTERTEXT, PO(control) },
|
||||
{ "allaftertext", MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT, PO(control) },
|
||||
{ "allcaptures", MOD_PND, MOD_CTL, CTL_ALLCAPTURES, PO(control) },
|
||||
{ "allow_empty_class", MOD_PAT, MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS, PO(options) },
|
||||
{ "allusedtext", MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT, PO(control) },
|
||||
{ "alt_bsux", MOD_PAT, MOD_OPT, PCRE2_ALT_BSUX, PO(options) },
|
||||
{ "alt_circumflex", MOD_PAT, MOD_OPT, PCRE2_ALT_CIRCUMFLEX, PO(options) },
|
||||
{ "alt_verbnames", MOD_PAT, MOD_OPT, PCRE2_ALT_VERBNAMES, PO(options) },
|
||||
{ "altglobal", MOD_PND, MOD_CTL, CTL_ALTGLOBAL, PO(control) },
|
||||
{ "anchored", MOD_PD, MOD_OPT, PCRE2_ANCHORED, PD(options) },
|
||||
{ "auto_callout", MOD_PAT, MOD_OPT, PCRE2_AUTO_CALLOUT, PO(options) },
|
||||
{ "bincode", MOD_PAT, MOD_CTL, CTL_BINCODE, PO(control) },
|
||||
{ "bsr", MOD_CTC, MOD_BSR, 0, CO(bsr_convention) },
|
||||
{ "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
|
||||
{ "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) },
|
||||
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
|
||||
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
|
||||
{ "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) },
|
||||
{ "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) },
|
||||
{ "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) },
|
||||
{ "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) },
|
||||
{ "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) },
|
||||
{ "dfa_restart", MOD_DAT, MOD_OPT, PCRE2_DFA_RESTART, DO(options) },
|
||||
{ "dfa_shortest", MOD_DAT, MOD_OPT, PCRE2_DFA_SHORTEST, DO(options) },
|
||||
{ "dollar_endonly", MOD_PAT, MOD_OPT, PCRE2_DOLLAR_ENDONLY, PO(options) },
|
||||
{ "dotall", MOD_PATP, MOD_OPT, PCRE2_DOTALL, PO(options) },
|
||||
{ "dupnames", MOD_PATP, MOD_OPT, PCRE2_DUPNAMES, PO(options) },
|
||||
{ "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) },
|
||||
{ "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) },
|
||||
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
|
||||
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
|
||||
{ "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) },
|
||||
{ "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) },
|
||||
{ "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) },
|
||||
{ "global", MOD_PNDP, MOD_CTL, CTL_GLOBAL, PO(control) },
|
||||
{ "hex", MOD_PAT, MOD_CTL, CTL_HEXPAT, PO(control) },
|
||||
{ "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) },
|
||||
{ "jit", MOD_PAT, MOD_IND, 7, PO(jit) },
|
||||
{ "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) },
|
||||
{ "jitstack", MOD_DAT, MOD_INT, 0, DO(jitstack) },
|
||||
{ "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) },
|
||||
{ "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) },
|
||||
{ "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) },
|
||||
{ "match_limit", MOD_CTM, MOD_INT, 0, MO(match_limit) },
|
||||
{ "match_unset_backref", MOD_PAT, MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) },
|
||||
{ "max_pattern_length", MOD_CTC, MOD_SIZ, 0, CO(max_pattern_length) },
|
||||
{ "memory", MOD_PD, MOD_CTL, CTL_MEMORY, PD(control) },
|
||||
{ "multiline", MOD_PATP, MOD_OPT, PCRE2_MULTILINE, PO(options) },
|
||||
{ "never_backslash_c", MOD_PAT, MOD_OPT, PCRE2_NEVER_BACKSLASH_C, PO(options) },
|
||||
{ "never_ucp", MOD_PAT, MOD_OPT, PCRE2_NEVER_UCP, PO(options) },
|
||||
{ "never_utf", MOD_PAT, MOD_OPT, PCRE2_NEVER_UTF, PO(options) },
|
||||
{ "newline", MOD_CTC, MOD_NL, 0, CO(newline_convention) },
|
||||
{ "no_auto_capture", MOD_PAT, MOD_OPT, PCRE2_NO_AUTO_CAPTURE, PO(options) },
|
||||
{ "no_auto_possess", MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS, PO(options) },
|
||||
{ "no_dotstar_anchor", MOD_PAT, MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR, PO(options) },
|
||||
{ "no_start_optimize", MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE, PO(options) },
|
||||
{ "no_utf_check", MOD_PD, MOD_OPT, PCRE2_NO_UTF_CHECK, PD(options) },
|
||||
{ "notbol", MOD_DAT, MOD_OPT, PCRE2_NOTBOL, DO(options) },
|
||||
{ "notempty", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY, DO(options) },
|
||||
{ "notempty_atstart", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY_ATSTART, DO(options) },
|
||||
{ "noteol", MOD_DAT, MOD_OPT, PCRE2_NOTEOL, DO(options) },
|
||||
{ "null_context", MOD_PD, MOD_CTL, CTL_NULLCONTEXT, PO(control) },
|
||||
{ "offset", MOD_DAT, MOD_INT, 0, DO(offset) },
|
||||
{ "offset_limit", MOD_CTM, MOD_SIZ, 0, MO(offset_limit)},
|
||||
{ "ovector", MOD_DAT, MOD_INT, 0, DO(oveccount) },
|
||||
{ "parens_nest_limit", MOD_CTC, MOD_INT, 0, CO(parens_nest_limit) },
|
||||
{ "partial_hard", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
||||
{ "partial_soft", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
||||
{ "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
||||
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
||||
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
||||
{ "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) },
|
||||
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
|
||||
{ "regerror_buffsize", MOD_PAT, MOD_INT, 0, PO(regerror_buffsize) },
|
||||
{ "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
|
||||
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
||||
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
|
||||
{ "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) },
|
||||
{ "substitute_extended", MOD_PAT, MOD_CTL, CTL_SUBSTITUTE_EXTENDED, PO(control) },
|
||||
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
|
||||
{ "ucp", MOD_PATP, MOD_OPT, PCRE2_UCP, PO(options) },
|
||||
{ "ungreedy", MOD_PAT, MOD_OPT, PCRE2_UNGREEDY, PO(options) },
|
||||
{ "use_offset_limit", MOD_PAT, MOD_OPT, PCRE2_USE_OFFSET_LIMIT, PO(options) },
|
||||
{ "utf", MOD_PATP, MOD_OPT, PCRE2_UTF, PO(options) },
|
||||
{ "zero_terminate", MOD_DAT, MOD_CTL, CTL_ZERO_TERMINATE, DO(control) }
|
||||
{ "aftertext", MOD_PNDP, MOD_CTL, CTL_AFTERTEXT, PO(control) },
|
||||
{ "allaftertext", MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT, PO(control) },
|
||||
{ "allcaptures", MOD_PND, MOD_CTL, CTL_ALLCAPTURES, PO(control) },
|
||||
{ "allow_empty_class", MOD_PAT, MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS, PO(options) },
|
||||
{ "allusedtext", MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT, PO(control) },
|
||||
{ "alt_bsux", MOD_PAT, MOD_OPT, PCRE2_ALT_BSUX, PO(options) },
|
||||
{ "alt_circumflex", MOD_PAT, MOD_OPT, PCRE2_ALT_CIRCUMFLEX, PO(options) },
|
||||
{ "alt_verbnames", MOD_PAT, MOD_OPT, PCRE2_ALT_VERBNAMES, PO(options) },
|
||||
{ "altglobal", MOD_PND, MOD_CTL, CTL_ALTGLOBAL, PO(control) },
|
||||
{ "anchored", MOD_PD, MOD_OPT, PCRE2_ANCHORED, PD(options) },
|
||||
{ "auto_callout", MOD_PAT, MOD_OPT, PCRE2_AUTO_CALLOUT, PO(options) },
|
||||
{ "bincode", MOD_PAT, MOD_CTL, CTL_BINCODE, PO(control) },
|
||||
{ "bsr", MOD_CTC, MOD_BSR, 0, CO(bsr_convention) },
|
||||
{ "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
|
||||
{ "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) },
|
||||
{ "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
|
||||
{ "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
|
||||
{ "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) },
|
||||
{ "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) },
|
||||
{ "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) },
|
||||
{ "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) },
|
||||
{ "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) },
|
||||
{ "dfa_restart", MOD_DAT, MOD_OPT, PCRE2_DFA_RESTART, DO(options) },
|
||||
{ "dfa_shortest", MOD_DAT, MOD_OPT, PCRE2_DFA_SHORTEST, DO(options) },
|
||||
{ "dollar_endonly", MOD_PAT, MOD_OPT, PCRE2_DOLLAR_ENDONLY, PO(options) },
|
||||
{ "dotall", MOD_PATP, MOD_OPT, PCRE2_DOTALL, PO(options) },
|
||||
{ "dupnames", MOD_PATP, MOD_OPT, PCRE2_DUPNAMES, PO(options) },
|
||||
{ "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) },
|
||||
{ "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) },
|
||||
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
|
||||
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
|
||||
{ "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) },
|
||||
{ "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) },
|
||||
{ "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) },
|
||||
{ "global", MOD_PNDP, MOD_CTL, CTL_GLOBAL, PO(control) },
|
||||
{ "hex", MOD_PAT, MOD_CTL, CTL_HEXPAT, PO(control) },
|
||||
{ "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) },
|
||||
{ "jit", MOD_PAT, MOD_IND, 7, PO(jit) },
|
||||
{ "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) },
|
||||
{ "jitstack", MOD_DAT, MOD_INT, 0, DO(jitstack) },
|
||||
{ "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) },
|
||||
{ "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) },
|
||||
{ "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) },
|
||||
{ "match_limit", MOD_CTM, MOD_INT, 0, MO(match_limit) },
|
||||
{ "match_unset_backref", MOD_PAT, MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) },
|
||||
{ "max_pattern_length", MOD_CTC, MOD_SIZ, 0, CO(max_pattern_length) },
|
||||
{ "memory", MOD_PD, MOD_CTL, CTL_MEMORY, PD(control) },
|
||||
{ "multiline", MOD_PATP, MOD_OPT, PCRE2_MULTILINE, PO(options) },
|
||||
{ "never_backslash_c", MOD_PAT, MOD_OPT, PCRE2_NEVER_BACKSLASH_C, PO(options) },
|
||||
{ "never_ucp", MOD_PAT, MOD_OPT, PCRE2_NEVER_UCP, PO(options) },
|
||||
{ "never_utf", MOD_PAT, MOD_OPT, PCRE2_NEVER_UTF, PO(options) },
|
||||
{ "newline", MOD_CTC, MOD_NL, 0, CO(newline_convention) },
|
||||
{ "no_auto_capture", MOD_PAT, MOD_OPT, PCRE2_NO_AUTO_CAPTURE, PO(options) },
|
||||
{ "no_auto_possess", MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS, PO(options) },
|
||||
{ "no_dotstar_anchor", MOD_PAT, MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR, PO(options) },
|
||||
{ "no_start_optimize", MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE, PO(options) },
|
||||
{ "no_utf_check", MOD_PD, MOD_OPT, PCRE2_NO_UTF_CHECK, PD(options) },
|
||||
{ "notbol", MOD_DAT, MOD_OPT, PCRE2_NOTBOL, DO(options) },
|
||||
{ "notempty", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY, DO(options) },
|
||||
{ "notempty_atstart", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY_ATSTART, DO(options) },
|
||||
{ "noteol", MOD_DAT, MOD_OPT, PCRE2_NOTEOL, DO(options) },
|
||||
{ "null_context", MOD_PD, MOD_CTL, CTL_NULLCONTEXT, PO(control) },
|
||||
{ "offset", MOD_DAT, MOD_INT, 0, DO(offset) },
|
||||
{ "offset_limit", MOD_CTM, MOD_SIZ, 0, MO(offset_limit)},
|
||||
{ "ovector", MOD_DAT, MOD_INT, 0, DO(oveccount) },
|
||||
{ "parens_nest_limit", MOD_CTC, MOD_INT, 0, CO(parens_nest_limit) },
|
||||
{ "partial_hard", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
||||
{ "partial_soft", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
||||
{ "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
|
||||
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
||||
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
||||
{ "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) },
|
||||
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
|
||||
{ "regerror_buffsize", MOD_PAT, MOD_INT, 0, PO(regerror_buffsize) },
|
||||
{ "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
|
||||
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
||||
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
|
||||
{ "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) },
|
||||
{ "substitute_extended", MOD_PND, MOD_CTL, CTL_SUBSTITUTE_EXTENDED, PO(control) },
|
||||
{ "substitute_unset_empty", MOD_PND, MOD_CTL, CTL_SUBSTITUTE_UNSET_EMPTY, PO(control) },
|
||||
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
|
||||
{ "ucp", MOD_PATP, MOD_OPT, PCRE2_UCP, PO(options) },
|
||||
{ "ungreedy", MOD_PAT, MOD_OPT, PCRE2_UNGREEDY, PO(options) },
|
||||
{ "use_offset_limit", MOD_PAT, MOD_OPT, PCRE2_USE_OFFSET_LIMIT, PO(options) },
|
||||
{ "utf", MOD_PATP, MOD_OPT, PCRE2_UTF, PO(options) },
|
||||
{ "zero_terminate", MOD_DAT, MOD_CTL, CTL_ZERO_TERMINATE, DO(control) }
|
||||
};
|
||||
|
||||
#define MODLISTCOUNT sizeof(modlist)/sizeof(modstruct)
|
||||
|
@ -3519,7 +3523,7 @@ Returns: nothing
|
|||
static void
|
||||
show_controls(uint32_t controls, const char *before)
|
||||
{
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
before,
|
||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||
|
@ -3549,6 +3553,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
|||
((controls & CTL_PUSH) != 0)? " push" : "",
|
||||
((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
|
||||
((controls & CTL_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
|
||||
((controls & CTL_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
|
||||
((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : "");
|
||||
}
|
||||
|
||||
|
@ -5873,8 +5878,10 @@ if (dat_datctl.replacement[0] != 0)
|
|||
|
||||
xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_GLOBAL) |
|
||||
(((pat_patctl.control & CTL_SUBSTITUTE_EXTENDED) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_EXTENDED);
|
||||
(((dat_datctl.control & CTL_SUBSTITUTE_EXTENDED) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_EXTENDED) |
|
||||
(((dat_datctl.control & CTL_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :
|
||||
PCRE2_SUBSTITUTE_UNSET_EMPTY);
|
||||
|
||||
SETCASTPTR(r, rbuffer); /* Sets r8, r16, or r32, as appropriate. */
|
||||
pr = dat_datctl.replacement;
|
||||
|
|
|
@ -4576,6 +4576,9 @@ B)x/alt_verbnames,mark
|
|||
/(abcd)/replace=${1:+xy\kz},substitute_extended
|
||||
abcd
|
||||
|
||||
/(abcd)/
|
||||
abcd\=replace=${1:+xy\kz},substitute_extended
|
||||
|
||||
/abcd/substitute_extended,replace=>$1<
|
||||
abcd
|
||||
|
||||
|
@ -4737,4 +4740,20 @@ a)"xI
|
|||
|
||||
/(8(*:6^\x09x\xa6l\)6!|\xd0:[^:|)\x09d\Z\d{85*m(?'(?<1!)*\W[*\xff]!!h\w]*\xbe;/alt_bsux,alt_verbnames,allow_empty_class,dollar_endonly,extended,multiline,never_utf,no_dotstar_anchor,no_start_optimize
|
||||
|
||||
/a|(b)c/replace=>$1<,substitute_unset_empty
|
||||
cat
|
||||
xbcom
|
||||
|
||||
/a|(b)c/
|
||||
cat\=replace=>$1<
|
||||
cat\=replace=>$1<,substitute_unset_empty
|
||||
xbcom\=replace=>$1<,substitute_unset_empty
|
||||
|
||||
/a|(?'X'b)c/replace=>$X<,substitute_unset_empty
|
||||
cat
|
||||
xbcom
|
||||
|
||||
/a|(b)c/replace=>$2<,substitute_unset_empty
|
||||
cat
|
||||
|
||||
# End of testinput2
|
||||
|
|
|
@ -14648,6 +14648,10 @@ Failed: error -58 at offset 7 in replacement: expected closing curly bracket in
|
|||
abcd
|
||||
Failed: error -57 at offset 8 in replacement: bad escape sequence in replacement string
|
||||
|
||||
/(abcd)/
|
||||
abcd\=replace=${1:+xy\kz},substitute_extended
|
||||
Failed: error -57 at offset 8 in replacement: bad escape sequence in replacement string
|
||||
|
||||
/abcd/substitute_extended,replace=>$1<
|
||||
abcd
|
||||
Failed: error -49 at offset 3 in replacement: unknown substring
|
||||
|
@ -15057,4 +15061,28 @@ Subject length lower bound = 0
|
|||
/(8(*:6^\x09x\xa6l\)6!|\xd0:[^:|)\x09d\Z\d{85*m(?'(?<1!)*\W[*\xff]!!h\w]*\xbe;/alt_bsux,alt_verbnames,allow_empty_class,dollar_endonly,extended,multiline,never_utf,no_dotstar_anchor,no_start_optimize
|
||||
Failed: error 124 at offset 49: letter or underscore expected after (?< or (?'
|
||||
|
||||
/a|(b)c/replace=>$1<,substitute_unset_empty
|
||||
cat
|
||||
1: c><t
|
||||
xbcom
|
||||
1: x>b<om
|
||||
|
||||
/a|(b)c/
|
||||
cat\=replace=>$1<
|
||||
Failed: error -55 at offset 3 in replacement: requested value is not set
|
||||
cat\=replace=>$1<,substitute_unset_empty
|
||||
1: c><t
|
||||
xbcom\=replace=>$1<,substitute_unset_empty
|
||||
1: x>b<om
|
||||
|
||||
/a|(?'X'b)c/replace=>$X<,substitute_unset_empty
|
||||
cat
|
||||
1: c><t
|
||||
xbcom
|
||||
1: x>b<om
|
||||
|
||||
/a|(b)c/replace=>$2<,substitute_unset_empty
|
||||
cat
|
||||
Failed: error -49 at offset 3 in replacement: unknown substring
|
||||
|
||||
# End of testinput2
|
||||
|
|
Loading…
Reference in New Issue