diff --git a/ChangeLog b/ChangeLog index 6ecc00e..02333f9 100644 --- a/ChangeLog +++ b/ChangeLog @@ -380,6 +380,9 @@ changed when the effects of those options were all moved to compile time. PCRE2_ALT_VERBNAMES was set caused pcre2_compile() to malfunction. This bug was found by the LLVM fuzzer. +110. Implemented PCRE2_SUBSTITUTE_UNSET_EMPTY, and updated pcre2test to make it +possible to test it. + Version 10.20 30-June-2015 -------------------------- diff --git a/doc/pcre2api.3 b/doc/pcre2api.3 index afd3779..2ec6f67 100644 --- a/doc/pcre2api.3 +++ b/doc/pcre2api.3 @@ -1,4 +1,4 @@ -.TH PCRE2API 3 "03 December 2015" "PCRE2 10.21" +.TH PCRE2API 3 "04 December 2015" "PCRE2 10.21" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .sp @@ -2734,19 +2734,26 @@ simultaneous substitutions, as this \fBpcre2test\fP example shows: apple lemon 2: pear orange .sp -There is an additional option, PCRE2_SUBSTITUTE_GLOBAL, which causes the -function to iterate over the subject string, replacing every matching -substring. If this is not set, only the first matching substring is replaced. -If any matched substring has zero length, after the substitution has happened, -an attempt to find a non-empty match at the same position is performed. If this -is not successful, the current position is advanced by one character except -when CRLF is a valid newline sequence and the next two characters are CR, LF. -In this case, the current position is advanced by two characters. +Three additional options are available: .P -A second additional option, PCRE2_SUBSTITUTE_EXTENDED, causes extra processing -to be applied to the replacement string. Without this option, only the dollar -character is special, and only the group insertion forms listed above are -valid. When PCRE2_SUBSTITUTE_EXTENDED is set, two things change: +PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string, +replacing every matching substring. If this is not set, only the first matching +substring is replaced. If any matched substring has zero length, after the +substitution has happened, an attempt to find a non-empty match at the same +position is performed. If this is not successful, the current position is +advanced by one character except when CRLF is a valid newline sequence and the +next two characters are CR, LF. In this case, the current position is advanced +by two characters. +.P +PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups to be treated as +empty strings when inserted as described above. If this option is not set, an +attempt to insert an unset group causes the PCRE2_ERROR_UNSET error. This +option does not influence the extended substitution syntax described below. +.P +PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the +replacement string. Without this option, only the dollar character is special, +and only the group insertion forms listed above are valid. When +PCRE2_SUBSTITUTE_EXTENDED is set, two things change: .P Firstly, backslash in a replacement string is interpreted as an escape character. The usual forms such as \en or \ex{ddd} can be used to specify @@ -2792,16 +2799,22 @@ string remains in force afterwards, as shown in this \fBpcre2test\fP example: somebody 1: HELLO .sp +The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended +substitutions. +.P If successful, the function returns the number of replacements that were made. This may be zero if no matches were found, and is never greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set. .P In the event of an error, a negative error code is returned. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP -are passed straight back. PCRE2_ERROR_NOMEMORY is returned if the output buffer -is not big enough. PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax -errors in the replacement string, with more particular errors being -PCRE2_ERROR_BADREPESCAPE (invalid escape sequence), +are passed straight back. PCRE2_ERROR_NOSUBSTRING is returned for a +non-existent substring insertion, and PCRE2_ERROR_UNSET is returned for an +unset substring insertion when the simple (non-extended) syntax is used and +PCRE2_SUBSTITUTE_UNSET_EMPTY is not set. PCRE2_ERROR_NOMEMORY is returned if +the output buffer is not big enough. PCRE2_ERROR_BADREPLACEMENT is used for +miscellaneous syntax errors in the replacement string, with more particular +errors being PCRE2_ERROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REPMISSING_BRACE (closing curly bracket not found), PCRE2_BADSUBSTITUTION (syntax error in extended group substitution), and PCRE2_BADSUBPATTERN (the pattern match ended before it started). As for all @@ -3100,6 +3113,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 03 December 2015 +Last updated: 04 December 2015 Copyright (c) 1997-2015 University of Cambridge. .fi diff --git a/doc/pcre2test.1 b/doc/pcre2test.1 index cfdec81..2315ddb 100644 --- a/doc/pcre2test.1 +++ b/doc/pcre2test.1 @@ -1,4 +1,4 @@ -.TH PCRE2TEST 1 "21 November 2015" "PCRE 10.21" +.TH PCRE2TEST 1 "04 December 2015" "PCRE 10.21" .SH NAME pcre2test - a program for testing Perl-compatible regular expressions. .SH SYNOPSIS @@ -854,14 +854,16 @@ are applied to every subject line that is processed with that pattern. They may not appear in \fB#pattern\fP commands. These modifiers do not affect the compilation process. .sp - aftertext show text after match - allaftertext show text after captures - allcaptures show all captures - allusedtext show all consulted text - /g global global matching - mark show mark values - replace= specify a replacement string - startchar show starting character when relevant + aftertext show text after match + allaftertext show text after captures + allcaptures show all captures + allusedtext show all consulted text + /g global global matching + mark show mark values + replace= specify a replacement string + startchar show starting character when relevant + substitute_extended use PCRE2_SUBSTITUTE_EXTENDED + substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY .sp These modifiers may not appear in a \fB#pattern\fP command. If you want them as defaults, set them in a \fB#subject\fP command. @@ -960,6 +962,8 @@ pattern. replace= specify a replacement string startchar show startchar when relevant startoffset= same as offset= + substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED + substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY zero_terminate pass the subject as zero-terminated .sp The effects of these modifiers are described in the following sections. @@ -1104,9 +1108,13 @@ individual code units are copied directly. This provides a means of passing an invalid UTF-8 string for testing purposes. .P If the \fBglobal\fP modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to -\fBpcre2_substitute()\fP. After a successful substitution, the modified string -is output, preceded by the number of replacements. This may be zero if there -were no matches. Here is a simple example of a substitution test: +\fBpcre2_substitute()\fP. The \fBsubstitute_extended\fP and +\fBsubstitute_unset_empty\fP modifiers set PCRE2_SUBSTITUTE_EXTENDED and +PCRE2_SUBSTITUTE_UNSET_EMPTY, respectively. +.P +After a successful substitution, the modified string is output, preceded by the +number of replacements. This may be zero if there were no matches. Here is a +simple example of a substitution test: .sp /abc/replace=xxx =abc=abc= @@ -1610,6 +1618,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 21 November 2015 +Last updated: 04 December 2015 Copyright (c) 1997-2015 University of Cambridge. .fi diff --git a/src/pcre2.h b/src/pcre2.h index e3caa46..ecdec00 100644 --- a/src/pcre2.h +++ b/src/pcre2.h @@ -148,8 +148,9 @@ sanity checks). */ /* These are additional options for pcre2_substitute(). */ -#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u -#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u +#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u +#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u +#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u /* Newline and \R settings, for use in compile contexts. The newline values must be kept in step with values set in config.h and both sets must all be diff --git a/src/pcre2.h.in b/src/pcre2.h.in index 31490bf..538a493 100644 --- a/src/pcre2.h.in +++ b/src/pcre2.h.in @@ -148,8 +148,9 @@ sanity checks). */ /* These are additional options for pcre2_substitute(). */ -#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u -#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u +#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u +#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u +#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u /* Newline and \R settings, for use in compile contexts. The newline values must be kept in step with values set in config.h and both sets must all be diff --git a/src/pcre2_substitute.c b/src/pcre2_substitute.c index 94a329e..76dd37b 100644 --- a/src/pcre2_substitute.c +++ b/src/pcre2_substitute.c @@ -197,6 +197,7 @@ BOOL match_data_created = FALSE; BOOL global = FALSE; BOOL extended = FALSE; BOOL literal = FALSE; +BOOL uempty = FALSE; /* Unset/unknown groups => empty string */ #ifdef SUPPORT_UNICODE BOOL utf = (code->overall_options & PCRE2_UTF) != 0; #endif @@ -262,6 +263,12 @@ if ((options & PCRE2_SUBSTITUTE_EXTENDED) != 0) extended = TRUE; } +if ((options & PCRE2_SUBSTITUTE_UNSET_EMPTY) != 0) + { + options &= ~PCRE2_SUBSTITUTE_UNSET_EMPTY; + uempty = TRUE; + } + /* Copy up to the start offset */ if (start_offset > buff_length) goto NOROOM; @@ -471,7 +478,6 @@ do if (inparens) { - if (extended && !star && ptr < repend - 2 && next == CHAR_COLON) { special = *(++ptr); @@ -562,8 +568,20 @@ do if (group < 0) group = GET2(first, 0); } + /* We now have a group that is identified by number. Find the length of + the captured string. If a group in a non-special substitution is unset + when PCRE2_SUBSTITUTE_UNSET_EMPTY is set, substitute nothing. */ + rc = pcre2_substring_length_bynumber(match_data, group, &sublength); - if (rc < 0 && (special == 0 || rc != PCRE2_ERROR_UNSET)) goto PTREXIT; + if (rc < 0) + { + if (rc != PCRE2_ERROR_UNSET) goto PTREXIT; /* Non-unset errors */ + if (special == 0) /* Plain substitution */ + { + if (uempty) continue; /* Treat as empty */ + goto PTREXIT; /* Else error */ + } + } /* If special is '+' we have a 'set' and possibly an 'unset' text, both of which are reprocessed when used. If special is '-' we have a diff --git a/src/pcre2test.c b/src/pcre2test.c index 8ab7b56..984c35e 100644 --- a/src/pcre2test.c +++ b/src/pcre2test.c @@ -385,33 +385,34 @@ enum { MOD_CTC, /* Applies to a compile context */ /* Control bits. Some apply to compiling, some to matching, but some can be set either on a pattern or a data line, so they must all be distinct. */ -#define CTL_AFTERTEXT 0x00000001u -#define CTL_ALLAFTERTEXT 0x00000002u -#define CTL_ALLCAPTURES 0x00000004u -#define CTL_ALLUSEDTEXT 0x00000008u -#define CTL_ALTGLOBAL 0x00000010u -#define CTL_BINCODE 0x00000020u -#define CTL_CALLOUT_CAPTURE 0x00000040u -#define CTL_CALLOUT_INFO 0x00000080u -#define CTL_CALLOUT_NONE 0x00000100u -#define CTL_DFA 0x00000200u -#define CTL_EXPAND 0x00000400u -#define CTL_FINDLIMITS 0x00000800u -#define CTL_FULLBINCODE 0x00001000u -#define CTL_GETALL 0x00002000u -#define CTL_GLOBAL 0x00004000u -#define CTL_HEXPAT 0x00008000u -#define CTL_INFO 0x00010000u -#define CTL_JITFAST 0x00020000u -#define CTL_JITVERIFY 0x00040000u -#define CTL_MARK 0x00080000u -#define CTL_MEMORY 0x00100000u -#define CTL_NULLCONTEXT 0x00200000u -#define CTL_POSIX 0x00400000u -#define CTL_PUSH 0x00800000u -#define CTL_STARTCHAR 0x01000000u -#define CTL_SUBSTITUTE_EXTENDED 0x02000000u -#define CTL_ZERO_TERMINATE 0x04000000u +#define CTL_AFTERTEXT 0x00000001u +#define CTL_ALLAFTERTEXT 0x00000002u +#define CTL_ALLCAPTURES 0x00000004u +#define CTL_ALLUSEDTEXT 0x00000008u +#define CTL_ALTGLOBAL 0x00000010u +#define CTL_BINCODE 0x00000020u +#define CTL_CALLOUT_CAPTURE 0x00000040u +#define CTL_CALLOUT_INFO 0x00000080u +#define CTL_CALLOUT_NONE 0x00000100u +#define CTL_DFA 0x00000200u +#define CTL_EXPAND 0x00000400u +#define CTL_FINDLIMITS 0x00000800u +#define CTL_FULLBINCODE 0x00001000u +#define CTL_GETALL 0x00002000u +#define CTL_GLOBAL 0x00004000u +#define CTL_HEXPAT 0x00008000u +#define CTL_INFO 0x00010000u +#define CTL_JITFAST 0x00020000u +#define CTL_JITVERIFY 0x00040000u +#define CTL_MARK 0x00080000u +#define CTL_MEMORY 0x00100000u +#define CTL_NULLCONTEXT 0x00200000u +#define CTL_POSIX 0x00400000u +#define CTL_PUSH 0x00800000u +#define CTL_STARTCHAR 0x01000000u +#define CTL_SUBSTITUTE_EXTENDED 0x02000000u +#define CTL_SUBSTITUTE_UNSET_EMPTY 0x04000000u +#define CTL_ZERO_TERMINATE 0x08000000u #define CTL_BSR_SET 0x80000000u /* This is informational */ #define CTL_NL_SET 0x40000000u /* This is informational */ @@ -431,7 +432,9 @@ data line. */ CTL_GLOBAL|\ CTL_MARK|\ CTL_MEMORY|\ - CTL_STARTCHAR) + CTL_STARTCHAR|\ + CTL_SUBSTITUTE_EXTENDED|\ + CTL_SUBSTITUTE_UNSET_EMPTY) /* Structures for holding modifier information for patterns and subject strings (data). Fields containing modifiers that can be set either for a pattern or a @@ -495,91 +498,92 @@ typedef struct modstruct { } modstruct; static modstruct modlist[] = { - { "aftertext", MOD_PNDP, MOD_CTL, CTL_AFTERTEXT, PO(control) }, - { "allaftertext", MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT, PO(control) }, - { "allcaptures", MOD_PND, MOD_CTL, CTL_ALLCAPTURES, PO(control) }, - { "allow_empty_class", MOD_PAT, MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS, PO(options) }, - { "allusedtext", MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT, PO(control) }, - { "alt_bsux", MOD_PAT, MOD_OPT, PCRE2_ALT_BSUX, PO(options) }, - { "alt_circumflex", MOD_PAT, MOD_OPT, PCRE2_ALT_CIRCUMFLEX, PO(options) }, - { "alt_verbnames", MOD_PAT, MOD_OPT, PCRE2_ALT_VERBNAMES, PO(options) }, - { "altglobal", MOD_PND, MOD_CTL, CTL_ALTGLOBAL, PO(control) }, - { "anchored", MOD_PD, MOD_OPT, PCRE2_ANCHORED, PD(options) }, - { "auto_callout", MOD_PAT, MOD_OPT, PCRE2_AUTO_CALLOUT, PO(options) }, - { "bincode", MOD_PAT, MOD_CTL, CTL_BINCODE, PO(control) }, - { "bsr", MOD_CTC, MOD_BSR, 0, CO(bsr_convention) }, - { "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) }, - { "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) }, - { "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) }, - { "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) }, - { "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) }, - { "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) }, - { "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) }, - { "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) }, - { "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) }, - { "dfa_restart", MOD_DAT, MOD_OPT, PCRE2_DFA_RESTART, DO(options) }, - { "dfa_shortest", MOD_DAT, MOD_OPT, PCRE2_DFA_SHORTEST, DO(options) }, - { "dollar_endonly", MOD_PAT, MOD_OPT, PCRE2_DOLLAR_ENDONLY, PO(options) }, - { "dotall", MOD_PATP, MOD_OPT, PCRE2_DOTALL, PO(options) }, - { "dupnames", MOD_PATP, MOD_OPT, PCRE2_DUPNAMES, PO(options) }, - { "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) }, - { "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) }, - { "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) }, - { "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) }, - { "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) }, - { "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) }, - { "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) }, - { "global", MOD_PNDP, MOD_CTL, CTL_GLOBAL, PO(control) }, - { "hex", MOD_PAT, MOD_CTL, CTL_HEXPAT, PO(control) }, - { "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) }, - { "jit", MOD_PAT, MOD_IND, 7, PO(jit) }, - { "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) }, - { "jitstack", MOD_DAT, MOD_INT, 0, DO(jitstack) }, - { "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) }, - { "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) }, - { "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) }, - { "match_limit", MOD_CTM, MOD_INT, 0, MO(match_limit) }, - { "match_unset_backref", MOD_PAT, MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) }, - { "max_pattern_length", MOD_CTC, MOD_SIZ, 0, CO(max_pattern_length) }, - { "memory", MOD_PD, MOD_CTL, CTL_MEMORY, PD(control) }, - { "multiline", MOD_PATP, MOD_OPT, PCRE2_MULTILINE, PO(options) }, - { "never_backslash_c", MOD_PAT, MOD_OPT, PCRE2_NEVER_BACKSLASH_C, PO(options) }, - { "never_ucp", MOD_PAT, MOD_OPT, PCRE2_NEVER_UCP, PO(options) }, - { "never_utf", MOD_PAT, MOD_OPT, PCRE2_NEVER_UTF, PO(options) }, - { "newline", MOD_CTC, MOD_NL, 0, CO(newline_convention) }, - { "no_auto_capture", MOD_PAT, MOD_OPT, PCRE2_NO_AUTO_CAPTURE, PO(options) }, - { "no_auto_possess", MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS, PO(options) }, - { "no_dotstar_anchor", MOD_PAT, MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR, PO(options) }, - { "no_start_optimize", MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE, PO(options) }, - { "no_utf_check", MOD_PD, MOD_OPT, PCRE2_NO_UTF_CHECK, PD(options) }, - { "notbol", MOD_DAT, MOD_OPT, PCRE2_NOTBOL, DO(options) }, - { "notempty", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY, DO(options) }, - { "notempty_atstart", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY_ATSTART, DO(options) }, - { "noteol", MOD_DAT, MOD_OPT, PCRE2_NOTEOL, DO(options) }, - { "null_context", MOD_PD, MOD_CTL, CTL_NULLCONTEXT, PO(control) }, - { "offset", MOD_DAT, MOD_INT, 0, DO(offset) }, - { "offset_limit", MOD_CTM, MOD_SIZ, 0, MO(offset_limit)}, - { "ovector", MOD_DAT, MOD_INT, 0, DO(oveccount) }, - { "parens_nest_limit", MOD_CTC, MOD_INT, 0, CO(parens_nest_limit) }, - { "partial_hard", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) }, - { "partial_soft", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) }, - { "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) }, - { "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) }, - { "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) }, - { "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) }, - { "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) }, - { "regerror_buffsize", MOD_PAT, MOD_INT, 0, PO(regerror_buffsize) }, - { "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) }, - { "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) }, - { "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) }, - { "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) }, - { "substitute_extended", MOD_PAT, MOD_CTL, CTL_SUBSTITUTE_EXTENDED, PO(control) }, - { "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) }, - { "ucp", MOD_PATP, MOD_OPT, PCRE2_UCP, PO(options) }, - { "ungreedy", MOD_PAT, MOD_OPT, PCRE2_UNGREEDY, PO(options) }, - { "use_offset_limit", MOD_PAT, MOD_OPT, PCRE2_USE_OFFSET_LIMIT, PO(options) }, - { "utf", MOD_PATP, MOD_OPT, PCRE2_UTF, PO(options) }, - { "zero_terminate", MOD_DAT, MOD_CTL, CTL_ZERO_TERMINATE, DO(control) } + { "aftertext", MOD_PNDP, MOD_CTL, CTL_AFTERTEXT, PO(control) }, + { "allaftertext", MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT, PO(control) }, + { "allcaptures", MOD_PND, MOD_CTL, CTL_ALLCAPTURES, PO(control) }, + { "allow_empty_class", MOD_PAT, MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS, PO(options) }, + { "allusedtext", MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT, PO(control) }, + { "alt_bsux", MOD_PAT, MOD_OPT, PCRE2_ALT_BSUX, PO(options) }, + { "alt_circumflex", MOD_PAT, MOD_OPT, PCRE2_ALT_CIRCUMFLEX, PO(options) }, + { "alt_verbnames", MOD_PAT, MOD_OPT, PCRE2_ALT_VERBNAMES, PO(options) }, + { "altglobal", MOD_PND, MOD_CTL, CTL_ALTGLOBAL, PO(control) }, + { "anchored", MOD_PD, MOD_OPT, PCRE2_ANCHORED, PD(options) }, + { "auto_callout", MOD_PAT, MOD_OPT, PCRE2_AUTO_CALLOUT, PO(options) }, + { "bincode", MOD_PAT, MOD_CTL, CTL_BINCODE, PO(control) }, + { "bsr", MOD_CTC, MOD_BSR, 0, CO(bsr_convention) }, + { "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) }, + { "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) }, + { "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) }, + { "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) }, + { "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) }, + { "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) }, + { "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) }, + { "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) }, + { "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) }, + { "dfa_restart", MOD_DAT, MOD_OPT, PCRE2_DFA_RESTART, DO(options) }, + { "dfa_shortest", MOD_DAT, MOD_OPT, PCRE2_DFA_SHORTEST, DO(options) }, + { "dollar_endonly", MOD_PAT, MOD_OPT, PCRE2_DOLLAR_ENDONLY, PO(options) }, + { "dotall", MOD_PATP, MOD_OPT, PCRE2_DOTALL, PO(options) }, + { "dupnames", MOD_PATP, MOD_OPT, PCRE2_DUPNAMES, PO(options) }, + { "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) }, + { "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) }, + { "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) }, + { "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) }, + { "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) }, + { "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) }, + { "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) }, + { "global", MOD_PNDP, MOD_CTL, CTL_GLOBAL, PO(control) }, + { "hex", MOD_PAT, MOD_CTL, CTL_HEXPAT, PO(control) }, + { "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) }, + { "jit", MOD_PAT, MOD_IND, 7, PO(jit) }, + { "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) }, + { "jitstack", MOD_DAT, MOD_INT, 0, DO(jitstack) }, + { "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) }, + { "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) }, + { "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) }, + { "match_limit", MOD_CTM, MOD_INT, 0, MO(match_limit) }, + { "match_unset_backref", MOD_PAT, MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) }, + { "max_pattern_length", MOD_CTC, MOD_SIZ, 0, CO(max_pattern_length) }, + { "memory", MOD_PD, MOD_CTL, CTL_MEMORY, PD(control) }, + { "multiline", MOD_PATP, MOD_OPT, PCRE2_MULTILINE, PO(options) }, + { "never_backslash_c", MOD_PAT, MOD_OPT, PCRE2_NEVER_BACKSLASH_C, PO(options) }, + { "never_ucp", MOD_PAT, MOD_OPT, PCRE2_NEVER_UCP, PO(options) }, + { "never_utf", MOD_PAT, MOD_OPT, PCRE2_NEVER_UTF, PO(options) }, + { "newline", MOD_CTC, MOD_NL, 0, CO(newline_convention) }, + { "no_auto_capture", MOD_PAT, MOD_OPT, PCRE2_NO_AUTO_CAPTURE, PO(options) }, + { "no_auto_possess", MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS, PO(options) }, + { "no_dotstar_anchor", MOD_PAT, MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR, PO(options) }, + { "no_start_optimize", MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE, PO(options) }, + { "no_utf_check", MOD_PD, MOD_OPT, PCRE2_NO_UTF_CHECK, PD(options) }, + { "notbol", MOD_DAT, MOD_OPT, PCRE2_NOTBOL, DO(options) }, + { "notempty", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY, DO(options) }, + { "notempty_atstart", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY_ATSTART, DO(options) }, + { "noteol", MOD_DAT, MOD_OPT, PCRE2_NOTEOL, DO(options) }, + { "null_context", MOD_PD, MOD_CTL, CTL_NULLCONTEXT, PO(control) }, + { "offset", MOD_DAT, MOD_INT, 0, DO(offset) }, + { "offset_limit", MOD_CTM, MOD_SIZ, 0, MO(offset_limit)}, + { "ovector", MOD_DAT, MOD_INT, 0, DO(oveccount) }, + { "parens_nest_limit", MOD_CTC, MOD_INT, 0, CO(parens_nest_limit) }, + { "partial_hard", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) }, + { "partial_soft", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) }, + { "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) }, + { "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) }, + { "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) }, + { "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) }, + { "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) }, + { "regerror_buffsize", MOD_PAT, MOD_INT, 0, PO(regerror_buffsize) }, + { "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) }, + { "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) }, + { "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) }, + { "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) }, + { "substitute_extended", MOD_PND, MOD_CTL, CTL_SUBSTITUTE_EXTENDED, PO(control) }, + { "substitute_unset_empty", MOD_PND, MOD_CTL, CTL_SUBSTITUTE_UNSET_EMPTY, PO(control) }, + { "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) }, + { "ucp", MOD_PATP, MOD_OPT, PCRE2_UCP, PO(options) }, + { "ungreedy", MOD_PAT, MOD_OPT, PCRE2_UNGREEDY, PO(options) }, + { "use_offset_limit", MOD_PAT, MOD_OPT, PCRE2_USE_OFFSET_LIMIT, PO(options) }, + { "utf", MOD_PATP, MOD_OPT, PCRE2_UTF, PO(options) }, + { "zero_terminate", MOD_DAT, MOD_CTL, CTL_ZERO_TERMINATE, DO(control) } }; #define MODLISTCOUNT sizeof(modlist)/sizeof(modstruct) @@ -3519,7 +3523,7 @@ Returns: nothing static void show_controls(uint32_t controls, const char *before) { -fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s", +fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s", before, ((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "", ((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "", @@ -3549,6 +3553,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s", ((controls & CTL_PUSH) != 0)? " push" : "", ((controls & CTL_STARTCHAR) != 0)? " startchar" : "", ((controls & CTL_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "", + ((controls & CTL_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "", ((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : ""); } @@ -3746,8 +3751,8 @@ if ((pat_patctl.control & CTL_INFO) != 0) const uint8_t *start_bits; BOOL match_limit_set, recursion_limit_set; uint32_t backrefmax, bsr_convention, capture_count, first_ctype, first_cunit, - hasbackslashc, hascrorlf, jchanged, last_ctype, last_cunit, match_empty, - match_limit, minlength, nameentrysize, namecount, newline_convention, + hasbackslashc, hascrorlf, jchanged, last_ctype, last_cunit, match_empty, + match_limit, minlength, nameentrysize, namecount, newline_convention, recursion_limit; /* These info requests may return PCRE2_ERROR_UNSET. */ @@ -5873,8 +5878,10 @@ if (dat_datctl.replacement[0] != 0) xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 : PCRE2_SUBSTITUTE_GLOBAL) | - (((pat_patctl.control & CTL_SUBSTITUTE_EXTENDED) == 0)? 0 : - PCRE2_SUBSTITUTE_EXTENDED); + (((dat_datctl.control & CTL_SUBSTITUTE_EXTENDED) == 0)? 0 : + PCRE2_SUBSTITUTE_EXTENDED) | + (((dat_datctl.control & CTL_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 : + PCRE2_SUBSTITUTE_UNSET_EMPTY); SETCASTPTR(r, rbuffer); /* Sets r8, r16, or r32, as appropriate. */ pr = dat_datctl.replacement; diff --git a/testdata/testinput2 b/testdata/testinput2 index 519a779..cbb2dcb 100644 --- a/testdata/testinput2 +++ b/testdata/testinput2 @@ -4576,6 +4576,9 @@ B)x/alt_verbnames,mark /(abcd)/replace=${1:+xy\kz},substitute_extended abcd +/(abcd)/ + abcd\=replace=${1:+xy\kz},substitute_extended + /abcd/substitute_extended,replace=>$1< abcd @@ -4737,4 +4740,20 @@ a)"xI /(8(*:6^\x09x\xa6l\)6!|\xd0:[^:|)\x09d\Z\d{85*m(?'(?<1!)*\W[*\xff]!!h\w]*\xbe;/alt_bsux,alt_verbnames,allow_empty_class,dollar_endonly,extended,multiline,never_utf,no_dotstar_anchor,no_start_optimize +/a|(b)c/replace=>$1<,substitute_unset_empty + cat + xbcom + +/a|(b)c/ + cat\=replace=>$1< + cat\=replace=>$1<,substitute_unset_empty + xbcom\=replace=>$1<,substitute_unset_empty + +/a|(?'X'b)c/replace=>$X<,substitute_unset_empty + cat + xbcom + +/a|(b)c/replace=>$2<,substitute_unset_empty + cat + # End of testinput2 diff --git a/testdata/testoutput2 b/testdata/testoutput2 index 0c03433..023843b 100644 --- a/testdata/testoutput2 +++ b/testdata/testoutput2 @@ -14648,6 +14648,10 @@ Failed: error -58 at offset 7 in replacement: expected closing curly bracket in abcd Failed: error -57 at offset 8 in replacement: bad escape sequence in replacement string +/(abcd)/ + abcd\=replace=${1:+xy\kz},substitute_extended +Failed: error -57 at offset 8 in replacement: bad escape sequence in replacement string + /abcd/substitute_extended,replace=>$1< abcd Failed: error -49 at offset 3 in replacement: unknown substring @@ -15057,4 +15061,28 @@ Subject length lower bound = 0 /(8(*:6^\x09x\xa6l\)6!|\xd0:[^:|)\x09d\Z\d{85*m(?'(?<1!)*\W[*\xff]!!h\w]*\xbe;/alt_bsux,alt_verbnames,allow_empty_class,dollar_endonly,extended,multiline,never_utf,no_dotstar_anchor,no_start_optimize Failed: error 124 at offset 49: letter or underscore expected after (?< or (?' +/a|(b)c/replace=>$1<,substitute_unset_empty + cat + 1: c>b$1< +Failed: error -55 at offset 3 in replacement: requested value is not set + cat\=replace=>$1<,substitute_unset_empty + 1: c>$1<,substitute_unset_empty + 1: x>b$X<,substitute_unset_empty + cat + 1: c>b$2<,substitute_unset_empty + cat +Failed: error -49 at offset 3 in replacement: unknown substring + # End of testinput2