Implement PCRE2_SUBSTITUTE_EXTENDED.

2015-10-07 17:32:48 +00:00 · 2015-10-07 17:32:48 +00:00 · 6ae5c36e83
commit 6ae5c36e83
parent f64749b40a
16 changed files with 838 additions and 165 deletions
--- a/2
+++ b/2
@ -192,6 +192,8 @@ pcre2test (and perltest.sh) input.
 54. Add the null_context modifier to pcre2test so that calling pcre2_compile() 
 and the matching functions with NULL contexts can be tested.

+55. Implemented PCRE2_SUBSTITUTE_EXTENDED.
+

 Version 10.20 30-June-2015
 --------------------------
--- a/doc/pcre2_substitute.3
+++ b/doc/pcre2_substitute.3
@ -1,4 +1,4 @@
-.TH PCRE2_SUBSTITUTE 3 "11 November 2014" "PCRE2 10.00"
+.TH PCRE2_SUBSTITUTE 3 "06 October 2015" "PCRE2 10.21"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -47,20 +47,22 @@ units, not characters, as is the contents of the variable pointed at by
 \fIoutlengthptr\fP, which is updated to the actual length of the new string.
 The options are:
 .sp
-  PCRE2_ANCHORED          Match only at the first position
-  PCRE2_NOTBOL            Subject string is not the beginning of a line
-  PCRE2_NOTEOL            Subject string is not the end of a line
-  PCRE2_NOTEMPTY          An empty string is not a valid match
-  PCRE2_NOTEMPTY_ATSTART  An empty string at the start of the subject
-                           is not a valid match
-  PCRE2_NO_UTF_CHECK      Do not check the subject or replacement for
-                           UTF validity (only relevant if PCRE2_UTF
-                           was set at compile time)
-  PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
+  PCRE2_ANCHORED             Match only at the first position
+  PCRE2_NOTBOL               Subject is not the beginning of a line
+  PCRE2_NOTEOL               Subject is not the end of a line
+  PCRE2_NOTEMPTY             An empty string is not a valid match
+  PCRE2_NOTEMPTY_ATSTART     An empty string at the start of the
+                              subject is not a valid match
+  PCRE2_NO_UTF_CHECK         Do not check the subject or replacement
+                              for UTF validity (only relevant if
+                              PCRE2_UTF was set at compile time)
+  PCRE2_SUBSTITUTE_EXTENDED  Do extended replacement processing
+  PCRE2_SUBSTITUTE_GLOBAL    Replace all occurrences in the subject
 .sp
 The function returns the number of substitutions, which may be zero if there
 were no matches. The result can be greater than one only when
-PCRE2_SUBSTITUTE_GLOBAL is set.
+PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
+is returned.
 .P
 There is a complete description of the PCRE2 native API in the
 .\" HREF
--- a/doc/pcre2api.3
+++ b/doc/pcre2api.3
@ -1,4 +1,4 @@
-.TH PCRE2API 3 "22 September 2015" "PCRE2 10.21"
+.TH PCRE2API 3 "07 October 2015" "PCRE2 10.21"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@ -1170,7 +1170,7 @@ built.
 .sp
 If this option is set, an unanchored pattern is required to match before or at
 the first newline in the subject string, though the matched text may continue
-over the newline. See also PCRE2_USE_OFFSET_LIMIT, which provides a more 
+over the newline. See also PCRE2_USE_OFFSET_LIMIT, which provides a more
 general limiting facility.
 .sp
  PCRE2_MATCH_UNSET_BACKREF
@ -1367,8 +1367,8 @@ with Perl. It can also be set by a (?U) option setting within the pattern.
 .sp
  PCRE2_USE_OFFSET_LIMIT
 .sp
-This option must be set for \fBpcre2_compile()\fP if 
-\fBpcre2_set_offset_limit()\fP is going to be used to set a non-default offset 
+This option must be set for \fBpcre2_compile()\fP if
+\fBpcre2_set_offset_limit()\fP is going to be used to set a non-default offset
 limit in a match context for matches that use this pattern. An error is
 generated if an offset limit is set without this option. For more details, see
 the description of \fBpcre2_set_offset_limit()\fP in the
@ -2657,40 +2657,16 @@ same number causes an error at compile time.
 .B int pcre2_substitute(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP,
 .B "  PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP,"
 .B "  uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP,"
-.B "  pcre2_match_context *\fImcontext\fP, PCRE2_SPTR \fIreplacementzfP,"
+.B "  pcre2_match_context *\fImcontext\fP, PCRE2_SPTR \fIreplacement\fP,"
 .B "  PCRE2_SIZE \fIrlength\fP, PCRE2_UCHAR *\fIoutputbuffer\zfP,"
 .B "  PCRE2_SIZE *\fIoutlengthptr\fP);"
 .fi
+.P
 This function calls \fBpcre2_match()\fP and then makes a copy of the subject
 string in \fIoutputbuffer\fP, replacing the part that was matched with the
 \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This can
 be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
 .P
-In the replacement string, which is interpreted as a UTF string in UTF mode,
-and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
-dollar character is an escape character that can specify the insertion of
-characters from capturing groups or (*MARK) items in the pattern. The following
-forms are recognized:
-.sp
-  $$                  insert a dollar character
-  $<n> or ${<n>}      insert the contents of group <n>
-  $*MARK or ${*MARK}  insert the name of the last (*MARK) encountered 
-.sp
-Either a group number or a group name can be given for <n>. Curly brackets are
-required only if the following character would be interpreted as part of the
-number or name. The number may be zero to include the entire matched string.
-For example, if the pattern a(b)c is matched with "=abc=" and the replacement
-string "+$1$0$1+", the result is "=+babcb+=". Group insertion is done by
-calling \fBpcre2_copy_byname()\fP or \fBpcre2_copy_bynumber()\fP as
-appropriate.
-.P
-The facility for inserting a (*MARK) name can be used to perform simple 
-simultaneous substitutions, as this \fBpcre2test\fP example shows:
-.sp
-  /(*:pear)apple|(*:orange)lemon/g,replace=${*MARK}
-      apple lemon
-   2: pear orange
-.P
 The first seven arguments of \fBpcre2_substitute()\fP are the same as for
 \fBpcre2_match()\fP, except that the partial matching options are not
 permitted, and \fImatch_data\fP may be passed as NULL, in which case a match
@ -2698,23 +2674,104 @@ data block is obtained and freed within this function, using memory management
 functions from the match context, if provided, or else those that were used to
 allocate memory for the compiled code.
 .P
-There is one additional option, PCRE2_SUBSTITUTE_GLOBAL, which causes the
+The \fIoutlengthptr\fP argument must point to a variable that contains the
+length, in code units, of the output buffer. If the function is successful,
+the value is updated to contain the length of the new string, excluding the
+trailing zero that is automatically added. If the function is not successful,
+the value is set to PCRE2_UNSET for general errors (such as output buffer too
+small). For syntax errors in the replacement string, the value is set to the
+offset in the replacement string where the error was detected.
+.P
+In the replacement string, which is interpreted as a UTF string in UTF mode,
+and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
+dollar character is an escape character that can specify the insertion of
+characters from capturing groups or (*MARK) items in the pattern. The following
+forms are always recognized:
+.sp
+  $$                  insert a dollar character
+  $<n> or ${<n>}      insert the contents of group <n>
+  $*MARK or ${*MARK}  insert the name of the last (*MARK) encountered
+.sp
+Either a group number or a group name can be given for <n>. Curly brackets are
+required only if the following character would be interpreted as part of the
+number or name. The number may be zero to include the entire matched string.
+For example, if the pattern a(b)c is matched with "=abc=" and the replacement
+string "+$1$0$1+", the result is "=+babcb+=".
+.P
+The facility for inserting a (*MARK) name can be used to perform simple
+simultaneous substitutions, as this \fBpcre2test\fP example shows:
+.sp
+  /(*:pear)apple|(*:orange)lemon/g,replace=${*MARK}
+      apple lemon
+   2: pear orange
+.sp
+There is an additional option, PCRE2_SUBSTITUTE_GLOBAL, which causes the
 function to iterate over the subject string, replacing every matching
 substring. If this is not set, only the first matching substring is replaced.
 .P
-The \fIoutlengthptr\fP argument must point to a variable that contains the
-length, in code units, of the output buffer. It is updated to contain the
-length of the new string, excluding the trailing zero that is automatically
-added.
+A second additional option, PCRE2_SUBSTITUTE_EXTENDED, causes extra processing
+to be applied to the replacement string. Without this option, only the dollar
+character is special, and only the group insertion forms listed above are
+valid. When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
 .P
-The function returns the number of replacements that were made. This may be
-zero if no matches were found, and is never greater than 1 unless
-PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
-is returned. Except for PCRE2_ERROR_NOMATCH (which is never returned), any
-errors from \fBpcre2_match()\fP or the substring copying functions are passed
-straight back. PCRE2_ERROR_BADREPLACEMENT is returned for an invalid
-replacement string (unrecognized sequence following a dollar sign), and
-PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough.
+Firstly, backslash in a replacement string is interpreted as an escape
+character. The usual forms such as \en or \ex{ddd} can be used to specify
+particular character codes, and backslash followed by any non-alphanumeric
+character quotes that character. Extended quoting can be coded using \eQ...\eE,
+exactly as in pattern strings.
+.P
+There are also four escape sequences for forcing the case of inserted letters.
+The insertion mechanism has three states: no case forcing, force upper case,
+and force lower case. The escape sequences change the current state: \eU and
+\eL change to upper or lower case forcing, respectively, and \eE (when not
+terminating a \eQ quoted sequence) reverts to no case forcing. The sequences
+\eu and \el force the next character (if it is a letter) to upper or lower
+case, respectively, and then the state automatically reverts to no case
+forcing. Case forcing applies to all inserted  characters, including those from
+captured groups and letters within \eQ...\eE quoted sequences.
+.P
+Note that case forcing sequences such as \eU...\eE do not nest. For example,
+the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
+effect.
+.P
+The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
+flexibility to group substitution. The syntax is similar to that used by Bash:
+.sp
+  ${<n>:-<string>}
+  ${<n>:+<string1>:<string2>}
+.sp
+As before, <n> may be a group number or a name. The first form specifies a
+default value. If group <n> is set, its value is inserted; if not, <string> is
+expanded and the result inserted. The second form specifies strings that are
+expanded and inserted when group <n> is set or unset, respectively. The first
+form is just a convenient shorthand for
+.sp
+  ${<n>:+${<n>}:<string>}
+.sp
+Backslash can be used to escape colons and closing curly brackets in the
+replacement strings. A change of the case forcing state within a replacement
+string remains in force afterwards, as shown in this \fBpcre2test\fP example:
+.sp
+  /(some)?(body)/substitute_extended,replace=${1:+\eU:\eL}HeLLo
+      body
+   1: hello
+      somebody
+   1: HELLO
+.sp
+If successful, the function returns the number of replacements that were made.
+This may be zero if no matches were found, and is never greater than 1 unless
+PCRE2_SUBSTITUTE_GLOBAL is set.
+.P
+In the event of an error, a negative error code is returned. Except for
+PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
+are passed straight back. PCRE2_ERROR_NOMEMORY is returned if the output buffer
+is not big enough. PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax
+errors in the replacement string, with more particular errors being
+PCRE2_ERROR_BADREPESCAPE (invalid escape sequence),
+PCRE2_ERROR_REPMISSING_BRACE (closing curly bracket not found), and
+PCRE2_BADSUBSTITUTION (syntax error in extended group substitution). As for all
+PCRE2 errors, a text message that describes the error can be obtained by
+calling \fBpcre2_get_error_message()\fP.
 .
 .
 .SH "DUPLICATE SUBPATTERN NAMES"
@ -3008,6 +3065,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 22 September 2015
+Last updated: 07 October 2015
 Copyright (c) 1997-2015 University of Cambridge.
 .fi
--- a/src/pcre2.h
+++ b/src/pcre2.h
@ -146,9 +146,10 @@ sanity checks). */
 #define PCRE2_DFA_RESTART         0x00000040u
 #define PCRE2_DFA_SHORTEST        0x00000080u

-/* This is an additional option for pcre2_substitute(). */
+/* These are additional options for pcre2_substitute(). */

 #define PCRE2_SUBSTITUTE_GLOBAL   0x00000100u
+#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u

 /* Newline and \R settings, for use in compile contexts. The newline values
 must be kept in step with values set in config.h and both sets must all be
@ -236,6 +237,9 @@ numbers must not be changed. */
 #define PCRE2_ERROR_UNAVAILABLE       (-54)
 #define PCRE2_ERROR_UNSET             (-55)
 #define PCRE2_ERROR_BADOFFSETLIMIT    (-56)
+#define PCRE2_ERROR_BADREPESCAPE      (-57)
+#define PCRE2_ERROR_REPMISSINGBRACE   (-58)
+#define PCRE2_ERROR_BADSUBSTITUTION   (-59)

 /* Request types for pcre2_pattern_info() */

--- a/src/pcre2.h.in
+++ b/src/pcre2.h.in
@ -146,9 +146,10 @@ sanity checks). */
 #define PCRE2_DFA_RESTART         0x00000040u
 #define PCRE2_DFA_SHORTEST        0x00000080u

-/* This is an additional option for pcre2_substitute(). */
+/* These are additional options for pcre2_substitute(). */

 #define PCRE2_SUBSTITUTE_GLOBAL   0x00000100u
+#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u

 /* Newline and \R settings, for use in compile contexts. The newline values
 must be kept in step with values set in config.h and both sets must all be
@ -236,6 +237,9 @@ numbers must not be changed. */
 #define PCRE2_ERROR_UNAVAILABLE       (-54)
 #define PCRE2_ERROR_UNSET             (-55)
 #define PCRE2_ERROR_BADOFFSETLIMIT    (-56)
+#define PCRE2_ERROR_BADREPESCAPE      (-57)
+#define PCRE2_ERROR_REPMISSINGBRACE   (-58)
+#define PCRE2_ERROR_BADSUBSTITUTION   (-59)

 /* Request types for pcre2_pattern_info() */

--- a/src/pcre2_compile.c
+++ b/src/pcre2_compile.c
@ -1612,8 +1612,15 @@ is placed in chptr. A backreference to group n is returned as negative n. On
 entry, ptr is pointing at the \. On exit, it points the final code unit of the
 escape sequence.

+This function is also called from pcre2_substitute() to handle escape sequences
+in replacement strings. In this case, the cb argument is NULL, and only
+sequences that define a data character are recognised. The isclass argument is
+not relevant, but the options argument is the final value of the compiled
+pattern's options.
+
 Arguments:
-  ptrptr         points to the pattern position pointer
+  ptrptr         points to the input position pointer
+  ptrend         points to the end of the input
  chptr          points to a returned data character
  errorcodeptr   points to the errorcode variable (containing zero)
  options        the current options bits
@ -1626,9 +1633,9 @@ Returns:         zero => a data character
                 on error, errorcodeptr is set non-zero
 */

-static int
-check_escape(PCRE2_SPTR *ptrptr, uint32_t *chptr, int *errorcodeptr,
-  uint32_t options, BOOL isclass, compile_block *cb)
+int
+PRIV(check_escape)(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t *chptr,
+  int *errorcodeptr, uint32_t options, BOOL isclass, compile_block *cb)
 {
 BOOL utf = (options & PCRE2_UTF) != 0;
 PCRE2_SPTR ptr = *ptrptr + 1;
@ -1636,19 +1643,23 @@ register uint32_t c, cc;
 int escape = 0;
 int i;

-GETCHARINCTEST(c, ptr);         /* Get character value, increment pointer */
-ptr--;                          /* Set pointer back to the last code unit */
-
 /* If backslash is at the end of the pattern, it's an error. */

-if (c == CHAR_NULL && ptr >= cb->end_pattern) *errorcodeptr = ERR1;
+if (ptr >= ptrend) 
+  {
+  *errorcodeptr = ERR1;
+  return 0;
+  }  
+
+GETCHARINCTEST(c, ptr);         /* Get character value, increment pointer */
+ptr--;                          /* Set pointer back to the last code unit */

 /* Non-alphanumerics are literals, so we just leave the value in c. An initial
 value test saves a memory lookup for code points outside the alphanumeric
 range. Otherwise, do a table lookup. A non-zero result is something that can be
 returned immediately. Otherwise further processing is required. */

-else if (c < ESCAPES_FIRST || c > ESCAPES_LAST) {}  /* Definitely literal */
+if (c < ESCAPES_FIRST || c > ESCAPES_LAST) {}  /* Definitely literal */

 else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
  {
@ -1660,13 +1671,24 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
    }
  }

-/* Escapes that need further processing, including those that are unknown. */
+/* Escapes that need further processing, including those that are unknown. 
+When called from pcre2_substitute(), only \c, \o, and \x are recognized (and \u 
+when BSUX is set). */

 else
  {
  PCRE2_SPTR oldptr;
  BOOL braced, negated, overflow;
  unsigned int s;
+  
+  /* Filter calls from pcre2_substitute(). */
+
+  if (cb == NULL && c != CHAR_c && c != CHAR_o && c != CHAR_x &&
+      (c != CHAR_u || (options & PCRE2_ALT_BSUX) != 0))
+    {
+    *errorcodeptr = ERR3;
+    return 0;
+    }  

  switch (c)
    {
@ -2020,7 +2042,7 @@ else

    c = *(++ptr);
    if (c >= CHAR_a && c <= CHAR_z) c = UPPER_CASE(c);
-    if (c == CHAR_NULL && ptr >= cb->end_pattern)
+    if (c == CHAR_NULL && ptr >= ptrend)
      {
      *errorcodeptr = ERR2;
      break;
@ -2874,7 +2896,8 @@ for (; ptr < cb->end_pattern; ptr++)
      {
      int rc;
      *errorcodeptr = 0;
-      rc = check_escape(&ptr, &x, errorcodeptr, options, FALSE, cb);
+      rc = PRIV(check_escape)(&ptr, cb->end_pattern, &x, errorcodeptr, options,
+        FALSE, cb);
      *ptrptr = ptr;   /* For possible error */
      if (*errorcodeptr != 0) return -1;
      if (rc != 0)
@ -3048,7 +3071,8 @@ for (; ptr < cb->end_pattern; ptr++)

    case CHAR_BACKSLASH:
    errorcode = 0;
-    escape = check_escape(&ptr, &c, &errorcode, options, FALSE, cb);
+    escape = PRIV(check_escape)(&ptr, cb->end_pattern, &c, &errorcode, options,
+      FALSE, cb);
    if (errorcode != 0) goto FAILED;
    if (escape == ESC_Q) inescq = TRUE;
    break;
@ -3132,7 +3156,8 @@ for (; ptr < cb->end_pattern; ptr++)
      else if (c == CHAR_BACKSLASH)
        {
        errorcode = 0;
-        escape = check_escape(&ptr, &c, &errorcode, options, TRUE, cb);
+        escape = PRIV(check_escape)(&ptr, cb->end_pattern, &c, &errorcode,
+          options, TRUE, cb);
        if (errorcode != 0) goto FAILED;
        if (escape == ESC_Q) inescq = TRUE;
        }
@ -4195,7 +4220,8 @@ for (;; ptr++)

      if (c == CHAR_BACKSLASH)
        {
-        escape = check_escape(&ptr, &ec, errorcodeptr, options, TRUE, cb);
+        escape = PRIV(check_escape)(&ptr, cb->end_pattern, &ec, errorcodeptr,
+          options, TRUE, cb);
        if (*errorcodeptr != 0) goto FAILED;
        if (escape == 0)    /* Escaped single char */
          {
@ -4405,7 +4431,8 @@ for (;; ptr++)
          if (d == CHAR_BACKSLASH)
            {
            int descape;
-            descape = check_escape(&ptr, &d, errorcodeptr, options, TRUE, cb);
+            descape = PRIV(check_escape)(&ptr, cb->end_pattern, &d,
+              errorcodeptr, options, TRUE, cb);
            if (*errorcodeptr != 0) goto FAILED;
 #ifdef EBCDIC
            range_is_literal = FALSE;
@ -6862,7 +6889,8 @@ for (;; ptr++)

    case CHAR_BACKSLASH:
    tempptr = ptr;
-    escape = check_escape(&ptr, &ec, errorcodeptr, options, FALSE, cb);
+    escape = PRIV(check_escape)(&ptr, cb->end_pattern, &ec, errorcodeptr,
+      options, FALSE, cb);
    if (*errorcodeptr != 0) goto FAILED;

    if (escape == 0)                  /* The escape coded a single character */
--- a/src/pcre2_error.c
+++ b/src/pcre2_error.c
@ -238,9 +238,12 @@ static const char match_error_texts[] =
  "nested recursion at the same subject position\0"
  "recursion limit exceeded\0"
  "requested value is not available\0"
-  /* 55 */ 
+  /* 55 */
  "requested value is not set\0"
-  "offset limit set without PCRE2_USE_OFFSET_LIMIT\0" 
+  "offset limit set without PCRE2_USE_OFFSET_LIMIT\0"
+  "bad escape sequence in replacement string\0"
+  "expected closing curly bracket in replacement string\0"
+  "bad substitution in replacement string\0" 
  ;


--- a/src/pcre2_internal.h
+++ b/src/pcre2_internal.h
@ -1886,6 +1886,7 @@ not referenced from pcre2test, and must not be defined when no code unit width
 is available. */

 #define _pcre2_auto_possessify       PCRE2_SUFFIX(_pcre2_auto_possessify_)
+#define _pcre2_check_escape          PCRE2_SUFFIX(_pcre2_check_escape_)
 #define _pcre2_find_bracket          PCRE2_SUFFIX(_pcre2_find_bracket_)
 #define _pcre2_is_newline            PCRE2_SUFFIX(_pcre2_is_newline_)
 #define _pcre2_jit_free_rodata       PCRE2_SUFFIX(_pcre2_jit_free_rodata_)
@ -1907,6 +1908,8 @@ is available. */

 extern int          _pcre2_auto_possessify(PCRE2_UCHAR *, BOOL,
                      const compile_block *);
+extern int          _pcre2_check_escape(PCRE2_SPTR *, PCRE2_SPTR, uint32_t *,
+                      int *, uint32_t, BOOL, compile_block *);
 extern PCRE2_SPTR   _pcre2_find_bracket(PCRE2_SPTR, BOOL, int);
 extern BOOL         _pcre2_is_newline(PCRE2_SPTR, uint32_t, PCRE2_SPTR,
                      uint32_t *, BOOL);
--- a/src/pcre2_substitute.c
+++ b/src/pcre2_substitute.c
@ -45,6 +45,115 @@ POSSIBILITY OF SUCH DAMAGE.

 #include "pcre2_internal.h"

+#define PTR_STACK_SIZE 20
+
+
+/*************************************************
+*           Find end of substitute text          *
+*************************************************/
+
+/* In extended mode, we recognize ${name:+set text:unset text} and similar
+constructions. This requires the identification of unescaped : and }
+characters. This function scans for such. It must deal with nested ${
+constructions. The pointer to the text is updated, either to the required end 
+character, or to where an error was detected.
+
+Arguments:
+  code      points to the compiled expression (for options)
+  ptrptr    points to the pointer to the start of the text (updated)
+  ptrend    end of the whole string
+  last      TRUE if the last expected string (only } recognized)
+
+Returns:    0 on success
+            negative error code on failure
+*/
+
+static int
+find_text_end(const pcre2_code *code, PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend,
+  BOOL last)
+{
+int rc = 0;
+uint32_t nestlevel = 0;
+BOOL literal = FALSE;
+PCRE2_SPTR ptr = *ptrptr;
+
+for (; ptr < ptrend; ptr++)
+  {
+  if (literal)
+    {
+    if (ptr[0] == CHAR_BACKSLASH && ptr < ptrend - 1 && ptr[1] == CHAR_E)
+      {
+      literal = FALSE;
+      ptr += 1;
+      }
+    }
+
+  else if (*ptr == CHAR_RIGHT_CURLY_BRACKET)
+    {
+    if (nestlevel == 0) goto EXIT;
+    nestlevel--;
+    }
+
+  else if (*ptr == CHAR_COLON && !last && nestlevel == 0) goto EXIT;
+
+  else if (*ptr == CHAR_DOLLAR_SIGN)
+    {
+    if (ptr < ptrend - 1 && ptr[1] == CHAR_LEFT_CURLY_BRACKET)
+      {
+      nestlevel++;
+      ptr += 1;
+      }
+    }
+
+  else if (*ptr == CHAR_BACKSLASH)
+    {
+    int erc; 
+    int errorcode = 0;
+    uint32_t ch;
+
+    if (ptr < ptrend - 1) switch (ptr[1])
+      {
+      case CHAR_L:
+      case CHAR_l:
+      case CHAR_U:
+      case CHAR_u:
+      ptr += 1;
+      continue;
+      }
+
+    erc = PRIV(check_escape)(&ptr, ptrend, &ch, &errorcode,
+      code->overall_options, FALSE, NULL);
+    if (errorcode != 0)
+      {
+      rc = errorcode;
+      goto EXIT;
+      }
+
+    switch(erc)
+      {
+      case 0:      /* Data character */
+      case ESC_E:  /* Isolated \E is ignored */
+      break;
+
+      case ESC_Q:
+      literal = TRUE;
+      break;
+
+      default:
+      rc = PCRE2_ERROR_BADREPESCAPE;
+      goto EXIT;
+      }
+    }
+  }
+
+rc = PCRE2_ERROR_REPMISSINGBRACE;   /* Terminator not found */
+
+EXIT:
+*ptrptr = ptr;
+return rc;
+}
+
+

 /*************************************************
 *              Match and substitute              *
@ -80,13 +189,23 @@ pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
 {
 int rc;
 int subs;
+int forcecase = 0;
+int forcecasereset = 0;
 uint32_t ovector_count;
 uint32_t goptions = 0;
 BOOL match_data_created = FALSE;
 BOOL global = FALSE;
-PCRE2_SIZE buff_offset, lengthleft, fraglength;
+BOOL extended = FALSE;
+BOOL literal = FALSE;
+BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
+PCRE2_SPTR ptr;
+PCRE2_SPTR repend;
+PCRE2_SIZE buff_offset, buff_length, lengthleft, fraglength;
 PCRE2_SIZE *ovector;

+buff_length = *blength;
+*blength = PCRE2_UNSET;
+
 /* Partial matching is not valid. */

 if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
@ -109,8 +228,7 @@ ovector_count = pcre2_get_ovector_count(match_data);
 /* Check UTF replacement string if necessary. */

 #ifdef SUPPORT_UNICODE
-if ((code->overall_options & PCRE2_UTF) != 0 &&
-    (options & PCRE2_NO_UTF_CHECK) == 0)
+if (utf && (options & PCRE2_NO_UTF_CHECK) == 0)
  {
  rc = PRIV(valid_utf)(replacement, rlength, &(match_data->rightchar));
  if (rc != 0)
@ -121,8 +239,8 @@ if ((code->overall_options & PCRE2_UTF) != 0 &&
  }
 #endif  /* SUPPORT_UNICODE */

-/* Notice the global option and remove it from the options that are passed to
-pcre2_match(). */
+/* Notice the global and extended options and remove them from the options that
+are passed to pcre2_match(). */

 if ((options & PCRE2_SUBSTITUTE_GLOBAL) != 0)
  {
@ -130,24 +248,32 @@ if ((options & PCRE2_SUBSTITUTE_GLOBAL) != 0)
  global = TRUE;
  }

-/* Find lengths of zero-terminated strings. */
+if ((options & PCRE2_SUBSTITUTE_EXTENDED) != 0)
+  {
+  options &= ~PCRE2_SUBSTITUTE_EXTENDED;
+  extended = TRUE;
+  }
+
+/* Find lengths of zero-terminated strings and the end of the replacement. */

 if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
 if (rlength == PCRE2_ZERO_TERMINATED) rlength = PRIV(strlen)(replacement);
+repend = replacement + rlength;

 /* Copy up to the start offset */

-if (start_offset > *blength) goto NOROOM;
+if (start_offset > buff_length) goto NOROOM;
 memcpy(buffer, subject, start_offset * (PCRE2_CODE_UNIT_WIDTH/8));
 buff_offset = start_offset;
-lengthleft = *blength - start_offset;
+lengthleft = buff_length - start_offset;

 /* Loop for global substituting. */

 subs = 0;
 do
  {
-  PCRE2_SIZE i;
+  PCRE2_SPTR ptrstack[PTR_STACK_SIZE];
+  uint32_t ptrstackptr = 0;

  rc = pcre2_match(code, subject, length, start_offset, options|goptions,
    match_data, mcontext);
@ -199,19 +325,56 @@ do
  buff_offset += fraglength;
  lengthleft -= fraglength;

-  for (i = 0; i < rlength; i++)
+  /* Process the replacement string. Literal mode is set by \Q, but only in
+  extended mode when backslashes are being interpreted. In extended mode we
+  must handle nested substrings that are to be reprocessed. */
+
+  ptr = replacement;
+  for (;;)
    {
-    if (replacement[i] == CHAR_DOLLAR_SIGN)
+    uint32_t ch;
+
+    /* If at the end of a nested substring, pop the stack. */
+
+    if (ptr >= repend)
+      {
+      if (ptrstackptr <= 0) break;
+      repend = ptrstack[--ptrstackptr];
+      ptr = ptrstack[--ptrstackptr];
+      continue;
+      }
+
+    /* Handle the next character */
+
+    if (literal)
+      {
+      if (ptr[0] == CHAR_BACKSLASH && ptr < repend - 1 && ptr[1] == CHAR_E)
+        {
+        literal = FALSE;
+        ptr += 2;
+        continue;
+        }
+      goto LOADLITERAL;
+      }
+
+    /* Not in literal mode. */
+
+    if (*ptr == CHAR_DOLLAR_SIGN)
      {
      int group, n;
+      uint32_t special = 0;
      BOOL inparens;
      BOOL star;
      PCRE2_SIZE sublength;
+      PCRE2_SPTR text1_start = NULL;
+      PCRE2_SPTR text1_end = NULL;
+      PCRE2_SPTR text2_start = NULL;
+      PCRE2_SPTR text2_end = NULL;
      PCRE2_UCHAR next;
      PCRE2_UCHAR name[33];

-      if (++i == rlength) goto BAD;
-      if ((next = replacement[i]) == CHAR_DOLLAR_SIGN) goto LITERAL;
+      if (++ptr >= repend) goto BAD;
+      if ((next = *ptr) == CHAR_DOLLAR_SIGN) goto LOADLITERAL;

      group = -1;
      n = 0;
@ -220,24 +383,24 @@ do

      if (next == CHAR_LEFT_CURLY_BRACKET)
        {
-        if (++i == rlength) goto BAD;
-        next = replacement[i];
+        if (++ptr >= repend) goto BAD;
+        next = *ptr;
        inparens = TRUE;
        }

      if (next == CHAR_ASTERISK)
        {
-        if (++i == rlength) goto BAD;
-        next = replacement[i];
+        if (++ptr >= repend) goto BAD;
+        next = *ptr;
        star = TRUE;
        }

      if (!star && next >= CHAR_0 && next <= CHAR_9)
        {
        group = next - CHAR_0;
-        while (++i < rlength)
+        while (++ptr < repend)
          {
-          next = replacement[i];
+          next = *ptr;
          if (next < CHAR_0 || next > CHAR_9) break;
          group = group * 10 + next - CHAR_0;
          }
@ -249,18 +412,53 @@ do
          {
          name[n++] = next;
          if (n > 32) goto BAD;
-          if (i == rlength) break;
-          next = replacement[++i];
+          if (ptr >= repend) break;
+          next = *(++ptr);
          }
        if (n == 0) goto BAD;
        name[n] = 0;
        }

+      /* In extended mode we recognize ${name:+set text:unset text} and
+      ${name:-default text}. */
+
      if (inparens)
        {
-        if (i == rlength || next != CHAR_RIGHT_CURLY_BRACKET) goto BAD;
+        
+        if (extended && !star && ptr < repend - 2 && next == CHAR_COLON)
+          {
+          special = *(++ptr);
+          if (special != CHAR_PLUS && special != CHAR_MINUS)
+            {
+            rc = PCRE2_ERROR_BADSUBSTITUTION;
+            goto PTREXIT;
+            }
+
+          text1_start = ++ptr;
+          rc = find_text_end(code, &ptr, repend, special == CHAR_MINUS);
+          if (rc != 0) goto PTREXIT;
+          text1_end = ptr;
+
+          if (special == CHAR_PLUS && *ptr == CHAR_COLON)
+            {
+            text2_start = ++ptr;
+            rc = find_text_end(code, &ptr, repend, TRUE);
+            if (rc != 0) goto PTREXIT;
+            text2_end = ptr;
+            }
+          }
+
+        else
+          {
+          if (ptr >= repend || *ptr != CHAR_RIGHT_CURLY_BRACKET)
+            {
+            rc = PCRE2_ERROR_REPMISSINGBRACE;
+            goto PTREXIT;
+            }
+          }
+
+        ptr++;
        }
-      else i--;   /* Last code unit of name/number */

      /* Have found a syntactically correct group number or name, or
      *name. Only *MARK is currently recognized. */
@ -282,31 +480,242 @@ do
        else goto BAD;
        }

-      /* Substitute the contents of a group. */
+      /* Substitute the contents of a group. We don't use substring_copy
+      functions any more, in order to support case forcing. */

      else
        {
-        sublength = lengthleft;
-        if (group < 0)
-          rc = pcre2_substring_copy_byname(match_data, name,
-            buffer + buff_offset, &sublength);
-        else
-          rc = pcre2_substring_copy_bynumber(match_data, group,
-            buffer + buff_offset, &sublength);
-        if (rc < 0) goto EXIT;
+        PCRE2_SPTR subptr, subptrend;
+        
+        /* Find a number for a named group. In case there are duplicate names, 
+        search for the first one that is set. */

-        buff_offset += sublength;
-        lengthleft -= sublength;
+        if (group < 0)
+          {
+          PCRE2_SPTR first, last, entry;
+          rc = pcre2_substring_nametable_scan(code, name, &first, &last);
+          if (rc < 0) goto PTREXIT;
+          for (entry = first; entry <= last; entry += rc)
+            {
+            uint32_t ng = GET2(entry, 0);
+            if (ng < ovector_count)
+              {
+              if (group < 0) group = ng;          /* First in ovector */
+              if (ovector[ng*2] != PCRE2_UNSET) 
+                {
+                group = ng;                       /* First that is set */
+                break;
+                } 
+              }
+            }
+            
+          /* If group is still negative, it means we did not find a group that 
+          is in the ovector. Just set the first group. */
+          
+          if (group < 0) group = GET2(first, 0); 
+          }
+
+        rc = pcre2_substring_length_bynumber(match_data, group, &sublength);
+        if (rc < 0 && (special == 0 || rc != PCRE2_ERROR_UNSET)) goto PTREXIT;
+
+        /* If special is '+' we have a 'set' and possibly an 'unset' text,
+        both of which are reprocessed when used. If special is '-' we have a
+        default text for when the group is unset; it must be reprocessed. */
+
+        if (special != 0)
+          {
+          if (special == CHAR_MINUS)
+            {
+            if (rc == 0) goto LITERAL_SUBSTITUTE;
+            text2_start = text1_start;
+            text2_end = text1_end;
+            }
+
+          if (ptrstackptr >= PTR_STACK_SIZE) goto BAD;
+          ptrstack[ptrstackptr++] = ptr;
+          ptrstack[ptrstackptr++] = repend;
+
+          if (rc == 0)
+            {
+            ptr = text1_start;
+            repend = text1_end;
+            }
+          else
+            {
+            ptr = text2_start;
+            repend = text2_end;
+            }
+          continue;
+          }
+
+        /* Otherwise we have a literal substitution of a group's contents. */
+
+        LITERAL_SUBSTITUTE:
+        subptr = subject + ovector[group*2];
+        subptrend = subject + ovector[group*2 + 1];
+
+        /* Substitute a literal string, possibly forcing alphabetic case. */
+
+        while (subptr < subptrend)
+          {
+          GETCHARINCTEST(ch, subptr);
+          if (forcecase != 0)
+            {
+#ifdef SUPPORT_UNICODE
+            if (utf)
+              {
+              uint32_t type = UCD_CHARTYPE(ch);
+              if (PRIV(ucp_gentype)[type] == ucp_L &&
+                  type != ((forcecase > 0)? ucp_Lu : ucp_Ll))
+                ch = UCD_OTHERCASE(ch);
+              }
+            else
+#endif
+              {
+              if (((code->tables + cbits_offset +
+                  ((forcecase > 0)? cbit_upper:cbit_lower)
+                  )[ch/8] & (1 << (ch%8))) == 0)
+                ch = (code->tables + fcc_offset)[ch];
+              }
+            forcecase = forcecasereset;
+            }
+
+#ifdef SUPPORT_UNICODE
+          if (utf)
+            {
+            unsigned int chlen;
+#if PCRE2_CODE_UNIT_WIDTH == 8
+            if (lengthleft < 6) goto NOROOM;
+#elif PCRE2_CODE_UNIT_WIDTH == 16
+            if (lengthleft < 2) goto NOROOM;
+#else
+            if (lengthleft < 1) goto NOROOM;
+#endif
+            chlen = PRIV(ord2utf)(ch, buffer + buff_offset);
+            buff_offset += chlen;
+            lengthleft -= chlen;
+            }
+          else
+#endif
+            {
+            if (lengthleft-- < 1) goto NOROOM;
+            buffer[buff_offset++] = ch;
+            }
+          }
        }
      }

-   /* Handle a literal code unit */
+    /* Handle an escape sequence in extended mode. We can use check_escape()
+    to process \Q, \E, \c, \o, \x and \ followed by non-alphanumerics, but
+    the case-forcing escapes are not supported in pcre2_compile() so must be
+    recognized here. */

-   else
+    else if (extended && *ptr == CHAR_BACKSLASH)
      {
+      int errorcode = 0;
+
+      if (ptr < repend - 1) switch (ptr[1])
+        {
+        case CHAR_L:
+        forcecase = forcecasereset = -1;
+        ptr += 2;
+        continue;
+
+        case CHAR_l:
+        forcecase = -1;
+        forcecasereset = 0;
+        ptr += 2;
+        continue;
+
+        case CHAR_U:
+        forcecase = forcecasereset = 1;
+        ptr += 2;
+        continue;
+
+        case CHAR_u:
+        forcecase = 1;
+        forcecasereset = 0;
+        ptr += 2;
+        continue;
+
+        default:
+        break;
+        }
+
+      rc = PRIV(check_escape)(&ptr, repend, &ch, &errorcode,
+        code->overall_options, FALSE, NULL);
+      if (errorcode != 0) goto BADESCAPE;
+      ptr++;
+
+      switch(rc)
+        {
+        case ESC_E:
+        forcecase = forcecasereset = 0;
+        continue;
+
+        case ESC_Q:
+        literal = TRUE;
+        continue;
+
+        case 0:      /* Data character */
+        goto LITERAL;
+
+        default:
+        goto BADESCAPE;
+        }
+      }
+
+    /* Handle a literal code unit */
+
+    else
+      {
+      LOADLITERAL:
+      GETCHARINCTEST(ch, ptr);    /* Get character value, increment pointer */
+
      LITERAL:
-      if (lengthleft-- < 1) goto NOROOM;
-      buffer[buff_offset++] = replacement[i];
+      if (forcecase != 0)
+        {
+#ifdef SUPPORT_UNICODE
+        if (utf)
+          {
+          uint32_t type = UCD_CHARTYPE(ch);
+          if (PRIV(ucp_gentype)[type] == ucp_L &&
+              type != ((forcecase > 0)? ucp_Lu : ucp_Ll))
+            ch = UCD_OTHERCASE(ch);
+          }
+        else
+#endif
+          {
+          if (((code->tables + cbits_offset +
+              ((forcecase > 0)? cbit_upper:cbit_lower)
+              )[ch/8] & (1 << (ch%8))) == 0)
+            ch = (code->tables + fcc_offset)[ch];
+          }
+
+        forcecase = forcecasereset;
+        }
+
+#ifdef SUPPORT_UNICODE
+      if (utf)
+        {
+        unsigned int chlen;
+#if PCRE2_CODE_UNIT_WIDTH == 8
+        if (lengthleft < 6) goto NOROOM;
+#elif PCRE2_CODE_UNIT_WIDTH == 16
+        if (lengthleft < 2) goto NOROOM;
+#else
+        if (lengthleft < 1) goto NOROOM;
+#endif
+        chlen = PRIV(ord2utf)(ch, buffer + buff_offset);
+        buff_offset += chlen;
+        lengthleft -= chlen;
+        }
+      else
+#endif
+        {
+        if (lengthleft-- < 1) goto NOROOM;
+        buffer[buff_offset++] = ch;
+        }
      }
    }

@ -341,6 +750,13 @@ goto EXIT;

 BAD:
 rc = PCRE2_ERROR_BADREPLACEMENT;
+goto PTREXIT;
+
+BADESCAPE:
+rc = PCRE2_ERROR_BADREPESCAPE;
+
+PTREXIT:
+*blength = (PCRE2_SIZE)(ptr - replacement);
 goto EXIT;
 }

--- a/src/pcre2test.c
+++ b/src/pcre2test.c
@ -182,13 +182,13 @@ void vms_setsymbol( char *, char *, int );
 #define LOCALESIZE 32           /* Size of locale name */
 #define LOOPREPEAT 500000       /* Default loop count for timing */
 #define PATSTACKSIZE 20         /* Pattern stack for save/restore testing */
-#define REPLACE_MODSIZE 96      /* Field for reading 8-bit replacement */
+#define REPLACE_MODSIZE 100     /* Field for reading 8-bit replacement */
 #define VERSION_SIZE 64         /* Size of buffer for the version strings */

 /* Make sure the buffer into which replacement strings are copied is big enough
 to hold them as 32-bit code units. */

-#define REPLACE_BUFFSIZE (4*REPLACE_MODSIZE)
+#define REPLACE_BUFFSIZE 1024   /* This is a byte value */

 /* Execution modes */

@ -385,31 +385,32 @@ enum { MOD_CTC,    /* Applies to a compile context */
 /* Control bits. Some apply to compiling, some to matching, but some can be set
 either on a pattern or a data line, so they must all be distinct. */

-#define CTL_AFTERTEXT          0x00000001u
-#define CTL_ALLAFTERTEXT       0x00000002u
-#define CTL_ALLCAPTURES        0x00000004u
-#define CTL_ALLUSEDTEXT        0x00000008u
-#define CTL_ALTGLOBAL          0x00000010u
-#define CTL_BINCODE            0x00000020u
-#define CTL_CALLOUT_CAPTURE    0x00000040u
-#define CTL_CALLOUT_INFO       0x00000080u
-#define CTL_CALLOUT_NONE       0x00000100u
-#define CTL_DFA                0x00000200u
-#define CTL_FINDLIMITS         0x00000400u
-#define CTL_FULLBINCODE        0x00000800u
-#define CTL_GETALL             0x00001000u
-#define CTL_GLOBAL             0x00002000u
-#define CTL_HEXPAT             0x00004000u
-#define CTL_INFO               0x00008000u
-#define CTL_JITFAST            0x00010000u
-#define CTL_JITVERIFY          0x00020000u
-#define CTL_MARK               0x00040000u
-#define CTL_MEMORY             0x00080000u
-#define CTL_NULLCONTEXT        0x00100000u
-#define CTL_POSIX              0x00200000u
-#define CTL_PUSH               0x00400000u
-#define CTL_STARTCHAR          0x00800000u
-#define CTL_ZERO_TERMINATE     0x01000000u
+#define CTL_AFTERTEXT            0x00000001u
+#define CTL_ALLAFTERTEXT         0x00000002u
+#define CTL_ALLCAPTURES          0x00000004u
+#define CTL_ALLUSEDTEXT          0x00000008u
+#define CTL_ALTGLOBAL            0x00000010u
+#define CTL_BINCODE              0x00000020u
+#define CTL_CALLOUT_CAPTURE      0x00000040u
+#define CTL_CALLOUT_INFO         0x00000080u
+#define CTL_CALLOUT_NONE         0x00000100u
+#define CTL_DFA                  0x00000200u
+#define CTL_FINDLIMITS           0x00000400u
+#define CTL_FULLBINCODE          0x00000800u
+#define CTL_GETALL               0x00001000u
+#define CTL_GLOBAL               0x00002000u
+#define CTL_HEXPAT               0x00004000u
+#define CTL_INFO                 0x00008000u
+#define CTL_JITFAST              0x00010000u
+#define CTL_JITVERIFY            0x00020000u
+#define CTL_MARK                 0x00040000u
+#define CTL_MEMORY               0x00080000u
+#define CTL_NULLCONTEXT          0x00100000u
+#define CTL_POSIX                0x00200000u
+#define CTL_PUSH                 0x00400000u
+#define CTL_STARTCHAR            0x00800000u
+#define CTL_SUBSTITUTE_EXTENDED  0x01000000u
+#define CTL_ZERO_TERMINATE       0x02000000u

 #define CTL_BSR_SET          0x80000000u  /* This is informational */
 #define CTL_NL_SET           0x40000000u  /* This is informational */
@ -566,6 +567,7 @@ static modstruct modlist[] = {
  { "replace",             MOD_PND,  MOD_STR, REPLACE_MODSIZE,           PO(replacement) },
  { "stackguard",          MOD_PAT,  MOD_INT, 0,                         PO(stackguard_test) },
  { "startchar",           MOD_PND,  MOD_CTL, CTL_STARTCHAR,             PO(control) },
+  { "substitute_extended", MOD_PAT,  MOD_CTL, CTL_SUBSTITUTE_EXTENDED,   PO(control) },
  { "tables",              MOD_PAT,  MOD_INT, 0,                         PO(tables_id) },
  { "ucp",                 MOD_PATP, MOD_OPT, PCRE2_UCP,                 PO(options) },
  { "ungreedy",            MOD_PAT,  MOD_OPT, PCRE2_UNGREEDY,            PO(options) },
@ -3453,7 +3455,7 @@ Returns:      nothing
 static void
 show_controls(uint32_t controls, const char *before)
 {
-fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
+fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
  before,
  ((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
  ((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@ -3481,6 +3483,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
  ((controls & CTL_POSIX) != 0)? " posix" : "",
  ((controls & CTL_PUSH) != 0)? " push" : "",
  ((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
+  ((controls & CTL_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
  ((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : "");
 }

@ -5685,7 +5688,7 @@ if (dat_datctl.replacement[0] != 0)
  uint8_t *pr;
  uint8_t rbuffer[REPLACE_BUFFSIZE];
  uint8_t nbuffer[REPLACE_BUFFSIZE];
-  uint32_t goption;
+  uint32_t xoptions;
  PCRE2_SIZE rlen, nsize, erroroffset;
  BOOL badutf = FALSE;

@ -5702,8 +5705,11 @@ if (dat_datctl.replacement[0] != 0)
  if (timeitm)
    fprintf(outfile, "** Timing is not supported with replace: ignored\n");

-  goption = ((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
-    PCRE2_SUBSTITUTE_GLOBAL;
+  xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
+                PCRE2_SUBSTITUTE_GLOBAL) |
+             (((pat_patctl.control & CTL_SUBSTITUTE_EXTENDED) == 0)? 0 :
+                PCRE2_SUBSTITUTE_EXTENDED);     
+ 
  SETCASTPTR(r, rbuffer);  /* Sets r8, r16, or r32, as appropriate. */
  pr = dat_datctl.replacement;

@ -5790,12 +5796,15 @@ if (dat_datctl.replacement[0] != 0)
  else
    rlen = (CASTVAR(uint8_t *, r) - rbuffer)/code_unit_size;
  PCRE2_SUBSTITUTE(rc, compiled_code, pp, ulen, dat_datctl.offset,
-    dat_datctl.options|goption, match_data, dat_context,
+    dat_datctl.options|xoptions, match_data, dat_context,
    rbuffer, rlen, nbuffer, &nsize);

  if (rc < 0)
    {
-    fprintf(outfile, "Failed: error %d: ", rc);
+    fprintf(outfile, "Failed: error %d", rc);
+    if (nsize != PCRE2_UNSET)
+      fprintf(outfile, " at offset %ld in replacement", nsize);  
+    fprintf(outfile, ": ");
    PCRE2_GET_ERROR_MESSAGE(nsize, rc, pbuffer);
    PCHARSV(CASTVAR(void *, pbuffer), 0, nsize, FALSE, outfile);
    }
--- a/testdata/testinput18
+++ b/testdata/testinput18
@ -92,4 +92,6 @@

 "(?(?C)"

+/abcd/substitute_extended
+
 # End of testdata/testinput18
--- a/testdata/testinput2
+++ b/testdata/testinput2
@ -4539,4 +4539,55 @@ B)x/alt_verbnames,mark
    abcd\=null_context,find_limits
    abcd\=allusedtext,startchar 

+/abcd/replace=w\rx\x82y\o{333}z(\Q12\$34$$\x34\E5$$),substitute_extended
+    abcd
+    
+/a(bc)(DE)/replace=a\u$1\U$1\E$1\l$2\L$2\Eab\Uab\LYZ\EDone,substitute_extended
+    abcDE
+ 
+/abcd/replace=xy\kz,substitute_extended
+    abcd
+
+/a(?:(b)|(c))/substitute_extended,replace=X${1:+1:-1}X${2:+2:-2}
+    ab
+    ac
+    ab\=replace=${1:+$1\:$1:$2}
+    ac\=replace=${1:+$1\:$1:$2}
+
+/a(?:(b)|(c))/substitute_extended,replace=X${1:-1:-1}X${2:-2:-2}
+    ab
+    ac
+
+/(a)/substitute_extended,replace=>${1:+\Q$1:{}$$\E+\U$1}<
+    a
+
+/X(b)Y/substitute_extended
+    XbY\=replace=x${1:+$1\U$1}y
+    XbY\=replace=\Ux${1:+$1$1}y
+
+/a/substitute_extended,replace=${*MARK:+a:b}
+    a
+
+/(abcd)/replace=${1:+xy\kz},substitute_extended
+    abcd
+
+/abcd/substitute_extended,replace=>$1<
+    abcd
+
+/abcd/substitute_extended,replace=>xxx${xyz}<<<
+    abcd
+
+/(?J)(?:(?<A>a)|(?<A>b))/replace=<$A>
+    [a]
+    [b] 
+\= Expect error     
+    (a)\=ovector=1
+
+/(a)|(b)/replace=<$1>
+\= Expect error
+    b
+
+/(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1
+    aaBB
+
 # End of testinput2 
--- a/testdata/testinput5
+++ b/testdata/testinput5
@ -1678,9 +1678,16 @@
 /[\pS#moq]/
    =

-# UTF tests 
-
 /(*:a\x{12345}b\t(d\)c)xxx/utf,alt_verbnames,mark
    cxxxz

+/abcd/utf,replace=x\x{824}y\o{3333}z(\Q12\$34$$\x34\E5$$),substitute_extended
+    abcd
+
+/a(\x{e0}\x{101})(\x{c0}\x{102})/utf,replace=a\u$1\U$1\E$1\l$2\L$2\Eab\U\x{e0}\x{101}\L\x{d0}\x{160}\EDone,substitute_extended
+    a\x{e0}\x{101}\x{c0}\x{102}
+
+/((?<digit>\d)|(?<letter>\p{L}))/g,substitute_extended,replace=<${digit:+digit; :not digit; }${letter:+letter:not a letter}>
+    ab12cde
+
 # End of testinput5 
--- a/testdata/testoutput18
+++ b/testdata/testoutput18
@ -135,9 +135,12 @@ No match: POSIX code 17: match failed
 0+ issippi

 /abc/\
-Failed: POSIX code 9: bad escape sequence at offset 4     
+Failed: POSIX code 9: bad escape sequence at offset 3     

 "(?(?C)"
 Failed: POSIX code 3: pattern error at offset 2     

+/abcd/substitute_extended
+** Ignored with POSIX interface: substitute_extended
+
 # End of testdata/testinput18
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@ -946,10 +946,10 @@ Failed: error 125 at offset 6: lookbehind assertion is not fixed length
 Failed: error 104 at offset 7: numbers out of order in {} quantifier

 /abc/\
-Failed: error 101 at offset 4: \ at end of pattern
+Failed: error 101 at offset 3: \ at end of pattern

 /abc/\i
-Failed: error 101 at offset 4: \ at end of pattern
+Failed: error 101 at offset 3: \ at end of pattern

 /(a)bc(d)/I
 Capturing subpattern count = 2
@ -13546,27 +13546,27 @@ Failed: error 119 at offset 3: parentheses are too deeply nested

 /abc/replace=a$++
    123abc
-Failed: error -35: invalid replacement string
+Failed: error -35 at offset 2 in replacement: invalid replacement string

 /abc/replace=a$bad
    123abc
-Failed: error -49: unknown substring
+Failed: error -49 at offset 5 in replacement: unknown substring

 /abc/replace=a${A234567890123456789_123456789012}z
    123abc
-Failed: error -49: unknown substring
+Failed: error -49 at offset 36 in replacement: unknown substring

 /abc/replace=a${A23456789012345678901234567890123}z
    123abc
-Failed: error -35: invalid replacement string
+Failed: error -35 at offset 35 in replacement: invalid replacement string

 /abc/replace=a${bcd
    123abc
-Failed: error -35: invalid replacement string
+Failed: error -58 at offset 6 in replacement: expected closing curly bracket in replacement string

 /abc/replace=a${b+d}z
    123abc
-Failed: error -35: invalid replacement string
+Failed: error -58 at offset 4 in replacement: expected closing curly bracket in replacement string

 /abc/replace=[10]XYZ
    123abc123
@ -13632,19 +13632,19 @@ Failed: error -34: bad option value
    
 /(*:pear)apple/g,replace=${*MARKING} 
    apple lemon blackberry
-Failed: error -35: invalid replacement string
+Failed: error -35 at offset 11 in replacement: invalid replacement string

 /(*:pear)apple/g,replace=${*MARK-time
    apple lemon blackberry
-Failed: error -35: invalid replacement string
+Failed: error -58 at offset 7 in replacement: expected closing curly bracket in replacement string

 /(*:pear)apple/g,replace=${*mark} 
    apple lemon blackberry
-Failed: error -35: invalid replacement string
+Failed: error -35 at offset 8 in replacement: invalid replacement string

 /(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=<$*MARKET>
    apple lemon blackberry
-Failed: error -35: invalid replacement string
+Failed: error -35 at offset 9 in replacement: invalid replacement string

 /(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=[22]${*MARK}
    apple lemon blackberry
@ -14669,4 +14669,76 @@ Failed: error -56: offset limit set without PCRE2_USE_OFFSET_LIMIT
    abcd\=allusedtext,startchar 
 ** Not allowed together: allusedtext startchar

+/abcd/replace=w\rx\x82y\o{333}z(\Q12\$34$$\x34\E5$$),substitute_extended
+    abcd
+ 1: w\x0dx\x82y\xdbz(12\$34$$\x345$)
+    
+/a(bc)(DE)/replace=a\u$1\U$1\E$1\l$2\L$2\Eab\Uab\LYZ\EDone,substitute_extended
+    abcDE
+ 1: aBcBCbcdEdeabAByzDone
+ 
+/abcd/replace=xy\kz,substitute_extended
+    abcd
+Failed: error -57 at offset 4 in replacement: bad escape sequence in replacement string
+
+/a(?:(b)|(c))/substitute_extended,replace=X${1:+1:-1}X${2:+2:-2}
+    ab
+ 1: X1X-2
+    ac
+ 1: X-1X2
+    ab\=replace=${1:+$1\:$1:$2}
+ 1: b:b
+    ac\=replace=${1:+$1\:$1:$2}
+ 1: c
+
+/a(?:(b)|(c))/substitute_extended,replace=X${1:-1:-1}X${2:-2:-2}
+    ab
+ 1: XbX2:-2
+    ac
+ 1: X1:-1Xc
+
+/(a)/substitute_extended,replace=>${1:+\Q$1:{}$$\E+\U$1}<
+    a
+ 1: >$1:{}$$+A<
+
+/X(b)Y/substitute_extended
+    XbY\=replace=x${1:+$1\U$1}y
+ 1: xbBY
+    XbY\=replace=\Ux${1:+$1$1}y
+ 1: XBBY
+
+/a/substitute_extended,replace=${*MARK:+a:b}
+    a
+Failed: error -58 at offset 7 in replacement: expected closing curly bracket in replacement string
+
+/(abcd)/replace=${1:+xy\kz},substitute_extended
+    abcd
+Failed: error -57 at offset 8 in replacement: bad escape sequence in replacement string
+
+/abcd/substitute_extended,replace=>$1<
+    abcd
+Failed: error -49 at offset 3 in replacement: unknown substring
+
+/abcd/substitute_extended,replace=>xxx${xyz}<<<
+    abcd
+Failed: error -49 at offset 10 in replacement: unknown substring
+
+/(?J)(?:(?<A>a)|(?<A>b))/replace=<$A>
+    [a]
+ 1: [<a>]
+    [b] 
+ 1: [<b>]
+\= Expect error     
+    (a)\=ovector=1
+Failed: error -54 at offset 3 in replacement: requested value is not available
+
+/(a)|(b)/replace=<$1>
+\= Expect error
+    b
+Failed: error -55 at offset 3 in replacement: requested value is not set
+
+/(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1
+    aaBB
+ 1: AAbbaa..AAbBaa
+
 # End of testinput2 
--- a/testdata/testoutput5
+++ b/testdata/testoutput5
@ -4026,11 +4026,21 @@ No match
    =
 0: =

-# UTF tests 
-
 /(*:a\x{12345}b\t(d\)c)xxx/utf,alt_verbnames,mark
    cxxxz
 0: xxx
 MK: a\x{12345}b\x{09}(d)c

+/abcd/utf,replace=x\x{824}y\o{3333}z(\Q12\$34$$\x34\E5$$),substitute_extended
+    abcd
+ 1: x\x{824}y\x{6db}z(12\$34$$\x345$)
+
+/a(\x{e0}\x{101})(\x{c0}\x{102})/utf,replace=a\u$1\U$1\E$1\l$2\L$2\Eab\U\x{e0}\x{101}\L\x{d0}\x{160}\EDone,substitute_extended
+    a\x{e0}\x{101}\x{c0}\x{102}
+ 1: a\x{c0}\x{101}\x{c0}\x{100}\x{e0}\x{101}\x{e0}\x{102}\x{e0}\x{103}ab\x{c0}\x{100}\x{f0}\x{161}Done
+
+/((?<digit>\d)|(?<letter>\p{L}))/g,substitute_extended,replace=<${digit:+digit; :not digit; }${letter:+letter:not a letter}>
+    ab12cde
+ 7: <not digit; letter><not digit; letter><digit; not a letter><digit; not a letter><not digit; letter><not digit; letter><not digit; letter>
+
 # End of testinput5