Add support for (?^) as now supported by Perl.

2018-07-28 16:23:24 +00:00 · 2018-07-28 16:23:24 +00:00 · 6e245572b8
commit 6e245572b8
parent 27337495dc
15 changed files with 2281 additions and 2162 deletions
--- a/2
+++ b/2
@ -131,6 +131,8 @@ present.
 terminated by (*ACCEPT).
 29. Add support for \N{U+dddd}, but not in EBCDIC environments.
 30. Add support for (?^) for unsetting all imnsx options.
 Version 10.31 12-February-2018
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@ -1466,7 +1466,8 @@ character, even if newlines are coded as CRLF. Without this option, a dot does
 not match when the current position in the subject is at a newline. This option
 is equivalent to Perl's /s option, and it can be changed within a pattern by a
 (?s) option setting. A negative class such as [^a] always matches newline
-characters, independent of the setting of this option.
+characters, and the \N escape sequence always matches a non-newline character,
 independent of the setting of PCRE2_DOTALL.
 <pre>
  PCRE2_DUPNAMES
 </pre>
@ -3634,7 +3635,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 02 July 2018
+Last updated: 27 July 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>
--- a/doc/html/pcre2compat.html
+++ b/doc/html/pcre2compat.html
@ -42,13 +42,14 @@ assertion is a condition that has a matching branch (that is, the condition is
 false).
 </P>
 <P>
-4. The following Perl escape sequences are not supported: \l, \u, \L,
+4. The following Perl escape sequences are not supported: \F, \l, \L, \u, 
-\U, and \N when followed by a character name or Unicode value. (\N on its
+\U, and \N when followed by a character name. \N on its own, matching a
-own, matching a non-newline character, is supported.) In fact these are
+non-newline character, and \N{U+dd..}, matching a Unicode code point, are
 supported. The escapes that modify the case of following letters are
 implemented by Perl's general string-handling and are not part of its pattern
 matching engine. If any of these are encountered by PCRE2, an error is
-generated by default. However, if the PCRE2_ALT_BSUX option is set,
+generated by default. However, if the PCRE2_ALT_BSUX option is set, \U and \u
-\U and \u are interpreted as ECMAScript interprets them.
+are interpreted as ECMAScript interprets them.
 </P>
 <P>
 5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
@ -61,17 +62,22 @@ internal representation of Unicode characters, there is no need to implement
 the somewhat messy concept of surrogates."
 </P>
 <P>
-6. PCRE2 does support the \Q...\E escape for quoting substrings. Characters
+6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
-in between are treated as literals. This is slightly different from Perl in
+in between are treated as literals. However, this is slightly different from
-that $ and @ are also handled as literals inside the quotes. In Perl, they
+Perl in that $ and @ are also handled as literals inside the quotes. In Perl,
-cause variable interpolation (but of course PCRE2 does not have variables).
+they cause variable interpolation (but of course PCRE2 does not have
-Note the following examples:
+variables). Also, Perl does "double-quotish backslash interpolation" on any
 backslashes between \Q and \E which, its documentation says, "may lead to
 confusing results". PCRE2 treats a backslash between \Q and \E just like any
 other character. Note the following examples:
 <pre>
-    Pattern            PCRE2 matches      Perl matches
+    Pattern            PCRE2 matches     Perl matches
    \Qabc$xyz\E        abc$xyz           abc followed by the contents of $xyz
    \Qabc\$xyz\E       abc\$xyz          abc\$xyz
    \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
    \QA\B\E            A\B               A\B
    \Q\\E              \                 \\E
 </pre>
 The \Q...\E sequence is recognized both inside and outside character classes.
 </P>
@ -229,9 +235,9 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 18 April 2017
+Last updated: 28 July 2018
 <br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@ -357,13 +357,18 @@ of the pattern.
 If you want to remove the special meaning from a sequence of characters, you
 can do so by putting them between \Q and \E. This is different from Perl in
 that $ and @ are handled as literals in \Q...\E sequences in PCRE2, whereas
-in Perl, $ and @ cause variable interpolation. Note the following examples:
+in Perl, $ and @ cause variable interpolation. Also, Perl does "double-quotish 
 backslash interpolation" on any backslashes between \Q and \E which, its
 documentation says, "may lead to confusing results". PCRE2 treats a backslash
 between \Q and \E just like any other character. Note the following examples:
 <pre>
  Pattern            PCRE2 matches   Perl matches
  \Qabc$xyz\E        abc$xyz        abc followed by the contents of $xyz
  \Qabc\$xyz\E       abc\$xyz       abc\$xyz
  \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
  \QA\B\E            A\B            A\B
  \Q\\E              \              \\E
 </pre>
 The \Q...\E sequence is recognized both inside and outside character classes.
 An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
@ -545,7 +550,7 @@ character class, these sequences have different meanings.
 Unsupported escape sequences
 </b><br>
 <P>
-In Perl, the sequences \l, \L, \u, and \U are recognized by its string
+In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
 handler and used to modify the case of following characters. By default, PCRE2
 does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
 is set, \U matches a "U" character, and \u can be used to define a character
@ -1635,21 +1640,27 @@ Perl option letters enclosed between "(?" and ")". The option letters are
  xx for PCRE2_EXTENDED_MORE
 </pre>
 For example, (?im) sets caseless, multiline matching. It is also possible to
-unset these options by preceding the letter with a hyphen. The two "extended"
+unset these options by preceding the relevant letters with a hyphen, for 
-options are not independent; unsetting either one cancels the effects of both
+example (?-im). The two "extended" options are not independent; unsetting either
-of them.
+one cancels the effects of both of them.
 </P>
 <P>
 A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
 and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
-permitted. If a letter appears both before and after the hyphen, the option is
+permitted. Only one hyphen may appear in the options string. If a letter
-unset. An empty options setting "(?)" is allowed. Needless to say, it has no
+appears both before and after the hyphen, the option is unset. An empty options
-effect.
+setting "(?)" is allowed. Needless to say, it has no effect.
 </P>
 <P>
 If the first character following (? is a circumflex, it causes all of the above 
 options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow 
 the circumflex to cause some options to be re-instated, but a hyphen may not 
 appear.
 </P>
 <P>
 The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
 the same way as the Perl-compatible options by using the characters J and U
-respectively.
+respectively. However, these are not unset by (?^).
 </P>
 <P>
 When one of these option changes occurs at top level (that is, not inside
@ -3579,7 +3590,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 27 July 2018
+Last updated: 28 July 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@ -456,7 +456,15 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
  (?x)            extended: ignore white space except in classes
  (?xx)           as (?x) but also ignore space and tab in classes
  (?-...)         unset option(s)
  (?^)            unset imnsx options 
 </pre>
 Unsetting x or xx unsets both. Several options may be set at once, and a
 mixture of setting and unsetting such as (?i-x) is allowed, but there may be
 only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
 (?^in). An option setting may appear at the start of a non-capturing group, for 
 example (?i:...).
 </P>
 <P>
 The following are recognized only at the very start of a pattern or after one
 of the newline or \R options with similar syntax. More than one of them may
 appear. For the first three, d is a decimal number.
@ -624,7 +632,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 27 July 2018
+Last updated: 28 July 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@ -1639,19 +1639,24 @@ Perl option letters enclosed between "(?" and ")". The option letters are
  xx for PCRE2_EXTENDED_MORE
 .sp
 For example, (?im) sets caseless, multiline matching. It is also possible to
-unset these options by preceding the letter with a hyphen. The two "extended"
+unset these options by preceding the relevant letters with a hyphen, for 
-options are not independent; unsetting either one cancels the effects of both
+example (?-im). The two "extended" options are not independent; unsetting either
-of them.
+one cancels the effects of both of them.
 .P
 A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS
 and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
-permitted. If a letter appears both before and after the hyphen, the option is
+permitted. Only one hyphen may appear in the options string. If a letter
-unset. An empty options setting "(?)" is allowed. Needless to say, it has no
+appears both before and after the hyphen, the option is unset. An empty options
-effect.
+setting "(?)" is allowed. Needless to say, it has no effect.
 .P
 If the first character following (? is a circumflex, it causes all of the above 
 options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow 
 the circumflex to cause some options to be re-instated, but a hyphen may not 
 appear.
 .P
 The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
 the same way as the Perl-compatible options by using the characters J and U
-respectively.
+respectively. However, these are not unset by (?^).
 .P
 When one of these option changes occurs at top level (that is, not inside
 subpattern parentheses), the change applies to the remainder of the pattern
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "27 July 2018" "PCRE2 10.32"
+.TH PCRE2SYNTAX 3 "28 July 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -431,7 +431,14 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
  (?x)            extended: ignore white space except in classes
  (?xx)           as (?x) but also ignore space and tab in classes
  (?-...)         unset option(s)
  (?^)            unset imnsx options 
 .sp
 Unsetting x or xx unsets both. Several options may be set at once, and a
 mixture of setting and unsetting such as (?i-x) is allowed, but there may be
 only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
 (?^in). An option setting may appear at the start of a non-capturing group, for 
 example (?i:...).
 .P
 The following are recognized only at the very start of a pattern or after one
 of the newline or \eR options with similar syntax. More than one of them may
 appear. For the first three, d is a decimal number.
@ -612,6 +619,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 27 July 2018
+Last updated: 28 July 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi
--- a/src/pcre2.h.in
+++ b/src/pcre2.h.in
@ -317,6 +317,7 @@ pcre2_pattern_convert(). */
 #define PCRE2_ERROR_NO_SURROGATES_IN_UTF16         191
 #define PCRE2_ERROR_BAD_LITERAL_OPTIONS            192
 #define PCRE2_ERROR_NOT_SUPPORTED_IN_EBCDIC        193
 #define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS      194
 /* "Expected" matching error codes: no match and partial match. */
--- a/src/pcre2_compile.c
+++ b/src/pcre2_compile.c
@ -263,7 +263,7 @@ versions. */
 #define META_SKIP             0x802d0000u  /*         kept        */
 #define META_SKIP_ARG         0x802e0000u  /*           in        */
 #define META_THEN             0x802f0000u  /*             this    */
-#define META_THEN_ARG         0x80300000u  /*               order */  
+#define META_THEN_ARG         0x80300000u  /*               order */
 /* These must be kept in groups of adjacent 3 values, and all together. */
@ -330,7 +330,7 @@ static unsigned char meta_extra_lengths[] = {
  0,             /* META_ACCEPT */
  0,             /* META_FAIL */
  0,             /* META_COMMIT */
-  1,             /* META_COMMIT_ARG - plus the string length */ 
+  1,             /* META_COMMIT_ARG - plus the string length */
  0,             /* META_PRUNE */
  1,             /* META_PRUNE_ARG - plus the string length */
  0,             /* META_SKIP */
@ -612,7 +612,7 @@ static const int verbcount = sizeof(verbs)/sizeof(verbitem);
 /* Verb opcodes, indexed by their META code offset from META_MARK. */
 static const uint32_t verbops[] = {
-  OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE, 
+  OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE,
  OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };
 /* Offsets from OP_STAR for case-independent and negative repeat opcodes. */
@ -731,7 +731,7 @@ enum { ERR0 = COMPILE_ERROR_BASE,
       ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
       ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
       ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
-       ERR91, ERR92, ERR93 };
+       ERR91, ERR92, ERR93, ERR94 };
 /* This is a table of start-of-pattern options such as (*UTF) and settings such
 as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
@ -1441,41 +1441,41 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
    escape = -i;                    /* Else return a special escape */
    if (cb != NULL && (escape == ESC_P || escape == ESC_p || escape == ESC_X))
      cb->external_flags |= PCRE2_HASBKPORX;   /* Note \P, \p, or \X */
- 
+
    /* Perl supports \N{name} for character names and \N{U+dddd} for numerical
    Unicode code points, as well as plain \N for "not newline". PCRE does not
-    support \N{name}. However, it does support quantification such as \N{2,3}, 
+    support \N{name}. However, it does support quantification such as \N{2,3},
    so if \N{ is not followed by U+dddd we check for a quantifier. */
    if (escape == ESC_N && ptr < ptrend && *ptr == CHAR_LEFT_CURLY_BRACKET)
      {
      PCRE2_SPTR p = ptr + 1;
-      
+
-      /* \N{U+ can be handled by the \x{ code. However, this construction is 
+      /* \N{U+ can be handled by the \x{ code. However, this construction is
-      not valid in EBCDIC environments because it specifies a Unicode 
+      not valid in EBCDIC environments because it specifies a Unicode
-      character, not a codepoint in the local code. For example \N{U+0041} 
+      character, not a codepoint in the local code. For example \N{U+0041}
      must be "A" in all environments. */
-      
+
      if (ptrend - p > 1 && *p == CHAR_U && p[1] == CHAR_PLUS)
        {
 #ifdef EBCDIC
        *errorcodeptr = ERR93;
-#else        
+#else
        ptr = p + 1;
-        escape = 0;   /* Not a fancy escape after all */ 
+        escape = 0;   /* Not a fancy escape after all */
        goto COME_FROM_NU;
-#endif 
+#endif
-        }  
+        }
-        
+
-      /* Give an error if what follows is not a quantifier, but don't override 
+      /* Give an error if what follows is not a quantifier, but don't override
      an error set by the quantifier reader (e.g. number overflow). */
- 
+
      else
-        { 
+        {
        if (!read_repeat_counts(&p, ptrend, NULL, NULL, errorcodeptr) &&
             *errorcodeptr == 0)
          *errorcodeptr = ERR37;
-        }   
+        }
      }
    }
  }
@ -1762,9 +1762,9 @@ else
      {
      if (ptr < ptrend && *ptr == CHAR_LEFT_CURLY_BRACKET)
        {
-#ifndef EBCDIC         
+#ifndef EBCDIC
-        COME_FROM_NU: 
+        COME_FROM_NU:
-#endif         
+#endif
        if (++ptr >= ptrend || *ptr == CHAR_RIGHT_CURLY_BRACKET)
          {
          *errorcodeptr = ERR78;
@ -2495,15 +2495,15 @@ while (ptr < ptrend)
        goto FAILED;
        }
      *verblengthptr = (uint32_t)verbnamelength;
-      
+
      /* If this name was on a verb such as (*ACCEPT) which does not continue,
-      a (*MARK) was generated for the name. We now add the original verb as the 
+      a (*MARK) was generated for the name. We now add the original verb as the
-      next item. */  
+      next item. */
      if (add_after_mark != 0)
        {
        *parsed_pattern++ = add_after_mark;
-        add_after_mark = 0;   
+        add_after_mark = 0;
        }
      break;
@ -3498,22 +3498,22 @@ while (ptr < ptrend)
        if (*ptr++ == CHAR_COLON)   /* Skip past : or ) */
          {
          /* Some optional arguments can be treated as a preceding (*MARK) */
- 
+
          if (verbs[i].has_arg < 0)
            {
            add_after_mark = verbs[i].meta;
-            *parsed_pattern++ = META_MARK; 
+            *parsed_pattern++ = META_MARK;
            }
-            
+
          /* The remaining verbs with arguments (except *MARK) need a different
          opcode. */
-          
+
          else
-            {  
+            {
            *parsed_pattern++ = verbs[i].meta +
              ((verbs[i].meta != META_MARK)? 0x00010000u:0);
-            }   
+            }
-            
+
          /* Set up for reading the name in the main loop. */
          verblengthptr = parsed_pattern++;
@ -3576,17 +3576,37 @@ while (ptr < ptrend)
      else
        {
        BOOL hyphenok = TRUE;
        top_nest->reset_group = 0;
        top_nest->max_group = 0;
        set = unset = 0;
        optset = &set;
        /* ^ at the start unsets imnsx and disables the subsequent use of - */
        if (ptr < ptrend && *ptr == CHAR_CIRCUMFLEX_ACCENT)
          {
          options &= ~(PCRE2_CASELESS|PCRE2_MULTILINE|PCRE2_NO_AUTO_CAPTURE|
                       PCRE2_DOTALL|PCRE2_EXTENDED|PCRE2_EXTENDED_MORE);
          hyphenok = FALSE;
          ptr++; 
          }
        while (ptr < ptrend && *ptr != CHAR_RIGHT_PARENTHESIS &&
                               *ptr != CHAR_COLON)
          {
          switch (*ptr++)
            {
-            case CHAR_MINUS: optset = &unset; break;
+            case CHAR_MINUS:
            if (!hyphenok)
              {
              errorcode = ERR94;
              ptr--;  /* Correct the offset */
              goto FAILED;
              }
            optset = &unset;
            hyphenok = FALSE; 
            break;
            case CHAR_J:  /* Record that it changed in the external options */
            *optset |= PCRE2_DUPNAMES;
@ -3644,9 +3664,10 @@ while (ptr < ptrend)
          }
        else *parsed_pattern++ = META_NOCAPTURE;
-        /* If nothing changed, no need to record. */
+        /* If nothing changed, no need to record. The check of hyphenok catches 
        the (?^) case. */
-        if (set != 0 || unset != 0)
+        if (set != 0 || unset != 0 || !hyphenok)
          {
          *parsed_pattern++ = META_OPTIONS;
          *parsed_pattern++ = options;
@ -3952,7 +3973,7 @@ while (ptr < ptrend)
          {
          if (++ptr >= ptrend || !IS_DIGIT(*ptr)) goto BAD_VERSION_CONDITION;
          minor = (*ptr++ - CHAR_0) * 10;
-          if (IS_DIGIT(*ptr)) minor += *ptr++ - CHAR_0; 
+          if (IS_DIGIT(*ptr)) minor += *ptr++ - CHAR_0;
          if (ptr >= ptrend || *ptr != CHAR_RIGHT_PARENTHESIS)
            goto BAD_VERSION_CONDITION;
          }
@ -5709,7 +5730,7 @@ for (;; pptr++)
    cb->had_pruneorskip = TRUE;
    /* Fall through */
    case META_MARK:
-    case META_COMMIT_ARG: 
+    case META_COMMIT_ARG:
    VERB_ARG:
    *code++ = verbops[(meta - META_MARK) >> 16];
    /* The length is in characters. */
@ -8058,7 +8079,7 @@ for (;;)
      break;
      case OP_MARK:
-      case OP_COMMIT_ARG: 
+      case OP_COMMIT_ARG:
      case OP_PRUNE_ARG:
      case OP_SKIP_ARG:
      case OP_THEN_ARG:
@ -8367,7 +8388,7 @@ for (;; pptr++)
    break;
    case META_MARK:     /* Add the length of the name. */
-    case META_COMMIT_ARG: 
+    case META_COMMIT_ARG:
    case META_PRUNE_ARG:
    case META_SKIP_ARG:
    case META_THEN_ARG:
@ -8558,7 +8579,7 @@ for (;; pptr++)
    goto EXIT;
    case META_MARK:
-    case META_COMMIT_ARG: 
+    case META_COMMIT_ARG:
    case META_PRUNE_ARG:
    case META_SKIP_ARG:
    case META_THEN_ARG:
@ -8630,31 +8651,31 @@ for (;; pptr++)
    case META_LOOKAHEADNOT:
    pptr = parsed_skip(pptr + 1, PSKIP_KET);
    if (pptr == NULL) goto PARSED_SKIP_FAILED;
-    
+
    /* Also ignore any qualifiers that follow a lookahead assertion. */
-    
+
    switch (pptr[1])
      {
      case META_ASTERISK:
      case META_ASTERISK_PLUS:
-      case META_ASTERISK_QUERY:   
+      case META_ASTERISK_QUERY:
      case META_PLUS:
-      case META_PLUS_PLUS: 
+      case META_PLUS_PLUS:
      case META_PLUS_QUERY:
      case META_QUERY:
      case META_QUERY_PLUS:
-      case META_QUERY_QUERY:       
+      case META_QUERY_QUERY:
      pptr++;
      break;
-      
+
      case META_MINMAX:
      case META_MINMAX_PLUS:
      case META_MINMAX_QUERY:
      pptr += 3;
      break;
-      
+
      default:
-      break;      
+      break;
      }
    break;
@ -9026,7 +9047,7 @@ for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++)
    break;
    case META_MARK:
-    case META_COMMIT_ARG: 
+    case META_COMMIT_ARG:
    case META_PRUNE_ARG:
    case META_SKIP_ARG:
    case META_THEN_ARG:
--- a/src/pcre2_error.c
+++ b/src/pcre2_error.c
@ -179,7 +179,8 @@ static const unsigned char compile_error_texts[] =
  "internal error: bad code value in parsed_skip()\0"
  "PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode\0"
  "invalid option bits with PCRE2_LITERAL\0"
-  "\\N{U+dddd} is not supported in EBCDIC mode\0" 
+  "\\N{U+dddd} is not supported in EBCDIC mode\0"
  "invalid hyphen in option setting\0"
  ;
 /* Match-time and UTF error texts are in the same format. */
--- a/testdata/testinput1
+++ b/testdata/testinput1
@ -6252,4 +6252,10 @@ ef) x/x,mark
 /(*COMMIT:]w)/
 /(?i)A(?^)B(?^x:C D)(?^i)e f/
    aBCDE F
 \= Expect no match
    aBCDEF
    AbCDe f
 # End of testinput1 
--- a/testdata/testinput2
+++ b/testdata/testinput2
@ -5453,4 +5453,10 @@ a)"xI
 \= Expect no match
    axy
 /(?^x-i)AB/ 
 /(?^-i)AB/ 
 /(?x-i-i)/
 # End of testinput2
--- a/testdata/testoutput1
+++ b/testdata/testoutput1
@ -9912,4 +9912,13 @@ No match, mark = X
 /(*COMMIT:]w)/
 /(?i)A(?^)B(?^x:C D)(?^i)e f/
    aBCDE F
 0: aBCDE F
 \= Expect no match
    aBCDEF
 No match
    AbCDe f
 No match
 # End of testinput1 
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@ -16622,6 +16622,15 @@ No match, mark = X
    axy
 No match, mark = X
 /(?^x-i)AB/ 
 Failed: error 194 at offset 4: invalid hyphen in option setting
 /(?^-i)AB/ 
 Failed: error 194 at offset 3: invalid hyphen in option setting
 /(?x-i-i)/
 Failed: error 194 at offset 5: invalid hyphen in option setting
 # End of testinput2
 Error -70: PCRE2_ERROR_BADDATA (unknown error number)
 Error -62: bad serialized data