Implement PCRE2_EXTRA_ALT_BSUX to support ECMAscript 6's \u{hhh..} syntax.

2019-02-12 17:50:19 +00:00 · 2019-02-12 17:50:19 +00:00 · 8c8deae8eb
parent d90de8b053
commit 8c8deae8eb
26 changed files with 1310 additions and 1112 deletions
--- a/3
+++ b/3
@ -125,6 +125,9 @@ processing or a crash could result.
 names, as Perl does. There was a small bug in this new code, found by
 ClusterFuzz 12950, fixed before release.
 31. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh} 
 construct.
 Version 10.32 10-September-2018
 -------------------------------
--- a/doc/html/pcre2_compile.html
+++ b/doc/html/pcre2_compile.html
@ -86,7 +86,12 @@ PCRE2 must be built with Unicode support (the default) in order to use
 PCRE2_UTF, PCRE2_UCP and related options.
 </P>
 <P>
-The yield of the function is a pointer to a private data structure that
+Additional options may be set in the compile context via the
 <a href="pcre2_set_compile_extra_options.html"><b>pcre2_set_compile_extra_options</b></a>
 function.
 </P>
 <P>
 The yield of this function is a pointer to a private data structure that
 contains the compiled pattern, or NULL if an error was detected.
 </P>
 <P>
--- a/doc/html/pcre2_set_compile_extra_options.html
+++ b/doc/html/pcre2_set_compile_extra_options.html
@ -20,7 +20,7 @@ SYNOPSIS
 </P>
 <P>
 <b>int pcre2_set_compile_extra_options(pcre2_compile_context *<i>ccontext</i>,</b>
-<b>  PCRE2_SIZE <i>extra_options</i>);</b>
+<b>  uint32_t <i>extra_options</i>);</b>
 </P>
 <br><b>
 DESCRIPTION
@ -31,6 +31,7 @@ housed in a compile context. It completely replaces all the bits. The extra
 options are:
 <pre>
  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
  PCRE2_EXTRA_ALT_BSUX                 Extended alternate \u, \U, and \x handling 
  PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL    Treat all invalid escapes as a literal following character
  PCRE2_EXTRA_ESCAPED_CR_IS_LF         Interpret \r as \n
  PCRE2_EXTRA_MATCH_LINE               Pattern matches whole lines
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@ -1298,7 +1298,7 @@ are needed. The <b>pcre2_code_copy_with_tables()</b> provides this facility.
 Copies of both the code and the tables are made, with the new code pointing to
 the new tables. The memory for the new tables is automatically freed when
 <b>pcre2_code_free()</b> is called for the new copy of the compiled code. If
-<b>pcre2_code_copy_withy_tables()</b> is called with a NULL argument, it returns
+<b>pcre2_code_copy_with_tables()</b> is called with a NULL argument, it returns
 NULL.
 </P>
 <P>
@ -1315,7 +1315,7 @@ PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
 </P>
 <P>
 The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
-settings that affect the compilation. It should be zero if no options are
+settings that affect the compilation. It should be zero if none of them are
 required. The available options are described below. Some of them (in
 particular, those that are compatible with Perl, but some others as well) can
 also be set and unset from within the pattern (see the detailed description in
@ -1330,8 +1330,9 @@ compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK
 options can be set at the time of matching as well as at compile time.
 </P>
 <P>
-Other, less frequently required compile-time parameters (for example, the
+Some additional options and less frequently required compile-time parameters
-newline setting) can be provided in a compile context (as described
+(for example, the newline setting) can be provided in a compile context (as
 described
 <a href="#compilecontext">above).</a>
 </P>
 <P>
@ -1384,7 +1385,13 @@ This code fragment shows a typical straightforward call to
    &errorcode,             /* for error code */
    &erroffset,             /* for error offset */
    NULL);                  /* no compile context */
-</pre>
+
 </PRE>
 </P>
 <br><b>
 Main compile options
 </b><br>
 <P>
 The following names for option bits are defined in the <b>pcre2.h</b> header
 file:
 <pre>
@ -1424,6 +1431,14 @@ hexadecimal digits, in which case the hexadecimal number defines the code point
 to match. By default, as in Perl, a hexadecimal number is always expected after
 \x, but it may have zero, one, or two digits (so, for example, \xz matches a
 binary zero character followed by z).
 </P>
 <P>
 ECMAscript 6 added additional functionality to \u. This can be accessed using
 the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options"
 <a href="#extracompileoptions">below).</a>
 Note that this alternative escape handling applies only to patterns. Neither of 
 these options affects the processing of replacement strings passed to 
 <b>pcre2_substitute()</b>.
 <pre>
  PCRE2_ALT_CIRCUMFLEX
 </pre>
@ -1830,9 +1845,8 @@ characters with code points greater than 127.
 Extra compile options
 </b><br>
 <P>
-Unlike the main compile-time options, the extra options are not saved with the
+The option bits that can be set in a compile context by calling the
-compiled pattern. The option bits that can be set in a compile context by
+<b>pcre2_set_compile_extra_options()</b> function are as follows:
 calling the <b>pcre2_set_compile_extra_options()</b> function are as follows:
 <pre>
  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
 </pre>
@ -1857,6 +1871,14 @@ If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
 point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
 incorporated in the compiled pattern. However, they can only match subject
 characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
 <pre>
  PCRE2_EXTRA_ALT_BSUX
 </pre>
 The original option PCRE2_ALT_BSUX causes PCRE2 to process \U, \u, and \x in 
 the way that ECMAscript (aka JavaScript) does. Additional functionality was 
 defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of 
 PCRE2_ALT_BSUX, but in addition it recognizes \u{hhh..} as a hexadecimal 
 character code, where hhh.. is any number of hexadecimal digits.
 <pre>
  PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
 </pre>
@ -3382,7 +3404,8 @@ capture groups and letters within \Q...\E quoted sequences.
 <P>
 Note that case forcing sequences such as \U...\E do not nest. For example,
 the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final \E has no
-effect.
+effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do 
 not apply to not apply to replacement strings.
 </P>
 <P>
 The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
@ -3784,7 +3807,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 04 February 2019
+Last updated: 12 February 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/html/pcre2compat.html
+++ b/doc/html/pcre2compat.html
@ -47,8 +47,9 @@ non-newline character, and \N{U+dd..}, matching a Unicode code point, are
 supported. The escapes that modify the case of following letters are
 implemented by Perl's general string-handling and are not part of its pattern
 matching engine. If any of these are encountered by PCRE2, an error is
-generated by default. However, if the PCRE2_ALT_BSUX option is set, \U and \u
+generated by default. However, if either of the PCRE2_ALT_BSUX or
-are interpreted as ECMAScript interprets them.
+PCRE2_EXTRA_ALT_BSUX options is set, \U and \u are interpreted as ECMAScript
 interprets them.
 </P>
 <P>
 5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
@ -233,7 +234,7 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 03 February 2019
+Last updated: 12 February 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@ -399,12 +399,33 @@ environment, these escapes are as follows:
  \xhh        character with hex code hh
  \x{hhh..}   character with hex code hhh..
  \N{U+hhh..} character with Unicode hex code point hhh..
  \uhhhh      character with hex code hhhh (when PCRE2_ALT_BSUX is set)
 </pre>
-There are some legacy applications where the escape sequence \r is expected to
+By default, after \x that is not followed by {, from zero to two hexadecimal
-match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
+digits are read (letters can be in upper or lower case). Any number of
-pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
+hexadecimal digits may appear between \x{ and }. If a character other than a
-(carriage return) character.
+hexadecimal digit appears between \x{ and }, or if there is no terminating },
 an error occurs.
 </P>
 <P>
 Characters whose code points are less than 256 can be defined by either of the
 two syntaxes for \x or by an octal sequence. There is no difference in the way
 they are handled. For example, \xdc is exactly the same as \x{dc} or \334.
 However, using the braced versions does make such sequences easier to read.
 </P>
 <P>
 Support is available for some ECMAScript (aka JavaScript) escape sequences via
 two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \x followed
 by { is not recognized. Only if \x is followed by two hexadecimal digits is it
 recognized as a character escape. Otherwise it is interpreted as a literal "x"
 character. In this mode, support for code points greater than 256 is provided
 by \u, which must be followed by four hexadecimal digits; otherwise it is 
 interpreted as a literal "u" character.
 </P>
 <P>
 PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition,
 \u{hhh..} is recognized as the character specified by hexadecimal code point.
 There may be any number of hexadecimal digits. This syntax is from ECMAScript 
 6.
 </P>
 <P>
 The \N{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
@ -414,6 +435,12 @@ Note that when \N is not followed by an opening brace (curly bracket) it has
 an entirely different meaning, matching any character that is not a newline.
 </P>
 <P>
 There are some legacy applications where the escape sequence \r is expected to
 match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
 pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
 (carriage return) character.
 </P>
 <P>
 The precise effect of \cx on ASCII characters is as follows: if x is a lower
 case letter, it is converted to upper case. Then bit 6 of the character (hex
 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
@ -500,28 +527,6 @@ Note that octal values of 100 or greater that are specified using this syntax
 must not be introduced by a leading zero, because no more than three octal
 digits are ever read.
 </P>
 <P>
 By default, after \x that is not followed by {, from zero to two hexadecimal
 digits are read (letters can be in upper or lower case). Any number of
 hexadecimal digits may appear between \x{ and }. If a character other than
 a hexadecimal digit appears between \x{ and }, or if there is no terminating
 }, an error occurs.
 </P>
 <P>
 If the PCRE2_ALT_BSUX option is set, the interpretation of \x is as just
 described only when it is followed by two hexadecimal digits. Otherwise, it
 matches a literal "x" character. In this mode, support for code points greater
 than 256 is provided by \u, which must be followed by four hexadecimal digits;
 otherwise it matches a literal "u" character. This syntax makes PCRE2 behave 
 like ECMAscript (aka JavaScript). Code points greater than 0xFFFF are not
 supported.
 </P>
 <P>
 Characters whose value is less than 256 can be defined by either of the two
 syntaxes for \x (or by \u in PCRE2_ALT_BSUX mode). There is no difference in
 the way they are handled. For example, \xdc is exactly the same as \x{dc} (or
 \u00dc in PCRE2_ALT_BSUX mode).
 </P>
 <br><b>
 Constraints on character values
 </b><br>
@ -560,9 +565,10 @@ Unsupported escape sequences
 <P>
 In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
 handler and used to modify the case of following characters. By default, PCRE2
-does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
+does not support these escape sequences in patterns. However, if either of the
-is set, \U matches a "U" character, and \u can be used to define a character
+PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \U matches a "U"
-by code point, as described above.
+character, and \u can be used to define a character by code point, as
 described above.
 </P>
 <br><b>
 Absolute and relative backreferences
@ -3721,7 +3727,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC31" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 04 February 2019
+Last updated: 12 February 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@ -58,7 +58,8 @@ documentation. This document contains a quick-reference summary of the syntax.
 </P>
 <br><a name="SEC3" href="#TOC1">ESCAPED CHARACTERS</a><br>
 <P>
-This table applies to ASCII and Unicode environments.
+This table applies to ASCII and Unicode environments. An unrecognized escape 
 sequence causes an error.
 <pre>
  \a         alarm, that is, the BEL character (hex 07)
  \cx        "control-x", where x is any ASCII printing character
@ -70,12 +71,25 @@ This table applies to ASCII and Unicode environments.
  \0dd       character with octal code 0dd
  \ddd       character with octal code ddd, or backreference
  \o{ddd..}  character with octal code ddd..
  \U         "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
  \N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
  \uhhhh     character with hex code hhhh (if PCRE2_ALT_BSUX is set)
  \xhh       character with hex code hh
  \x{hh..}   character with hex code hh..
 </pre>
 If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
 following are also recognized:
 <pre>
  \U         the character "U"
  \uhhhh     character with hex code hhhh
  \u{hh..}   character with hex code hh.. but only for EXTRA_ALT_BSUX
 </pre>
 When \x is not followed by {, from zero to two hexadecimal digits are read,
 but in ALT_BSUX mode \x must be followed by two hexadecimal digits to be
 recognized as a hexadecimal escape; otherwise it matches a literal "x".
 Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits 
 or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
 matches a literal "u".
 </P>
 <P>
 Note that \0dd is always an octal code. The treatment of backslash followed by
 a non-zero digit is complicated; for details see the section
 <a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
@ -86,13 +100,6 @@ also given. \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not
 supported in EBCDIC environments. Note that \N not followed by an opening
 curly bracket has a different meaning (see below).
 </P>
 <P>
 When \x is not followed by {, from zero to two hexadecimal digits are read,
 but if PCRE2_ALT_BSUX is set, \x must be followed by two hexadecimal digits to
 be recognized as a hexadecimal escape; otherwise it matches a literal "x".
 Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits,
 it matches a literal "u".
 </P>
 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
 <P>
 <pre>
@ -660,7 +667,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 03 February 2019
+Last updated: 11 February 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/html/pcre2test.html
+++ b/doc/html/pcre2test.html
@ -609,6 +609,7 @@ for a description of the effects of these options.
      escaped_cr_is_lf          set PCRE2_EXTRA_ESCAPED_CR_IS_LF 
  /x  extended                  set PCRE2_EXTENDED
  /xx extended_more             set PCRE2_EXTENDED_MORE
      extra_alt_bsux            set PCRE2_EXTRA_ALT_BSUX 
      firstline                 set PCRE2_FIRSTLINE
      literal                   set PCRE2_LITERAL
      match_line                set PCRE2_EXTRA_MATCH_LINE
@ -2075,7 +2076,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 03 February 2019
+Last updated: 11 February 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
@ -1296,7 +1296,7 @@ COMPILING A PATTERN
       Copies of both the code and the tables are  made,  with  the  new  code
       pointing  to the new tables. The memory for the new tables is automati-
       cally freed when pcre2_code_free() is called for the new  copy  of  the
-       compiled  code. If pcre2_code_copy_withy_tables() is called with a NULL
+       compiled  code.  If pcre2_code_copy_with_tables() is called with a NULL
       argument, it returns NULL.
       NOTE: When one of the matching functions is  called,  pointers  to  the
@ -1310,7 +1310,7 @@ COMPILING A PATTERN
       below.
       The options argument for pcre2_compile() contains various bit  settings
-       that  affect  the  compilation.  It  should  be  zero if no options are
+       that  affect  the  compilation.  It  should be zero if none of them are
       required. The available options are described below. Some of  them  (in
       particular,  those  that  are  compatible with Perl, but some others as
       well) can also be set and  unset  from  within  the  pattern  (see  the
@ -1322,9 +1322,9 @@ COMPILING A PATTERN
       PCRE2_NO_UTF_CHECK options can be set at the time of matching  as  well
       as at compile time.
-       Other,  less  frequently required compile-time parameters (for example,
+       Some  additional  options  and  less  frequently  required compile-time
-       the newline setting) can be provided in a compile context (as described
+       parameters (for example, the newline setting) can be provided in a com-
-       above).
+       pile context (as described above).
       If errorcode or erroroffset is NULL, pcre2_compile() returns NULL imme-
       diately. Otherwise, the variables to which these point are  set  to  an
@ -1371,6 +1371,9 @@ COMPILING A PATTERN
           &erroffset,             /* for error offset */
           NULL);                  /* no compile context */
   Main compile options
       The  following  names for option bits are defined in the pcre2.h header
       file:
@ -1409,6 +1412,12 @@ COMPILING A PATTERN
       always expected after \x, but it may have zero, one, or two digits (so,
       for example, \xz matches a binary zero character followed by z).
       ECMAscript 6 added additional functionality to \u. This can be accessed
       using  the  PCRE2_EXTRA_ALT_BSUX  extra  option  (see  "Extra   compile
       options"  below).   Note  that this alternative escape handling applies
       only to patterns. Neither of these options affects  the  processing  of
       replacement strings passed to pcre2_substitute().
         PCRE2_ALT_CIRCUMFLEX
       In  multiline  mode  (when  PCRE2_MULTILINE  is  set),  the  circumflex
@ -1804,10 +1813,8 @@ COMPILING A PATTERN
   Extra compile options
-       Unlike  the  main compile-time options, the extra options are not saved
+       The  option  bits  that  can be set in a compile context by calling the
-       with the compiled pattern. The option bits that can be set in a compile
+       pcre2_set_compile_extra_options() function are as follows:
       context  by  calling the pcre2_set_compile_extra_options() function are
       as follows:
         PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
@ -1834,6 +1841,15 @@ COMPILING A PATTERN
       only match subject characters if the matching function is  called  with
       PCRE2_NO_UTF_CHECK set.
         PCRE2_EXTRA_ALT_BSUX
       The  original option PCRE2_ALT_BSUX causes PCRE2 to process \U, \u, and
       \x in the way that ECMAscript (aka JavaScript) does.  Additional  func-
       tionality was defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has
       the effect of PCRE2_ALT_BSUX, but in addition it  recognizes  \u{hhh..}
       as a hexadecimal character code, where hhh.. is any number of hexadeci-
       mal digits.
         PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
       This is a dangerous option. Use with care. By default, an  unrecognized
@ -3288,7 +3304,9 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
       Note that case forcing sequences such as \U...\E do not nest. For exam-
       ple,  the  result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
-       \E has no effect.
+       \E  has  no   effect.   Note   also   that   the   PCRE2_ALT_BSUX   and
       PCRE2_EXTRA_ALT_BSUX  options  do not apply to not apply to replacement
       strings.
       The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to  add  more
       flexibility  to  capture  group  substitution. The syntax is similar to
@ -3659,7 +3677,7 @@ AUTHOR
 REVISION
-       Last updated: 04 February 2019
+       Last updated: 12 February 2019
       Copyright (c) 1997-2019 University of Cambridge.
 ------------------------------------------------------------------------------
@ -4701,8 +4719,9 @@ DIFFERENCES BETWEEN PCRE2 AND PERL
       point,  are  supported.  The  escapes that modify the case of following
       letters are implemented by Perl's general string-handling and  are  not
       part of its pattern matching engine. If any of these are encountered by
-       PCRE2, an error is generated by default. However, if the PCRE2_ALT_BSUX
+       PCRE2, an error is generated by default.  However,  if  either  of  the
-       option is set, \U and \u are interpreted as ECMAScript interprets them.
+       PCRE2_ALT_BSUX  or  PCRE2_EXTRA_ALT_BSUX  options is set, \U and \u are
       interpreted as ECMAScript interprets them.
       5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
       is built with Unicode support (the default). The properties that can be
@ -4864,7 +4883,7 @@ AUTHOR
 REVISION
-       Last updated: 03 February 2019
+       Last updated: 12 February 2019
       Copyright (c) 1997-2019 University of Cambridge.
 ------------------------------------------------------------------------------
@ -6333,12 +6352,32 @@ BACKSLASH
         \xhh        character with hex code hh
         \x{hhh..}   character with hex code hhh..
         \N{U+hhh..} character with Unicode hex code point hhh..
         \uhhhh      character with hex code hhhh (when PCRE2_ALT_BSUX is set)
-       There  are  some  legacy  applications  where the escape sequence \r is
+       By  default, after \x that is not followed by {, from zero to two hexa-
-       expected to match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option
+       decimal digits are read (letters can be in upper or  lower  case).  Any
-       is  set,  \r  in  a  pattern is converted to \n so that it matches a LF
+       number of hexadecimal digits may appear between \x{ and }. If a charac-
-       (linefeed) instead of a CR (carriage return) character.
+       ter other than a hexadecimal digit appears between \x{  and  },  or  if
       there is no terminating }, an error occurs.
       Characters whose code points are less than 256 can be defined by either
       of the two syntaxes for \x or by an octal sequence. There is no differ-
       ence in the way they are handled. For example, \xdc is exactly the same
       as \x{dc} or \334.  However, using the braced versions does  make  such
       sequences easier to read.
       Support  is  available  for  some  ECMAScript  (aka  JavaScript) escape
       sequences via two compile-time options. If PCRE2_ALT_BSUX is  set,  the
       sequence  \x followed by { is not recognized. Only if \x is followed by
       two hexadecimal digits is it recognized as a character  escape.  Other-
       wise  it  is interpreted as a literal "x" character. In this mode, sup-
       port for code points greater than 256 is provided by \u, which must  be
       followed  by  four hexadecimal digits; otherwise it is interpreted as a
       literal "u" character.
       PCRE2_EXTRA_ALT_BSUX has the same  effect  as  PCRE2_ALT_BSUX  and,  in
       addition,  \u{hhh..}  is recognized as the character specified by hexa-
       decimal code point.  There may be any  number  of  hexadecimal  digits.
       This syntax is from ECMAScript 6.
       The  \N{U+hhh..}  escape sequence is recognized only when the PCRE2_UTF
       option is set, that is, when PCRE2 is operating in a Unicode mode. Perl
@ -6347,6 +6386,11 @@ BACKSLASH
       brace  (curly  bracket)  it has an entirely different meaning, matching
       any character that is not a newline.
       There are some legacy applications where  the  escape  sequence  \r  is
       expected to match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option
       is set, \r in a pattern is converted to \n so  that  it  matches  a  LF
       (linefeed) instead of a CR (carriage return) character.
       The  precise effect of \cx on ASCII characters is as follows: if x is a
       lower case letter, it is converted to upper case. Then  bit  6  of  the
       character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A
@ -6429,25 +6473,6 @@ BACKSLASH
       syntax must not be introduced by a leading zero, because no  more  than
       three octal digits are ever read.
       By  default, after \x that is not followed by {, from zero to two hexa-
       decimal digits are read (letters can be in upper or  lower  case).  Any
       number of hexadecimal digits may appear between \x{ and }. If a charac-
       ter other than a hexadecimal digit appears between \x{  and  },  or  if
       there is no terminating }, an error occurs.
       If  the  PCRE2_ALT_BSUX  option  is set, the interpretation of \x is as
       just described only when it is followed by two hexadecimal digits. Oth-
       erwise,  it  matches a literal "x" character. In this mode, support for
       code points greater than 256 is provided by \u, which must be  followed
       by  four hexadecimal digits; otherwise it matches a literal "u" charac-
       ter. This syntax makes PCRE2 behave like ECMAscript  (aka  JavaScript).
       Code points greater than 0xFFFF are not supported.
       Characters whose value is less than 256 can be defined by either of the
       two syntaxes for \x (or by \u in PCRE2_ALT_BSUX mode). There is no dif-
       ference  in  the way they are handled. For example, \xdc is exactly the
       same as \x{dc} (or \u00dc in PCRE2_ALT_BSUX mode).
   Constraints on character values
       Characters  that  are  specified using octal or hexadecimal numbers are
@ -6480,9 +6505,10 @@ BACKSLASH
       In Perl, the sequences \F, \l, \L, \u, and \U  are  recognized  by  its
       string  handler and used to modify the case of following characters. By
-       default, PCRE2 does not support these escape sequences. However, if the
+       default, PCRE2 does not support these  escape  sequences  in  patterns.
-       PCRE2_ALT_BSUX option is set, \U matches a "U" character, and \u can be
+       However,  if  either  of  the  PCRE2_ALT_BSUX  or  PCRE2_EXTRA_ALT_BSUX
-       used to define a character by code point, as described above.
+       options is set, \U matches a "U" character,  and  \u  can  be  used  to
       define a character by code point, as described above.
   Absolute and relative backreferences
@ -9332,7 +9358,7 @@ AUTHOR
 REVISION
-       Last updated: 04 February 2019
+       Last updated: 12 February 2019
       Copyright (c) 1997-2019 University of Cambridge.
 ------------------------------------------------------------------------------
@ -10203,7 +10229,8 @@ QUOTING
 ESCAPED CHARACTERS
-       This table applies to ASCII and Unicode environments.
+       This  table  applies to ASCII and Unicode environments. An unrecognized
       escape sequence causes an error.
         \a         alarm, that is, the BEL character (hex 07)
         \cx        "control-x", where x is any ASCII printing character
@ -10215,12 +10242,24 @@ ESCAPED CHARACTERS
         \0dd       character with octal code 0dd
         \ddd       character with octal code ddd, or backreference
         \o{ddd..}  character with octal code ddd..
         \U         "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
         \N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
         \uhhhh     character with hex code hhhh (if PCRE2_ALT_BSUX is set)
         \xhh       character with hex code hh
         \x{hh..}   character with hex code hh..
       If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
       following are also recognized:
         \U         the character "U"
         \uhhhh     character with hex code hhhh
         \u{hh..}   character with hex code hh.. but only for EXTRA_ALT_BSUX
       When  \x  is not followed by {, from zero to two hexadecimal digits are
       read, but in ALT_BSUX mode \x must be followed by two hexadecimal  dig-
       its  to  be  recognized as a hexadecimal escape; otherwise it matches a
       literal "x".  Likewise, if \u (in ALT_BSUX mode)  is  not  followed  by
       four  hexadecimal  digits or (in EXTRA_ALT_BSUX mode) a sequence of hex
       digits in curly brackets, it matches a literal "u".
       Note that \0dd is always an octal code. The treatment of backslash fol-
       lowed  by  a non-zero digit is complicated; for details see the section
       "Non-printing characters"  in  the  pcre2pattern  documentation,  where
@ -10229,12 +10268,6 @@ ESCAPED CHARACTERS
       EBCDIC  environments.  Note  that  \N  not followed by an opening curly
       bracket has a different meaning (see below).
       When  \x  is not followed by {, from zero to two hexadecimal digits are
       read, but if PCRE2_ALT_BSUX is set, \x must be followed by two hexadec-
       imal  digits  to  be  recognized  as a hexadecimal escape; otherwise it
       matches a literal "x".  Likewise, if \u (in ALT_BSUX mode) is not  fol-
       lowed by four hexadecimal digits, it matches a literal "u".
 CHARACTER TYPES
@ -10670,7 +10703,7 @@ AUTHOR
 REVISION
-       Last updated: 03 February 2019
+       Last updated: 11 February 2019
       Copyright (c) 1997-2019 University of Cambridge.
 ------------------------------------------------------------------------------
--- a/doc/pcre2_compile.3
+++ b/doc/pcre2_compile.3
@ -1,4 +1,4 @@
-.TH PCRE2_COMPILE 3 "16 June 2017" "PCRE2 10.30"
+.TH PCRE2_COMPILE 3 "11 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -73,7 +73,13 @@ The option bits are:
 PCRE2 must be built with Unicode support (the default) in order to use
 PCRE2_UTF, PCRE2_UCP and related options.
 .P
-The yield of the function is a pointer to a private data structure that
+Additional options may be set in the compile context via the
 .\" HREF
 \fBpcre2_set_compile_extra_options\fP 
 .\"
 function.
 .P
 The yield of this function is a pointer to a private data structure that
 contains the compiled pattern, or NULL if an error was detected.
 .P
 There is a complete description of the PCRE2 native API, with more detail on
--- a/doc/pcre2_set_compile_extra_options.3
+++ b/doc/pcre2_set_compile_extra_options.3
@ -1,4 +1,4 @@
-.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "21 September 2018" "PCRE2 10.33"
+.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "11 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -8,7 +8,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .PP
 .nf
 .B int pcre2_set_compile_extra_options(pcre2_compile_context *\fIccontext\fP,
-.B "  PCRE2_SIZE \fIextra_options\fP);"
+.B "  uint32_t \fIextra_options\fP);"
 .fi
 .
 .SH DESCRIPTION
@ -21,6 +21,9 @@ options are:
 .\" JOIN
  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  Allow \ex{df800} to \ex{dfff}
                                         in UTF-8 and UTF-32 modes
 .\" JOIN
  PCRE2_EXTRA_ALT_BSUX                 Extended alternate \eu, \eU, and \ex
                                         handling 
 .\" JOIN
  PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL    Treat all invalid escapes as
                                         a literal following character
--- a/doc/pcre2api.3
+++ b/doc/pcre2api.3
@ -1,4 +1,4 @@
-.TH PCRE2API 3 "04 February 2019" "PCRE2 10.33"
+.TH PCRE2API 3 "12 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@ -1231,7 +1231,7 @@ are needed. The \fBpcre2_code_copy_with_tables()\fP provides this facility.
 Copies of both the code and the tables are made, with the new code pointing to
 the new tables. The memory for the new tables is automatically freed when
 \fBpcre2_code_free()\fP is called for the new copy of the compiled code. If
-\fBpcre2_code_copy_withy_tables()\fP is called with a NULL argument, it returns
+\fBpcre2_code_copy_with_tables()\fP is called with a NULL argument, it returns
 NULL.
 .P
 NOTE: When one of the matching functions is called, pointers to the compiled
@ -1252,7 +1252,7 @@ below.
 .\"
 .P
 The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit
-settings that affect the compilation. It should be zero if no options are
+settings that affect the compilation. It should be zero if none of them are
 required. The available options are described below. Some of them (in
 particular, those that are compatible with Perl, but some others as well) can
 also be set and unset from within the pattern (see the detailed description in
@ -1267,8 +1267,9 @@ contents of the \fIoptions\fP argument specifies their settings at the start of
 compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK
 options can be set at the time of matching as well as at compile time.
 .P
-Other, less frequently required compile-time parameters (for example, the
+Some additional options and less frequently required compile-time parameters
-newline setting) can be provided in a compile context (as described
+(for example, the newline setting) can be provided in a compile context (as
 described
 .\" HTML <a href="#compilecontext">
 .\" </a>
 above).
@ -1325,6 +1326,11 @@ This code fragment shows a typical straightforward call to
    &erroffset,             /* for error offset */
    NULL);                  /* no compile context */
 .sp
 .
 .
 .SS "Main compile options"
 .rs
 .sp
 The following names for option bits are defined in the \fBpcre2.h\fP header
 file:
 .sp
@ -1361,6 +1367,16 @@ hexadecimal digits, in which case the hexadecimal number defines the code point
 to match. By default, as in Perl, a hexadecimal number is always expected after
 \ex, but it may have zero, one, or two digits (so, for example, \exz matches a
 binary zero character followed by z).
 .P
 ECMAscript 6 added additional functionality to \eu. This can be accessed using
 the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options"
 .\" HTML <a href="#extracompileoptions">
 .\" </a>
 below).
 .\"
 Note that this alternative escape handling applies only to patterns. Neither of 
 these options affects the processing of replacement strings passed to 
 \fBpcre2_substitute()\fP.
 .sp
  PCRE2_ALT_CIRCUMFLEX
 .sp
@ -1788,9 +1804,8 @@ characters with code points greater than 127.
 .SS "Extra compile options"
 .rs
 .sp
-Unlike the main compile-time options, the extra options are not saved with the
+The option bits that can be set in a compile context by calling the
-compiled pattern. The option bits that can be set in a compile context by
+\fBpcre2_set_compile_extra_options()\fP function are as follows:
 calling the \fBpcre2_set_compile_extra_options()\fP function are as follows:
 .sp
  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
 .sp
@ -1813,6 +1828,14 @@ If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
 point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
 incorporated in the compiled pattern. However, they can only match subject
 characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
 .sp
  PCRE2_EXTRA_ALT_BSUX
 .sp
 The original option PCRE2_ALT_BSUX causes PCRE2 to process \eU, \eu, and \ex in 
 the way that ECMAscript (aka JavaScript) does. Additional functionality was 
 defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of 
 PCRE2_ALT_BSUX, but in addition it recognizes \eu{hhh..} as a hexadecimal 
 character code, where hhh.. is any number of hexadecimal digits.
 .sp
  PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
 .sp
@ -3383,7 +3406,8 @@ capture groups and letters within \eQ...\eE quoted sequences.
 .P
 Note that case forcing sequences such as \eU...\eE do not nest. For example,
 the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
-effect.
+effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do 
 not apply to not apply to replacement strings.
 .P
 The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
 flexibility to capture group substitution. The syntax is similar to that used
@ -3792,6 +3816,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 04 February 2019
+Last updated: 12 February 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2compat.3
+++ b/doc/pcre2compat.3
@ -1,4 +1,4 @@
-.TH PCRE2COMPAT 3 "03 February 2019" "PCRE2 10.33"
+.TH PCRE2COMPAT 3 "12 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -33,8 +33,9 @@ non-newline character, and \eN{U+dd..}, matching a Unicode code point, are
 supported. The escapes that modify the case of following letters are
 implemented by Perl's general string-handling and are not part of its pattern
 matching engine. If any of these are encountered by PCRE2, an error is
-generated by default. However, if the PCRE2_ALT_BSUX option is set, \eU and \eu
+generated by default. However, if either of the PCRE2_ALT_BSUX or
-are interpreted as ECMAScript interprets them.
+PCRE2_EXTRA_ALT_BSUX options is set, \eU and \eu are interpreted as ECMAScript
 interprets them.
 .P
 5. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
 built with Unicode support (the default). The properties that can be tested
@ -198,6 +199,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 03 February 2019
+Last updated: 12 February 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "04 February 2019" "PCRE2 10.33"
+.TH PCRE2PATTERN 3 "12 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -373,12 +373,30 @@ environment, these escapes are as follows:
  \exhh        character with hex code hh
  \ex{hhh..}   character with hex code hhh..
  \eN{U+hhh..} character with Unicode hex code point hhh..
  \euhhhh      character with hex code hhhh (when PCRE2_ALT_BSUX is set)
 .sp
-There are some legacy applications where the escape sequence \er is expected to
+By default, after \ex that is not followed by {, from zero to two hexadecimal
-match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \er in a
+digits are read (letters can be in upper or lower case). Any number of
-pattern is converted to \en so that it matches a LF (linefeed) instead of a CR
+hexadecimal digits may appear between \ex{ and }. If a character other than a
-(carriage return) character.
+hexadecimal digit appears between \ex{ and }, or if there is no terminating },
 an error occurs.
 .P
 Characters whose code points are less than 256 can be defined by either of the
 two syntaxes for \ex or by an octal sequence. There is no difference in the way
 they are handled. For example, \exdc is exactly the same as \ex{dc} or \e334.
 However, using the braced versions does make such sequences easier to read.
 .P
 Support is available for some ECMAScript (aka JavaScript) escape sequences via
 two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \ex followed
 by { is not recognized. Only if \ex is followed by two hexadecimal digits is it
 recognized as a character escape. Otherwise it is interpreted as a literal "x"
 character. In this mode, support for code points greater than 256 is provided
 by \eu, which must be followed by four hexadecimal digits; otherwise it is 
 interpreted as a literal "u" character.
 .P
 PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition,
 \eu{hhh..} is recognized as the character specified by hexadecimal code point.
 There may be any number of hexadecimal digits. This syntax is from ECMAScript 
 6.
 .P
 The \eN{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
 is set, that is, when PCRE2 is operating in a Unicode mode. Perl also uses
@ -386,6 +404,11 @@ is set, that is, when PCRE2 is operating in a Unicode mode. Perl also uses
 Note that when \eN is not followed by an opening brace (curly bracket) it has
 an entirely different meaning, matching any character that is not a newline.
 .P
 There are some legacy applications where the escape sequence \er is expected to
 match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \er in a
 pattern is converted to \en so that it matches a LF (linefeed) instead of a CR
 (carriage return) character.
 .P
 The precise effect of \ecx on ASCII characters is as follows: if x is a lower
 case letter, it is converted to upper case. Then bit 6 of the character (hex
 40) is inverted. Thus \ecA to \ecZ become hex 01 to hex 1A (A is 41, Z is 5A),
@ -477,25 +500,6 @@ for themselves. For example, outside a character class:
 Note that octal values of 100 or greater that are specified using this syntax
 must not be introduced by a leading zero, because no more than three octal
 digits are ever read.
 .P
 By default, after \ex that is not followed by {, from zero to two hexadecimal
 digits are read (letters can be in upper or lower case). Any number of
 hexadecimal digits may appear between \ex{ and }. If a character other than
 a hexadecimal digit appears between \ex{ and }, or if there is no terminating
 }, an error occurs.
 .P
 If the PCRE2_ALT_BSUX option is set, the interpretation of \ex is as just
 described only when it is followed by two hexadecimal digits. Otherwise, it
 matches a literal "x" character. In this mode, support for code points greater
 than 256 is provided by \eu, which must be followed by four hexadecimal digits;
 otherwise it matches a literal "u" character. This syntax makes PCRE2 behave 
 like ECMAscript (aka JavaScript). Code points greater than 0xFFFF are not
 supported.
 .P
 Characters whose value is less than 256 can be defined by either of the two
 syntaxes for \ex (or by \eu in PCRE2_ALT_BSUX mode). There is no difference in
 the way they are handled. For example, \exdc is exactly the same as \ex{dc} (or
 \eu00dc in PCRE2_ALT_BSUX mode).
 .
 .
 .SS "Constraints on character values"
@ -534,9 +538,10 @@ character class, these sequences have different meanings.
 .sp
 In Perl, the sequences \eF, \el, \eL, \eu, and \eU are recognized by its string
 handler and used to modify the case of following characters. By default, PCRE2
-does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
+does not support these escape sequences in patterns. However, if either of the
-is set, \eU matches a "U" character, and \eu can be used to define a character
+PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \eU matches a "U"
-by code point, as described above.
+character, and \eu can be used to define a character by code point, as
 described above.
 .
 .
 .SS "Absolute and relative backreferences"
@ -3758,6 +3763,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 04 February 2019
+Last updated: 12 February 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "03 February 2019" "PCRE2 10.33"
+.TH PCRE2SYNTAX 3 "11 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -22,7 +22,8 @@ documentation. This document contains a quick-reference summary of the syntax.
 .SH "ESCAPED CHARACTERS"
 .rs
 .sp
-This table applies to ASCII and Unicode environments.
+This table applies to ASCII and Unicode environments. An unrecognized escape 
 sequence causes an error.
 .sp
  \ea         alarm, that is, the BEL character (hex 07)
  \ecx        "control-x", where x is any ASCII printing character
@ -34,12 +35,24 @@ This table applies to ASCII and Unicode environments.
  \e0dd       character with octal code 0dd
  \eddd       character with octal code ddd, or backreference
  \eo{ddd..}  character with octal code ddd..
  \eU         "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
  \eN{U+hh..} character with Unicode code point hh.. (Unicode mode only)
  \euhhhh     character with hex code hhhh (if PCRE2_ALT_BSUX is set)
  \exhh       character with hex code hh
  \ex{hh..}   character with hex code hh..
 .sp
 If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
 following are also recognized:
 .sp
  \eU         the character "U"
  \euhhhh     character with hex code hhhh
  \eu{hh..}   character with hex code hh.. but only for EXTRA_ALT_BSUX
 .sp
 When \ex is not followed by {, from zero to two hexadecimal digits are read,
 but in ALT_BSUX mode \ex must be followed by two hexadecimal digits to be
 recognized as a hexadecimal escape; otherwise it matches a literal "x".
 Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits 
 or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
 matches a literal "u".
 .P
 Note that \e0dd is always an octal code. The treatment of backslash followed by
 a non-zero digit is complicated; for details see the section
 .\" HTML <a href="pcre2pattern.html#digitsafterbackslash">
@ -54,12 +67,6 @@ documentation, where details of escape processing in EBCDIC environments are
 also given. \eN{U+hh..} is synonymous with \ex{hh..} in PCRE2 but is not
 supported in EBCDIC environments. Note that \eN not followed by an opening
 curly bracket has a different meaning (see below).
 .P
 When \ex is not followed by {, from zero to two hexadecimal digits are read,
 but if PCRE2_ALT_BSUX is set, \ex must be followed by two hexadecimal digits to
 be recognized as a hexadecimal escape; otherwise it matches a literal "x".
 Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits,
 it matches a literal "u".
 .
 .
 .SH "CHARACTER TYPES"
@ -647,6 +654,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 03 February 2019
+Last updated: 11 February 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2test.1
+++ b/doc/pcre2test.1
@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "03 February 2019" "PCRE 10.33"
+.TH PCRE2TEST 1 "11 February 2019" "PCRE 10.33"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@ -568,6 +568,7 @@ for a description of the effects of these options.
      escaped_cr_is_lf          set PCRE2_EXTRA_ESCAPED_CR_IS_LF 
  /x  extended                  set PCRE2_EXTENDED
  /xx extended_more             set PCRE2_EXTENDED_MORE
      extra_alt_bsux            set PCRE2_EXTRA_ALT_BSUX 
      firstline                 set PCRE2_FIRSTLINE
      literal                   set PCRE2_LITERAL
      match_line                set PCRE2_EXTRA_MATCH_LINE
@ -2056,6 +2057,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 03 February 2019
+Last updated: 11 February 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2test.txt
+++ b/doc/pcre2test.txt
@ -547,6 +547,7 @@ PATTERN MODIFIERS
             escaped_cr_is_lf          set PCRE2_EXTRA_ESCAPED_CR_IS_LF
         /x  extended                  set PCRE2_EXTENDED
         /xx extended_more             set PCRE2_EXTENDED_MORE
             extra_alt_bsux            set PCRE2_EXTRA_ALT_BSUX
             firstline                 set PCRE2_FIRSTLINE
             literal                   set PCRE2_LITERAL
             match_line                set PCRE2_EXTRA_MATCH_LINE
@ -1887,5 +1888,5 @@ AUTHOR
 REVISION
-       Last updated: 03 February 2019
+       Last updated: 11 February 2019
       Copyright (c) 1997-2019 University of Cambridge.
--- a/src/pcre2.h.in
+++ b/src/pcre2.h.in
@ -150,6 +150,7 @@ D   is inspected during pcre2_dfa_match() execution
 #define PCRE2_EXTRA_MATCH_WORD               0x00000004u  /* C */
 #define PCRE2_EXTRA_MATCH_LINE               0x00000008u  /* C */
 #define PCRE2_EXTRA_ESCAPED_CR_IS_LF         0x00000010u  /* C */
 #define PCRE2_EXTRA_ALT_BSUX                 0x00000020u  /* C */
 /* These are for pcre2_jit_compile(). */
--- a/src/pcre2_compile.c
+++ b/src/pcre2_compile.c
@ -764,7 +764,7 @@ are allowed. */
 #define PUBLIC_COMPILE_EXTRA_OPTIONS \
   (PUBLIC_LITERAL_COMPILE_EXTRA_OPTIONS| \
    PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES|PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL| \
-    PCRE2_EXTRA_ESCAPED_CR_IS_LF)
+    PCRE2_EXTRA_ESCAPED_CR_IS_LF|PCRE2_EXTRA_ALT_BSUX)
 /* Compile time error code numbers. They are given names so that they can more
 easily be tracked. When a new number is added, the tables called eint1 and
@ -1459,7 +1459,8 @@ Returns:         zero => a data character
 int
 PRIV(check_escape)(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t *chptr,
-  int *errorcodeptr, uint32_t options, BOOL isclass, compile_block *cb)
+  int *errorcodeptr, uint32_t options, uint32_t extra_options, BOOL isclass, 
  compile_block *cb)
 {
 BOOL utf = (options & PCRE2_UTF) != 0;
 PCRE2_SPTR ptr = *ptrptr;
@ -1495,8 +1496,7 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
  if (i > 0)
    {
    c = (uint32_t)i;
-    if (cb != NULL && c == CHAR_CR &&
+    if (c == CHAR_CR && (extra_options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)
        (cb->cx->extra_options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)
      c = CHAR_LF;
    }
  else  /* Negative table entry */
@ -1551,22 +1551,28 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
 /* Escapes that need further processing, including those that are unknown, have
 a zero entry in the lookup table. When called from pcre2_substitute(), only \c,
-\o, and \x are recognized (and \u when BSUX is set). */
+\o, and \x are recognized (\u and \U can never appear as they are used for case 
 forcing). */
 else
  {
  int s;
  PCRE2_SPTR oldptr;
  BOOL overflow;
-  int s;
+  BOOL alt_bsux = 
    ((options & PCRE2_ALT_BSUX) | (extra_options & PCRE2_EXTRA_ALT_BSUX)) != 0;
  /* Filter calls from pcre2_substitute(). */
-  if (cb == NULL && c != CHAR_c && c != CHAR_o && c != CHAR_x &&
+  if (cb == NULL)
-      (c != CHAR_u || (options & PCRE2_ALT_BSUX) != 0))
+    {
    if (c != CHAR_c && c != CHAR_o && c != CHAR_x)
      {
      *errorcodeptr = ERR3;
      return 0;
      }
    alt_bsux = FALSE;   /* Do not modify \x handling */   
    }   
  switch (c)
    {
@ -1579,14 +1585,46 @@ else
    *errorcodeptr = ERR37;
    break;
-    /* \u is unrecognized when PCRE2_ALT_BSUX is not set. When it is treated
+    /* \u is unrecognized when neither PCRE2_ALT_BSUX nor PCRE2_EXTRA_ALT_BSUX
-    specially, \u must be followed by four hex digits. Otherwise it is a
+    is set. Otherwise, \u must be followed by exactly four hex digits or, if
-    lowercase u letter. */
+    PCRE2_EXTRA_ALT_BSUX is set, by any number of hex digits in braces.
    Otherwise it is a lowercase u letter. This gives some compatibility with
    ECMAScript (aka JavaScript). */
    case CHAR_u:
-    if ((options & PCRE2_ALT_BSUX) == 0) *errorcodeptr = ERR37; else
+    if (!alt_bsux) *errorcodeptr = ERR37; else
      {
      uint32_t xc;
      if (*ptr == CHAR_LEFT_CURLY_BRACKET && 
          (extra_options & PCRE2_EXTRA_ALT_BSUX) != 0)
        {
        PCRE2_SPTR hptr = ptr + 1;
        cc = 0;
        while (hptr < ptrend && (xc = XDIGIT(*hptr)) != 0xff)
          { 
          if ((cc & 0xf0000000) != 0)  /* Test for 32-bit overflow */
            {
            *errorcodeptr = ERR77;
            ptr = hptr;   /* Show where */
            break;        /* *hptr != } will cause another break below */  
            } 
          cc = (cc << 4) | xc;
          hptr++; 
          } 
        if (hptr == ptr + 1 ||   /* No hex digits */
            hptr >= ptrend ||    /* Hit end of input */
            *hptr != CHAR_RIGHT_CURLY_BRACKET)  /* No } terminator */
          break;         /* Hex escape not recognized */
        c = cc;          /* Accept the code point */
        ptr = hptr + 1; 
        }
      else  /* Must be exactly 4 hex digits */
        {      
        if (ptrend - ptr < 4) break;               /* Less than 4 chars */
        if ((cc = XDIGIT(ptr[0])) == 0xff) break;  /* Not a hex digit */
        if ((xc = XDIGIT(ptr[1])) == 0xff) break;  /* Not a hex digit */
@ -1596,23 +1634,25 @@ else
        if ((xc = XDIGIT(ptr[3])) == 0xff) break;  /* Not a hex digit */
        c = (cc << 4) | xc;
        ptr += 4;
        } 
      if (utf)
        {
        if (c > 0x10ffffU) *errorcodeptr = ERR77;
        else
          if (c >= 0xd800 && c <= 0xdfff &&
-            (cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
+              (extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
                *errorcodeptr = ERR73;
        }
      else if (c > MAX_NON_UTF_CHAR) *errorcodeptr = ERR77;
      }
    break;
-    /* \U is unrecognized unless PCRE2_ALT_BSUX is set, in which case it is an
+    /* \U is unrecognized unless PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set,
-    upper case letter. */
+    in which case it is an upper case letter. */
    case CHAR_U:
-    if ((options & PCRE2_ALT_BSUX) == 0) *errorcodeptr = ERR37;
+    if (!alt_bsux) *errorcodeptr = ERR37;
    break;
    /* In a character class, \g is just a literal "g". Outside a character
@ -1791,8 +1831,8 @@ else
        }
      else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET)
        {
-        if (utf && c >= 0xd800 && c <= 0xdfff && (cb == NULL ||
+        if (utf && c >= 0xd800 && c <= 0xdfff &&
-            (cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0))
+            (extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
          {
          ptr--;
          *errorcodeptr = ERR73;
@ -1806,11 +1846,11 @@ else
      }
    break;
-    /* \x is complicated. When PCRE2_ALT_BSUX is set, \x must be followed by
+    /* When PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set, \x must be followed
-    two hexadecimal digits. Otherwise it is a lowercase x letter. */
+    by two hexadecimal digits. Otherwise it is a lowercase x letter. */
    case CHAR_x:
-    if ((options & PCRE2_ALT_BSUX) != 0)
+    if (alt_bsux)
      {
      uint32_t xc;
      if (ptrend - ptr < 2) break;               /* Less than 2 characters */
@ -1818,9 +1858,9 @@ else
      if ((xc = XDIGIT(ptr[1])) == 0xff) break;  /* Not a hex digit */
      c = (cc << 4) | xc;
      ptr += 2;
-      }    /* End PCRE2_ALT_BSUX handling */
+      }
-    /* Handle \x in Perl's style. \x{ddd} is a character number which can be
+    /* Handle \x in Perl's style. \x{ddd} is a character code which can be
    greater than 0xff in UTF-8 or non-8bit mode, but only if the ddd are hex
    digits. If not, { used to be treated as a data character. However, Perl
    seems to read hex digits up to the first non-such, and ignore the rest, so
@ -1864,8 +1904,8 @@ else
          }
        else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET)
          {
-          if (utf && c >= 0xd800 && c <= 0xdfff && (cb == NULL ||
+          if (utf && c >= 0xd800 && c <= 0xdfff &&
-              (cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0))
+              (extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
            {
            ptr--;
            *errorcodeptr = ERR73;
@ -2438,6 +2478,7 @@ uint32_t *parsed_pattern = cb->parsed_pattern;
 uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
 uint32_t meta_quantifier = 0;
 uint32_t add_after_mark = 0;
 uint32_t extra_options = cb->cx->extra_options;
 uint16_t nest_depth = 0;
 int after_manual_callout = 0;
 int expect_cond_assert = 0;
@ -2461,12 +2502,12 @@ nest_save *top_nest, *end_nests;
 /* Insert leading items for word and line matching (features provided for the
 benefit of pcre2grep). */
-if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
+if ((extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
  {
  *parsed_pattern++ = META_CIRCUMFLEX;
  *parsed_pattern++ = META_NOCAPTURE;
  }
-else if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
+else if ((extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
  {
  *parsed_pattern++ = META_ESCAPE + ESC_b;
  *parsed_pattern++ = META_NOCAPTURE;
@ -2631,7 +2672,7 @@ while (ptr < ptrend)
      if ((options & PCRE2_ALT_VERBNAMES) != 0)
        {
        escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
-          FALSE, cb);
+          cb->cx->extra_options, FALSE, cb);
        if (errorcode != 0) goto FAILED;
        }
      else escape = 0;   /* Treat all as literal */
@ -2821,11 +2862,11 @@ while (ptr < ptrend)
    case CHAR_BACKSLASH:
    tempptr = ptr;
    escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
-      FALSE, cb);
+      cb->cx->extra_options, FALSE, cb);
    if (errorcode != 0)
      {
      ESCAPE_FAILED:
-      if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
+      if ((extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
        goto FAILED;
      ptr = tempptr;
      if (ptr >= ptrend) c = CHAR_BACKSLASH; else
@ -3382,12 +3423,12 @@ while (ptr < ptrend)
      else
        {
        tempptr = ptr;
-        escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode,
+        escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options, 
-          options, TRUE, cb);
+          cb->cx->extra_options, TRUE, cb);
        if (errorcode != 0)
          {
-          if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
+          if ((extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
            goto FAILED;
          ptr = tempptr;
          if (ptr >= ptrend) c = CHAR_BACKSLASH; else
@ -4545,12 +4586,12 @@ parsed_pattern = manage_callouts(ptr, &previous_callout, auto_callout,
 /* Insert trailing items for word and line matching (features provided for the
 benefit of pcre2grep). */
-if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
+if ((extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
  {
  *parsed_pattern++ = META_KET;
  *parsed_pattern++ = META_DOLLAR;
  }
-else if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
+else if ((extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
  {
  *parsed_pattern++ = META_KET;
  *parsed_pattern++ = META_ESCAPE + ESC_b;
--- a/src/pcre2_internal.h
+++ b/src/pcre2_internal.h
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
                       Written by Philip Hazel
     Original API code Copyright (c) 1997-2012 University of Cambridge
-          New API code Copyright (c) 2016-2018 University of Cambridge
+          New API code Copyright (c) 2016-2019 University of Cambridge
 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@ -1942,7 +1942,7 @@ is available. */
 extern int          _pcre2_auto_possessify(PCRE2_UCHAR *, BOOL,
                      const compile_block *);
 extern int          _pcre2_check_escape(PCRE2_SPTR *, PCRE2_SPTR, uint32_t *,
-                      int *, uint32_t, BOOL, compile_block *);
+                      int *, uint32_t, uint32_t, BOOL, compile_block *);
 extern PCRE2_SPTR   _pcre2_extuni(uint32_t, PCRE2_SPTR, PCRE2_SPTR, PCRE2_SPTR,
                      BOOL, int *);
 extern PCRE2_SPTR   _pcre2_find_bracket(PCRE2_SPTR, BOOL, int);
--- a/src/pcre2_substitute.c
+++ b/src/pcre2_substitute.c
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
                       Written by Philip Hazel
     Original API code Copyright (c) 1997-2012 University of Cambridge
-          New API code Copyright (c) 2016-2018 University of Cambridge
+          New API code Copyright (c) 2016-2019 University of Cambridge
 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@ -129,7 +129,7 @@ for (; ptr < ptrend; ptr++)
    ptr += 1;  /* Must point after \ */
    erc = PRIV(check_escape)(&ptr, ptrend, &ch, &errorcode,
-      code->overall_options, FALSE, NULL);
+      code->overall_options, code->extra_options, FALSE, NULL);
    ptr -= 1;  /* Back to last code unit of escape */
    if (errorcode != 0)
      {
@ -774,7 +774,7 @@ do
      ptr++;  /* Point after \ */
      rc = PRIV(check_escape)(&ptr, repend, &ch, &errorcode,
-        code->overall_options, FALSE, NULL);
+        code->overall_options, code->extra_options, FALSE, NULL);
      if (errorcode != 0) goto BADESCAPE;
      switch(rc)
--- a/src/pcre2test.c
+++ b/src/pcre2test.c
@ -646,6 +646,7 @@ static modstruct modlist[] = {
  { "expand",                     MOD_PAT,  MOD_CTL, CTL_EXPAND,                 PO(control) },
  { "extended",                   MOD_PATP, MOD_OPT, PCRE2_EXTENDED,             PO(options) },
  { "extended_more",              MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE,        PO(options) },
  { "extra_alt_bsux",             MOD_CTC,  MOD_OPT, PCRE2_EXTRA_ALT_BSUX,       CO(extra_options) },
  { "find_limits",                MOD_DAT,  MOD_CTL, CTL_FINDLIMITS,             DO(control) },
  { "firstline",                  MOD_PAT,  MOD_OPT, PCRE2_FIRSTLINE,            PO(options) },
  { "framesize",                  MOD_PAT,  MOD_CTL, CTL_FRAMESIZE,              PO(control) },
@ -4189,10 +4190,11 @@ show_compile_extra_options(uint32_t options, const char *before,
  const char *after)
 {
 if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
-else fprintf(outfile, "%s%s%s%s%s%s%s",
+else fprintf(outfile, "%s%s%s%s%s%s%s%s",
  before,
  ((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "",
  ((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
  ((options & PCRE2_EXTRA_ALT_BSUX) != 0)? " extra_alt_bsux" : "",
  ((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "",
  ((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "",
  ((options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)? " escaped_cr_is_lf" : "",
--- a/testdata/testinput2
+++ b/testdata/testinput2
@ -2408,13 +2408,13 @@
 \= Expect no match
    cat
-/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
    cat
 /TA]/
    The ACTA] comes
-/TA]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/TA]/allow_empty_class,match_unset_backref,dupnames
    The ACTA] comes
 /(?2)[]a()b](abc)/
@ -2446,25 +2446,25 @@
 /a[^]b/
-/a[]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab
-/a[]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]+b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab
-/a[]*+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]*+b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab
-/a[^]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[^]b/allow_empty_class,match_unset_backref,dupnames
    aXb
    a\nb
 \= Expect no match
    ab
-/a[^]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[^]+b/allow_empty_class,match_unset_backref,dupnames
    aXb
    a\nX\nXb
 \= Expect no match
@ -2903,10 +2903,10 @@
    xxxxabcde\=ps
    xxxxabcde\=ph
-/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
    cat
-/(\3)(\1)(a)/I,alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/I,allow_empty_class,match_unset_backref,dupnames
    cat
 /(\3)(\1)(a)/I
@ -3419,6 +3419,14 @@
 \= Expect no match
    aAz
 /^\u{7a}/alt_bsux
    u{7a}
 \= Expect no match
    zoo 
 /^\u{7a}/extra_alt_bsux
    zoo 
 /(?(?=c)c|d)++Y/B
 /(?(?=c)c|d)*+Y/B
--- a/testdata/testinput5
+++ b/testdata/testinput5
@ -333,13 +333,13 @@
 /[[:a\x{100}b:]]/utf
-/a[^]b/utf,alt_bsux,allow_empty_class,match_unset_backref
+/a[^]b/utf,allow_empty_class,match_unset_backref
    a\x{1234}b
    a\nb
 \= Expect no match
    ab
-/a[^]+b/utf,alt_bsux,allow_empty_class,match_unset_backref
+/a[^]+b/utf,allow_empty_class,match_unset_backref
    aXb
    a\nX\nX\x{1234}b
 \= Expect no match
@ -814,6 +814,9 @@
 /\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
 /^\u{0000000000010ffff}/utf,extra_alt_bsux
    \x{10ffff}
 /^a+[a\x{200}]/B,utf
    aa
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@ -8774,7 +8774,7 @@ No match
    cat
 No match
-/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
    cat
 0: a
 1: 
@ -8785,7 +8785,7 @@ No match
    The ACTA] comes
 0: TA]
-/TA]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/TA]/allow_empty_class,match_unset_backref,dupnames
    The ACTA] comes
 0: TA]
@ -8833,22 +8833,22 @@ Failed: error 106 at offset 4: missing terminating ] for character class
 /a[^]b/
 Failed: error 106 at offset 5: missing terminating ] for character class
-/a[]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab
 No match
-/a[]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]+b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab
 No match
-/a[]*+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]*+b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab
 No match
-/a[^]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[^]b/allow_empty_class,match_unset_backref,dupnames
    aXb
 0: aXb
    a\nb
@ -8857,7 +8857,7 @@ No match
    ab
 No match
-/a[^]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[^]+b/allow_empty_class,match_unset_backref,dupnames
    aXb
 0: aXb
    a\nX\nXb
@ -9971,17 +9971,17 @@ Partial match: abca
    xxxxabcde\=ph
 Partial match: abcde
-/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
    cat
 0: a
 1: 
 2: 
 3: a
-/(\3)(\1)(a)/I,alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/I,allow_empty_class,match_unset_backref,dupnames
 Capture group count = 3
 Max back reference = 3
-Options: alt_bsux allow_empty_class dupnames match_unset_backref
+Options: allow_empty_class dupnames match_unset_backref
 Last code unit = 'a'
 Subject length lower bound = 1
    cat
@ -11365,6 +11365,17 @@ No match
    aAz
 No match
 /^\u{7a}/alt_bsux
    u{7a}
 0: u{7a}
 \= Expect no match
    zoo 
 No match
 /^\u{7a}/extra_alt_bsux
    zoo 
 0: z
 /(?(?=c)c|d)++Y/B
 ------------------------------------------------------------------
        Bra
--- a/testdata/testoutput5
+++ b/testdata/testoutput5
@ -798,7 +798,7 @@ No match
 /[[:a\x{100}b:]]/utf
 Failed: error 130 at offset 3: unknown POSIX class name
-/a[^]b/utf,alt_bsux,allow_empty_class,match_unset_backref
+/a[^]b/utf,allow_empty_class,match_unset_backref
    a\x{1234}b
 0: a\x{1234}b
    a\nb
@ -807,7 +807,7 @@ Failed: error 130 at offset 3: unknown POSIX class name
    ab
 No match
-/a[^]+b/utf,alt_bsux,allow_empty_class,match_unset_backref
+/a[^]+b/utf,allow_empty_class,match_unset_backref
    aXb
 0: aXb
    a\nX\nX\x{1234}b
@ -1734,6 +1734,10 @@ No match
 /\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
 Failed: error 173 at offset 6: disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
 /^\u{0000000000010ffff}/utf,extra_alt_bsux
    \x{10ffff}
 0: \x{10ffff}
 /^a+[a\x{200}]/B,utf
 ------------------------------------------------------------------
        Bra