Implement PCRE2_EXTRA_ALT_BSUX to support ECMAscript 6's \u{hhh..} syntax.

2019-02-12 17:50:19 +00:00 · 2019-02-12 17:50:19 +00:00 · 8c8deae8eb
parent d90de8b053
commit 8c8deae8eb
26 changed files with 1310 additions and 1112 deletions
--- a/3
+++ b/3
@ -125,6 +125,9 @@ processing or a crash could result.
 names, as Perl does. There was a small bug in this new code, found by
 ClusterFuzz 12950, fixed before release.

+31. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh} 
+construct.
+

 Version 10.32 10-September-2018
 -------------------------------
--- a/doc/html/pcre2_compile.html
+++ b/doc/html/pcre2_compile.html
@ -86,7 +86,12 @@ PCRE2 must be built with Unicode support (the default) in order to use
 PCRE2_UTF, PCRE2_UCP and related options.
 </P>
 <P>
-The yield of the function is a pointer to a private data structure that
+Additional options may be set in the compile context via the
+<a href="pcre2_set_compile_extra_options.html"><b>pcre2_set_compile_extra_options</b></a>
+function.
+</P>
+<P>
+The yield of this function is a pointer to a private data structure that
 contains the compiled pattern, or NULL if an error was detected.
 </P>
 <P>
--- a/doc/html/pcre2_set_compile_extra_options.html
+++ b/doc/html/pcre2_set_compile_extra_options.html
@ -20,7 +20,7 @@ SYNOPSIS
 </P>
 <P>
 <b>int pcre2_set_compile_extra_options(pcre2_compile_context *<i>ccontext</i>,</b>
-<b>  PCRE2_SIZE <i>extra_options</i>);</b>
+<b>  uint32_t <i>extra_options</i>);</b>
 </P>
 <br><b>
 DESCRIPTION
@ -31,6 +31,7 @@ housed in a compile context. It completely replaces all the bits. The extra
 options are:
 <pre>
  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
+  PCRE2_EXTRA_ALT_BSUX                 Extended alternate \u, \U, and \x handling 
  PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL    Treat all invalid escapes as a literal following character
  PCRE2_EXTRA_ESCAPED_CR_IS_LF         Interpret \r as \n
  PCRE2_EXTRA_MATCH_LINE               Pattern matches whole lines
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@ -1298,7 +1298,7 @@ are needed. The <b>pcre2_code_copy_with_tables()</b> provides this facility.
 Copies of both the code and the tables are made, with the new code pointing to
 the new tables. The memory for the new tables is automatically freed when
 <b>pcre2_code_free()</b> is called for the new copy of the compiled code. If
-<b>pcre2_code_copy_withy_tables()</b> is called with a NULL argument, it returns
+<b>pcre2_code_copy_with_tables()</b> is called with a NULL argument, it returns
 NULL.
 </P>
 <P>
@ -1315,7 +1315,7 @@ PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
 </P>
 <P>
 The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
-settings that affect the compilation. It should be zero if no options are
+settings that affect the compilation. It should be zero if none of them are
 required. The available options are described below. Some of them (in
 particular, those that are compatible with Perl, but some others as well) can
 also be set and unset from within the pattern (see the detailed description in
@ -1330,8 +1330,9 @@ compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK
 options can be set at the time of matching as well as at compile time.
 </P>
 <P>
-Other, less frequently required compile-time parameters (for example, the
-newline setting) can be provided in a compile context (as described
+Some additional options and less frequently required compile-time parameters
+(for example, the newline setting) can be provided in a compile context (as
+described
 <a href="#compilecontext">above).</a>
 </P>
 <P>
@ -1384,7 +1385,13 @@ This code fragment shows a typical straightforward call to
    &errorcode,             /* for error code */
    &erroffset,             /* for error offset */
    NULL);                  /* no compile context */
-</pre>
+
+</PRE>
+</P>
+<br><b>
+Main compile options
+</b><br>
+<P>
 The following names for option bits are defined in the <b>pcre2.h</b> header
 file:
 <pre>
@ -1424,6 +1431,14 @@ hexadecimal digits, in which case the hexadecimal number defines the code point
 to match. By default, as in Perl, a hexadecimal number is always expected after
 \x, but it may have zero, one, or two digits (so, for example, \xz matches a
 binary zero character followed by z).
+</P>
+<P>
+ECMAscript 6 added additional functionality to \u. This can be accessed using
+the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options"
+<a href="#extracompileoptions">below).</a>
+Note that this alternative escape handling applies only to patterns. Neither of 
+these options affects the processing of replacement strings passed to 
+<b>pcre2_substitute()</b>.
 <pre>
  PCRE2_ALT_CIRCUMFLEX
 </pre>
@ -1830,9 +1845,8 @@ characters with code points greater than 127.
 Extra compile options
 </b><br>
 <P>
-Unlike the main compile-time options, the extra options are not saved with the
-compiled pattern. The option bits that can be set in a compile context by
-calling the <b>pcre2_set_compile_extra_options()</b> function are as follows:
+The option bits that can be set in a compile context by calling the
+<b>pcre2_set_compile_extra_options()</b> function are as follows:
 <pre>
  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
 </pre>
@ -1857,6 +1871,14 @@ If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
 point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
 incorporated in the compiled pattern. However, they can only match subject
 characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
+<pre>
+  PCRE2_EXTRA_ALT_BSUX
+</pre>
+The original option PCRE2_ALT_BSUX causes PCRE2 to process \U, \u, and \x in 
+the way that ECMAscript (aka JavaScript) does. Additional functionality was 
+defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of 
+PCRE2_ALT_BSUX, but in addition it recognizes \u{hhh..} as a hexadecimal 
+character code, where hhh.. is any number of hexadecimal digits.
 <pre>
  PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
 </pre>
@ -3382,7 +3404,8 @@ capture groups and letters within \Q...\E quoted sequences.
 <P>
 Note that case forcing sequences such as \U...\E do not nest. For example,
 the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final \E has no
-effect.
+effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do 
+not apply to not apply to replacement strings.
 </P>
 <P>
 The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
@ -3784,7 +3807,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 04 February 2019
+Last updated: 12 February 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/html/pcre2compat.html
+++ b/doc/html/pcre2compat.html
@ -47,8 +47,9 @@ non-newline character, and \N{U+dd..}, matching a Unicode code point, are
 supported. The escapes that modify the case of following letters are
 implemented by Perl's general string-handling and are not part of its pattern
 matching engine. If any of these are encountered by PCRE2, an error is
-generated by default. However, if the PCRE2_ALT_BSUX option is set, \U and \u
-are interpreted as ECMAScript interprets them.
+generated by default. However, if either of the PCRE2_ALT_BSUX or
+PCRE2_EXTRA_ALT_BSUX options is set, \U and \u are interpreted as ECMAScript
+interprets them.
 </P>
 <P>
 5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
@ -233,7 +234,7 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 03 February 2019
+Last updated: 12 February 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@ -399,12 +399,33 @@ environment, these escapes are as follows:
  \xhh        character with hex code hh
  \x{hhh..}   character with hex code hhh..
  \N{U+hhh..} character with Unicode hex code point hhh..
-  \uhhhh      character with hex code hhhh (when PCRE2_ALT_BSUX is set)
 </pre>
-There are some legacy applications where the escape sequence \r is expected to
-match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
-pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
-(carriage return) character.
+By default, after \x that is not followed by {, from zero to two hexadecimal
+digits are read (letters can be in upper or lower case). Any number of
+hexadecimal digits may appear between \x{ and }. If a character other than a
+hexadecimal digit appears between \x{ and }, or if there is no terminating },
+an error occurs.
+</P>
+<P>
+Characters whose code points are less than 256 can be defined by either of the
+two syntaxes for \x or by an octal sequence. There is no difference in the way
+they are handled. For example, \xdc is exactly the same as \x{dc} or \334.
+However, using the braced versions does make such sequences easier to read.
+</P>
+<P>
+Support is available for some ECMAScript (aka JavaScript) escape sequences via
+two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \x followed
+by { is not recognized. Only if \x is followed by two hexadecimal digits is it
+recognized as a character escape. Otherwise it is interpreted as a literal "x"
+character. In this mode, support for code points greater than 256 is provided
+by \u, which must be followed by four hexadecimal digits; otherwise it is 
+interpreted as a literal "u" character.
+</P>
+<P>
+PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition,
+\u{hhh..} is recognized as the character specified by hexadecimal code point.
+There may be any number of hexadecimal digits. This syntax is from ECMAScript 
+6.
 </P>
 <P>
 The \N{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
@ -414,6 +435,12 @@ Note that when \N is not followed by an opening brace (curly bracket) it has
 an entirely different meaning, matching any character that is not a newline.
 </P>
 <P>
+There are some legacy applications where the escape sequence \r is expected to
+match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a
+pattern is converted to \n so that it matches a LF (linefeed) instead of a CR
+(carriage return) character.
+</P>
+<P>
 The precise effect of \cx on ASCII characters is as follows: if x is a lower
 case letter, it is converted to upper case. Then bit 6 of the character (hex
 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
@ -500,28 +527,6 @@ Note that octal values of 100 or greater that are specified using this syntax
 must not be introduced by a leading zero, because no more than three octal
 digits are ever read.
 </P>
-<P>
-By default, after \x that is not followed by {, from zero to two hexadecimal
-digits are read (letters can be in upper or lower case). Any number of
-hexadecimal digits may appear between \x{ and }. If a character other than
-a hexadecimal digit appears between \x{ and }, or if there is no terminating
-}, an error occurs.
-</P>
-<P>
-If the PCRE2_ALT_BSUX option is set, the interpretation of \x is as just
-described only when it is followed by two hexadecimal digits. Otherwise, it
-matches a literal "x" character. In this mode, support for code points greater
-than 256 is provided by \u, which must be followed by four hexadecimal digits;
-otherwise it matches a literal "u" character. This syntax makes PCRE2 behave 
-like ECMAscript (aka JavaScript). Code points greater than 0xFFFF are not
-supported.
-</P>
-<P>
-Characters whose value is less than 256 can be defined by either of the two
-syntaxes for \x (or by \u in PCRE2_ALT_BSUX mode). There is no difference in
-the way they are handled. For example, \xdc is exactly the same as \x{dc} (or
-\u00dc in PCRE2_ALT_BSUX mode).
-</P>
 <br><b>
 Constraints on character values
 </b><br>
@ -560,9 +565,10 @@ Unsupported escape sequences
 <P>
 In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string
 handler and used to modify the case of following characters. By default, PCRE2
-does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
-is set, \U matches a "U" character, and \u can be used to define a character
-by code point, as described above.
+does not support these escape sequences in patterns. However, if either of the
+PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \U matches a "U"
+character, and \u can be used to define a character by code point, as
+described above.
 </P>
 <br><b>
 Absolute and relative backreferences
@ -3721,7 +3727,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC31" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 04 February 2019
+Last updated: 12 February 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@ -58,7 +58,8 @@ documentation. This document contains a quick-reference summary of the syntax.
 </P>
 <br><a name="SEC3" href="#TOC1">ESCAPED CHARACTERS</a><br>
 <P>
-This table applies to ASCII and Unicode environments.
+This table applies to ASCII and Unicode environments. An unrecognized escape 
+sequence causes an error.
 <pre>
  \a         alarm, that is, the BEL character (hex 07)
  \cx        "control-x", where x is any ASCII printing character
@ -70,12 +71,25 @@ This table applies to ASCII and Unicode environments.
  \0dd       character with octal code 0dd
  \ddd       character with octal code ddd, or backreference
  \o{ddd..}  character with octal code ddd..
-  \U         "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
  \N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
-  \uhhhh     character with hex code hhhh (if PCRE2_ALT_BSUX is set)
  \xhh       character with hex code hh
  \x{hh..}   character with hex code hh..
 </pre>
+If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
+following are also recognized:
+<pre>
+  \U         the character "U"
+  \uhhhh     character with hex code hhhh
+  \u{hh..}   character with hex code hh.. but only for EXTRA_ALT_BSUX
+</pre>
+When \x is not followed by {, from zero to two hexadecimal digits are read,
+but in ALT_BSUX mode \x must be followed by two hexadecimal digits to be
+recognized as a hexadecimal escape; otherwise it matches a literal "x".
+Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits 
+or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
+matches a literal "u".
+</P>
+<P>
 Note that \0dd is always an octal code. The treatment of backslash followed by
 a non-zero digit is complicated; for details see the section
 <a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
@ -86,13 +100,6 @@ also given. \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not
 supported in EBCDIC environments. Note that \N not followed by an opening
 curly bracket has a different meaning (see below).
 </P>
-<P>
-When \x is not followed by {, from zero to two hexadecimal digits are read,
-but if PCRE2_ALT_BSUX is set, \x must be followed by two hexadecimal digits to
-be recognized as a hexadecimal escape; otherwise it matches a literal "x".
-Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits,
-it matches a literal "u".
-</P>
 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
 <P>
 <pre>
@ -660,7 +667,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC28" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 03 February 2019
+Last updated: 11 February 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/html/pcre2test.html
+++ b/doc/html/pcre2test.html
@ -609,6 +609,7 @@ for a description of the effects of these options.
      escaped_cr_is_lf          set PCRE2_EXTRA_ESCAPED_CR_IS_LF 
  /x  extended                  set PCRE2_EXTENDED
  /xx extended_more             set PCRE2_EXTENDED_MORE
+      extra_alt_bsux            set PCRE2_EXTRA_ALT_BSUX 
      firstline                 set PCRE2_FIRSTLINE
      literal                   set PCRE2_LITERAL
      match_line                set PCRE2_EXTRA_MATCH_LINE
@ -2075,7 +2076,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 03 February 2019
+Last updated: 11 February 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
--- a/doc/pcre2_compile.3
+++ b/doc/pcre2_compile.3
@ -1,4 +1,4 @@
-.TH PCRE2_COMPILE 3 "16 June 2017" "PCRE2 10.30"
+.TH PCRE2_COMPILE 3 "11 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -73,7 +73,13 @@ The option bits are:
 PCRE2 must be built with Unicode support (the default) in order to use
 PCRE2_UTF, PCRE2_UCP and related options.
 .P
-The yield of the function is a pointer to a private data structure that
+Additional options may be set in the compile context via the
+.\" HREF
+\fBpcre2_set_compile_extra_options\fP 
+.\"
+function.
+.P
+The yield of this function is a pointer to a private data structure that
 contains the compiled pattern, or NULL if an error was detected.
 .P
 There is a complete description of the PCRE2 native API, with more detail on
--- a/doc/pcre2_set_compile_extra_options.3
+++ b/doc/pcre2_set_compile_extra_options.3
@ -1,4 +1,4 @@
-.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "21 September 2018" "PCRE2 10.33"
+.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "11 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -8,7 +8,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .PP
 .nf
 .B int pcre2_set_compile_extra_options(pcre2_compile_context *\fIccontext\fP,
-.B "  PCRE2_SIZE \fIextra_options\fP);"
+.B "  uint32_t \fIextra_options\fP);"
 .fi
 .
 .SH DESCRIPTION
@ -21,6 +21,9 @@ options are:
 .\" JOIN
  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  Allow \ex{df800} to \ex{dfff}
                                         in UTF-8 and UTF-32 modes
+.\" JOIN
+  PCRE2_EXTRA_ALT_BSUX                 Extended alternate \eu, \eU, and \ex
+                                         handling 
 .\" JOIN
  PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL    Treat all invalid escapes as
                                         a literal following character
--- a/doc/pcre2api.3
+++ b/doc/pcre2api.3
@ -1,4 +1,4 @@
-.TH PCRE2API 3 "04 February 2019" "PCRE2 10.33"
+.TH PCRE2API 3 "12 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@ -1231,7 +1231,7 @@ are needed. The \fBpcre2_code_copy_with_tables()\fP provides this facility.
 Copies of both the code and the tables are made, with the new code pointing to
 the new tables. The memory for the new tables is automatically freed when
 \fBpcre2_code_free()\fP is called for the new copy of the compiled code. If
-\fBpcre2_code_copy_withy_tables()\fP is called with a NULL argument, it returns
+\fBpcre2_code_copy_with_tables()\fP is called with a NULL argument, it returns
 NULL.
 .P
 NOTE: When one of the matching functions is called, pointers to the compiled
@ -1252,7 +1252,7 @@ below.
 .\"
 .P
 The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit
-settings that affect the compilation. It should be zero if no options are
+settings that affect the compilation. It should be zero if none of them are
 required. The available options are described below. Some of them (in
 particular, those that are compatible with Perl, but some others as well) can
 also be set and unset from within the pattern (see the detailed description in
@ -1267,8 +1267,9 @@ contents of the \fIoptions\fP argument specifies their settings at the start of
 compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK
 options can be set at the time of matching as well as at compile time.
 .P
-Other, less frequently required compile-time parameters (for example, the
-newline setting) can be provided in a compile context (as described
+Some additional options and less frequently required compile-time parameters
+(for example, the newline setting) can be provided in a compile context (as
+described
 .\" HTML <a href="#compilecontext">
 .\" </a>
 above).
@ -1325,6 +1326,11 @@ This code fragment shows a typical straightforward call to
    &erroffset,             /* for error offset */
    NULL);                  /* no compile context */
 .sp
+.
+.
+.SS "Main compile options"
+.rs
+.sp
 The following names for option bits are defined in the \fBpcre2.h\fP header
 file:
 .sp
@ -1361,6 +1367,16 @@ hexadecimal digits, in which case the hexadecimal number defines the code point
 to match. By default, as in Perl, a hexadecimal number is always expected after
 \ex, but it may have zero, one, or two digits (so, for example, \exz matches a
 binary zero character followed by z).
+.P
+ECMAscript 6 added additional functionality to \eu. This can be accessed using
+the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options"
+.\" HTML <a href="#extracompileoptions">
+.\" </a>
+below).
+.\"
+Note that this alternative escape handling applies only to patterns. Neither of 
+these options affects the processing of replacement strings passed to 
+\fBpcre2_substitute()\fP.
 .sp
  PCRE2_ALT_CIRCUMFLEX
 .sp
@ -1788,9 +1804,8 @@ characters with code points greater than 127.
 .SS "Extra compile options"
 .rs
 .sp
-Unlike the main compile-time options, the extra options are not saved with the
-compiled pattern. The option bits that can be set in a compile context by
-calling the \fBpcre2_set_compile_extra_options()\fP function are as follows:
+The option bits that can be set in a compile context by calling the
+\fBpcre2_set_compile_extra_options()\fP function are as follows:
 .sp
  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
 .sp
@ -1813,6 +1828,14 @@ If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
 point values in UTF-8 and UTF-32 patterns no longer provoke errors and are
 incorporated in the compiled pattern. However, they can only match subject
 characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
+.sp
+  PCRE2_EXTRA_ALT_BSUX
+.sp
+The original option PCRE2_ALT_BSUX causes PCRE2 to process \eU, \eu, and \ex in 
+the way that ECMAscript (aka JavaScript) does. Additional functionality was 
+defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of 
+PCRE2_ALT_BSUX, but in addition it recognizes \eu{hhh..} as a hexadecimal 
+character code, where hhh.. is any number of hexadecimal digits.
 .sp
  PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
 .sp
@ -3383,7 +3406,8 @@ capture groups and letters within \eQ...\eE quoted sequences.
 .P
 Note that case forcing sequences such as \eU...\eE do not nest. For example,
 the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
-effect.
+effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do 
+not apply to not apply to replacement strings.
 .P
 The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
 flexibility to capture group substitution. The syntax is similar to that used
@ -3792,6 +3816,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 04 February 2019
+Last updated: 12 February 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2compat.3
+++ b/doc/pcre2compat.3
@ -1,4 +1,4 @@
-.TH PCRE2COMPAT 3 "03 February 2019" "PCRE2 10.33"
+.TH PCRE2COMPAT 3 "12 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -33,8 +33,9 @@ non-newline character, and \eN{U+dd..}, matching a Unicode code point, are
 supported. The escapes that modify the case of following letters are
 implemented by Perl's general string-handling and are not part of its pattern
 matching engine. If any of these are encountered by PCRE2, an error is
-generated by default. However, if the PCRE2_ALT_BSUX option is set, \eU and \eu
-are interpreted as ECMAScript interprets them.
+generated by default. However, if either of the PCRE2_ALT_BSUX or
+PCRE2_EXTRA_ALT_BSUX options is set, \eU and \eu are interpreted as ECMAScript
+interprets them.
 .P
 5. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
 built with Unicode support (the default). The properties that can be tested
@ -198,6 +199,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 03 February 2019
+Last updated: 12 February 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "04 February 2019" "PCRE2 10.33"
+.TH PCRE2PATTERN 3 "12 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -373,12 +373,30 @@ environment, these escapes are as follows:
  \exhh        character with hex code hh
  \ex{hhh..}   character with hex code hhh..
  \eN{U+hhh..} character with Unicode hex code point hhh..
-  \euhhhh      character with hex code hhhh (when PCRE2_ALT_BSUX is set)
 .sp
-There are some legacy applications where the escape sequence \er is expected to
-match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \er in a
-pattern is converted to \en so that it matches a LF (linefeed) instead of a CR
-(carriage return) character.
+By default, after \ex that is not followed by {, from zero to two hexadecimal
+digits are read (letters can be in upper or lower case). Any number of
+hexadecimal digits may appear between \ex{ and }. If a character other than a
+hexadecimal digit appears between \ex{ and }, or if there is no terminating },
+an error occurs.
+.P
+Characters whose code points are less than 256 can be defined by either of the
+two syntaxes for \ex or by an octal sequence. There is no difference in the way
+they are handled. For example, \exdc is exactly the same as \ex{dc} or \e334.
+However, using the braced versions does make such sequences easier to read.
+.P
+Support is available for some ECMAScript (aka JavaScript) escape sequences via
+two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \ex followed
+by { is not recognized. Only if \ex is followed by two hexadecimal digits is it
+recognized as a character escape. Otherwise it is interpreted as a literal "x"
+character. In this mode, support for code points greater than 256 is provided
+by \eu, which must be followed by four hexadecimal digits; otherwise it is 
+interpreted as a literal "u" character.
+.P
+PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition,
+\eu{hhh..} is recognized as the character specified by hexadecimal code point.
+There may be any number of hexadecimal digits. This syntax is from ECMAScript 
+6.
 .P
 The \eN{U+hhh..} escape sequence is recognized only when the PCRE2_UTF option
 is set, that is, when PCRE2 is operating in a Unicode mode. Perl also uses
@ -386,6 +404,11 @@ is set, that is, when PCRE2 is operating in a Unicode mode. Perl also uses
 Note that when \eN is not followed by an opening brace (curly bracket) it has
 an entirely different meaning, matching any character that is not a newline.
 .P
+There are some legacy applications where the escape sequence \er is expected to
+match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \er in a
+pattern is converted to \en so that it matches a LF (linefeed) instead of a CR
+(carriage return) character.
+.P
 The precise effect of \ecx on ASCII characters is as follows: if x is a lower
 case letter, it is converted to upper case. Then bit 6 of the character (hex
 40) is inverted. Thus \ecA to \ecZ become hex 01 to hex 1A (A is 41, Z is 5A),
@ -477,25 +500,6 @@ for themselves. For example, outside a character class:
 Note that octal values of 100 or greater that are specified using this syntax
 must not be introduced by a leading zero, because no more than three octal
 digits are ever read.
-.P
-By default, after \ex that is not followed by {, from zero to two hexadecimal
-digits are read (letters can be in upper or lower case). Any number of
-hexadecimal digits may appear between \ex{ and }. If a character other than
-a hexadecimal digit appears between \ex{ and }, or if there is no terminating
-}, an error occurs.
-.P
-If the PCRE2_ALT_BSUX option is set, the interpretation of \ex is as just
-described only when it is followed by two hexadecimal digits. Otherwise, it
-matches a literal "x" character. In this mode, support for code points greater
-than 256 is provided by \eu, which must be followed by four hexadecimal digits;
-otherwise it matches a literal "u" character. This syntax makes PCRE2 behave 
-like ECMAscript (aka JavaScript). Code points greater than 0xFFFF are not
-supported.
-.P
-Characters whose value is less than 256 can be defined by either of the two
-syntaxes for \ex (or by \eu in PCRE2_ALT_BSUX mode). There is no difference in
-the way they are handled. For example, \exdc is exactly the same as \ex{dc} (or
-\eu00dc in PCRE2_ALT_BSUX mode).
 .
 .
 .SS "Constraints on character values"
@ -534,9 +538,10 @@ character class, these sequences have different meanings.
 .sp
 In Perl, the sequences \eF, \el, \eL, \eu, and \eU are recognized by its string
 handler and used to modify the case of following characters. By default, PCRE2
-does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
-is set, \eU matches a "U" character, and \eu can be used to define a character
-by code point, as described above.
+does not support these escape sequences in patterns. However, if either of the
+PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \eU matches a "U"
+character, and \eu can be used to define a character by code point, as
+described above.
 .
 .
 .SS "Absolute and relative backreferences"
@ -3758,6 +3763,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 04 February 2019
+Last updated: 12 February 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "03 February 2019" "PCRE2 10.33"
+.TH PCRE2SYNTAX 3 "11 February 2019" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -22,7 +22,8 @@ documentation. This document contains a quick-reference summary of the syntax.
 .SH "ESCAPED CHARACTERS"
 .rs
 .sp
-This table applies to ASCII and Unicode environments.
+This table applies to ASCII and Unicode environments. An unrecognized escape 
+sequence causes an error.
 .sp
  \ea         alarm, that is, the BEL character (hex 07)
  \ecx        "control-x", where x is any ASCII printing character
@ -34,12 +35,24 @@ This table applies to ASCII and Unicode environments.
  \e0dd       character with octal code 0dd
  \eddd       character with octal code ddd, or backreference
  \eo{ddd..}  character with octal code ddd..
-  \eU         "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
  \eN{U+hh..} character with Unicode code point hh.. (Unicode mode only)
-  \euhhhh     character with hex code hhhh (if PCRE2_ALT_BSUX is set)
  \exhh       character with hex code hh
  \ex{hh..}   character with hex code hh..
 .sp
+If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
+following are also recognized:
+.sp
+  \eU         the character "U"
+  \euhhhh     character with hex code hhhh
+  \eu{hh..}   character with hex code hh.. but only for EXTRA_ALT_BSUX
+.sp
+When \ex is not followed by {, from zero to two hexadecimal digits are read,
+but in ALT_BSUX mode \ex must be followed by two hexadecimal digits to be
+recognized as a hexadecimal escape; otherwise it matches a literal "x".
+Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits 
+or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
+matches a literal "u".
+.P
 Note that \e0dd is always an octal code. The treatment of backslash followed by
 a non-zero digit is complicated; for details see the section
 .\" HTML <a href="pcre2pattern.html#digitsafterbackslash">
@ -54,12 +67,6 @@ documentation, where details of escape processing in EBCDIC environments are
 also given. \eN{U+hh..} is synonymous with \ex{hh..} in PCRE2 but is not
 supported in EBCDIC environments. Note that \eN not followed by an opening
 curly bracket has a different meaning (see below).
-.P
-When \ex is not followed by {, from zero to two hexadecimal digits are read,
-but if PCRE2_ALT_BSUX is set, \ex must be followed by two hexadecimal digits to
-be recognized as a hexadecimal escape; otherwise it matches a literal "x".
-Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits,
-it matches a literal "u".
 .
 .
 .SH "CHARACTER TYPES"
@ -647,6 +654,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 03 February 2019
+Last updated: 11 February 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2test.1
+++ b/doc/pcre2test.1
@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "03 February 2019" "PCRE 10.33"
+.TH PCRE2TEST 1 "11 February 2019" "PCRE 10.33"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@ -568,6 +568,7 @@ for a description of the effects of these options.
      escaped_cr_is_lf          set PCRE2_EXTRA_ESCAPED_CR_IS_LF 
  /x  extended                  set PCRE2_EXTENDED
  /xx extended_more             set PCRE2_EXTENDED_MORE
+      extra_alt_bsux            set PCRE2_EXTRA_ALT_BSUX 
      firstline                 set PCRE2_FIRSTLINE
      literal                   set PCRE2_LITERAL
      match_line                set PCRE2_EXTRA_MATCH_LINE
@ -2056,6 +2057,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 03 February 2019
+Last updated: 11 February 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2test.txt
+++ b/doc/pcre2test.txt
@ -547,6 +547,7 @@ PATTERN MODIFIERS
             escaped_cr_is_lf          set PCRE2_EXTRA_ESCAPED_CR_IS_LF
         /x  extended                  set PCRE2_EXTENDED
         /xx extended_more             set PCRE2_EXTENDED_MORE
+             extra_alt_bsux            set PCRE2_EXTRA_ALT_BSUX
             firstline                 set PCRE2_FIRSTLINE
             literal                   set PCRE2_LITERAL
             match_line                set PCRE2_EXTRA_MATCH_LINE
@ -1887,5 +1888,5 @@ AUTHOR

 REVISION

-       Last updated: 03 February 2019
+       Last updated: 11 February 2019
       Copyright (c) 1997-2019 University of Cambridge.
--- a/src/pcre2.h.in
+++ b/src/pcre2.h.in
@ -150,6 +150,7 @@ D   is inspected during pcre2_dfa_match() execution
 #define PCRE2_EXTRA_MATCH_WORD               0x00000004u  /* C */
 #define PCRE2_EXTRA_MATCH_LINE               0x00000008u  /* C */
 #define PCRE2_EXTRA_ESCAPED_CR_IS_LF         0x00000010u  /* C */
+#define PCRE2_EXTRA_ALT_BSUX                 0x00000020u  /* C */

 /* These are for pcre2_jit_compile(). */

--- a/src/pcre2_compile.c
+++ b/src/pcre2_compile.c
@ -764,7 +764,7 @@ are allowed. */
 #define PUBLIC_COMPILE_EXTRA_OPTIONS \
   (PUBLIC_LITERAL_COMPILE_EXTRA_OPTIONS| \
    PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES|PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL| \
-    PCRE2_EXTRA_ESCAPED_CR_IS_LF)
+    PCRE2_EXTRA_ESCAPED_CR_IS_LF|PCRE2_EXTRA_ALT_BSUX)

 /* Compile time error code numbers. They are given names so that they can more
 easily be tracked. When a new number is added, the tables called eint1 and
@ -1459,7 +1459,8 @@ Returns:         zero => a data character

 int
 PRIV(check_escape)(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t *chptr,
-  int *errorcodeptr, uint32_t options, BOOL isclass, compile_block *cb)
+  int *errorcodeptr, uint32_t options, uint32_t extra_options, BOOL isclass, 
+  compile_block *cb)
 {
 BOOL utf = (options & PCRE2_UTF) != 0;
 PCRE2_SPTR ptr = *ptrptr;
@ -1495,8 +1496,7 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
  if (i > 0)
    {
    c = (uint32_t)i;
-    if (cb != NULL && c == CHAR_CR &&
-        (cb->cx->extra_options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)
+    if (c == CHAR_CR && (extra_options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)
      c = CHAR_LF;
    }
  else  /* Negative table entry */
@ -1551,22 +1551,28 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)

 /* Escapes that need further processing, including those that are unknown, have
 a zero entry in the lookup table. When called from pcre2_substitute(), only \c,
-\o, and \x are recognized (and \u when BSUX is set). */
+\o, and \x are recognized (\u and \U can never appear as they are used for case 
+forcing). */

 else
  {
+  int s;
  PCRE2_SPTR oldptr;
  BOOL overflow;
-  int s;
+  BOOL alt_bsux = 
+    ((options & PCRE2_ALT_BSUX) | (extra_options & PCRE2_EXTRA_ALT_BSUX)) != 0;

  /* Filter calls from pcre2_substitute(). */

-  if (cb == NULL && c != CHAR_c && c != CHAR_o && c != CHAR_x &&
-      (c != CHAR_u || (options & PCRE2_ALT_BSUX) != 0))
+  if (cb == NULL)
    {
-    *errorcodeptr = ERR3;
-    return 0;
-    }
+    if (c != CHAR_c && c != CHAR_o && c != CHAR_x)
+      {
+      *errorcodeptr = ERR3;
+      return 0;
+      }
+    alt_bsux = FALSE;   /* Do not modify \x handling */   
+    }   

  switch (c)
    {
@ -1579,40 +1585,74 @@ else
    *errorcodeptr = ERR37;
    break;

-    /* \u is unrecognized when PCRE2_ALT_BSUX is not set. When it is treated
-    specially, \u must be followed by four hex digits. Otherwise it is a
-    lowercase u letter. */
+    /* \u is unrecognized when neither PCRE2_ALT_BSUX nor PCRE2_EXTRA_ALT_BSUX
+    is set. Otherwise, \u must be followed by exactly four hex digits or, if
+    PCRE2_EXTRA_ALT_BSUX is set, by any number of hex digits in braces.
+    Otherwise it is a lowercase u letter. This gives some compatibility with
+    ECMAScript (aka JavaScript). */

    case CHAR_u:
-    if ((options & PCRE2_ALT_BSUX) == 0) *errorcodeptr = ERR37; else
+    if (!alt_bsux) *errorcodeptr = ERR37; else
      {
      uint32_t xc;
-      if (ptrend - ptr < 4) break;              /* Less than 4 chars */
-      if ((cc = XDIGIT(ptr[0])) == 0xff) break;  /* Not a hex digit */
-      if ((xc = XDIGIT(ptr[1])) == 0xff) break;  /* Not a hex digit */
-      cc = (cc << 4) | xc;
-      if ((xc = XDIGIT(ptr[2])) == 0xff) break;  /* Not a hex digit */
-      cc = (cc << 4) | xc;
-      if ((xc = XDIGIT(ptr[3])) == 0xff) break;  /* Not a hex digit */
-      c = (cc << 4) | xc;
-      ptr += 4;
+      
+      if (*ptr == CHAR_LEFT_CURLY_BRACKET && 
+          (extra_options & PCRE2_EXTRA_ALT_BSUX) != 0)
+        {
+        PCRE2_SPTR hptr = ptr + 1;
+        cc = 0;
+        
+        while (hptr < ptrend && (xc = XDIGIT(*hptr)) != 0xff)
+          { 
+          if ((cc & 0xf0000000) != 0)  /* Test for 32-bit overflow */
+            {
+            *errorcodeptr = ERR77;
+            ptr = hptr;   /* Show where */
+            break;        /* *hptr != } will cause another break below */  
+            } 
+          cc = (cc << 4) | xc;
+          hptr++; 
+          } 
+          
+        if (hptr == ptr + 1 ||   /* No hex digits */
+            hptr >= ptrend ||    /* Hit end of input */
+            *hptr != CHAR_RIGHT_CURLY_BRACKET)  /* No } terminator */
+          break;         /* Hex escape not recognized */
+           
+        c = cc;          /* Accept the code point */
+        ptr = hptr + 1; 
+        }
+         
+      else  /* Must be exactly 4 hex digits */
+        {      
+        if (ptrend - ptr < 4) break;               /* Less than 4 chars */
+        if ((cc = XDIGIT(ptr[0])) == 0xff) break;  /* Not a hex digit */
+        if ((xc = XDIGIT(ptr[1])) == 0xff) break;  /* Not a hex digit */
+        cc = (cc << 4) | xc;
+        if ((xc = XDIGIT(ptr[2])) == 0xff) break;  /* Not a hex digit */
+        cc = (cc << 4) | xc;
+        if ((xc = XDIGIT(ptr[3])) == 0xff) break;  /* Not a hex digit */
+        c = (cc << 4) | xc;
+        ptr += 4;
+        } 
+ 
      if (utf)
        {
        if (c > 0x10ffffU) *errorcodeptr = ERR77;
        else
          if (c >= 0xd800 && c <= 0xdfff &&
-            (cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
-              *errorcodeptr = ERR73;
+              (extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
+                *errorcodeptr = ERR73;
        }
      else if (c > MAX_NON_UTF_CHAR) *errorcodeptr = ERR77;
      }
    break;

-    /* \U is unrecognized unless PCRE2_ALT_BSUX is set, in which case it is an
-    upper case letter. */
+    /* \U is unrecognized unless PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set,
+    in which case it is an upper case letter. */

    case CHAR_U:
-    if ((options & PCRE2_ALT_BSUX) == 0) *errorcodeptr = ERR37;
+    if (!alt_bsux) *errorcodeptr = ERR37;
    break;

    /* In a character class, \g is just a literal "g". Outside a character
@ -1791,8 +1831,8 @@ else
        }
      else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET)
        {
-        if (utf && c >= 0xd800 && c <= 0xdfff && (cb == NULL ||
-            (cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0))
+        if (utf && c >= 0xd800 && c <= 0xdfff &&
+            (extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
          {
          ptr--;
          *errorcodeptr = ERR73;
@ -1806,11 +1846,11 @@ else
      }
    break;

-    /* \x is complicated. When PCRE2_ALT_BSUX is set, \x must be followed by
-    two hexadecimal digits. Otherwise it is a lowercase x letter. */
+    /* When PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set, \x must be followed
+    by two hexadecimal digits. Otherwise it is a lowercase x letter. */

    case CHAR_x:
-    if ((options & PCRE2_ALT_BSUX) != 0)
+    if (alt_bsux)
      {
      uint32_t xc;
      if (ptrend - ptr < 2) break;               /* Less than 2 characters */
@ -1818,9 +1858,9 @@ else
      if ((xc = XDIGIT(ptr[1])) == 0xff) break;  /* Not a hex digit */
      c = (cc << 4) | xc;
      ptr += 2;
-      }    /* End PCRE2_ALT_BSUX handling */
+      }

-    /* Handle \x in Perl's style. \x{ddd} is a character number which can be
+    /* Handle \x in Perl's style. \x{ddd} is a character code which can be
    greater than 0xff in UTF-8 or non-8bit mode, but only if the ddd are hex
    digits. If not, { used to be treated as a data character. However, Perl
    seems to read hex digits up to the first non-such, and ignore the rest, so
@ -1864,8 +1904,8 @@ else
          }
        else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET)
          {
-          if (utf && c >= 0xd800 && c <= 0xdfff && (cb == NULL ||
-              (cb->cx->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0))
+          if (utf && c >= 0xd800 && c <= 0xdfff &&
+              (extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0)
            {
            ptr--;
            *errorcodeptr = ERR73;
@ -2438,6 +2478,7 @@ uint32_t *parsed_pattern = cb->parsed_pattern;
 uint32_t *parsed_pattern_end = cb->parsed_pattern_end;
 uint32_t meta_quantifier = 0;
 uint32_t add_after_mark = 0;
+uint32_t extra_options = cb->cx->extra_options;
 uint16_t nest_depth = 0;
 int after_manual_callout = 0;
 int expect_cond_assert = 0;
@ -2461,12 +2502,12 @@ nest_save *top_nest, *end_nests;
 /* Insert leading items for word and line matching (features provided for the
 benefit of pcre2grep). */

-if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
+if ((extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
  {
  *parsed_pattern++ = META_CIRCUMFLEX;
  *parsed_pattern++ = META_NOCAPTURE;
  }
-else if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
+else if ((extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
  {
  *parsed_pattern++ = META_ESCAPE + ESC_b;
  *parsed_pattern++ = META_NOCAPTURE;
@ -2631,7 +2672,7 @@ while (ptr < ptrend)
      if ((options & PCRE2_ALT_VERBNAMES) != 0)
        {
        escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
-          FALSE, cb);
+          cb->cx->extra_options, FALSE, cb);
        if (errorcode != 0) goto FAILED;
        }
      else escape = 0;   /* Treat all as literal */
@ -2821,11 +2862,11 @@ while (ptr < ptrend)
    case CHAR_BACKSLASH:
    tempptr = ptr;
    escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options,
-      FALSE, cb);
+      cb->cx->extra_options, FALSE, cb);
    if (errorcode != 0)
      {
      ESCAPE_FAILED:
-      if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
+      if ((extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
        goto FAILED;
      ptr = tempptr;
      if (ptr >= ptrend) c = CHAR_BACKSLASH; else
@ -3382,12 +3423,12 @@ while (ptr < ptrend)
      else
        {
        tempptr = ptr;
-        escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode,
-          options, TRUE, cb);
+        escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options, 
+          cb->cx->extra_options, TRUE, cb);

        if (errorcode != 0)
          {
-          if ((cb->cx->extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
+          if ((extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0)
            goto FAILED;
          ptr = tempptr;
          if (ptr >= ptrend) c = CHAR_BACKSLASH; else
@ -4545,12 +4586,12 @@ parsed_pattern = manage_callouts(ptr, &previous_callout, auto_callout,
 /* Insert trailing items for word and line matching (features provided for the
 benefit of pcre2grep). */

-if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
+if ((extra_options & PCRE2_EXTRA_MATCH_LINE) != 0)
  {
  *parsed_pattern++ = META_KET;
  *parsed_pattern++ = META_DOLLAR;
  }
-else if ((cb->cx->extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
+else if ((extra_options & PCRE2_EXTRA_MATCH_WORD) != 0)
  {
  *parsed_pattern++ = META_KET;
  *parsed_pattern++ = META_ESCAPE + ESC_b;
--- a/src/pcre2_internal.h
+++ b/src/pcre2_internal.h
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
     Original API code Copyright (c) 1997-2012 University of Cambridge
-          New API code Copyright (c) 2016-2018 University of Cambridge
+          New API code Copyright (c) 2016-2019 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@ -1942,7 +1942,7 @@ is available. */
 extern int          _pcre2_auto_possessify(PCRE2_UCHAR *, BOOL,
                      const compile_block *);
 extern int          _pcre2_check_escape(PCRE2_SPTR *, PCRE2_SPTR, uint32_t *,
-                      int *, uint32_t, BOOL, compile_block *);
+                      int *, uint32_t, uint32_t, BOOL, compile_block *);
 extern PCRE2_SPTR   _pcre2_extuni(uint32_t, PCRE2_SPTR, PCRE2_SPTR, PCRE2_SPTR,
                      BOOL, int *);
 extern PCRE2_SPTR   _pcre2_find_bracket(PCRE2_SPTR, BOOL, int);
--- a/src/pcre2_substitute.c
+++ b/src/pcre2_substitute.c
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.

                       Written by Philip Hazel
     Original API code Copyright (c) 1997-2012 University of Cambridge
-          New API code Copyright (c) 2016-2018 University of Cambridge
+          New API code Copyright (c) 2016-2019 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@ -129,7 +129,7 @@ for (; ptr < ptrend; ptr++)

    ptr += 1;  /* Must point after \ */
    erc = PRIV(check_escape)(&ptr, ptrend, &ch, &errorcode,
-      code->overall_options, FALSE, NULL);
+      code->overall_options, code->extra_options, FALSE, NULL);
    ptr -= 1;  /* Back to last code unit of escape */
    if (errorcode != 0)
      {
@ -774,7 +774,7 @@ do

      ptr++;  /* Point after \ */
      rc = PRIV(check_escape)(&ptr, repend, &ch, &errorcode,
-        code->overall_options, FALSE, NULL);
+        code->overall_options, code->extra_options, FALSE, NULL);
      if (errorcode != 0) goto BADESCAPE;

      switch(rc)
--- a/src/pcre2test.c
+++ b/src/pcre2test.c
@ -646,6 +646,7 @@ static modstruct modlist[] = {
  { "expand",                     MOD_PAT,  MOD_CTL, CTL_EXPAND,                 PO(control) },
  { "extended",                   MOD_PATP, MOD_OPT, PCRE2_EXTENDED,             PO(options) },
  { "extended_more",              MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE,        PO(options) },
+  { "extra_alt_bsux",             MOD_CTC,  MOD_OPT, PCRE2_EXTRA_ALT_BSUX,       CO(extra_options) },
  { "find_limits",                MOD_DAT,  MOD_CTL, CTL_FINDLIMITS,             DO(control) },
  { "firstline",                  MOD_PAT,  MOD_OPT, PCRE2_FIRSTLINE,            PO(options) },
  { "framesize",                  MOD_PAT,  MOD_CTL, CTL_FRAMESIZE,              PO(control) },
@ -4189,10 +4190,11 @@ show_compile_extra_options(uint32_t options, const char *before,
  const char *after)
 {
 if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
-else fprintf(outfile, "%s%s%s%s%s%s%s",
+else fprintf(outfile, "%s%s%s%s%s%s%s%s",
  before,
  ((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "",
  ((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
+  ((options & PCRE2_EXTRA_ALT_BSUX) != 0)? " extra_alt_bsux" : "",
  ((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "",
  ((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "",
  ((options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)? " escaped_cr_is_lf" : "",
--- a/testdata/testinput2
+++ b/testdata/testinput2
@ -2408,13 +2408,13 @@
 \= Expect no match
    cat

-/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
    cat

 /TA]/
    The ACTA] comes

-/TA]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/TA]/allow_empty_class,match_unset_backref,dupnames
    The ACTA] comes

 /(?2)[]a()b](abc)/
@ -2446,25 +2446,25 @@

 /a[^]b/

-/a[]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab

-/a[]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]+b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab

-/a[]*+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]*+b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab

-/a[^]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[^]b/allow_empty_class,match_unset_backref,dupnames
    aXb
    a\nb
 \= Expect no match
    ab

-/a[^]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[^]+b/allow_empty_class,match_unset_backref,dupnames
    aXb
    a\nX\nXb
 \= Expect no match
@ -2903,10 +2903,10 @@
    xxxxabcde\=ps
    xxxxabcde\=ph

-/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
    cat

-/(\3)(\1)(a)/I,alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/I,allow_empty_class,match_unset_backref,dupnames
    cat

 /(\3)(\1)(a)/I
@ -3418,6 +3418,14 @@
    aU0041z
 \= Expect no match
    aAz
+    
+/^\u{7a}/alt_bsux
+    u{7a}
+\= Expect no match
+    zoo 
+
+/^\u{7a}/extra_alt_bsux
+    zoo 

 /(?(?=c)c|d)++Y/B

--- a/testdata/testinput5
+++ b/testdata/testinput5
@ -333,13 +333,13 @@

 /[[:a\x{100}b:]]/utf

-/a[^]b/utf,alt_bsux,allow_empty_class,match_unset_backref
+/a[^]b/utf,allow_empty_class,match_unset_backref
    a\x{1234}b
    a\nb
 \= Expect no match
    ab

-/a[^]+b/utf,alt_bsux,allow_empty_class,match_unset_backref
+/a[^]+b/utf,allow_empty_class,match_unset_backref
    aXb
    a\nX\nX\x{1234}b
 \= Expect no match
@ -814,6 +814,9 @@

 /\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref

+/^\u{0000000000010ffff}/utf,extra_alt_bsux
+    \x{10ffff}
+
 /^a+[a\x{200}]/B,utf
    aa

--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@ -8774,7 +8774,7 @@ No match
    cat
 No match

-/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
    cat
 0: a
 1: 
@ -8785,7 +8785,7 @@ No match
    The ACTA] comes
 0: TA]

-/TA]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/TA]/allow_empty_class,match_unset_backref,dupnames
    The ACTA] comes
 0: TA]

@ -8833,22 +8833,22 @@ Failed: error 106 at offset 4: missing terminating ] for character class
 /a[^]b/
 Failed: error 106 at offset 5: missing terminating ] for character class

-/a[]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab
 No match

-/a[]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]+b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab
 No match

-/a[]*+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[]*+b/allow_empty_class,match_unset_backref,dupnames
 \= Expect no match
    ab
 No match

-/a[^]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[^]b/allow_empty_class,match_unset_backref,dupnames
    aXb
 0: aXb
    a\nb
@ -8857,7 +8857,7 @@ No match
    ab
 No match

-/a[^]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/a[^]+b/allow_empty_class,match_unset_backref,dupnames
    aXb
 0: aXb
    a\nX\nXb
@ -9971,17 +9971,17 @@ Partial match: abca
    xxxxabcde\=ph
 Partial match: abcde

-/(\3)(\1)(a)/alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames
    cat
 0: a
 1: 
 2: 
 3: a

-/(\3)(\1)(a)/I,alt_bsux,allow_empty_class,match_unset_backref,dupnames
+/(\3)(\1)(a)/I,allow_empty_class,match_unset_backref,dupnames
 Capture group count = 3
 Max back reference = 3
-Options: alt_bsux allow_empty_class dupnames match_unset_backref
+Options: allow_empty_class dupnames match_unset_backref
 Last code unit = 'a'
 Subject length lower bound = 1
    cat
@ -11364,6 +11364,17 @@ No match
 \= Expect no match
    aAz
 No match
+    
+/^\u{7a}/alt_bsux
+    u{7a}
+ 0: u{7a}
+\= Expect no match
+    zoo 
+No match
+
+/^\u{7a}/extra_alt_bsux
+    zoo 
+ 0: z

 /(?(?=c)c|d)++Y/B
 ------------------------------------------------------------------
--- a/testdata/testoutput5
+++ b/testdata/testoutput5
@ -798,7 +798,7 @@ No match
 /[[:a\x{100}b:]]/utf
 Failed: error 130 at offset 3: unknown POSIX class name

-/a[^]b/utf,alt_bsux,allow_empty_class,match_unset_backref
+/a[^]b/utf,allow_empty_class,match_unset_backref
    a\x{1234}b
 0: a\x{1234}b
    a\nb
@ -807,7 +807,7 @@ Failed: error 130 at offset 3: unknown POSIX class name
    ab
 No match

-/a[^]+b/utf,alt_bsux,allow_empty_class,match_unset_backref
+/a[^]+b/utf,allow_empty_class,match_unset_backref
    aXb
 0: aXb
    a\nX\nX\x{1234}b
@ -1734,6 +1734,10 @@ No match
 /\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
 Failed: error 173 at offset 6: disallowed Unicode code point (>= 0xd800 && <= 0xdfff)

+/^\u{0000000000010ffff}/utf,extra_alt_bsux
+    \x{10ffff}
+ 0: \x{10ffff}
+
 /^a+[a\x{200}]/B,utf
 ------------------------------------------------------------------
        Bra