Major refactoring of pcre2_compile.c; see ChangeLog and HACKING.

2016-10-02 16:01:01 +00:00 · 2016-10-02 16:01:01 +00:00 · 99264dfc23
parent dda1e79060
commit 99264dfc23
46 changed files with 7298 additions and 6268 deletions
--- a/40
+++ b/40
@ -14,6 +14,46 @@ cause all characters greater than 255 to match, whatever else is in the class.
 There was a bug that caused this not to happen if a Unicode property item was
 added to such a class, for example [\D\P{Nd}] or [\W\pL].

+3. There has been a major re-factoring of the pcre2_compile.c file. Most syntax 
+checking is now done in the pre-pass that identifies capturing groups. This has 
+reduced the amount of duplication and made the code tidier. While doing this, 
+some minor bugs and Perl incompatibilities were fixed, including:
+
+  (a) \Q\E in the middle of a quantifier such as A+\Q\E+ is now ignored instead
+      of giving an invalid quantifier error.
+  (b) {0} can now be used after a group in a lookbehind assertion; previously
+      this caused an "assertion is not fixed length" error.
+  (c) Perl always treats (?(DEFINE) as a "define" group, even if a group with 
+      the name "DEFINE" exists. PCRE2 now does likewise.
+  (d) A recursion condition test such as (?(R2)...) must now refer to an 
+      existing subpattern.
+
+One effect of the refactoring is that some error numbers and messages have 
+changed, and the pattern offset given for compiling errors is not always the
+right-most character that has been read. In particular, for a variable-length
+lookbehind assertion it now points to the start of the assertion. Another
+change is that when a callout appears before a group, the "length of next
+pattern item" that is passed now just gives the length of the opening
+parenthesis item, not the length of the whole group. A length of zero is now
+given only for a callout at the end of the pattern. Automatic callouts are no 
+longer inserted before and after explicit callouts in the pattern.
+
+4. Back references are now permitted in lookbehind assertions when there are 
+no duplicated group numbers (that is, (?| has not been used), and, if the 
+reference is by name, there is only one group of that name. The referenced
+group must, of course be of fixed length.
+
+5. pcre2test has been upgraded so that, when run under valgrind with valgrind 
+support enabled, reading past the end of the pattern is detected, both when 
+compiling and during callout processing.
+
+6. \g{+<number>} (e.g. \g{+2)} ) is now supported. It is a "forward back 
+reference" and can be useful in repetitions (compare \g{-<number>}). Perl does 
+not recognize this syntax.
+
+7. Automatic callouts are no longer generated before and after callouts in the 
+pattern.
+

 Version 10.22 29-July-2016
 --------------------------
--- a/241
+++ b/241
@ -7,8 +7,8 @@ but with a revised (and incompatible) API. To avoid confusion, the original
 library is referred to as PCRE1 below. For information about testing PCRE2, see
 the pcre2test documentation and the comment at the head of the RunTest file.

-PCRE1 releases were up to 8.3x when PCRE2 was developed. The 8.xx series will
-continue for bugfixes if necessary. PCRE2 releases started at 10.00 to avoid
+PCRE1 releases were up to 8.3x when PCRE2 was developed, and later bug fix 
+releases remain in the 8.xx series. PCRE2 releases started at 10.00 to avoid
 confusion with PCRE1.


@ -16,19 +16,20 @@ Historical note 1
 -----------------

 Many years ago I implemented some regular expression functions to an algorithm
-suggested by Martin Richards. These were not Unix-like in form, and were quite
-restricted in what they could do by comparison with Perl. The interesting part
-about the algorithm was that the amount of space required to hold the compiled
-form of an expression was known in advance. The code to apply an expression did
-not operate by backtracking, as the original Henry Spencer code and current
-PCRE2 and Perl code does, but instead checked all possibilities simultaneously
-by keeping a list of current states and checking all of them as it advanced
-through the subject string. In the terminology of Jeffrey Friedl's book, it was
-a "DFA algorithm", though it was not a traditional Finite State Machine (FSM).
-When the pattern was all used up, all remaining states were possible matches,
-and the one matching the longest subset of the subject string was chosen. This
-did not necessarily maximize the individual wild portions of the pattern, as is
-expected in Unix and Perl-style regular expressions.
+suggested by Martin Richards. The rather simple patterns were not Unix-like in
+form, and were quite restricted in what they could do by comparison with Perl.
+The interesting part about the algorithm was that the amount of space required
+to hold the compiled form of an expression was known in advance. The code to
+apply an expression did not operate by backtracking, as the original Henry
+Spencer code and current PCRE2 and Perl code does, but instead checked all
+possibilities simultaneously by keeping a list of current states and checking
+all of them as it advanced through the subject string. In the terminology of
+Jeffrey Friedl's book, it was a "DFA algorithm", though it was not a
+traditional Finite State Machine (FSM). When the pattern was all used up, all
+remaining states were possible matches, and the one matching the longest subset
+of the subject string was chosen. This did not necessarily maximize the
+individual wild portions of the pattern, as is expected in Unix and Perl-style
+regular expressions.


 Historical note 2
@ -85,7 +86,7 @@ had become very complicated and hard to maintain. Indeed one of the early
 things I did for 6.8 was to fix Yet Another Bug in the memory computation. Then
 I had a flash of inspiration as to how I could run the real compile function in
 a "fake" mode that enables it to compute how much memory it would need, while
-actually only ever using a few hundred bytes of working memory, and without too
+in most cases only ever using a small amount of working memory, and without too
 many tests of the mode that might slow it down. So I refactored the compiling
 functions to work this way. This got rid of about 600 lines of source. It
 should make future maintenance and development easier. As this was such a major
@ -104,20 +105,204 @@ system stack used by the compile function, which uses recursive function calls
 for nested parenthesized groups. This is a safety feature for environments with
 small stacks where the patterns are provided by users.

-History repeated itself for release 10.20. A number of bugs relating to named 
-subpatterns had been discovered by fuzzers. Most of these were related to the 
-handling of forward references when it was not known if the named pattern was
+
+Yet another pattern scan
+------------------------
+
+History repeated itself for PCRE2 release 10.20. A number of bugs relating to
+named subpatterns had been discovered by fuzzers. Most of these were related to
+the handling of forward references when it was not known if the named group was
 unique. (References to non-unique names use a different opcode and more
 memory.) The use of duplicate group numbers (the (?| facility) also caused
-issues. 
+issues.

-To get around these problems I adopted a new approach by adding a third pass,
-really a "pre-pass", over the pattern, which does nothing other than identify
-all the named subpatterns and their corresponding group numbers. This means 
-that the actual compile (both pre-pass and real compile) have full knowledge of 
-group names and numbers throughout. Several dozen lines of messy code were 
-eliminated, though the new pre-pass is not short (skipping over [] classes is 
-complicated).
+To get around these problems I adopted a new approach by adding a third pass
+over the pattern (really a "pre-pass"), which did nothing other than identify
+all the named subpatterns and their corresponding group numbers. This means
+that the actual compile (both the memory-computing dummy run and the real
+compile) has full knowledge of group names and numbers throughout. Several
+dozen lines of messy code were eliminated, though the new pre-pass was not
+short. In particular, parsing and skipping over [] classes is complicated.
+
+While working on 10.22 I realized that I could simplify yet again by moving 
+more of the parsing into the pre-pass, thus avoiding doing it in two places, so
+after 10.22 was released, the code underwent yet another big refactoring. This
+is how it is from 10.23 onwards:
+
+The function called parse_regex() scans the pattern characters, parsing them 
+into literal data and meta characters. It converts escapes such as \x{123} 
+into literals, handles \Q...\E, and skips over comments and non-significant 
+white space. The result of the scanning is put into a vector of 32-bit unsigned 
+integers. Values less than 0x80000000 are literal data. Higher values represent 
+meta-characters. The top 16-bits of such values identify the meta-character,
+and these are given names such as META_CAPTURE. The lower 16-bits are available
+for data, for example, the capturing group number. The only situation in which 
+literal data values greater than 0x7fffffff can appear is when the 32-bit 
+library is running in non-UTF mode. This is handled by having a special 
+meta-character that is followed by the 32-bit data value.
+
+The size of the parsed pattern vector, when auto-callouts are not enabled, is
+bounded by the length of the pattern (with one exception). The code is written 
+so that each item in the pattern uses no more vector elements than the number 
+of code units in the item itself. The exception is the aforementioned large
+32-bit number handling. For this reason, 32-bit non-UTF patterns are scanned in
+advance to check for such values. When auto-callouts are enabled, the generous
+assumption is made that there will be a callout for each pattern code unit
+(which of course is only actually true if all code units are literals) plus one 
+at the end. There is a default parsed pattern vector on the stack, but if this
+is not big enough, heap memory is used.
+
+As before, the actual compiling function is run twice, the first time to 
+determine the amount of memory needed for the final compiled pattern. It 
+now processes the parsed pattern vector, not the pattern itself, although some
+of the parsed items refer to strings in the pattern - for example, group
+names. As escapes and comments have already been processed, the code is a bit 
+simpler than before.
+
+Most errors can be diagnosed during the parsing scan. For those that cannot
+(for example, "lookbehind assertion is not fixed length"), the parsed code
+contains offsets into the pattern so that the actual compiling code can
+identify where errors occur.
+
+
+The elements of the parsed pattern vector
+-----------------------------------------
+
+The word "offset" below means a code unit offset into the pattern. When 
+PCRE2_SIZE (which is usually size_t) is no bigger than uint32_t, an offset is
+stored in a single parsed pattern element. Otherwise (typically on 64-bit
+systems) it occupies two elements. The following meta items occupy just one
+element, with no data:
+
+META_ACCEPT           (*ACCEPT)
+META_ALT              | alternation 
+META_ASTERISK         *  
+META_ASTERISK_PLUS    *+ 
+META_ASTERISK_QUERY   *? 
+META_ATOMIC           (?> start of atomic group 
+META_CIRCUMFLEX       ^ metacharacter 
+META_CLASS            [ start of non-empty class 
+META_CLASS_EMPTY      [] empty class - only with PCRE2_ALLOW_EMPTY_CLASS 
+META_CLASS_EMPTY_NOT  [^] negative empty class - ditto
+META_CLASS_END        ] end of non-empty class 
+META_CLASS_NOT        [^ start non-empty negative class 
+META_COMMIT           (*COMMIT)
+META_DOLLAR           $ metacharacter 
+META_DOT              . metacharacter 
+META_END              End of pattern (this value is 0x80000000)
+META_FAIL             (*FAIL)
+META_KET              ) closing parenthesis 
+META_LOOKAHEAD        (?= start of lookahead
+META_LOOKAHEADNOT     (?! start of negative lookahead
+META_NOCAPTURE        (?: no capture parens 
+META_PLUS             +  
+META_PLUS_PLUS        ++ 
+META_PLUS_QUERY       +? 
+META_PRUNE            (*PRUNE) - no argument
+META_QUERY            ?  
+META_QUERY_PLUS       ?+ 
+META_QUERY_QUERY      ?? 
+META_RANGE_ESCAPED    hyphen in class range with at least one escape 
+META_RANGE_LITERAL    hyphen in class range defined literally 
+META_SKIP             (*SKIP) - no argument
+META_THEN             (*THEN) - no argument
+
+The two RANGE values occur only in character classes. They are positioned 
+between two literals that define the start and end of the range. In an EBCDIC 
+evironment it is necessary to know whether either of the range values was 
+specified as an escape. In an ASCII/Unicode environment the distinction is not 
+relevant.
+
+The following have data in the lower 16 bits, and may be followed by other data 
+elements:
+
+META_BACKREF
+META_CAPTURE
+META_ESCAPE
+META_RECURSE
+
+META_BACKREF, META_CAPTURE, and META_RECURSE have the capture group number as
+their data in the lower 16 bits of the element.
+
+META_BACKREF is followed by an offset if the back reference group number is 10
+or more. The offsets of the first ocurrences of references to groups whose 
+numbers are less than 10 are put in cb->small_ref_offset[] (only the first
+occurrence is useful). On 64-bit systems this avoids using more than two parsed
+pattern elements for items such as \3. The offset is used when an error is
+given for a reference to a non-existent group.
+
+META_RECURSE is always followed by an offset, for use in error messages.
+
+META_ESCAPE has an ESC_xxx value as its data. For ESC_P and ESC_p, the next
+element contains the 16-bit type and data property values, packed together.
+ESC_g and ESC_k are used only for named references - numerical ones are turned
+into META_RECURSE or META_BACKREF as appropriate. They are followed by a length
+and an offset into the pattern to specify the name.
+
+The following have one data item that follows in the next vector element:
+
+META_BIGVALUE         Next is a literal >= META_END 
+META_OPTIONS          (?i) and friends (data is new option bits)
+META_POSIX            POSIX class item (data identifies the class)
+META_POSIX_NEG        negative POSIX class item (ditto)
+
+The following are followed by a length element, then a number of character code
+values (which should match with the length):
+
+META_MARK             (*MARK:xxxx) 
+META_PRUNE_ARG        (*PRUNE:xxx)
+META_SKIP_ARG         (*SKIP:xxxx)
+META_THEN_ARG         (*THEN:xxxx)
+
+The following are followed by a length element, then an offset in the pattern 
+that identifies the name:
+
+META_COND_NAME        (?(<name>) or (?('name') or (?(name) 
+META_COND_RNAME       (?(R&name) 
+META_COND_RNUMBER     (?(Rdigits)
+META_RECURSE_BYNAME   (?&name) 
+META_BACKREF_BYNAME   \k'name' 
+
+META_COND_RNUMBER is used for names that start with R and continue with digits,
+because this is an ambiguous case. It could be a back reference to a group with
+that name, or it could be a recursion test on a numbered group.
+
+This one is followed by an offset, for use in error messages, then a number:
+
+META_COND_NUMBER       (?([+-]digits) 
+
+The following are followed just by an offset, for use in error messages:
+
+META_COND_ASSERT      (?(?assertion)
+META_COND_DEFINE      (?(DEFINE)
+META_LOOKBEHIND       (?<= 
+META_LOOKBEHINDNOT    (?<!
+
+In fact, META_COND_ASSERT is used for any group starting (?( that does not 
+match any of the other META_COND cases. The check that this group is an 
+assertion (optionally preceded by a callout) happens at compile time. 
+
+The following are followed by two values, the minimum and maximum. Repeat
+values are limited to 65535 (MAX_REPEAT). A maximum value of "unlimited" is
+represented by UNLIMITED_REPEAT, which is bigger than MAX_REPEAT:
+
+META_MINMAX           {n,m}  repeat 
+META_MINMAX_PLUS      {n,m}+ repeat 
+META_MINMAX_QUERY     {n,m}? repeat 
+
+This one is followed by three elements. The first is 0 for '>' and 1 for '>=';
+the next two are the major and minor numbers:
+
+META_COND_VERSION     (?(VERSION<op>x.y)
+
+Callouts are converted into one of two items:
+
+META_CALLOUT_NUMBER   (?C with numerical argument 
+META_CALLOUT_STRING   (?C with string argument 
+
+In both cases, the next two elements contain the offset and length of the next 
+item in the pattern. Then there is either one callout number, or a length and 
+an offset for the string argument. The length includes both delimiters.


 Traditional matching function
@ -606,4 +791,4 @@ not a real opcode, but is used to check that tables indexed by opcode are the
 correct length, in order to catch updating errors.

 Philip Hazel
-June 2016
+September 2016
--- a/Makefile.am
+++ b/Makefile.am
@ -65,6 +65,7 @@ dist_html_DATA = \
  doc/html/pcre2_set_character_tables.html \
  doc/html/pcre2_set_compile_recursion_guard.html \
  doc/html/pcre2_set_match_limit.html \
+  doc/html/pcre2_set_max_pattern_length.html \
  doc/html/pcre2_set_offset_limit.html \
  doc/html/pcre2_set_newline.html \
  doc/html/pcre2_set_parens_nest_limit.html \
@ -146,6 +147,7 @@ dist_man_MANS = \
  doc/pcre2_set_character_tables.3 \
  doc/pcre2_set_compile_recursion_guard.3 \
  doc/pcre2_set_match_limit.3 \
+  doc/pcre2_set_max_pattern_length.3 \
  doc/pcre2_set_offset_limit.3 \
  doc/pcre2_set_newline.3 \
  doc/pcre2_set_parens_nest_limit.3 \
--- a/2
+++ b/2
@ -502,7 +502,7 @@ for bmode in "$test8" "$test16" "$test32"; do
    for opt in "" $jitopt; do
      $sim $valgrind ${opt:+$vjs} ./pcre2test -q $test2stack $bmode $opt $testdata/testinput2 testtry
      if [ $? = 0 ] ; then
-        $sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -63,-62,-2,-1,0,100,188,189 >>testtry
+        $sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -63,-62,-2,-1,0,100,188,189,190 >>testtry
        checkresult $? 2 "$opt"
      else
        echo " "
--- a/doc/pcre2api.3
+++ b/doc/pcre2api.3
@ -1,4 +1,4 @@
-.TH PCRE2API 3 "17 June 2016" "PCRE2 10.22"
+.TH PCRE2API 3 "30 September 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@ -693,7 +693,8 @@ functions, \fIpcre2_match()\fP and \fIpcre2_dfa_match()\fP.
 .sp
 This parameter ajusts the limit, set when PCRE2 is built (default 250), on the
 depth of parenthesis nesting in a pattern. This limit stops rogue patterns
-using up too much system stack when being compiled.
+using up too much system stack when being compiled. The limit applies to 
+parentheses of all kinds, not just capturing parentheses.
 .sp
 .nf
 .B int pcre2_set_compile_recursion_guard(pcre2_compile_context *\fIccontext\fP,
@ -1091,7 +1092,13 @@ NULL immediately. Otherwise, the variables to which these point are set to an
 error code and an offset (number of code units) within the pattern,
 respectively, when \fBpcre2_compile()\fP returns NULL because a compilation
 error has occurred. The values are not defined when compilation is successful
-and \fBpcre2_compile()\fP returns a non-NULL value.
+and \fBpcre2_compile()\fP returns a non-NULL value. 
+.P
+The value returned in \fIerroroffset\fP is an indication of where in the
+pattern the error occurred. It is not necessarily the furthest point in the
+pattern that was read. For example, after the error "lookbehind assertion is
+not fixed length", the error offset points to the start of the failing
+assertion.
 .P
 The \fBpcre2_get_error_message()\fP function (see "Obtaining a textual error
 message"
@ -1184,8 +1191,8 @@ recognized, exactly as in the rest of the pattern.
  PCRE2_AUTO_CALLOUT
 .sp
 If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
-all with number 255, before each pattern item. For discussion of the callout
-facility, see the
+all with number 255, before each pattern item, except immediately before or
+after a callout in the pattern. For discussion of the callout facility, see the
 .\" HREF
 \fBpcre2callout\fP
 .\"
@ -3292,6 +3299,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 17 June 2016
+Last updated: 30 September 2016
 Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/doc/pcre2callout.3
+++ b/doc/pcre2callout.3
@ -1,4 +1,4 @@
-.TH PCRE2CALLOUT 3 "23 March 2015" "PCRE2 10.20"
+.TH PCRE2CALLOUT 3 "29 September 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -40,11 +40,20 @@ two callout points:
 .sp
 If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
 automatically inserts callouts, all with number 255, before each item in the
-pattern. For example, if PCRE2_AUTO_CALLOUT is used with the pattern
+pattern except for immediately before or after a callout item in the pattern.
+For example, if PCRE2_AUTO_CALLOUT is used with the pattern
+.sp
+  A(?C3)B
+.sp
+it is processed as if it were
+.sp
+  (?C255)A(?C3)B(?C255)   
+.sp
+Here is a more complicated example:
 .sp
  A(\ed{2}|--)
 .sp
-it is processed as if it were
+With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were
 .sp
 (?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
 .sp
@ -91,10 +100,10 @@ with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied to the string
  No match
 .sp
 This indicates that when matching [bc] fails, there is no backtracking into a+
-and therefore the callouts that would be taken for the backtracks do not occur.
-You can disable the auto-possessify feature by passing PCRE2_NO_AUTO_POSSESS to
-\fBpcre2_compile()\fP, or starting the pattern with (*NO_AUTO_POSSESS). In this
-case, the output changes to this:
+(because it is being treated as a++) and therefore the callouts that would be
+taken for the backtracks do not occur. You can disable the auto-possessify
+feature by passing PCRE2_NO_AUTO_POSSESS to \fBpcre2_compile()\fP, or starting
+the pattern with (*NO_AUTO_POSSESS). In this case, the output changes to this:
 .sp
  --->aaaa
   +0 ^        a+
@ -220,8 +229,8 @@ but the intention is never to remove any of the existing fields.
 .sp
 For a numerical callout, \fIcallout_string\fP is NULL, and \fIcallout_number\fP
 contains the number of the callout, in the range 0-255. This is the number
-that follows (?C for manual callouts; it is 255 for automatically generated
-callouts.
+that follows (?C for callouts that part of the pattern; it is 255 for
+automatically generated callouts.
 .
 .
 .SS "Fields for string callouts"
@ -286,10 +295,15 @@ The \fIpattern_position\fP field contains the offset in the pattern string to
 the next item to be matched.
 .P
 The \fInext_item_length\fP field contains the length of the next item to be
-matched in the pattern string. When the callout immediately precedes an
-alternation bar, a closing parenthesis, or the end of the pattern, the length
-is zero. When the callout precedes an opening parenthesis, the length is that
-of the entire subpattern.
+processed in the pattern string. When the callout is at the end of the pattern,
+the length is zero. When the callout precedes an opening parenthesis, the
+length includes meta characters that follow the parenthesis. For example, in a
+callout before an assertion such as (?=ab) the length is 3. For an an
+alternation bar or a closing parenthesis, the length is one, unless a closing
+parenthesis is followed by a quantifier, in which case its length is included.
+(This changed in release 10.23. In earlier releases, before an opening
+parenthesis the length was that of the entire subpattern, and before an
+alternation bar or a closing parenthesis the length was zero.)
 .P
 The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
 help in distinguishing between different automatic callouts, which all have the
@ -382,6 +396,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 23 March 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 29 September 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/doc/pcre2compat.3
+++ b/doc/pcre2compat.3
@ -1,4 +1,4 @@
-.TH PCRE2COMPAT 3 "15 March 2015" "PCRE2 10.20"
+.TH PCRE2COMPAT 3 "30 September 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -96,7 +96,7 @@ processed as anchored at the point where they are tested.
 one that is backtracked onto acts. For example, in the pattern
 A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
 triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
-same as PCRE2, but there are examples where it differs.
+same as PCRE2, but there are cases where it differs.
 .P
 11. Most backtracking verbs in assertions have their normal actions. They are
 not confined to the assertion.
@ -116,10 +116,11 @@ would not be possible to distinguish which parentheses matched, because both
 names map to capturing subpattern number 1. To avoid this confusing situation,
 an error is given at compile time.
 .P
-14. Perl recognizes comments in some places that PCRE2 does not, for example,
-between the ( and ? at the start of a subpattern. If the /x modifier is set,
-Perl allows white space between ( and ? (though current Perls warn that this is
-deprecated) but PCRE2 never does, even if the PCRE2_EXTENDED option is set.
+14. Perl used to recognize comments in some places that PCRE2 does not, for
+example, between the ( and ? at the start of a subpattern. If the /x modifier
+is set, Perl allowed white space between ( and ? though the latest Perls give 
+an error (for a while it was just deprecated). There may still be some cases 
+where Perl behaves differently.
 .P
 15. Perl, when in warning mode, gives warnings for character classes such as
 [A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
@ -139,35 +140,39 @@ list is with respect to Perl 5.10:
 .sp
 (a) Although lookbehind assertions in PCRE2 must match fixed length strings,
 each alternative branch of a lookbehind assertion can match a different length
-of string. Perl requires them all to have the same length.
+of string. Perl requires them all to have the same length. 
 .sp
-(b) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
+(b) From PCRE2 10.23, back references to groups of fixed length are supported
+in lookbehinds, provided that there is no possibility of referencing a
+non-unique number or name. Perl does not support backreferences in lookbehinds.
+.sp
+(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
 meta-character matches only at the very end of the string.
 .sp
-(c) A backslash followed by a letter with no special meaning is faulted. (Perl
+(d) A backslash followed by a letter with no special meaning is faulted. (Perl
 can be made to issue a warning.)
 .sp
-(d) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
+(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
 inverted, that is, by default they are not greedy, but if followed by a
 question mark they are.
 .sp
-(e) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
+(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
 only at the first matching position in the subject string.
 .sp
-(f) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, and
+(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, and
 PCRE2_NO_AUTO_CAPTURE options have no Perl equivalents.
 .sp
-(g) The \eR escape sequence can be restricted to match only CR, LF, or CRLF
+(h) The \eR escape sequence can be restricted to match only CR, LF, or CRLF
 by the PCRE2_BSR_ANYCRLF option.
 .sp
-(h) The callout facility is PCRE2-specific.
+(i) The callout facility is PCRE2-specific.
 .sp
-(i) The partial matching facility is PCRE2-specific.
+(j) The partial matching facility is PCRE2-specific.
 .sp
-(j) The alternative matching function (\fBpcre2_dfa_match()\fP matches in a
+(k) The alternative matching function (\fBpcre2_dfa_match()\fP matches in a
 different way and is not Perl-compatible.
 .sp
-(k) PCRE2 recognizes some special sequences such as (*CR) at the start of
+(l) PCRE2 recognizes some special sequences such as (*CR) at the start of
 a pattern that set overall options that cannot be changed within the pattern.
 .
 .
@ -185,6 +190,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 15 March 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 30 September 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/doc/pcre2limits.3
+++ b/doc/pcre2limits.3
@ -1,4 +1,4 @@
-.TH PCRE2LIMITS 3 "05 November 2015" "PCRE2 10.21"
+.TH PCRE2LIMITS 3 "29 September 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "SIZE AND OTHER LIMITATIONS"
@ -46,19 +46,19 @@ The maximum length of a lookbehind assertion is 65535 characters.
 There is no limit to the number of parenthesized subpatterns, but there can be
 no more than 65535 capturing subpatterns. There is, however, a limit to the
 depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
-order to limit the amount of system stack used at compile time. The limit can
-be specified when PCRE2 is built; the default is 250.
-.P
-There is a limit to the number of forward references to subsequent subpatterns
-of around 200,000. Repeated forward references with fixed upper limits, for
-example, (?2){0,100} when subpattern number 2 is to the right, are included in
-the count. There is no limit to the number of backward references.
+order to limit the amount of system stack used at compile time. The default
+limit can be specified when PCRE2 is built; the default default is 250. An 
+application can change this limit by calling pcre2_set_parens_nest_limit() to 
+set the limit in a compile context.
 .P
 The maximum length of name for a named subpattern is 32 code units, and the
 maximum number of named subpatterns is 10000.
 .P
 The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
 is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit libraries.
+.P
+The maximum length of a string argument to a callout is the largest number a 
+32-bit unsigned integer can hold.
 .
 .
 .SH AUTHOR
@ -75,6 +75,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 05 November 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 29 September 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "20 June 2016" "PCRE2 10.22"
+.TH PCRE2PATTERN 3 "30 September 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -508,9 +508,9 @@ by code point, as described in the previous section.
 .SS "Absolute and relative back references"
 .rs
 .sp
-The sequence \eg followed by an unsigned or a negative number, optionally
-enclosed in braces, is an absolute or relative back reference. A named back
-reference can be coded as \eg{name}. Back references are discussed
+The sequence \eg followed by a signed or unsigned number, optionally enclosed
+in braces, is an absolute or relative back reference. A named back reference
+can be coded as \eg{name}. Back references are discussed
 .\" HTML <a href="#backreferences">
 .\" </a>
 later,
@ -1325,13 +1325,33 @@ when matching character classes, whatever line-ending sequence is in use, and
 whatever setting of the PCRE2_DOTALL and PCRE2_MULTILINE options is used. A
 class such as [^a] always matches one of these characters.
 .P
+The character escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev,
+\eV, \ew, and \eW may appear in a character class, and add the characters that
+they match to the class. For example, [\edABCDEF] matches any hexadecimal
+digit. In UTF modes, the PCRE2_UCP option affects the meanings of \ed, \es, \ew
+and their upper case partners, just as it does when they appear outside a
+character class, as described in the section entitled
+.\" HTML <a href="#genericchartypes">
+.\" </a>
+"Generic character types"
+.\"
+above. The escape sequence \eb has a different meaning inside a character
+class; it matches the backspace character. The sequences \eB, \eN, \eR, and \eX
+are not special inside a character class. Like any other unrecognized escape
+sequences, they cause an error.
+.P
 The minus (hyphen) character can be used to specify a range of characters in a
 character class. For example, [d-m] matches any letter between d and m,
 inclusive. If a minus character is required in a class, it must be escaped with
 a backslash or appear in a position where it cannot be interpreted as
-indicating a range, typically as the first or last character in the class, or
-immediately after a range. For example, [b-d-z] matches letters in the range b
-to d, a hyphen character, or z.
+indicating a range, typically as the first or last character in the class,
+or immediately after a range. For example, [b-d-z] matches letters in the range
+b to d, a hyphen character, or z.
+.P
+Perl treats a hyphen as a literal if it appears before a POSIX class (see
+below) or a character type escape such as as \ed, but gives a warning in its 
+warning mode, as this is most likely a user error. As PCRE2 has no facility for
+warning, an error is given in these cases.
 .P
 It is not possible to have the literal character "]" as the end character of a
 range. A pattern such as [W-]46] is interpreted as a class of two characters
@ -1341,11 +1361,6 @@ the end of range, so [W-\e]46] is interpreted as a class containing a range
 followed by two other characters. The octal or hexadecimal representation of
 "]" can also be used to end a range.
 .P
-An error is generated if a POSIX character class (see below) or an escape
-sequence other than one that defines a single character appears at a point
-where a range ending character is expected. For example, [z-\exff] is valid,
-but [A-\ed] and [A-[:digit:]] are not.
-.P
 Ranges normally include all code points between the start and end characters,
 inclusive. They can also be used for code points specified numerically, for
 example [\e000-\e037]. Ranges can include any characters that are valid for the
@ -1365,21 +1380,6 @@ matches the letters in either case. For example, [W-c] is equivalent to
 tables for a French locale are in use, [\exc8-\excb] matches accented E
 characters in both cases.
 .P
-The character escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev,
-\eV, \ew, and \eW may appear in a character class, and add the characters that
-they match to the class. For example, [\edABCDEF] matches any hexadecimal
-digit. In UTF modes, the PCRE2_UCP option affects the meanings of \ed, \es, \ew
-and their upper case partners, just as it does when they appear outside a
-character class, as described in the section entitled
-.\" HTML <a href="#genericchartypes">
-.\" </a>
-"Generic character types"
-.\"
-above. The escape sequence \eb has a different meaning inside a character
-class; it matches the backspace character. The sequences \eB, \eN, \eR, and \eX
-are not special inside a character class. Like any other unrecognized escape
-sequences, they cause an error.
-.P
 A circumflex can conveniently be used with the upper case character types to
 specify a more restricted set of characters than the matching lower case type.
 For example, the class [^\eW_] matches any letter or digit, but not underscore,
@ -2096,9 +2096,9 @@ no such problem when named parentheses are used. A back reference to any
 subpattern is possible using named parentheses (see below).
 .P
 Another way of avoiding the ambiguity inherent in the use of digits following a
-backslash is to use the \eg escape sequence. This escape must be followed by an
-unsigned number or a negative number, optionally enclosed in braces. These
-examples are all identical:
+backslash is to use the \eg escape sequence. This escape must be followed by a 
+signed or unsigned number, optionally enclosed in braces. These examples are
+all identical:
 .sp
  (ring), \e1
  (ring), \eg1
@ -2106,8 +2106,7 @@ examples are all identical:
 .sp
 An unsigned number specifies an absolute reference without the ambiguity that
 is present in the older syntax. It is also useful when literal digits follow
-the reference. A negative number is a relative reference. Consider this
-example:
+the reference. A signed number is a relative reference. Consider this example:
 .sp
  (abc(def)ghi)\eg{-1}
 .sp
@ -2117,6 +2116,10 @@ Similarly, \eg{-2} would be equivalent to \e1. The use of relative references
 can be helpful in long patterns, and also in patterns that are created by
 joining together fragments that contain references within themselves.
 .P
+The sequence \eg{+1} is a reference to the next capturing subpattern. This kind 
+of forward reference can be useful it patterns that repeat. Perl does not 
+support the use of + in this way.
+.P
 A back reference matches whatever actually matched the capturing subpattern in
 the current subject string, rather than anything matching the subpattern
 itself (see
@ -2321,23 +2324,34 @@ temporarily move the current position back by the fixed length and then try to
 match. If there are insufficient characters before the current position, the
 assertion fails.
 .P
-In a UTF mode, PCRE2 does not allow the \eC escape (which matches a single code
-unit even in a UTF mode) to appear in lookbehind assertions, because it makes
-it impossible to calculate the length of the lookbehind. The \eX and \eR
-escapes, which can match different numbers of code units, are also not
-permitted.
+In UTF-8 and UTF-16 modes, PCRE2 does not allow the \eC escape (which matches a
+single code unit even in a UTF mode) to appear in lookbehind assertions,
+because it makes it impossible to calculate the length of the lookbehind. The
+\eX and \eR escapes, which can match different numbers of code units, are never
+permitted in lookbehinds.
 .P
 .\" HTML <a href="#subpatternsassubroutines">
 .\" </a>
 "Subroutine"
 .\"
 calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
-as the subpattern matches a fixed-length string.
+as the subpattern matches a fixed-length string. However,
 .\" HTML <a href="#recursion">
 .\" </a>
-Recursion,
+recursion,
 .\"
-however, is not supported.
+that is, a "subroutine" call into a group that is already active,
+is not supported.
+.P
+Perl does not support back references in lookbehinds. PCRE2 does support them,
+but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option
+must not be set, there must be no use of (?| in the pattern (it creates
+duplicate subpattern numbers), and if the back reference is by name, the name
+must be unique. Of course, the referenced subpattern must itself be of fixed
+length. The following pattern matches words containing at least two characters
+that begin and end with the same character:
+.sp
+   \eb(\ew)\ew++(?<=\e1)
 .P
 Possessive quantifiers can be used in conjunction with lookbehind assertions to
 specify efficient matching of fixed-length strings at the end of subject
@ -2476,7 +2490,9 @@ This makes the fragment independent of the parentheses in the larger pattern.
 .sp
 Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a used
 subpattern by name. For compatibility with earlier versions of PCRE1, which had
-this facility before Perl, the syntax (?(name)...) is also recognized.
+this facility before Perl, the syntax (?(name)...) is also recognized. Note, 
+however, that undelimited names consisting of the letter R followed by digits
+are ambiguous (see the following section).
 .P
 Rewriting the above example to use a named subpattern gives this:
 .sp
@ -2490,33 +2506,55 @@ matched.
 .SS "Checking for pattern recursion"
 .rs
 .sp
-If the condition is the string (R), and there is no subpattern with the name R,
-the condition is true if a recursive call to the whole pattern or any
-subpattern has been made. If digits or a name preceded by ampersand follow the
-letter R, for example:
-.sp
-  (?(R3)...) or (?(R&name)...)
-.sp
-the condition is true if the most recent recursion is into a subpattern whose
-number or name is given. This condition does not check the entire recursion
-stack. If the name used in a condition of this kind is a duplicate, the test is
-applied to all subpatterns of the same name, and is true if any one of them is
-the most recent recursion.
-.P
-At "top level", all these recursion test conditions are false.
+"Recursion" in this sense refers to any subroutine-like call from one part of
+the pattern to another, whether or not it is actually recursive. See the
+sections entitled
 .\" HTML <a href="#recursion">
 .\" </a>
-The syntax for recursive patterns
+"Recursive patterns"
 .\"
-is described below.
+and
+.\" HTML <a href="#subpatternsassubroutines">
+.\" </a>
+"Subpatterns as subroutines"
+.\"
+below for details of recursion and subpattern calls.
+.P
+If a condition is the string (R), and there is no subpattern with the name R,
+the condition is true if matching is currently in a recursion or subroutine
+call to the whole pattern or any subpattern. If digits follow the letter R, and
+there is no subpattern with that name, the condition is true if the most recent
+call is into a subpattern with the given number, which must exist somewhere in 
+the overall pattern. This is a contrived example that is equivalent to a+b:
+.sp
+  ((?(R1)a+|(?1)b))
+.sp   
+However, in both cases, if there is a subpattern with a matching name, the
+condition tests for its being set, as described in the section above, instead
+of testing for recursion. For example, creating a group with the name R1 by
+adding (?<R1>) to the above pattern completely changes its meaning.
+.P
+If a name preceded by ampersand follows the letter R, for example:
+.sp
+  (?(R&name)...)
+.sp
+the condition is true if the most recent recursion is into a subpattern of that 
+name (which must exist within the pattern).
+.P
+This condition does not check the entire recursion stack. It tests only the 
+current level. If the name used in a condition of this kind is a duplicate, the
+test is applied to all subpatterns of the same name, and is true if any one of
+them is the most recent recursion.
+.P
+At "top level", all these recursion test conditions are false.
 .
 .
 .\" HTML <a name="subdefine"></a>
 .SS "Defining subpatterns for use by reference only"
 .rs
 .sp
-If the condition is the string (DEFINE), and there is no subpattern with the
-name DEFINE, the condition is always false. In this case, there may be only one
+If the condition is the string (DEFINE), the condition is always false, even if
+there is a group with the name DEFINE. In this case, there may be only one
 alternative in the subpattern. It is always skipped if control reaches this
 point in the pattern; the idea of DEFINE is that it can be used to define
 subroutines that can be referenced from elsewhere. (The use of
@ -2994,12 +3032,20 @@ depending on whether or not a name is present.
 By default, for compatibility with Perl, a name is any sequence of characters
 that does not include a closing parenthesis. The name is not processed in
 any way, and it is not possible to include a closing parenthesis in the name.
-However, if the PCRE2_ALT_VERBNAMES option is set, normal backslash processing
-is applied to verb names and only an unescaped closing parenthesis terminates
-the name. A closing parenthesis can be included in a name either as \e) or
-between \eQ and \eE. If the PCRE2_EXTENDED option is set, unescaped whitespace
-in verb names is skipped and #-comments are recognized, exactly as in the rest
-of the pattern.
+This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result 
+is no longer Perl-compatible. 
+.P
+When PCRE2_ALT_VERBNAMES is set, backslash processing is applied to verb names
+and only an unescaped closing parenthesis terminates the name. However, the 
+only backslash items that are permitted are \eQ, \eE, and sequences such as 
+\ex{100} that define character code points. Character type escapes such as \ed 
+are faulted.
+.P
+A closing parenthesis can be included in a name either as \e) or between \eQ
+and \eE. In addition to backslash processing, if the PCRE2_EXTENDED option is
+also set, unescaped whitespace in verb names is skipped, and #-comments are
+recognized, exactly as in the rest of the pattern. PCRE2_EXTENDED does not 
+affect verb names unless PCRE2_ALT_VERBNAMES is also set.
 .P
 The maximum length of a name is 255 in the 8-bit library and 65535 in the
 16-bit and 32-bit libraries. If the name is empty, that is, if the closing
@ -3429,6 +3475,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 20 June 2016
+Last updated: 30 September 2016
 Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "16 October 2015" "PCRE2 10.21"
+.TH PCRE2SYNTAX 3 "28 September 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -473,6 +473,9 @@ Each top-level branch of a look behind must be of a fixed length.
  \en              reference by number (can be ambiguous)
  \egn             reference by number
  \eg{n}           reference by number
+  \eg+n            relative reference by number (PCRE2 extension)
+  \eg-n            relative reference by number
+  \eg{+n}          relative reference by number (PCRE2 extension) 
  \eg{-n}          relative reference by number
  \ek<name>        reference by name (Perl)
  \ek'name'        reference by name (Perl)
@ -511,13 +514,17 @@ Each top-level branch of a look behind must be of a fixed length.
  (?(-n)              relative reference condition
  (?(<name>)          named reference condition (Perl)
  (?('name')          named reference condition (Perl)
-  (?(name)            named reference condition (PCRE2)
+  (?(name)            named reference condition (PCRE2, deprecated)
  (?(R)               overall recursion condition
-  (?(Rn)              specific group recursion condition
-  (?(R&name)          specific recursion condition
+  (?(Rn)              specific numbered group recursion condition
+  (?(R&name)          specific named group recursion condition
  (?(DEFINE)          define subpattern for reference
  (?(VERSION[>]=n.m)  test PCRE2 version
  (?(assert)          assertion condition
+.sp
+Note the ambiguity of (?(R) and (?(Rn) which might be named reference 
+conditions or recursion tests. Such a condition is interpreted as a reference
+condition if the relevant named group exists.
 .
 .
 .SH "BACKTRACKING CONTROL"
@ -577,6 +584,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 16 October 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 28 September 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/perltest.sh
+++ b/perltest.sh
@ -1,14 +1,17 @@
 #! /bin/sh

 # Script for testing regular expressions with perl to check that PCRE2 handles
-# them the same. The Perl code has to have "use utf8" and "require Encode" at
-# the start when running UTF-8 tests, but *not* for non-utf8 tests. (The
-# "require" would actually be OK for non-utf8-tests, but is not always
-# installed, so this way the script will always run for these tests.)
+# them the same. If the first argument to this script is "-w", Perl is also
+# called with "-w", which turns on its warning mode.
+#
+# The Perl code has to have "use utf8" and "require Encode" at the start when
+# running UTF-8 tests, but *not* for non-utf8 tests. (The "require" would
+# actually be OK for non-utf8-tests, but is not always installed, so this way
+# the script will always run for these tests.)
 #
 # The desired effect is achieved by making this a shell script that passes the
-# Perl script to Perl through a pipe. If the first argument is "-utf8", a
-# suitable prefix is set up.
+# Perl script to Perl through a pipe. If the first argument (possibly after
+# removing "-w") is "-utf8", a suitable prefix is set up.
 #
 # The remaining arguments, if any, are passed to Perl. They are an input file
 # and an output file. If there is one argument, the output is written to
@ -17,7 +20,14 @@
 # of the contorted piping input.)

 perl=perl
+perlarg=''
 prefix=''
+
+if [ $# -gt 0 -a "$1" = "-w" ] ; then
+  perlarg="-w"
+  shift
+fi
+
 if [ $# -gt 0 -a "$1" = "-utf8" ] ; then
  prefix="use utf8; require Encode;"
  shift
@ -292,6 +302,6 @@ for (;;)
 # printf $outfile "\n";

 PERLEND
-) | $perl - $@
+) | $perl $perlarg - $@

 # End
--- a/src/pcre2_compile.c
+++ b/src/pcre2_compile.c
--- a/src/pcre2_error.c
+++ b/src/pcre2_error.c
@ -91,13 +91,13 @@ static const unsigned char compile_error_texts[] =
  "failed to allocate heap memory\0"
  "unmatched closing parenthesis\0"
  "internal error: code overflow\0"
-  "letter or underscore expected after (?< or (?'\0"
+  "missing closing parenthesis for condition\0"
  /* 25 */
  "lookbehind assertion is not fixed length\0"
-  "malformed number or name after (?(\0"
+  "a relative value of zero is not allowed\0"
  "conditional group contains more than two branches\0"
  "assertion expected after (?( or (?(?C)\0"
-  "(?R or (?[+-]digits must be followed by )\0"
+  "digit expected after (?+ or (?-\0"
  /* 30 */
  "unknown POSIX class name\0"
  "internal error in pcre2_study(): should not occur\0"
@ -105,7 +105,7 @@ static const unsigned char compile_error_texts[] =
  "parentheses are too deeply nested (stack check)\0"
  "character code point value in \\x{} or \\o{} is too large\0"
  /* 35 */
-  "invalid condition (?(0)\0"
+  "lookbehind is too complicated\0"
  "\\C is not allowed in a lookbehind assertion in UTF-" XSTRING(PCRE2_CODE_UNIT_WIDTH) " mode\0"
  "PCRE does not support \\L, \\l, \\N{name}, \\U, or \\u\0"
  "number after (?C is greater than 255\0"
@ -132,13 +132,13 @@ static const unsigned char compile_error_texts[] =
  "missing opening brace after \\o\0"
  "internal error: unknown newline setting\0"
  "\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0"
-  "a numbered reference must not be zero\0"
+  "(?R (recursive pattern call) must be followed by a closing parenthesis\0"
  "an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0"
  /* 60 */
  "(*VERB) not recognized or malformed\0"
-  "number is too big\0"
+  "group number is too big\0"
  "subpattern name expected\0"
-  "digit expected after (?+\0"
+  "SPARE ERROR\0"
  "non-octal character in \\o{} (closing brace missing?)\0"
  /* 65 */
  "different names for subpatterns of the same number are not allowed\0"
@ -151,9 +151,9 @@ static const unsigned char compile_error_texts[] =
 #endif
  "\\k is not followed by a braced, angle-bracketed, or quoted name\0"
  /* 70 */
-  "internal error: unknown opcode in find_fixedlength()\0"
+  "internal error: unknown meta code in check_lookbehinds()\0"
  "\\N is not supported in a class\0"
-  "SPARE ERROR\0"
+  "callout string is too long\0"
  "disallowed Unicode code point (>= 0xd800 && <= 0xdfff)\0"
  "using UTF is disabled by the application\0"
  /* 75 */
@ -161,7 +161,7 @@ static const unsigned char compile_error_texts[] =
  "name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0"
  "character code point value in \\u.... sequence is too large\0"
  "digits missing in \\x{} or \\o{}\0"
-  "syntax error in (?(VERSION condition\0"
+  "syntax error or number too big in (?(VERSION condition\0"
  /* 80 */
  "internal error: unknown opcode in auto_possessify()\0"
  "missing terminating delimiter for callout with string argument\0"
@ -173,6 +173,8 @@ static const unsigned char compile_error_texts[] =
  "regular expression is too complicated\0"
  "lookbehind assertion is too long\0"
  "pattern string is longer than the limit set by the application\0"
+  "internal error: unknown code in parsed pattern\0" 
+  /* 90 */
  ;

 /* Match-time and UTF error texts are in the same format. */
--- a/src/pcre2_internal.h
+++ b/src/pcre2_internal.h
@ -1298,23 +1298,16 @@ mode rather than an escape sequence. It is also used for [^] in JavaScript
 compatibility mode, and for \C in non-utf mode. In non-DOTALL mode, "." behaves
 like \N.

-The special values ESC_DU, ESC_du, etc. are used instead of ESC_D, ESC_d, etc.
-when PCRE2_UCP is set and replacement of \d etc by \p sequences is required.
-They must be contiguous, and remain in order so that the replacements can be
-looked up from a table.
-
 Negative numbers are used to encode a backreference (\1, \2, \3, etc.) in
-check_escape(). There are two tests in the code for an escape
-greater than ESC_b and less than ESC_Z to detect the types that may be
-repeated. These are the types that consume characters. If any new escapes are
-put in between that don't consume a character, that code will have to change.
-*/
+check_escape(). There are tests in the code for an escape greater than ESC_b
+and less than ESC_Z to detect the types that may be repeated. These are the
+types that consume characters. If any new escapes are put in between that don't
+consume a character, that code will have to change. */

 enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s,
       ESC_W, ESC_w, ESC_N, ESC_dum, ESC_C, ESC_P, ESC_p, ESC_R, ESC_H,
       ESC_h, ESC_V, ESC_v, ESC_X, ESC_Z, ESC_z,
-       ESC_E, ESC_Q, ESC_g, ESC_k,
-       ESC_DU, ESC_du, ESC_SU, ESC_su, ESC_WU, ESC_wu };
+       ESC_E, ESC_Q, ESC_g, ESC_k };


 /********************** Opcode definitions ******************/
@ -1380,7 +1373,8 @@ enum {
  OP_CIRC,           /* 27 Start of line - not multiline */
  OP_CIRCM,          /* 28 Start of line - multiline */

-  /* Single characters; caseful must precede the caseless ones */
+  /* Single characters; caseful must precede the caseless ones, and these
+  must remain in this order, and adjacent. */

  OP_CHAR,           /* 29 Match one character, casefully */
  OP_CHARI,          /* 30 Match one character, caselessly */
--- a/src/pcre2_intmodedep.h
+++ b/src/pcre2_intmodedep.h
@ -648,18 +648,24 @@ typedef struct pcre2_real_match_data {

 #ifndef PCRE2_PCRE2TEST

-/* Structure for checking for mutual recursion when scanning compiled code. */
+/* Structures for checking for mutual recursion when scanning compiled or 
+parsed code. */

 typedef struct recurse_check {
  struct recurse_check *prev;
  PCRE2_SPTR group;
 } recurse_check;

+typedef struct parsed_recurse_check {
+  struct parsed_recurse_check *prev;
+  uint32_t *groupptr;
+} parsed_recurse_check;
+
 /* Structure for building a cache when filling in recursion offsets. */

 typedef struct recurse_cache {
  PCRE2_SPTR group;
-  int recno;
+  int groupnumber;
 } recurse_cache;

 /* Structure for maintaining a chain of pointers to the currently incomplete
@ -693,9 +699,10 @@ typedef struct compile_block {
  PCRE2_SPTR start_code;           /* The start of the compiled code */
  PCRE2_SPTR start_pattern;        /* The start of the pattern */
  PCRE2_SPTR end_pattern;          /* The end of the pattern */
-  PCRE2_SPTR nestptr[2];           /* Pointer(s) saved for string substitution */
  PCRE2_UCHAR *name_table;         /* The name/number table */
-  size_t workspace_size;           /* Size of workspace */
+  PCRE2_SIZE workspace_size;       /* Size of workspace */
+  PCRE2_SIZE small_ref_offset[10]; /* Offsets for \1 to \9 */
+  PCRE2_SIZE erroroffset;          /* Offset of error in pattern */ 
  uint16_t names_found;            /* Number of entries so far */
  uint16_t name_entry_size;        /* Size of each entry */
  open_capitem *open_caps;         /* Chain of open capture items */
@ -703,8 +710,9 @@ typedef struct compile_block {
  uint32_t named_group_list_size;  /* Number of entries in the list */
  uint32_t external_options;       /* External (initial) options */
  uint32_t external_flags;         /* External flag bits to be set */
-  uint32_t bracount;               /* Count of capturing parens as we compile */
-  uint32_t final_bracount;         /* Saved value after first pass */
+  uint32_t bracount;               /* Count of capturing parentheses */
+  uint32_t lastcapture;            /* Last capture encountered */ 
+  uint32_t *parsed_pattern;        /* Parsed pattern buffer */ 
  uint32_t *groupinfo;             /* Group info vector */
  uint32_t top_backref;            /* Maximum back reference */
  uint32_t backref_map;            /* Bitmap of low back refs */
@ -718,9 +726,7 @@ typedef struct compile_block {
  BOOL had_accept;                 /* (*ACCEPT) encountered */
  BOOL had_pruneorskip;            /* (*PRUNE) or (*SKIP) encountered */
  BOOL had_recurse;                /* Had a recursion or subroutine call */
-  BOOL check_lookbehind;           /* Lookbehinds need later checking */
  BOOL dupnames;                   /* Duplicate names exist */
-  BOOL iscondassert;               /* Next assert is a condition */
 } compile_block;

 /* Structure for keeping the properties of the in-memory stack used
--- a/src/pcre2_substitute.c
+++ b/src/pcre2_substitute.c
@ -114,7 +114,7 @@ for (; ptr < ptrend; ptr++)
  else if (*ptr == CHAR_BACKSLASH)
    {
    int erc;
-    int errorcode = 0;
+    int errorcode;
    uint32_t ch;

    if (ptr < ptrend - 1) switch (ptr[1])
@ -127,8 +127,10 @@ for (; ptr < ptrend; ptr++)
      continue;
      }

+    ptr += 1;  /* Must point after \ */
    erc = PRIV(check_escape)(&ptr, ptrend, &ch, &errorcode,
      code->overall_options, FALSE, NULL);
+    ptr -= 1;  /* Back to last code unit of escape */ 
    if (errorcode != 0)
      {
      rc = errorcode;
@ -698,7 +700,7 @@ do
    else if ((suboptions & PCRE2_SUBSTITUTE_EXTENDED) != 0 &&
              *ptr == CHAR_BACKSLASH)
      {
-      int errorcode = 0;
+      int errorcode;

      if (ptr < repend - 1) switch (ptr[1])
        {
@ -728,10 +730,10 @@ do
        break;
        }

+      ptr++;  /* Point after \ */
      rc = PRIV(check_escape)(&ptr, repend, &ch, &errorcode,
        code->overall_options, FALSE, NULL);
      if (errorcode != 0) goto BADESCAPE;
-      ptr++;

      switch(rc)
        {
--- a/src/pcre2test.c
+++ b/src/pcre2test.c
@ -2808,7 +2808,7 @@ return 0;

 /* In UTF mode the input is always interpreted as a string of UTF-8 bytes using
 the original UTF-8 definition of RFC 2279, which allows for up to 6 bytes, and
-code values from 0 to 0x7fffffff. However, values greater than the later UTF 
+code values from 0 to 0x7fffffff. However, values greater than the later UTF
 limit of 0x10ffff cause an error.

 In non-UTF mode the input is interpreted as UTF-8 if the utf8_input modifier
@ -2867,7 +2867,7 @@ if (!utf && (pat_patctl.control & CTL_UTF8_INPUT) == 0)

 else while (len > 0)
  {
-  int chlen; 
+  int chlen;
  uint32_t c;
  uint32_t topbit = 0;
  if (!utf && *p == 0xff && len > 1)
@ -2875,7 +2875,7 @@ else while (len > 0)
    topbit = 0x80000000u;
    p++;
    len--;
-    }     
+    }
  chlen = utf82ord(p, &c);
  if (chlen <= 0) return -1;
  if (utf && c > 0x10ffff) return -2;
@ -4494,6 +4494,7 @@ unsigned int delimiter = *p++;
 int errorcode;
 void *use_pat_context;
 PCRE2_SIZE patlen;
+PCRE2_SIZE valgrind_access_length;
 PCRE2_SIZE erroroffset;

 /* Initialize the context and pattern/data controls for this test from the
@ -4537,7 +4538,7 @@ patlen = p - buffer - 2;
 if (!decode_modifiers(p, CTX_PAT, &pat_patctl, NULL)) return PR_SKIP;
 utf = (pat_patctl.options & PCRE2_UTF) != 0;

-/* The utf8_input modifier is not allowed in 8-bit mode, and is mutually 
+/* The utf8_input modifier is not allowed in 8-bit mode, and is mutually
 exclusive with the utf modifier. */

 if ((pat_patctl.control & CTL_UTF8_INPUT) != 0)
@ -4550,8 +4551,8 @@ if ((pat_patctl.control & CTL_UTF8_INPUT) != 0)
  if (utf)
    {
    fprintf(outfile, "** The utf and utf8_input modifiers are mutually exclusive\n");
-    return PR_SKIP; 
-    }   
+    return PR_SKIP;
+    }
  }

 /* Check for mutually exclusive modifiers. At present, these are all in the
@ -4949,11 +4950,43 @@ switch(errorcode)
  break;
  }

-/* The pattern is now in pbuffer[8|16|32], with the length in patlen. By
-default, however, we pass a zero-terminated pattern. The length is passed only
-if we had a hex pattern. */
+/* The pattern is now in pbuffer[8|16|32], with the length in code units in
+patlen. By default, however, we pass a zero-terminated pattern. The length is
+passed only if we had a hex pattern. When valgrind is supported, arrange for
+the unused part of the buffer to be marked as no access. */

-if ((pat_patctl.control & CTL_HEXPAT) == 0) patlen = PCRE2_ZERO_TERMINATED;
+valgrind_access_length = patlen;
+if ((pat_patctl.control & CTL_HEXPAT) == 0)
+  {
+  patlen = PCRE2_ZERO_TERMINATED;
+  valgrind_access_length += 1;  /* For the terminating zero */
+  }
+
+#ifdef SUPPORT_VALGRIND
+#ifdef SUPPORT_PCRE2_8
+if (test_mode == PCRE8_MODE && pbuffer8 != NULL)
+  {
+  VALGRIND_MAKE_MEM_NOACCESS(pbuffer8 + valgrind_access_length,
+    pbuffer8_size - valgrind_access_length);
+  }
+#endif
+#ifdef SUPPORT_PCRE2_16
+if (test_mode == PCRE16_MODE && pbuffer16 != NULL)
+  {
+  VALGRIND_MAKE_MEM_NOACCESS(pbuffer16 + valgrind_access_length,
+    pbuffer16_size - valgrind_access_length*sizeof(uint16_t));
+  }
+#endif
+#ifdef SUPPORT_PCRE2_32
+if (test_mode == PCRE32_MODE && pbuffer32 != NULL)
+  {
+  VALGRIND_MAKE_MEM_NOACCESS(pbuffer32 + valgrind_access_length,
+    pbuffer32_size - valgrind_access_length*sizeof(uint32_t));
+  }
+#endif
+#else  /* Valgrind not supported */
+(void)valgrind_access_length;  /* Avoid compiler warning */
+#endif

 /* If #newline_default has been used and the library was not compiled with an
 appropriate default newline setting, local_newline_default will be non-zero. We
@ -4996,6 +5029,65 @@ if (timeit > 0)
 PCRE2_COMPILE(compiled_code, pbuffer, patlen, pat_patctl.options|forbid_utf,
  &errorcode, &erroroffset, use_pat_context);

+/* Call the JIT compiler if requested. When timing, we must free and recompile
+the pattern each time because that is the only way to free the JIT compiled
+code. We know that compilation will always succeed. */
+
+if (TEST(compiled_code, !=, NULL) && pat_patctl.jit != 0)
+  {
+  if (timeit > 0)
+    {
+    register int i;
+    clock_t time_taken = 0;
+    for (i = 0; i < timeit; i++)
+      {
+      clock_t start_time;
+      SUB1(pcre2_code_free, compiled_code);
+      PCRE2_COMPILE(compiled_code, pbuffer, patlen,
+        pat_patctl.options|forbid_utf, &errorcode, &erroroffset,
+        use_pat_context);
+      start_time = clock();
+      PCRE2_JIT_COMPILE(jitrc,compiled_code, pat_patctl.jit);
+      time_taken += clock() - start_time;
+      }
+    total_jit_compile_time += time_taken;
+    fprintf(outfile, "JIT compile  %.4f milliseconds\n",
+      (((double)time_taken * 1000.0) / (double)timeit) /
+        (double)CLOCKS_PER_SEC);
+    }
+  else
+    {
+    PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);
+    }
+  }
+
+/* If valgrind is supported, mark the pbuffer as accessible again. The 16-bit
+and 32-bit buffers can be marked completely undefined, but we must leave the
+pattern in the 8-bit buffer defined because it may be read from a callout
+during matching. */
+
+#ifdef SUPPORT_VALGRIND
+#ifdef SUPPORT_PCRE2_8
+if (test_mode == PCRE8_MODE)
+  {
+  VALGRIND_MAKE_MEM_UNDEFINED(pbuffer8 + valgrind_access_length, 
+    pbuffer8_size - valgrind_access_length);
+  }
+#endif
+#ifdef SUPPORT_PCRE2_16
+if (test_mode == PCRE16_MODE)
+  {
+  VALGRIND_MAKE_MEM_UNDEFINED(pbuffer16, pbuffer16_size);
+  }
+#endif
+#ifdef SUPPORT_PCRE2_32
+if (test_mode == PCRE32_MODE)
+  {
+  VALGRIND_MAKE_MEM_UNDEFINED(pbuffer32, pbuffer32_size);
+  }
+#endif
+#endif
+
 /* Compilation failed; go back for another re, skipping to blank line
 if non-interactive. */

@ -5029,38 +5121,6 @@ if (forbid_utf != 0)
 if (pattern_info(PCRE2_INFO_MAXLOOKBEHIND, &maxlookbehind, FALSE) != 0)
  return PR_ABEND;

-/* Call the JIT compiler if requested. When timing, we must free and recompile
-the pattern each time because that is the only way to free the JIT compiled
-code. We know that compilation will always succeed. */
-
-if (pat_patctl.jit != 0)
-  {
-  if (timeit > 0)
-    {
-    register int i;
-    clock_t time_taken = 0;
-    for (i = 0; i < timeit; i++)
-      {
-      clock_t start_time;
-      SUB1(pcre2_code_free, compiled_code);
-      PCRE2_COMPILE(compiled_code, pbuffer, patlen,
-        pat_patctl.options|forbid_utf, &errorcode, &erroroffset,
-        use_pat_context);
-      start_time = clock();
-      PCRE2_JIT_COMPILE(jitrc,compiled_code, pat_patctl.jit);
-      time_taken += clock() - start_time;
-      }
-    total_jit_compile_time += time_taken;
-    fprintf(outfile, "JIT compile  %.4f milliseconds\n",
-      (((double)time_taken * 1000.0) / (double)timeit) /
-        (double)CLOCKS_PER_SEC);
-    }
-  else
-    {
-    PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);
-    }
-  }
-
 /* If an explicit newline modifier was given, set the information flag in the
 pattern so that it is preserved over push/pop. */

@ -5300,10 +5360,10 @@ if (post_start > 0)
 for (i = 0; i < subject_length - pre_start - post_start + 4; i++)
  fprintf(outfile, " ");

-fprintf(outfile, "%.*s",
-  (int)((cb->next_item_length == 0)? 1 : cb->next_item_length),
-  pbuffer8 + cb->pattern_position);
-
+if (cb->next_item_length != 0)  
+  fprintf(outfile, "%.*s", (int)(cb->next_item_length),
+    pbuffer8 + cb->pattern_position);
+    
 fprintf(outfile, "\n");
 first_callout = FALSE;

@ -5740,18 +5800,18 @@ while ((c = *p++) != 0)
    continue;
    }

-  /* Handle a non-escaped character. In non-UTF 32-bit mode with utf8_input 
+  /* Handle a non-escaped character. In non-UTF 32-bit mode with utf8_input
  set, do the fudge for setting the top bit. */

  if (c != '\\')
    {
    uint32_t topbit = 0;
-    if (test_mode == PCRE32_MODE && c == 0xff && *p != 0) 
+    if (test_mode == PCRE32_MODE && c == 0xff && *p != 0)
      {
      topbit = 0x80000000;
      c = *p++;
-      }  
-    if ((utf || (pat_patctl.control & CTL_UTF8_INPUT) != 0) && 
+      }
+    if ((utf || (pat_patctl.control & CTL_UTF8_INPUT) != 0) &&
      HASUTF8EXTRALEN(c)) { GETUTF8INC(c, p); }
    c |= topbit;
    }
@ -6405,7 +6465,7 @@ else for (gmatched = 0;; gmatched++)
    }

  /* Otherwise just run a single match, setting up a callout if required (the
-  default). */
+  default). There is a copy of the pattern in pbuffer8 for use by callouts. */

  else
    {
@ -7583,6 +7643,10 @@ if (argc > 1 && strcmp(argv[op], "-") != 0)
    }
  }

+#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT)
+if (INTERACTIVE(infile)) using_history();
+#endif
+
 if (argc > 2)
  {
  outfile = fopen(argv[op+1], OUTPUT_MODE);
@ -7621,8 +7685,7 @@ while (notdone)
  p = buffer;

  /* If we have a pattern set up for testing, or we are skipping after a
-  compile failure, a blank line terminates this test; otherwise process the
-  line as a data line. */
+  compile failure, a blank line terminates this test. */

  if (expectdata || skipping)
    {
@ -7645,14 +7708,21 @@ while (notdone)
      skipping = FALSE;
      setlocale(LC_CTYPE, "C");
      }
+      
+    /* Otherwise, if we are not skipping, and the line is not a data comment 
+    line starting with "\=", process a data line. */
+     
    else if (!skipping && !(p[0] == '\\' && p[1] == '=' && isspace(p[2])))
+      { 
      rc = process_data();
+      } 
    }

  /* We do not have a pattern set up for testing. Lines starting with # are
  either comments or special commands. Blank lines are ignored. Otherwise, the
  line must start with a valid delimiter. It is then processed as a pattern
-  line. */
+  line. A copy of the pattern is left in pbuffer8 for use by callouts. Under
+  valgrind, make the unused part of the buffer undefined, to catch overruns. */

  else if (*p == '#')
    {
@ -7713,6 +7783,10 @@ if (showtotaltimes)

 EXIT:

+#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT)
+if (infile != NULL && INTERACTIVE(infile)) clear_history();
+#endif
+
 if (infile != NULL && infile != stdin) fclose(infile);
 if (outfile != NULL && outfile != stdout) fclose(outfile);

--- a/testdata/testinput1
+++ b/testdata/testinput1
@ -5792,4 +5792,18 @@ name)/mark
    aaaccccaaa
    bccccb 

+# /x does not apply to MARK labels 
+
+/x (*MARK:ab cd # comment
+ef) x/x,mark
+    axxz
+
+/(?<=a(B){0}c)X/
+    acX
+
+/(?<DEFINE>b)(?(DEFINE)(a+))(?&DEFINE)/          
+    bbbb 
+\= Expect no match     
+    baaab
+
 # End of testinput1 
--- a/testdata/testinput15
+++ b/testdata/testinput15
@ -79,7 +79,7 @@
 /((?2))((?1))/
    abc

-/((?(R2)a+|(?1)b))/
+/((?(R2)a+|(?1)b))()/
    aaaabcde

 /(?(R)a*(?1)|((?R))b)/
--- a/testdata/testinput17
+++ b/testdata/testinput17
@ -177,7 +177,7 @@
 /((?2))((?1))/
    abc

-/((?(R2)a+|(?1)b))/
+/((?(R2)a+|(?1)b))()/
    aaaabcde

 /(?(R)a*(?1)|((?R))b)/
--- a/testdata/testinput2
+++ b/testdata/testinput2
@ -189,9 +189,9 @@
    the barfoo
    and cattlefoo

-/(?<=a+)b/
+/abc(?<=a+)b/

-/(?<=aaa|b{0,3})b/
+/12345(?<=aaa|b{0,3})b/

 /(?<!(foo)a\1)bar/

@ -4518,6 +4518,18 @@
     \ B)x/x,alt_verbnames,mark
    x  
    
+/(*: A \ and #comment
+     \ B)x/alt_verbnames,mark
+    x  
+    
+/(*: A \ and #comment
+     \ B)x/x,mark
+    x  
+    
+/(*: A \ and #comment
+     \ B)x/mark
+    x  
+    
 /(*:A
 B)x/alt_verbnames,mark 
    x
@ -4819,4 +4831,61 @@ a)"xI

 /\[AB]{6000000000000000000000}/expand

+# Hex uses pattern length, not zero-terminated. This tests for overrunning
+# the given length of a pattern.
+
+/'(*U'/hex
+
+/'(*'/hex
+
+/'('/hex
+
+//hex
+
+# These tests are here because Perl never allows a back reference in a
+# lookbehind. PCRE2 supports some limited cases.
+
+/([ab])...(?<=\1)z/
+    a11az
+    b11bz 
+\= Expect no match
+    b11az 
+    
+/(?|([ab]))...(?<=\1)z/
+
+/([ab])(\1)...(?<=\2)z/
+    aa11az
+    
+/(a\2)(b\1)(?<=\2)/ 
+ 
+/(?<A>[ab])...(?<=\k'A')z/
+    a11az
+    b11bz 
+\= Expect no match
+    b11az 
+
+/(?<A>[ab])...(?<=\k'A')(?<A>)z/dupnames
+
+# Perl does not support \g+n
+
+/((\g+1X)?([ab]))+/
+    aaXbbXa
+
+/ab(?C1)c/auto_callout
+    abc
+
+/'ab(?C1)c'/hex,auto_callout
+    abc
+    
+# Perl accepts these, but gives a warning. We can't warn, so give an error. 
+
+/[a-[:digit:]]+/
+    a-a9-a
+
+/[A-[:digit:]]+/
+    A-A9-A
+
+/[a-\d]+/
+    a-a9-a
+
 # End of testinput2 
--- a/testdata/testinput5
+++ b/testdata/testinput5
@ -1723,5 +1723,14 @@
    \x{1d7cf}
 \= Expect no match
    \x{10000}
+    
+# Hex uses pattern length, not zero-terminated. This tests for overrunning
+# the given length of a pattern.
+
+/'(*UTF)'/hex 
+
+/a(?<=A\XB)/utf
+
+/ab(?<=A\RB)/utf

 # End of testinput5 
--- a/testdata/testinput6
+++ b/testdata/testinput6
@ -4635,7 +4635,7 @@
 /((?(R)a+|(?1)b))/
    aaaabcde

-/((?(R2)a+|(?1)b))/
+/((?(R2)a+|(?1)b))()/
    aaaabcde

 /(?(R)a*(?1)|((?R))b)/
--- a/testdata/testinput8
+++ b/testdata/testinput8
@ -161,18 +161,14 @@

 # Use "expand" to create some very long patterns with nested parentheses, in
 # order to test workspace overflow. Again, this varies with code unit width,
-# and even with it fails in two modes, the error offset differs. It also varies
+# and even when it fails in two modes, the error offset differs. It also varies
 # with link size - hence multiple tests with different values.

-/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
+/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000

-/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
+/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000

-/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
-
-/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
-
-/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
+/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000

 /(?(1)(?1)){8,}+()/debug
    abcd
--- a/testdata/testoutput1
+++ b/testdata/testoutput1
@ -9257,4 +9257,24 @@ No match
 1: b
 2: cccc

+# /x does not apply to MARK labels 
+
+/x (*MARK:ab cd # comment
+ef) x/x,mark
+    axxz
+ 0: xx
+MK: ab cd # comment\x0aef
+
+/(?<=a(B){0}c)X/
+    acX
+ 0: X
+
+/(?<DEFINE>b)(?(DEFINE)(a+))(?&DEFINE)/          
+    bbbb 
+ 0: bb
+ 1: b
+\= Expect no match     
+    baaab
+No match
+
 # End of testinput1 
--- a/testdata/testoutput12-16
+++ b/testdata/testoutput12-16
@ -557,7 +557,7 @@ Subject length lower bound = 1
 0: \x{11234}

 /(*UTF-32)\x{11234}/
-Failed: error 134 at offset 17: character code point value in \x{} or \o{} is too large
+Failed: error 160 at offset 5: (*VERB) not recognized or malformed
  abcd\x{11234}pqr

 /(*UTF-32)\x{112}/
--- a/testdata/testoutput15
+++ b/testdata/testoutput15
@ -188,7 +188,7 @@ Failed: error -53: recursion limit exceeded
    abc
 Failed: error -52: nested recursion at the same subject position

-/((?(R2)a+|(?1)b))/
+/((?(R2)a+|(?1)b))()/
    aaaabcde
 Failed: error -52: nested recursion at the same subject position

--- a/testdata/testoutput17
+++ b/testdata/testoutput17
@ -335,7 +335,7 @@ Failed: error -47: match limit exceeded
    abc
 Failed: error -46: JIT stack limit reached

-/((?(R2)a+|(?1)b))/
+/((?(R2)a+|(?1)b))()/
    aaaabcde
 Failed: error -46: JIT stack limit reached

--- a/testdata/testoutput18
+++ b/testdata/testoutput18
@ -139,7 +139,7 @@ No match: POSIX code 17: match failed
 0+ issippi

 /abc/\
-Failed: POSIX code 9: bad escape sequence at offset 3     
+Failed: POSIX code 9: bad escape sequence at offset 4     

 "(?(?C)"
 Failed: POSIX code 11: unbalanced () at offset 6     
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
--- a/testdata/testoutput21
+++ b/testdata/testoutput21
@ -76,7 +76,7 @@
 ------------------------------------------------------------------

 /ab\Cde/never_backslash_c
-Failed: error 183 at offset 3: using \C is disabled by the application
+Failed: error 183 at offset 4: using \C is disabled by the application

 /ab\Cde/info
 Capturing subpattern count = 0
--- a/testdata/testoutput22-16
+++ b/testdata/testoutput22-16
@ -17,7 +17,7 @@ Subject length lower bound = 0
 # 16-bit modes, but not in 32-bit mode.

 /(?<=ab\Cde)X/utf
-Failed: error 136 at offset 10: \C is not allowed in a lookbehind assertion in UTF-16 mode
+Failed: error 136 at offset 0: \C is not allowed in a lookbehind assertion in UTF-16 mode
    ab!deXYZ

 # Autopossessification tests
--- a/testdata/testoutput22-8
+++ b/testdata/testoutput22-8
@ -17,7 +17,7 @@ Subject length lower bound = 0
 # 16-bit modes, but not in 32-bit mode.

 /(?<=ab\Cde)X/utf
-Failed: error 136 at offset 10: \C is not allowed in a lookbehind assertion in UTF-8 mode
+Failed: error 136 at offset 0: \C is not allowed in a lookbehind assertion in UTF-8 mode
    ab!deXYZ

 # Autopossessification tests
--- a/testdata/testoutput23
+++ b/testdata/testoutput23
@ -3,6 +3,6 @@
 # correct error message.

 /a\Cb/
-Failed: error 185 at offset 2: using \C is disabled in this PCRE2 library
+Failed: error 185 at offset 3: using \C is disabled in this PCRE2 library

 # End of testinput23
--- a/testdata/testoutput5
+++ b/testdata/testoutput5
@ -1746,7 +1746,7 @@ No match
 ------------------------------------------------------------------

 /\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
-Failed: error 173 at offset 5: disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
+Failed: error 173 at offset 6: disallowed Unicode code point (>= 0xd800 && <= 0xdfff)

 /^a+[a\x{200}]/B,utf
 ------------------------------------------------------------------
@ -3997,7 +3997,7 @@ Failed: error 122 at offset 1227: unmatched closing parenthesis
 /$(&.+[\p{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(?(R)){0,6}?|){12\x8a\X*?\x8a\x0b\xd1^9\3*+(\xc1,\k'P'\xb4)\xcc(z\z(?JJ)(?'X'8};(\x0b\xd1^9\?'3*+(\xc1.]k+\x0b'Pm'\xb4\xcc4'\xd1'(?'X'))?-%--\x95$9*\4'|\xd1(''%\x95*$9)#(?'R')3\x07?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+!~:(?)''(d'E:yD!\s(?'R'\x1e;\x10:U))|')g!\xb0*){29+))#(?'P'})*?/

 "(*UTF)(*UCP)(.UTF).+X(\V+;\^(\D|)!999}(?(?C{7(?C')\H*\S*/^\x5\xa\\xd3\x85n?(;\D*(?m).[^mH+((*UCP)(*U:F)})(?!^)(?'"
-Failed: error 124 at offset 113: letter or underscore expected after (?< or (?'
+Failed: error 162 at offset 113: subpattern name expected

 /[\pS#moq]/
    =
@ -4159,5 +4159,16 @@ No match
 \= Expect no match
    \x{10000}
 No match
+    
+# Hex uses pattern length, not zero-terminated. This tests for overrunning
+# the given length of a pattern.
+
+/'(*UTF)'/hex 
+
+/a(?<=A\XB)/utf
+Failed: error 125 at offset 1: lookbehind assertion is not fixed length
+
+/ab(?<=A\RB)/utf
+Failed: error 125 at offset 2: lookbehind assertion is not fixed length

 # End of testinput5 
--- a/testdata/testoutput6
+++ b/testdata/testoutput6
@ -713,7 +713,7 @@ No match
 /(ab|cd){3,4}/auto_callout
  ababab
 --->ababab
- +0 ^          (ab|cd){3,4}
+ +0 ^          (
 +1 ^          a
 +4 ^          c
 +2 ^^         b
@ -732,7 +732,7 @@ No match
 0: ababab
  abcdabcd
 --->abcdabcd
- +0 ^            (ab|cd){3,4}
+ +0 ^            (
 +1 ^            a
 +4 ^            c
 +2 ^^           b
@ -740,7 +740,7 @@ No match
 +1 ^ ^          a
 +4 ^ ^          c
 +5 ^  ^         d
- +6 ^   ^        )
+ +6 ^   ^        ){3,4}
 +1 ^   ^        a
 +4 ^   ^        c
 +2 ^    ^       b
@ -749,13 +749,13 @@ No match
 +1 ^     ^      a
 +4 ^     ^      c
 +5 ^      ^     d
- +6 ^       ^    )
+ +6 ^       ^    ){3,4}
 +12 ^       ^    
 0: abcdabcd
 1: abcdab
  abcdcdcdcdcd  
 --->abcdcdcdcdcd
- +0 ^                (ab|cd){3,4}
+ +0 ^                (
 +1 ^                a
 +4 ^                c
 +2 ^^               b
@ -763,16 +763,16 @@ No match
 +1 ^ ^              a
 +4 ^ ^              c
 +5 ^  ^             d
- +6 ^   ^            )
+ +6 ^   ^            ){3,4}
 +1 ^   ^            a
 +4 ^   ^            c
 +5 ^    ^           d
- +6 ^     ^          )
+ +6 ^     ^          ){3,4}
 +12 ^     ^          
 +1 ^     ^          a
 +4 ^     ^          c
 +5 ^      ^         d
- +6 ^       ^        )
+ +6 ^       ^        ){3,4}
 +12 ^       ^        
 0: abcdcdcd
 1: abcdcd
@ -6712,26 +6712,26 @@ No match
 --->"ab"
 +0 ^        ^
 +1 ^        "
- +2 ^^       ((?(?=[a])[^"])|b)*
+ +2 ^^       (
 +21 ^^       "
- +3 ^^       (?(?=[a])[^"])
+ +3 ^^       (?
 +18 ^^       b
- +5 ^^       (?=[a])
+ +5 ^^       (?=
 +8  ^       [a]
 +11  ^^      )
 +12 ^^       [^"]
 +16 ^ ^      )
 +17 ^ ^      |
 +21 ^ ^      "
- +3 ^ ^      (?(?=[a])[^"])
+ +3 ^ ^      (?
 +18 ^ ^      b
- +5 ^ ^      (?=[a])
+ +5 ^ ^      (?=
 +8   ^      [a]
-+19 ^  ^     )
+19 ^  ^     )*
 +21 ^  ^     "
- +3 ^  ^     (?(?=[a])[^"])
+ +3 ^  ^     (?
 +18 ^  ^     b
- +5 ^  ^     (?=[a])
+ +5 ^  ^     (?=
 +8    ^     [a]
 +17 ^  ^     |
 +22 ^   ^    $
@ -7154,7 +7154,7 @@ Failed: error -52: nested recursion at the same subject position
    aaaabcde
 0: aaaab

-/((?(R2)a+|(?1)b))/
+/((?(R2)a+|(?1)b))()/
    aaaabcde
 Failed: error -40: backreference condition or recursion test is not supported for DFA matching

@ -7548,7 +7548,7 @@ Callout (10): {AB} last capture = 0
        Bra
        ^
        Cond
-        Callout 25 9 7
+        Callout 25 9 3
        Assert
        abc
        Ket
@ -7561,11 +7561,11 @@ Callout (10): {AB} last capture = 0
 ------------------------------------------------------------------
    abcdefg
 --->abcdefg
- 25 ^           (?=abc)
+ 25 ^           (?=
 0: abcd
    xyz123 
 --->xyz123
- 25 ^          (?=abc)
+ 25 ^          (?=
 0: xyz

 /^(?(?C$abc$)(?=abc)abcd|xyz)/B
@ -7573,7 +7573,7 @@ Callout (10): {AB} last capture = 0
        Bra
        ^
        Cond
-        CalloutStr $abc$ 7 12 7
+        CalloutStr $abc$ 7 12 3
        Assert
        abc
        Ket
@ -7587,12 +7587,12 @@ Callout (10): {AB} last capture = 0
    abcdefg
 Callout (7): $abc$
 --->abcdefg
-    ^           (?=abc)
+    ^           (?=
 0: abcd
    xyz123 
 Callout (7): $abc$
 --->xyz123
-    ^          (?=abc)
+    ^          (?=
 0: xyz

 /^ab(?C'first')cd(?C"second")ef/
@ -7609,13 +7609,13 @@ Callout (20): "second"
    aaaXY
 Callout (8): `code`
 --->aaaXY
-    ^^        )
+    ^^        ){3}
 Callout (8): `code`
 --->aaaXY
-    ^ ^       )
+    ^ ^       ){3}
 Callout (8): `code`
 --->aaaXY
-    ^  ^      )
+    ^  ^      ){3}
 0: aaaX

 # Binary zero in callout string
--- a/testdata/testoutput8-16-2
+++ b/testdata/testoutput8-16-2
@ -854,23 +854,17 @@ Failed: error 184 at offset 1540: (?| and/or (?J: or (?x: parentheses are too de

 # Use "expand" to create some very long patterns with nested parentheses, in
 # order to test workspace overflow. Again, this varies with code unit width,
-# and even with it fails in two modes, the error offset differs. It also varies
+# and even when it fails in two modes, the error offset differs. It also varies
 # with link size - hence multiple tests with different values.

-/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
-Failed: error 186 at offset 594: regular expression is too complicated
+/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
+Failed: error 186 at offset 5813: regular expression is too complicated

-/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
-Failed: error 186 at offset 594: regular expression is too complicated
+/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
+Failed: error 186 at offset 5820: regular expression is too complicated

-/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
-Failed: error 186 at offset 594: regular expression is too complicated
-
-/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
-Failed: error 186 at offset 594: regular expression is too complicated
-
-/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
-Failed: error 186 at offset 594: regular expression is too complicated
+/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
+Failed: error 186 at offset 12820: regular expression is too complicated

 /(?(1)(?1)){8,}+()/debug
 ------------------------------------------------------------------
@ -1031,6 +1025,5 @@ Subject length lower bound = 0
 Failed: error 114 at offset 509: missing closing parenthesis

 /([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/-fullbincode
-Failed: error 186 at offset 490: regular expression is too complicated

 # End of testinput8
--- a/testdata/testoutput8-16-3
+++ b/testdata/testoutput8-16-3
@ -853,20 +853,15 @@ Memory allocation (code space): 18

 # Use "expand" to create some very long patterns with nested parentheses, in
 # order to test workspace overflow. Again, this varies with code unit width,
-# and even with it fails in two modes, the error offset differs. It also varies
+# and even when it fails in two modes, the error offset differs. It also varies
 # with link size - hence multiple tests with different values.

-/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
+/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000

-/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
+/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000

-/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
-
-/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
-Failed: error 186 at offset 1147: regular expression is too complicated
-
-/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
-Failed: error 186 at offset 1147: regular expression is too complicated
+/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
+Failed: error 186 at offset 12820: regular expression is too complicated

 /(?(1)(?1)){8,}+()/debug
 ------------------------------------------------------------------
--- a/testdata/testoutput8-16-4
+++ b/testdata/testoutput8-16-4
@ -853,20 +853,15 @@ Memory allocation (code space): 18

 # Use "expand" to create some very long patterns with nested parentheses, in
 # order to test workspace overflow. Again, this varies with code unit width,
-# and even with it fails in two modes, the error offset differs. It also varies
+# and even when it fails in two modes, the error offset differs. It also varies
 # with link size - hence multiple tests with different values.

-/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
+/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000

-/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
+/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000

-/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
-
-/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
-Failed: error 186 at offset 1147: regular expression is too complicated
-
-/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
-Failed: error 186 at offset 1147: regular expression is too complicated
+/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
+Failed: error 186 at offset 12820: regular expression is too complicated

 /(?(1)(?1)){8,}+()/debug
 ------------------------------------------------------------------
--- a/testdata/testoutput8-32-2
+++ b/testdata/testoutput8-32-2
@ -853,20 +853,17 @@ Memory allocation (code space): 28

 # Use "expand" to create some very long patterns with nested parentheses, in
 # order to test workspace overflow. Again, this varies with code unit width,
-# and even with it fails in two modes, the error offset differs. It also varies
+# and even when it fails in two modes, the error offset differs. It also varies
 # with link size - hence multiple tests with different values.

-/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
+/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
+Failed: error 186 at offset 5813: regular expression is too complicated

-/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
+/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
+Failed: error 186 at offset 5820: regular expression is too complicated

-/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
-
-/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
-Failed: error 186 at offset 979: regular expression is too complicated
-
-/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
-Failed: error 186 at offset 979: regular expression is too complicated
+/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
+Failed: error 186 at offset 12820: regular expression is too complicated

 /(?(1)(?1)){8,}+()/debug
 ------------------------------------------------------------------
--- a/testdata/testoutput8-32-3
+++ b/testdata/testoutput8-32-3
@ -853,20 +853,17 @@ Memory allocation (code space): 28

 # Use "expand" to create some very long patterns with nested parentheses, in
 # order to test workspace overflow. Again, this varies with code unit width,
-# and even with it fails in two modes, the error offset differs. It also varies
+# and even when it fails in two modes, the error offset differs. It also varies
 # with link size - hence multiple tests with different values.

-/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
+/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
+Failed: error 186 at offset 5813: regular expression is too complicated

-/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
+/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
+Failed: error 186 at offset 5820: regular expression is too complicated

-/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
-
-/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
-Failed: error 186 at offset 979: regular expression is too complicated
-
-/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
-Failed: error 186 at offset 979: regular expression is too complicated
+/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
+Failed: error 186 at offset 12820: regular expression is too complicated

 /(?(1)(?1)){8,}+()/debug
 ------------------------------------------------------------------
--- a/testdata/testoutput8-32-4
+++ b/testdata/testoutput8-32-4
@ -853,20 +853,17 @@ Memory allocation (code space): 28

 # Use "expand" to create some very long patterns with nested parentheses, in
 # order to test workspace overflow. Again, this varies with code unit width,
-# and even with it fails in two modes, the error offset differs. It also varies
+# and even when it fails in two modes, the error offset differs. It also varies
 # with link size - hence multiple tests with different values.

-/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
+/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
+Failed: error 186 at offset 5813: regular expression is too complicated

-/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
+/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
+Failed: error 186 at offset 5820: regular expression is too complicated

-/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
-
-/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
-Failed: error 186 at offset 979: regular expression is too complicated
-
-/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
-Failed: error 186 at offset 979: regular expression is too complicated
+/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
+Failed: error 186 at offset 12820: regular expression is too complicated

 /(?(1)(?1)){8,}+()/debug
 ------------------------------------------------------------------
--- a/testdata/testoutput8-8-2
+++ b/testdata/testoutput8-8-2
@ -854,22 +854,16 @@ Failed: error 184 at offset 1540: (?| and/or (?J: or (?x: parentheses are too de

 # Use "expand" to create some very long patterns with nested parentheses, in
 # order to test workspace overflow. Again, this varies with code unit width,
-# and even with it fails in two modes, the error offset differs. It also varies
+# and even when it fails in two modes, the error offset differs. It also varies
 # with link size - hence multiple tests with different values.

-/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
+/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000

-/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
-Failed: error 186 at offset 637: regular expression is too complicated
+/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
+Failed: error 186 at offset 5820: regular expression is too complicated

-/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
-Failed: error 186 at offset 637: regular expression is too complicated
-
-/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
-Failed: error 186 at offset 637: regular expression is too complicated
-
-/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
-Failed: error 186 at offset 637: regular expression is too complicated
+/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
+Failed: error 186 at offset 12820: regular expression is too complicated

 /(?(1)(?1)){8,}+()/debug
 ------------------------------------------------------------------
--- a/testdata/testoutput8-8-3
+++ b/testdata/testoutput8-8-3
@ -853,21 +853,15 @@ Memory allocation (code space): 12

 # Use "expand" to create some very long patterns with nested parentheses, in
 # order to test workspace overflow. Again, this varies with code unit width,
-# and even with it fails in two modes, the error offset differs. It also varies
+# and even when it fails in two modes, the error offset differs. It also varies
 # with link size - hence multiple tests with different values.

-/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
+/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000

-/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
+/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000

-/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
-Failed: error 186 at offset 936: regular expression is too complicated
-
-/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
-Failed: error 186 at offset 936: regular expression is too complicated
-
-/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
-Failed: error 186 at offset 936: regular expression is too complicated
+/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
+Failed: error 186 at offset 12820: regular expression is too complicated

 /(?(1)(?1)){8,}+()/debug
 ------------------------------------------------------------------
--- a/testdata/testoutput8-8-4
+++ b/testdata/testoutput8-8-4
@ -853,19 +853,15 @@ Memory allocation (code space): 14

 # Use "expand" to create some very long patterns with nested parentheses, in
 # order to test workspace overflow. Again, this varies with code unit width,
-# and even with it fails in two modes, the error offset differs. It also varies
+# and even when it fails in two modes, the error offset differs. It also varies
 # with link size - hence multiple tests with different values.

-/(?'ABC'\[[bar](]{105}*THEN:\[A]{255}\[)]{106}/expand,-fullbincode
+/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000

-/(?'ABC'\[[bar](]{106}*THEN:\[A]{255}\[)]{107}/expand,-fullbincode
+/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000

-/(?'ABC'\[[bar](]{159}*THEN:\[A]{255}\[)]{160}/expand,-fullbincode
-
-/(?'ABC'\[[bar](]{199}*THEN:\[A]{255}\[)]{200}/expand,-fullbincode
-
-/(?'ABC'\[[bar](]{299}*THEN:\[A]{255}\[)]{300}/expand,-fullbincode
-Failed: error 186 at offset 1224: regular expression is too complicated
+/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
+Failed: error 186 at offset 12820: regular expression is too complicated

 /(?(1)(?1)){8,}+()/debug
 ------------------------------------------------------------------
--- a/testdata/testoutput9
+++ b/testdata/testoutput9
@ -307,14 +307,14 @@ Subject length lower bound = 1
 ------------------------------------------------------------------

 /\777/I
-Failed: error 151 at offset 3: octal value is greater than \377 in 8-bit non-UTF-8 mode
+Failed: error 151 at offset 4: octal value is greater than \377 in 8-bit non-UTF-8 mode

 /(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark
 Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
    XX
     
 /(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark,alt_verbnames
-Failed: error 176 at offset 258: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
+Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
    XX
     
 /(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark
@ -328,10 +328,10 @@ MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789AB
 MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE

 /\u0100/alt_bsux,allow_empty_class,match_unset_backref,dupnames
-Failed: error 177 at offset 5: character code point value in \u.... sequence is too large
+Failed: error 177 at offset 6: character code point value in \u.... sequence is too large

 /[\u0100-\u0200]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
-Failed: error 177 at offset 6: character code point value in \u.... sequence is too large
+Failed: error 177 at offset 7: character code point value in \u.... sequence is too large

 /[^\x00-a]{12,}[^b-\xff]*/B
 ------------------------------------------------------------------