Implement Perl 5.28's alphabetic lookaround syntax, e.g. (*pla:...) and also

(*atomic:...).
2018-09-24 16:23:53 +00:00 · 2018-09-24 16:23:53 +00:00 · f26b0b0bae
parent 69254c77f1
commit f26b0b0bae
21 changed files with 1218 additions and 734 deletions
--- a/14
+++ b/14
@ -5,8 +5,8 @@ Change Log for PCRE2
 Version 10.33-RC1 15-September-2018
 -----------------------------------

-1. Added "allvector" to pcre2test to make it easy to check the part of the 
-ovector that shouldn't be changed, in particular after substitute and failed or 
+1. Added "allvector" to pcre2test to make it easy to check the part of the
+ovector that shouldn't be changed, in particular after substitute and failed or
 partial matches.

 2. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has
@ -15,13 +15,21 @@ a greater than 1 fixed quantifier. This issue was found by Yunho Kim.
 3. Added support for callouts from pcre2_substitute().

 4. The POSIX functions are now all called pcre2_regcomp() etc., with wrappers
-that use the standard POSIX names. This should help avoid linking with the 
+that use the standard POSIX names. This should help avoid linking with the
 wrong library in some environments.

 5. Fix an xclass matching issue in JIT.

 6. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315).

+7. Implement the Perl 5.28 experimental alphabetic names for atomic groups and
+lookaround assertions, for example, (*pla:...) and (*atomic:...). These are
+characterized by a lower case letter following (* and to simplify coding for
+this, the character tables created by pcre2_maketables() were updated to add a
+new "is lower case letter" bit. At the same time, the now unused "is
+hexadecimal digit" bit was removed. The default tables in
+src/pcre2_chartables.c.dist are updated.
+

 Version 10.32 10-September-2018
 -------------------------------
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@ -2120,6 +2120,11 @@ special parenthesis, starting with (?&#62; as in this example:
 <pre>
  (?&#62;\d+)foo
 </pre>
+Perl 5.28 introduced an experimental alphabetic form starting with (* which may
+be easier to remember:
+<pre>
+  (*atomic:\d+)foo
+</pre>
 This kind of parenthesis "locks up" the  part of the pattern it contains once
 it has matched, and a failure further into the pattern is prevented from
 backtracking into it. Backtracking past it to previous items, however, works as
@ -2342,11 +2347,17 @@ coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described
 <P>
 More complicated assertions are coded as subpatterns. There are two kinds:
 those that look ahead of the current position in the subject string, and those
-that look behind it, and in each case an assertion may be positive (must
-succeed for matching to continue) or negative (must not succeed for matching to
-continue). An assertion subpattern is matched in the normal way, except that,
-when matching continues after a successful assertion, the matching position in
-the subject string is as it was before the assertion was processed.
+that look behind it, and in each case an assertion may be positive (must match
+for the assertion to be true) or negative (must not match for the assertion to
+be true). An assertion subpattern is matched in the normal way, and if it is
+true, matching continues after it, but with the matching position in the
+subject string is was it was before the assertion was processed.
+</P>
+<P>
+A lookaround assertion may also appear as the condition in a
+<a href="#conditions">conditional subpattern</a>
+(see below). In this case, the result of matching the assertion determines
+which branch of the condition is followed.
 </P>
 <P>
 Assertion subpatterns are not capturing subpatterns. If an assertion contains
@ -2359,7 +2370,7 @@ adjacent characters are the same.
 <P>
 When a branch within an assertion fails to match, any substrings that were
 captured are discarded (as happens with any pattern branch that fails to
-match). A negative assertion succeeds only when all its branches fail to match;
+match). A negative assertion is true only when all its branches fail to match;
 this means that no captured substrings are ever retained after a successful
 negative assertion. When an assertion contains a matching branch, what happens
 depends on the type of assertion.
@ -2368,7 +2379,7 @@ depends on the type of assertion.
 For a positive assertion, internally captured substrings in the successful
 branch are retained, and matching continues with the next pattern item after
 the assertion. For a negative assertion, a matching branch means that the
-assertion has failed. If the assertion is being used as a condition in a
+assertion is not true. If such an assertion is being used as a condition in a
 <a href="#conditions">conditional subpattern</a>
 (see below), captured substrings are retained, because matching continues with
 the "no" branch of the condition. For other failing negative assertions,
@ -2398,6 +2409,25 @@ without the assertion, the order depending on the greediness of the quantifier.
 The assertion is obeyed just once when encountered during matching.
 </P>
 <br><b>
+Alphabetic assertion names
+</b><br>
+<P>
+Traditionally, symbolic sequences such as (?= and (?&#60;= have been used to specify
+lookaround assertions. Perl 5.28 introduced some experimental alphabetic
+alternatives which might be easier to remember. They all start with (* instead
+of (? and must be written using lower case letters. PCRE2 supports the
+following synonyms:
+<pre>
+  (*positive_lookahead:  or (*pla: is the same as (?=
+  (*negative_lookahead:  or (*nla: is the same as (?!
+  (*positive_lookbehind: or (*plb: is the same as (?&#60;=
+  (*negative_lookbehind: or (*nlb: is the same as (?&#60;!
+</pre>
+For example, (*pla:foo) is the same assertion as (?=foo). However, in the
+following sections, the various assertions are described using the original
+symbolic forms.
+</P>
+<br><b>
 Lookahead assertions
 </b><br>
 <P>
@ -3630,7 +3660,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 21 September 2018
+Last updated: 24 September 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@ -436,6 +436,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
 <P>
 <pre>
  (?&#62;...)         atomic, non-capturing group
+  (*atomic:...)   atomic, non-capturing group
 </PRE>
 </P>
 <br><a name="SEC15" href="#TOC1">COMMENT</a><br>
@ -514,12 +515,23 @@ setting with a similar syntax.
 <br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
 <P>
 <pre>
-  (?=...)         positive look ahead
-  (?!...)         negative look ahead
-  (?&#60;=...)        positive look behind
-  (?&#60;!...)        negative look behind
+  (?=...)                     )
+  (*pla:...)                  ) positive lookahead
+  (*positive_lookahead:...)   )
+
+  (?!...)                     )
+  (*nla:...)                  ) negative lookahead
+  (*negative_lookahead:...)   )
+
+  (?&#60;=...)                    )
+  (*plb:...)                  ) positive lookbehind
+  (*positive_lookbehind:...)  )
+
+  (?&#60;!...)                    )
+  (*nlb:...)                  ) negative lookbehind
+  (*negative_lookbehind:...)  )
 </pre>
-Each top-level branch of a look behind must be of a fixed length.
+Each top-level branch of a lookbehind must be of a fixed length.
 </P>
 <br><a name="SEC20" href="#TOC1">BACKREFERENCES</a><br>
 <P>
@ -634,7 +646,7 @@ Cambridge, England.
 </P>
 <br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 02 September 2018
+Last updated: 24 September 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "21 September 2018" "PCRE2 10.33"
+.TH PCRE2PATTERN 3 "24 September 2018" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -2124,6 +2124,11 @@ special parenthesis, starting with (?> as in this example:
 .sp
  (?>\ed+)foo
 .sp
+Perl 5.28 introduced an experimental alphabetic form starting with (* which may
+be easier to remember:
+.sp
+  (*atomic:\ed+)foo
+.sp
 This kind of parenthesis "locks up" the  part of the pattern it contains once
 it has matched, and a failure further into the pattern is prevented from
 backtracking into it. Backtracking past it to previous items, however, works as
@ -2351,11 +2356,19 @@ above.
 .P
 More complicated assertions are coded as subpatterns. There are two kinds:
 those that look ahead of the current position in the subject string, and those
-that look behind it, and in each case an assertion may be positive (must
-succeed for matching to continue) or negative (must not succeed for matching to
-continue). An assertion subpattern is matched in the normal way, except that,
-when matching continues after a successful assertion, the matching position in
-the subject string is as it was before the assertion was processed.
+that look behind it, and in each case an assertion may be positive (must match
+for the assertion to be true) or negative (must not match for the assertion to
+be true). An assertion subpattern is matched in the normal way, and if it is
+true, matching continues after it, but with the matching position in the
+subject string is was it was before the assertion was processed.
+.P
+A lookaround assertion may also appear as the condition in a
+.\" HTML <a href="#conditions">
+.\" </a>
+conditional subpattern
+.\"
+(see below). In this case, the result of matching the assertion determines
+which branch of the condition is followed.
 .P
 Assertion subpatterns are not capturing subpatterns. If an assertion contains
 capturing subpatterns within it, these are counted for the purposes of
@ -2366,7 +2379,7 @@ adjacent characters are the same.
 .P
 When a branch within an assertion fails to match, any substrings that were
 captured are discarded (as happens with any pattern branch that fails to
-match). A negative assertion succeeds only when all its branches fail to match;
+match). A negative assertion is true only when all its branches fail to match;
 this means that no captured substrings are ever retained after a successful
 negative assertion. When an assertion contains a matching branch, what happens
 depends on the type of assertion.
@ -2374,7 +2387,7 @@ depends on the type of assertion.
 For a positive assertion, internally captured substrings in the successful
 branch are retained, and matching continues with the next pattern item after
 the assertion. For a negative assertion, a matching branch means that the
-assertion has failed. If the assertion is being used as a condition in a
+assertion is not true. If such an assertion is being used as a condition in a
 .\" HTML <a href="#conditions">
 .\" </a>
 conditional subpattern
@ -2406,6 +2419,25 @@ without the assertion, the order depending on the greediness of the quantifier.
 The assertion is obeyed just once when encountered during matching.
 .
 .
+.SS "Alphabetic assertion names"
+.rs
+.sp
+Traditionally, symbolic sequences such as (?= and (?<= have been used to specify
+lookaround assertions. Perl 5.28 introduced some experimental alphabetic
+alternatives which might be easier to remember. They all start with (* instead
+of (? and must be written using lower case letters. PCRE2 supports the
+following synonyms:
+.sp
+  (*positive_lookahead:  or (*pla: is the same as (?=
+  (*negative_lookahead:  or (*nla: is the same as (?!
+  (*positive_lookbehind: or (*plb: is the same as (?<=
+  (*negative_lookbehind: or (*nlb: is the same as (?<!
+.sp
+For example, (*pla:foo) is the same assertion as (?=foo). However, in the
+following sections, the various assertions are described using the original
+symbolic forms.
+.
+.
 .SS "Lookahead assertions"
 .rs
 .sp
@ -3660,6 +3692,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 21 September 2018
+Last updated: 24 September 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "02 September 2018" "PCRE2 10.32"
+.TH PCRE2SYNTAX 3 "24 September 2018" "PCRE2 10.33"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -411,6 +411,7 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
 .rs
 .sp
  (?>...)         atomic, non-capturing group
+  (*atomic:...)   atomic, non-capturing group
 .
 .
 .SH "COMMENT"
@ -491,12 +492,23 @@ setting with a similar syntax.
 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
 .rs
 .sp
-  (?=...)         positive look ahead
-  (?!...)         negative look ahead
-  (?<=...)        positive look behind
-  (?<!...)        negative look behind
+  (?=...)                     )
+  (*pla:...)                  ) positive lookahead
+  (*positive_lookahead:...)   )
 .sp
-Each top-level branch of a look behind must be of a fixed length.
+  (?!...)                     )
+  (*nla:...)                  ) negative lookahead
+  (*negative_lookahead:...)   )
+.sp
+  (?<=...)                    )
+  (*plb:...)                  ) positive lookbehind
+  (*positive_lookbehind:...)  )
+.sp
+  (?<!...)                    )
+  (*nlb:...)                  ) negative lookbehind
+  (*negative_lookbehind:...)  )
+.sp
+Each top-level branch of a lookbehind must be of a fixed length.
 .
 .
 .SH "BACKREFERENCES"
@ -621,6 +633,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 02 September 2018
+Last updated: 24 September 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi
--- a/maint/pcre2_chartables.c.non-standard
+++ b/maint/pcre2_chartables.c.non-standard
@ -103,19 +103,22 @@ const unsigned char _pcre_default_tables[] = {
 0,0,0,0,0,0,0,128,
 255,255,255,255,0,0,0,0,
 0,0,0,0,0,0,0,0,
+
+/* Fiddled by hand when the table bits changed. May be broken! */
+
 128,0,0,0,0,0,0,0,
-0,1,1,0,1,1,0,0,
+0,1,1,1,1,1,0,0,
 0,0,0,0,0,0,0,0,
 0,0,0,0,0,0,0,0,
 1,0,0,0,128,0,0,0,
 128,128,128,128,0,0,128,0,
-28,28,28,28,28,28,28,28,
-28,28,0,0,0,0,0,128,
-0,26,26,26,26,26,26,18,
+24,24,24,24,24,24,24,24,
+24,24,0,0,0,0,0,128,
+0,18,18,18,18,18,18,18,
 18,18,18,18,18,18,18,18,
 18,18,18,18,18,18,18,18,
 18,18,18,128,128,0,128,16,
-0,26,26,26,26,26,26,18,
+0,18,18,18,18,18,18,18,
 18,18,18,18,18,18,18,18,
 18,18,18,18,18,18,18,18,
 18,18,18,128,128,0,0,0,
@ -125,8 +128,8 @@ const unsigned char _pcre_default_tables[] = {
 0,0,0,0,0,0,0,0,
 1,0,0,0,0,0,0,0,
 0,0,18,0,0,0,0,0,
-0,0,20,20,0,18,0,0,
-0,20,18,0,0,0,0,0,
+0,0,24,24,0,18,0,0,
+0,24,18,0,0,0,0,0,
 18,18,18,18,18,18,18,18,
 18,18,18,18,18,18,18,18,
 18,18,18,18,18,18,18,0,
--- a/perltest.sh
+++ b/perltest.sh
@ -75,6 +75,10 @@ fi

 (echo "$prefix" ; cat <<'PERLEND'

+# The alpha assertions currently give warnings even when -w is not specified.
+
+no warnings "experimental::alpha_assertions";
+
 # Function for turning a string into a string of printing chars.

 sub pchars {
@ -129,6 +133,9 @@ else { $outfile = "STDOUT"; }

 printf($outfile "Perl $] Regular Expressions\n\n");

+$extra_modifiers = "";
+$default_show_mark = 0;
+
 # Main loop

 NEXT_RE:
@ -370,7 +377,10 @@ for (;;)
    }
  }

-# printf $outfile "\n";
+# By closing OUTFILE explicitly, we avoid a Perl warning in -w mode 
+# "main::OUTFILE" used only once".
+
+close(OUTFILE) if $outfile eq "OUTFILE"; 

 PERLEND
 ) | $perl $perlarg - $@
--- a/src/dftables.c
+++ b/src/dftables.c
@ -183,10 +183,10 @@ fprintf(f,
  "/* This table identifies various classes of character by individual bits:\n"
  "  0x%02x   white space character\n"
  "  0x%02x   letter\n"
+  "  0x%02x   lower case letter\n"
  "  0x%02x   decimal digit\n"
-  "  0x%02x   hexadecimal digit\n"
  "  0x%02x   alphanumeric or '_'\n*/\n\n",
-  ctype_space, ctype_letter, ctype_digit, ctype_xdigit, ctype_word);
+  ctype_space, ctype_letter, ctype_lcletter, ctype_digit, ctype_word);

 fprintf(f, "  ");
 for (i = 0; i < 256; i++)
--- a/src/pcre2.h.in
+++ b/src/pcre2.h.in
@ -320,6 +320,7 @@ pcre2_pattern_convert(). */
 #define PCRE2_ERROR_BAD_LITERAL_OPTIONS            192
 #define PCRE2_ERROR_SUPPORTED_ONLY_IN_UNICODE      193
 #define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS      194
+#define PCRE2_ERROR_ALPHA_ASSERTION_UNKNOWN        195


 /* "Expected" matching error codes: no match and partial match. */
--- a/src/pcre2_chartables.c.dist
+++ b/src/pcre2_chartables.c.dist
@ -157,8 +157,8 @@ graph print, punct, and cntrl. Other classes are built from combinations. */
 /* This table identifies various classes of character by individual bits:
  0x01   white space character
  0x02   letter
-  0x04   decimal digit
-  0x08   hexadecimal digit
+  0x04   lower case letter
+  0x08   decimal digit
  0x10   alphanumeric or '_'
 */

@ -168,16 +168,16 @@ graph print, punct, and cntrl. Other classes are built from combinations. */
  0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /*  24- 31 */
  0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /*    - '  */
  0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /*  ( - /  */
-  0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c, /*  0 - 7  */
-  0x1c,0x1c,0x00,0x00,0x00,0x00,0x00,0x00, /*  8 - ?  */
-  0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /*  @ - G  */
+  0x18,0x18,0x18,0x18,0x18,0x18,0x18,0x18, /*  0 - 7  */
+  0x18,0x18,0x00,0x00,0x00,0x00,0x00,0x00, /*  8 - ?  */
+  0x00,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /*  @ - G  */
  0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /*  H - O  */
  0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /*  P - W  */
  0x12,0x12,0x12,0x00,0x00,0x00,0x00,0x10, /*  X - _  */
-  0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /*  ` - g  */
-  0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /*  h - o  */
-  0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /*  p - w  */
-  0x12,0x12,0x12,0x00,0x00,0x00,0x00,0x00, /*  x -127 */
+  0x00,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /*  ` - g  */
+  0x16,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /*  h - o  */
+  0x16,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /*  p - w  */
+  0x16,0x16,0x16,0x00,0x00,0x00,0x00,0x00, /*  x -127 */
  0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 128-135 */
  0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 136-143 */
  0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144-151 */
--- a/src/pcre2_compile.c
+++ b/src/pcre2_compile.c
@ -615,6 +615,46 @@ static const uint32_t verbops[] = {
  OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE,
  OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG };

+/* Table of "alpha assertions" like (*pla:...), similar to the (*VERB) table. */
+
+typedef struct alasitem {
+  unsigned int len;          /* Length of name */
+  uint32_t meta;             /* Base META_ code */
+} alasitem;
+
+static const char alasnames[] =
+  STRING_pla0
+  STRING_plb0
+  STRING_nla0
+  STRING_nlb0
+  STRING_positive_lookahead0
+  STRING_positive_lookbehind0
+  STRING_negative_lookahead0
+  STRING_negative_lookbehind0
+  STRING_atomic0
+  STRING_sr0
+  STRING_asr0
+  STRING_script_run0
+  STRING_atomic_script_run;
+
+static const alasitem alasmeta[] = {
+  {  3, META_LOOKAHEAD     },
+  {  3, META_LOOKBEHIND    },
+  {  3, META_LOOKAHEADNOT  },
+  {  3, META_LOOKBEHINDNOT },
+  { 18, META_LOOKAHEAD     },
+  { 19, META_LOOKBEHIND    },
+  { 18, META_LOOKAHEADNOT  },
+  { 19, META_LOOKBEHINDNOT },
+  {  6, META_ATOMIC        },
+  {  2, 0                  }, /* sr = script run */
+  {  3, 0                  }, /* asr = atomic script run */
+  { 10, 0                  }, /* script run */
+  { 17, 0                  }  /* atomic script run */
+};
+
+static const int alascount = sizeof(alasmeta)/sizeof(alasitem);
+
 /* Offsets from OP_STAR for case-independent and negative repeat opcodes. */

 static uint32_t chartypeoffset[] = {
@ -732,7 +772,7 @@ enum { ERR0 = COMPILE_ERROR_BASE,
       ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
       ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
       ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
-       ERR91, ERR92, ERR93, ERR94 };
+       ERR91, ERR92, ERR93, ERR94, ERR95 };

 /* This is a table of start-of-pattern options such as (*UTF) and settings such
 as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
@ -1447,9 +1487,9 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
    c = (uint32_t)i;
    if (cb != NULL && c == CHAR_CR &&
        (cb->cx->extra_options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)
-      c = CHAR_LF;   
+      c = CHAR_LF;
    }
-  else  /* Negative table entry */  
+  else  /* Negative table entry */
    {
    escape = -i;                    /* Else return a special escape */
    if (cb != NULL && (escape == ESC_P || escape == ESC_p || escape == ESC_X))
@ -1499,7 +1539,7 @@ else if ((i = escapes[c - ESCAPES_FIRST]) != 0)
    }
  }

-/* Escapes that need further processing, including those that are unknown, have 
+/* Escapes that need further processing, including those that are unknown, have
 a zero entry in the lookup table. When called from pcre2_substitute(), only \c,
 \o, and \x are recognized (and \u when BSUX is set). */

@ -2133,9 +2173,10 @@ return -1;
 *************************************************/

 /* This function is called from parse_regex() below whenever it needs to read
-the name of a subpattern or a (*VERB). The initial pointer must be to the
-character before the name. If that character is '*' we are reading a verb name.
-The pointer is updated to point after the name, for a VERB, or after tha name's
+the name of a subpattern or a (*VERB) or an (*alpha_assertion). The initial
+pointer must be to the character before the name. If that character is '*' we
+are reading a verb or alpha assertion name. The pointer is updated to point
+after the name, for a VERB or alpha assertion name, or after tha name's
 terminator for a subpattern name. Returning both the offset and the name
 pointer is redundant information, but some callers use one and some the other,
 so it is simplest just to return both.
@ -2160,27 +2201,29 @@ read_name(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t terminator,
  int *errorcodeptr, compile_block *cb)
 {
 PCRE2_SPTR ptr = *ptrptr;
-BOOL is_verb = (*ptr == CHAR_ASTERISK);
+BOOL is_group = (*ptr != CHAR_ASTERISK);
 uint32_t namelen = 0;
-uint32_t ctype = is_verb? ctype_letter : ctype_word;

-if (++ptr >= ptrend)
+if (++ptr >= ptrend)               /* No characters in name */
  {
-  *errorcodeptr = is_verb? ERR60:  /* Verb not recognized or malformed */
-                           ERR62;  /* Subpattern name expected */
+  *errorcodeptr = is_group? ERR62: /* Subpattern name expected */
+                            ERR60; /* Verb not recognized or malformed */
  goto FAILED;
  }
+  
+/* A group name must not start with a digit. If either of the others start with 
+a digit it just won't be recognized. */ 
+  
+if (is_group && IS_DIGIT(*ptr))
+  {
+  *errorcodeptr = ERR44;
+  goto FAILED;
+  }   

 *nameptr = ptr;
 *offsetptr = (PCRE2_SIZE)(ptr - cb->start_pattern);

-if (IS_DIGIT(*ptr))
-  {
-  *errorcodeptr = ERR44;   /* Group name must not start with digit */
-  goto FAILED;
-  }
-
-while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype) != 0)
+while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype_word) != 0)
  {
  ptr++;
  namelen++;
@ -2192,9 +2235,9 @@ while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype) != 0)
  }

 /* Subpattern names must not be empty, and their terminator is checked here.
-(What follows a verb name is checked separately.) */
+(What follows a verb or alpha assertion name is checked separately.) */

-if (!is_verb)
+if (is_group)
  {
  if (namelen == 0)
    {
@ -2652,24 +2695,31 @@ while (ptr < ptrend)
  if (expect_cond_assert > 0)
    {
    BOOL ok = c == CHAR_LEFT_PARENTHESIS && ptrend - ptr >= 3 &&
-              ptr[0] == CHAR_QUESTION_MARK;
-    if (ok) switch(ptr[1])
+              (ptr[0] == CHAR_QUESTION_MARK || ptr[0] == CHAR_ASTERISK);
+    if (ok)
      {
-      case CHAR_C:
-      ok = expect_cond_assert == 2;
-      break;
-
-      case CHAR_EQUALS_SIGN:
-      case CHAR_EXCLAMATION_MARK:
-      break;
-
-      case CHAR_LESS_THAN_SIGN:
-      ok = ptr[2] == CHAR_EQUALS_SIGN || ptr[2] == CHAR_EXCLAMATION_MARK;
-      break;
-
-      default:
-      ok = FALSE;
-      }
+      if (ptr[0] == CHAR_ASTERISK)  /* New alpha assertion format, possibly */
+        {
+        ok = MAX_255(ptr[1]) && (cb->ctypes[ptr[1]] & ctype_lcletter) != 0;
+        }
+      else switch(ptr[1])  /* Traditional symbolic format */
+        {
+        case CHAR_C:
+        ok = expect_cond_assert == 2;
+        break;
+       
+        case CHAR_EQUALS_SIGN:
+        case CHAR_EXCLAMATION_MARK:
+        break;
+       
+        case CHAR_LESS_THAN_SIGN:
+        ok = ptr[2] == CHAR_EQUALS_SIGN || ptr[2] == CHAR_EXCLAMATION_MARK;
+        break;
+       
+        default:
+        ok = FALSE;
+        }
+      }      

    if (!ok)
      {
@ -3453,7 +3503,8 @@ while (ptr < ptrend)
    case CHAR_LEFT_PARENTHESIS:
    if (ptr >= ptrend) goto UNCLOSED_PARENTHESIS;

-    /* If ( is not followed by ? it is either a capture or a special verb. */
+    /* If ( is not followed by ? it is either a capture or a special verb or an
+    alpha assertion. */

    if (*ptr != CHAR_QUESTION_MARK)
      {
@ -3473,13 +3524,88 @@ while (ptr < ptrend)
        else *parsed_pattern++ = META_NOCAPTURE;
        }

+      /* Do nothing for (* followed by end of pattern or ) so it gives a "bad
+      quantifier" error rather than "(*MARK) must have an argument". */
+
+      else if (ptrend - ptr <= 1 || (c = ptr[1]) == CHAR_RIGHT_PARENTHESIS)
+        break;
+
+      /* Handle "alpha assertions" such as (*pla:...). Most of these are
+      synonyms for the historical symbolic assertions, but the script run ones
+      are new. They are distinguished by starting with a lower case letter.
+      Checking both ends of the alphabet makes this work in all character 
+      codes. */
+
+      else if (CHMAX_255(c) && (cb->ctypes[c] & ctype_lcletter) != 0)
+        {
+        uint32_t meta;
+          
+        vn = alasnames;
+        if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode,
+          cb)) goto FAILED;
+        if (ptr >= ptrend || *ptr != CHAR_COLON)
+          {
+          errorcode = ERR95;  /* Malformed */
+          goto FAILED;
+          }
+
+        /* Scan the table of alpha assertion names */
+        
+        for (i = 0; i < alascount; i++)
+          {
+          if (namelen == alasmeta[i].len &&
+              PRIV(strncmp_c8)(name, vn, namelen) == 0)
+            break;
+          vn += alasmeta[i].len + 1;
+          }
+
+        if (i >= alascount)
+          {
+          errorcode = ERR95;  /* Alpha assertion not recognized */
+          goto FAILED;
+          }
+          
+        /* Check for expecting an assertion condition. If so, only lookaround 
+        assertions are valid. */
+         
+        meta = alasmeta[i].meta;
+        if (prev_expect_cond_assert > 0 && 
+            (meta < META_LOOKAHEAD || meta > META_LOOKBEHINDNOT))
+          {
+          errorcode = ERR28;  /* Assertion expected */
+          goto FAILED;  
+          }                                  
+
+        switch(meta)
+          {
+          case META_ATOMIC:
+          goto ATOMIC_GROUP; 
+
+          case META_LOOKAHEAD:
+          goto POSITIVE_LOOK_AHEAD;
+          
+          case META_LOOKAHEADNOT:
+          goto NEGATIVE_LOOK_AHEAD;
+          
+          case META_LOOKBEHIND:
+          case META_LOOKBEHINDNOT: 
+          *parsed_pattern++ = meta; 
+          ptr--;
+          goto LOOKBEHIND;  
+          
+          /* FIXME: Script Run stuff ... */ 
+            
+          
+ 
+
+ 
+          }  
+        }
+

      /* ---- Handle (*VERB) and (*VERB:NAME) ---- */

-      /* Do nothing for (*) so it gives a "bad quantifier" error rather than
-      "(*MARK) must have an argument". */
-
-      else if (ptrend - ptr > 1 && ptr[1] != CHAR_RIGHT_PARENTHESIS)
+      else
        {
        vn = verbnames;
        if (!read_name(&ptr, ptrend, 0, &offset, &name, &namelen, &errorcode,
@ -3946,14 +4072,15 @@ while (ptr < ptrend)
      if (++ptr >= ptrend) goto UNCLOSED_PARENTHESIS;
      nest_depth++;

-      /* If the next character is ? there must be an assertion next (optionally
-      preceded by a callout). We do not check this here, but instead we set
-      expect_cond_assert to 2. If this is still greater than zero (callouts
-      decrement it) when the next assertion is read, it will be marked as a
-      condition that must not be repeated. A value greater than zero also
-      causes checking that an assertion (possibly with callout) follows. */
+      /* If the next character is ? or * there must be an assertion next
+      (optionally preceded by a callout). We do not check this here, but
+      instead we set expect_cond_assert to 2. If this is still greater than
+      zero (callouts decrement it) when the next assertion is read, it will be
+      marked as a condition that must not be repeated. A value greater than
+      zero also causes checking that an assertion (possibly with callout)
+      follows. */

-      if (*ptr == CHAR_QUESTION_MARK)
+      if (*ptr == CHAR_QUESTION_MARK || *ptr == CHAR_ASTERISK)
        {
        *parsed_pattern++ = META_COND_ASSERT;
        ptr--;   /* Pull pointer back to the opening parenthesis. */
@ -4099,6 +4226,7 @@ while (ptr < ptrend)
      /* ---- Atomic group ---- */

      case CHAR_GREATER_THAN_SIGN:
+      ATOMIC_GROUP:                          /* Come from (*atomic: */
      *parsed_pattern++ = META_ATOMIC;
      nest_depth++;
      ptr++;
@ -4108,11 +4236,13 @@ while (ptr < ptrend)
      /* ---- Lookahead assertions ---- */

      case CHAR_EQUALS_SIGN:
+      POSITIVE_LOOK_AHEAD:                   /* Come from (*pla: */
      *parsed_pattern++ = META_LOOKAHEAD;
      ptr++;
      goto POST_ASSERTION;

      case CHAR_EXCLAMATION_MARK:
+      NEGATIVE_LOOK_AHEAD:                   /* Come from (*nla: */
      *parsed_pattern++ = META_LOOKAHEADNOT;
      ptr++;
      goto POST_ASSERTION;
@ -4132,6 +4262,8 @@ while (ptr < ptrend)
        }
      *parsed_pattern++ = (ptr[1] == CHAR_EQUALS_SIGN)?
        META_LOOKBEHIND : META_LOOKBEHINDNOT;
+        
+      LOOKBEHIND:                /* Come from (*plb: and (*nlb: */
      *has_lookbehind = TRUE;
      offset = (PCRE2_SIZE)(ptr - cb->start_pattern - 2);
      PUTOFFSET(offset, parsed_pattern);
--- a/src/pcre2_error.c
+++ b/src/pcre2_error.c
@ -181,6 +181,8 @@ static const unsigned char compile_error_texts[] =
  "invalid option bits with PCRE2_LITERAL\0"
  "\\N{U+dddd} is supported only in Unicode (UTF) mode\0"
  "invalid hyphen in option setting\0"
+  /* 95 */
+  "(*alpha_assertion) not recognized\0"  
  ;

 /* Match-time and UTF error texts are in the same format. */
--- a/src/pcre2_internal.h
+++ b/src/pcre2_internal.h
@ -569,11 +569,11 @@ these tables. */
 without checking pcre2_jit_compile.c, which has an assertion to ensure that
 ctype_word has the value 16. */

-#define ctype_space   0x01
-#define ctype_letter  0x02
-#define ctype_digit   0x04
-#define ctype_xdigit  0x08    /* not actually used any more */
-#define ctype_word    0x10    /* alphanumeric or '_' */
+#define ctype_space    0x01
+#define ctype_letter   0x02
+#define ctype_lcletter 0x04
+#define ctype_digit    0x08
+#define ctype_word     0x10    /* alphanumeric or '_' */

 /* Offsets of the various tables from the base tables pointer, and
 total length of the tables. */
@ -874,34 +874,48 @@ a positive value. */
 #define STR_RIGHT_CURLY_BRACKET     "}"
 #define STR_TILDE                   "~"

-#define STRING_ACCEPT0              "ACCEPT\0"
-#define STRING_COMMIT0              "COMMIT\0"
-#define STRING_F0                   "F\0"
-#define STRING_FAIL0                "FAIL\0"
-#define STRING_MARK0                "MARK\0"
-#define STRING_PRUNE0               "PRUNE\0"
-#define STRING_SKIP0                "SKIP\0"
-#define STRING_THEN                 "THEN"
+#define STRING_ACCEPT0               "ACCEPT\0"
+#define STRING_COMMIT0               "COMMIT\0"
+#define STRING_F0                    "F\0"
+#define STRING_FAIL0                 "FAIL\0"
+#define STRING_MARK0                 "MARK\0"
+#define STRING_PRUNE0                "PRUNE\0"
+#define STRING_SKIP0                 "SKIP\0"
+#define STRING_THEN                  "THEN"

-#define STRING_alpha0               "alpha\0"
-#define STRING_lower0               "lower\0"
-#define STRING_upper0               "upper\0"
-#define STRING_alnum0               "alnum\0"
-#define STRING_ascii0               "ascii\0"
-#define STRING_blank0               "blank\0"
-#define STRING_cntrl0               "cntrl\0"
-#define STRING_digit0               "digit\0"
-#define STRING_graph0               "graph\0"
-#define STRING_print0               "print\0"
-#define STRING_punct0               "punct\0"
-#define STRING_space0               "space\0"
-#define STRING_word0                "word\0"
-#define STRING_xdigit               "xdigit"
+#define STRING_atomic0               "atomic\0"
+#define STRING_pla0                  "pla\0"
+#define STRING_plb0                  "plb\0"
+#define STRING_nla0                  "nla\0"
+#define STRING_nlb0                  "nlb\0"
+#define STRING_sr0                   "sr\0"
+#define STRING_asr0                  "asr\0"
+#define STRING_positive_lookahead0   "positive_lookahead\0"
+#define STRING_positive_lookbehind0  "positive_lookbehind\0"
+#define STRING_negative_lookahead0   "negative_lookahead\0"
+#define STRING_negative_lookbehind0  "negative_lookbehind\0"
+#define STRING_script_run0           "script_run\0"
+#define STRING_atomic_script_run     "atomic_script_run"

-#define STRING_DEFINE               "DEFINE"
-#define STRING_VERSION              "VERSION"
-#define STRING_WEIRD_STARTWORD      "[:<:]]"
-#define STRING_WEIRD_ENDWORD        "[:>:]]"
+#define STRING_alpha0                "alpha\0"
+#define STRING_lower0                "lower\0"
+#define STRING_upper0                "upper\0"
+#define STRING_alnum0                "alnum\0"
+#define STRING_ascii0                "ascii\0"
+#define STRING_blank0                "blank\0"
+#define STRING_cntrl0                "cntrl\0"
+#define STRING_digit0                "digit\0"
+#define STRING_graph0                "graph\0"
+#define STRING_print0                "print\0"
+#define STRING_punct0                "punct\0"
+#define STRING_space0                "space\0"
+#define STRING_word0                 "word\0"
+#define STRING_xdigit                "xdigit"
+
+#define STRING_DEFINE                "DEFINE"
+#define STRING_VERSION               "VERSION"
+#define STRING_WEIRD_STARTWORD       "[:<:]]"
+#define STRING_WEIRD_ENDWORD         "[:>:]]"

 #define STRING_CR_RIGHTPAR                "CR)"
 #define STRING_LF_RIGHTPAR                "LF)"
@ -1150,34 +1164,48 @@ only. */
 #define STR_RIGHT_CURLY_BRACKET     "\175"
 #define STR_TILDE                   "\176"

-#define STRING_ACCEPT0              STR_A STR_C STR_C STR_E STR_P STR_T "\0"
-#define STRING_COMMIT0              STR_C STR_O STR_M STR_M STR_I STR_T "\0"
-#define STRING_F0                   STR_F "\0"
-#define STRING_FAIL0                STR_F STR_A STR_I STR_L "\0"
-#define STRING_MARK0                STR_M STR_A STR_R STR_K "\0"
-#define STRING_PRUNE0               STR_P STR_R STR_U STR_N STR_E "\0"
-#define STRING_SKIP0                STR_S STR_K STR_I STR_P "\0"
-#define STRING_THEN                 STR_T STR_H STR_E STR_N
+#define STRING_ACCEPT0               STR_A STR_C STR_C STR_E STR_P STR_T "\0"
+#define STRING_COMMIT0               STR_C STR_O STR_M STR_M STR_I STR_T "\0"
+#define STRING_F0                    STR_F "\0"
+#define STRING_FAIL0                 STR_F STR_A STR_I STR_L "\0"
+#define STRING_MARK0                 STR_M STR_A STR_R STR_K "\0"
+#define STRING_PRUNE0                STR_P STR_R STR_U STR_N STR_E "\0"
+#define STRING_SKIP0                 STR_S STR_K STR_I STR_P "\0"
+#define STRING_THEN                  STR_T STR_H STR_E STR_N

-#define STRING_alpha0               STR_a STR_l STR_p STR_h STR_a "\0"
-#define STRING_lower0               STR_l STR_o STR_w STR_e STR_r "\0"
-#define STRING_upper0               STR_u STR_p STR_p STR_e STR_r "\0"
-#define STRING_alnum0               STR_a STR_l STR_n STR_u STR_m "\0"
-#define STRING_ascii0               STR_a STR_s STR_c STR_i STR_i "\0"
-#define STRING_blank0               STR_b STR_l STR_a STR_n STR_k "\0"
-#define STRING_cntrl0               STR_c STR_n STR_t STR_r STR_l "\0"
-#define STRING_digit0               STR_d STR_i STR_g STR_i STR_t "\0"
-#define STRING_graph0               STR_g STR_r STR_a STR_p STR_h "\0"
-#define STRING_print0               STR_p STR_r STR_i STR_n STR_t "\0"
-#define STRING_punct0               STR_p STR_u STR_n STR_c STR_t "\0"
-#define STRING_space0               STR_s STR_p STR_a STR_c STR_e "\0"
-#define STRING_word0                STR_w STR_o STR_r STR_d       "\0"
-#define STRING_xdigit               STR_x STR_d STR_i STR_g STR_i STR_t
+#define STRING_atomic0               STR_a STR_t STR_o STR_m STR_i STR_c "\0"
+#define STRING_pla0                  STR_p STR_l STR_a "\0"
+#define STRING_plb0                  STR_p STR_l STR_b "\0"
+#define STRING_nla0                  STR_n STR_l STR_a "\0"
+#define STRING_nlb0                  STR_n STR_l STR_b "\0"
+#define STRING_sr0                   STR_s STR_r "\0"
+#define STRING_asr0                  STR_a STR_s STR_r "\0"
+#define STRING_positive_lookahead0   STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0"
+#define STRING_positive_lookbehind0  STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0"
+#define STRING_negative_lookahead0   STR_n STR_e STR_g STR_a STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0"
+#define STRING_negative_lookbehind0  STR_n STR_e STR_g STR_a STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0"
+#define STRING_script_run0           STR_s STR_c STR_r STR_i STR_p STR_t STR_UNDERSCORE STR_r STR_u STR_n "\0"
+#define STRING_atomic_script_run     STR_a STR_t STR_o STR_m STR_i STR_c STR_UNDERSCORE STR_s STR_c STR_r STR_i STR_p STR_t STR_UNDERSCORE STR_r STR_u STR_n

-#define STRING_DEFINE               STR_D STR_E STR_F STR_I STR_N STR_E
-#define STRING_VERSION              STR_V STR_E STR_R STR_S STR_I STR_O STR_N
-#define STRING_WEIRD_STARTWORD      STR_LEFT_SQUARE_BRACKET STR_COLON STR_LESS_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET
-#define STRING_WEIRD_ENDWORD        STR_LEFT_SQUARE_BRACKET STR_COLON STR_GREATER_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET
+#define STRING_alpha0                STR_a STR_l STR_p STR_h STR_a "\0"
+#define STRING_lower0                STR_l STR_o STR_w STR_e STR_r "\0"
+#define STRING_upper0                STR_u STR_p STR_p STR_e STR_r "\0"
+#define STRING_alnum0                STR_a STR_l STR_n STR_u STR_m "\0"
+#define STRING_ascii0                STR_a STR_s STR_c STR_i STR_i "\0"
+#define STRING_blank0                STR_b STR_l STR_a STR_n STR_k "\0"
+#define STRING_cntrl0                STR_c STR_n STR_t STR_r STR_l "\0"
+#define STRING_digit0                STR_d STR_i STR_g STR_i STR_t "\0"
+#define STRING_graph0                STR_g STR_r STR_a STR_p STR_h "\0"
+#define STRING_print0                STR_p STR_r STR_i STR_n STR_t "\0"
+#define STRING_punct0                STR_p STR_u STR_n STR_c STR_t "\0"
+#define STRING_space0                STR_s STR_p STR_a STR_c STR_e "\0"
+#define STRING_word0                 STR_w STR_o STR_r STR_d       "\0"
+#define STRING_xdigit                STR_x STR_d STR_i STR_g STR_i STR_t
+
+#define STRING_DEFINE                STR_D STR_E STR_F STR_I STR_N STR_E
+#define STRING_VERSION               STR_V STR_E STR_R STR_S STR_I STR_O STR_N
+#define STRING_WEIRD_STARTWORD       STR_LEFT_SQUARE_BRACKET STR_COLON STR_LESS_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET
+#define STRING_WEIRD_ENDWORD         STR_LEFT_SQUARE_BRACKET STR_COLON STR_GREATER_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET

 #define STRING_CR_RIGHTPAR                STR_C STR_R STR_RIGHT_PARENTHESIS
 #define STRING_LF_RIGHTPAR                STR_L STR_F STR_RIGHT_PARENTHESIS
--- a/src/pcre2_maketables.c
+++ b/src/pcre2_maketables.c
@ -138,8 +138,8 @@ for (i = 0; i < 256; i++)
  int x = 0;
  if (isspace(i)) x += ctype_space;
  if (isalpha(i)) x += ctype_letter;
+  if (islower(i)) x += ctype_lcletter; 
  if (isdigit(i)) x += ctype_digit;
-  if (isxdigit(i)) x += ctype_xdigit;
  if (isalnum(i) || i == '_') x += ctype_word;
  *p++ = x;
  }
--- a/testdata/testinput1
+++ b/testdata/testinput1
@ -6263,4 +6263,69 @@ ef) x/x,mark
    aBCDEF
    AbCDe f

+/(*pla:foo).{6}/
+    abcfoobarxyz
+\= Expect no match
+    abcfooba
+
+/(*positive_lookahead:foo).{6}/
+    abcfoobarxyz
+    
+/(?(*pla:foo).{6}|a..)/
+    foobarbaz
+    abcfoobar       
+
+/(?(*positive_lookahead:foo).{6}|a..)/
+    foobarbaz
+    abcfoobar       
+    
+/(*plb:foo)bar/
+    abcfoobar
+\= Expect no match
+    abcbarfoo     
+
+/(*positive_lookbehind:foo)bar/
+    abcfoobar
+\= Expect no match
+    abcbarfoo
+    
+/(?(*plb:foo)bar|baz)/
+    abcfoobar
+    bazfoobar
+    abcbazfoobar
+    foobazfoobar    
+ 
+/(?(*positive_lookbehind:foo)bar|baz)/
+    abcfoobar
+    bazfoobar
+    abcbazfoobar
+    foobazfoobar    
+ 
+/(*nlb:foo)bar/
+    abcbarfoo     
+\= Expect no match
+    abcfoobar
+
+/(*negative_lookbehind:foo)bar/
+    abcbarfoo     
+\= Expect no match
+    abcfoobar
+    
+/(?(*nlb:foo)bar|baz)/
+    abcfoobaz 
+    abcbarbaz 
+\= Expect no match
+    abcfoobar
+ 
+/(?(*negative_lookbehind:foo)bar|baz)/
+    abcfoobaz 
+    abcbarbaz 
+\= Expect no match
+    abcfoobar
+ 
+/(*atomic:a+)\w/
+    aaab
+\= Expect no match
+    aaaa      
+
 # End of testinput1 
--- a/testdata/testinput2
+++ b/testdata/testinput2
@ -5525,4 +5525,10 @@ a)"xI
 \= Expect no match     
    abc\ndef\nxyz

+/(?(*ACCEPT)xxx)/
+
+/(?(*atomic:xx)xxx)/
+
+/(?(*script_run:xxx)zzz)/
+
 # End of testinput2
--- a/testdata/testoutput1
+++ b/testdata/testoutput1
@ -9929,4 +9929,100 @@ No match
    AbCDe f
 No match

+/(*pla:foo).{6}/
+    abcfoobarxyz
+ 0: foobar
+\= Expect no match
+    abcfooba
+No match
+
+/(*positive_lookahead:foo).{6}/
+    abcfoobarxyz
+ 0: foobar
+    
+/(?(*pla:foo).{6}|a..)/
+    foobarbaz
+ 0: foobar
+    abcfoobar       
+ 0: abc
+
+/(?(*positive_lookahead:foo).{6}|a..)/
+    foobarbaz
+ 0: foobar
+    abcfoobar       
+ 0: abc
+    
+/(*plb:foo)bar/
+    abcfoobar
+ 0: bar
+\= Expect no match
+    abcbarfoo     
+No match
+
+/(*positive_lookbehind:foo)bar/
+    abcfoobar
+ 0: bar
+\= Expect no match
+    abcbarfoo
+No match
+    
+/(?(*plb:foo)bar|baz)/
+    abcfoobar
+ 0: bar
+    bazfoobar
+ 0: baz
+    abcbazfoobar
+ 0: baz
+    foobazfoobar    
+ 0: bar
+ 
+/(?(*positive_lookbehind:foo)bar|baz)/
+    abcfoobar
+ 0: bar
+    bazfoobar
+ 0: baz
+    abcbazfoobar
+ 0: baz
+    foobazfoobar    
+ 0: bar
+ 
+/(*nlb:foo)bar/
+    abcbarfoo     
+ 0: bar
+\= Expect no match
+    abcfoobar
+No match
+
+/(*negative_lookbehind:foo)bar/
+    abcbarfoo     
+ 0: bar
+\= Expect no match
+    abcfoobar
+No match
+    
+/(?(*nlb:foo)bar|baz)/
+    abcfoobaz 
+ 0: baz
+    abcbarbaz 
+ 0: bar
+\= Expect no match
+    abcfoobar
+No match
+ 
+/(?(*negative_lookbehind:foo)bar|baz)/
+    abcfoobaz 
+ 0: baz
+    abcbarbaz 
+ 0: bar
+\= Expect no match
+    abcfoobar
+No match
+ 
+/(*atomic:a+)\w/
+    aaab
+ 0: aaab
+\= Expect no match
+    aaaa      
+No match
+
 # End of testinput1 
--- a/testdata/testoutput12-16
+++ b/testdata/testoutput12-16
@ -575,7 +575,7 @@ Last code unit = 'b'
 Subject length lower bound = 3

 /(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
-Failed: error 160 at offset 12: (*VERB) not recognized or malformed
+Failed: error 160 at offset 14: (*VERB) not recognized or malformed

 /\h/I,utf
 Capturing subpattern count = 0
--- a/testdata/testoutput12-32
+++ b/testdata/testoutput12-32
@ -538,7 +538,7 @@ No match
 Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2

 /(*UTF16)\x{11234}/
-Failed: error 160 at offset 5: (*VERB) not recognized or malformed
+Failed: error 160 at offset 7: (*VERB) not recognized or malformed
  abcd\x{11234}pqr

 /(*UTF)\x{11234}/I
@ -559,7 +559,7 @@ Failed: error 160 at offset 5: (*VERB) not recognized or malformed
  abcd\x{11234}pqr

 /(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I
-Failed: error 160 at offset 12: (*VERB) not recognized or malformed
+Failed: error 160 at offset 14: (*VERB) not recognized or malformed

 /(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I
 Capturing subpattern count = 0
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@ -16812,6 +16812,15 @@ No match
    abc\ndef\nxyz
 No match

+/(?(*ACCEPT)xxx)/
+Failed: error 128 at offset 2: assertion expected after (?( or (?(?C)
+
+/(?(*atomic:xx)xxx)/
+Failed: error 128 at offset 10: assertion expected after (?( or (?(?C)
+
+/(?(*script_run:xxx)zzz)/
+Failed: error 128 at offset 14: assertion expected after (?( or (?(?C)
+
 # End of testinput2
 Error -70: PCRE2_ERROR_BADDATA (unknown error number)
 Error -62: bad serialized data