Implement non-atomic positive assertions.

2019-07-13 11:12:03 +00:00 · 2019-07-13 11:12:03 +00:00 · 620f3a1307
parent 691aca7a86
commit 620f3a1307
21 changed files with 1134 additions and 683 deletions
--- a/2
+++ b/2
@ -88,6 +88,8 @@ otherwise), an atomic group, or a recursion.
 17. Check for integer overflow when computing lookbehind lengths. Fixes 
 Clusterfuzz issue 15636.

+18. Implement non-atomic positive lookaround assertions.
+

 Version 10.33 16-April-2019
 ---------------------------
--- a/34
+++ b/34
@ -195,6 +195,7 @@ META_END              End of pattern (this value is 0x80000000)
 META_FAIL             (*FAIL)
 META_KET              ) closing parenthesis
 META_LOOKAHEAD        (?= start of lookahead
+META_LOOKAHEAD_NA     (*napla: start of non-atomic lookahead
 META_LOOKAHEADNOT     (?! start of negative lookahead
 META_NOCAPTURE        (?: no capture parens
 META_PLUS             +
@ -286,8 +287,9 @@ The following are also followed just by an offset, but also the lower 16 bits
 of the main word contain the length of the first branch of the lookbehind
 group; this is used when generating OP_REVERSE for that branch.

-META_LOOKBEHIND       (?<=
-META_LOOKBEHINDNOT    (?<!
+META_LOOKBEHIND       (?<=      start of lookbehind
+META_LOOKBEHIND_NA    (*naplb:  start of non-atomic lookbehind
+META_LOOKBEHINDNOT    (?<!      start of negative lookbehind

 The following are followed by two elements, the minimum and maximum. Repeat
 values are limited to 65535 (MAX_REPEAT). A maximum value of "unlimited" is
@ -715,13 +717,15 @@ Assertions
 ----------

 Forward assertions are also just like other subpatterns, but starting with one
-of the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
-OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
-is OP_REVERSE, followed by a count of the number of characters to move back the
-pointer in the subject string. In ASCII or UTF-32 mode, the count is also the
-number of code units, but in UTF-8/16 mode each character may occupy more than
-one code unit. A separate count is present in each alternative of a lookbehind
-assertion, allowing them to have different (but fixed) lengths.
+of the opcodes OP_ASSERT, OP_ASSERT_NA (non-atomic assertion), or
+OP_ASSERT_NOT. Backward assertions use the opcodes OP_ASSERTBACK, 
+OP_ASSERTBACK_NA, and OP_ASSERTBACK_NOT, and the first opcode inside the
+assertion is OP_REVERSE, followed by a count of the number of characters to
+move back the pointer in the subject string. In ASCII or UTF-32 mode, the count
+is also the number of code units, but in UTF-8/16 mode each character may
+occupy more than one code unit. A separate count is present in each alternative
+of a lookbehind assertion, allowing each branch to have a different (but fixed)
+length.


 Conditional subpatterns
@ -754,11 +758,11 @@ tests the PCRE2 version number. This compiles into one of the opcodes OP_TRUE
 or OP_FALSE.

 If a condition is not a back reference, recursion test, DEFINE, or VERSION, it
-must start with a parenthesized assertion, whose opcode normally immediately
-follows OP_COND or OP_SCOND. However, if automatic callouts are enabled, a
-callout is inserted immediately before the assertion. It is also possible to
-insert a manual callout at this point. Only assertion conditions may have
-callouts preceding the condition.
+must start with a parenthesized atomic assertion, whose opcode normally
+immediately follows OP_COND or OP_SCOND. However, if automatic callouts are
+enabled, a callout is inserted immediately before the assertion. It is also
+possible to insert a manual callout at this point. Only assertion conditions
+may have callouts preceding the condition.

 A condition that is the negative assertion (?!) is optimized to OP_FAIL in all
 parts of the pattern, so this is another opcode that may appear as a condition.
@ -823,4 +827,4 @@ not a real opcode, but is used to check at compile time that tables indexed by
 opcode are the correct length, in order to catch updating errors.

 Philip Hazel
-20 July 2018
+12 July 2019
--- a/doc/html/pcre2compat.html
+++ b/doc/html/pcre2compat.html
@ -205,6 +205,11 @@ different way and is not Perl-compatible.
 (l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
 the start of a pattern that set overall options that cannot be changed within
 the pattern.
+<br>
+<br>
+(m) PCRE2 supports non-atomic positive lookaround assertions. This is an 
+extension to the lookaround facilities. The default, Perl-compatible
+lookarounds are atomic.
 </P>
 <P>
 18. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa
@ -234,7 +239,7 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 12 February 2019
+Last updated: 13 July 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@ -33,17 +33,18 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC18" href="#SEC18">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
 <li><a name="TOC19" href="#SEC19">BACKREFERENCES</a>
 <li><a name="TOC20" href="#SEC20">ASSERTIONS</a>
-<li><a name="TOC21" href="#SEC21">SCRIPT RUNS</a>
-<li><a name="TOC22" href="#SEC22">CONDITIONAL GROUPS</a>
-<li><a name="TOC23" href="#SEC23">COMMENTS</a>
-<li><a name="TOC24" href="#SEC24">RECURSIVE PATTERNS</a>
-<li><a name="TOC25" href="#SEC25">GROUPS AS SUBROUTINES</a>
-<li><a name="TOC26" href="#SEC26">ONIGURUMA SUBROUTINE SYNTAX</a>
-<li><a name="TOC27" href="#SEC27">CALLOUTS</a>
-<li><a name="TOC28" href="#SEC28">BACKTRACKING CONTROL</a>
-<li><a name="TOC29" href="#SEC29">SEE ALSO</a>
-<li><a name="TOC30" href="#SEC30">AUTHOR</a>
-<li><a name="TOC31" href="#SEC31">REVISION</a>
+<li><a name="TOC21" href="#SEC21">NON-ATOMIC ASSERTIONS</a>
+<li><a name="TOC22" href="#SEC22">SCRIPT RUNS</a>
+<li><a name="TOC23" href="#SEC23">CONDITIONAL GROUPS</a>
+<li><a name="TOC24" href="#SEC24">COMMENTS</a>
+<li><a name="TOC25" href="#SEC25">RECURSIVE PATTERNS</a>
+<li><a name="TOC26" href="#SEC26">GROUPS AS SUBROUTINES</a>
+<li><a name="TOC27" href="#SEC27">ONIGURUMA SUBROUTINE SYNTAX</a>
+<li><a name="TOC28" href="#SEC28">CALLOUTS</a>
+<li><a name="TOC29" href="#SEC29">BACKTRACKING CONTROL</a>
+<li><a name="TOC30" href="#SEC30">SEE ALSO</a>
+<li><a name="TOC31" href="#SEC31">AUTHOR</a>
+<li><a name="TOC32" href="#SEC32">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION DETAILS</a><br>
 <P>
@ -2364,19 +2365,23 @@ those that look behind it, and in each case an assertion may be positive (must
 match for the assertion to be true) or negative (must not match for the
 assertion to be true). An assertion group is matched in the normal way,
 and if it is true, matching continues after it, but with the matching position
-in the subject string is was it was before the assertion was processed.
+in the subject string reset to what it was before the assertion was processed.
 </P>
 <P>
-A lookaround assertion may also appear as the condition in a
+The Perl-compatible lookaround assertions are atomic. If an assertion is true,
+but there is a subsequent matching failure, there is no backtracking into the
+assertion. However, there are some cases where non-atomic assertions can be 
+useful. PCRE2 has some support for these, described in the section entitled
+<a href="#nonatomicassertions">"Non-atomic assertions"</a>
+below, but they are not Perl-compatible.
+</P>
+<P>
+A lookaround assertion may appear as the condition in a
 <a href="#conditions">conditional group</a>
 (see below). In this case, the result of matching the assertion determines
 which branch of the condition is followed.
 </P>
 <P>
-Lookaround assertions are atomic. If an assertion is true, but there is a
-subsequent matching failure, there is no backtracking into the assertion.
-</P>
-<P>
 Assertion groups are not capture groups. If an assertion contains capture
 groups within it, these are counted for the purposes of numbering the capture
 groups in the whole pattern. Within each branch of an assertion, locally
@ -2429,11 +2434,11 @@ The assertion is obeyed just once when encountered during matching.
 Alphabetic assertion names
 </b><br>
 <P>
-Traditionally, symbolic sequences such as (?= and (?&#60;= have been used to specify
-lookaround assertions. Perl 5.28 introduced some experimental alphabetic
-alternatives which might be easier to remember. They all start with (* instead
-of (? and must be written using lower case letters. PCRE2 supports the
-following synonyms:
+Traditionally, symbolic sequences such as (?= and (?&#60;= have been used to
+specify lookaround assertions. Perl 5.28 introduced some experimental
+alphabetic alternatives which might be easier to remember. They all start with
+(* instead of (? and must be written using lower case letters. PCRE2 supports
+the following synonyms:
 <pre>
  (*positive_lookahead:  or (*pla: is the same as (?=
  (*negative_lookahead:  or (*nla: is the same as (?!
@ -2606,8 +2611,63 @@ preceded by "foo", while
 </pre>
 is another pattern that matches "foo" preceded by three digits and any three
 characters that are not "999".
+<a name="nonatomicassertions"></a></P>
+<br><a name="SEC21" href="#TOC1">NON-ATOMIC ASSERTIONS</a><br>
+<P>
+The traditional Perl-compatible lookaround assertions are atomic. That is, if
+an assertion is true, but there is a subsequent matching failure, there is no
+backtracking into the assertion. However, there are some cases where non-atomic
+positive assertions can be useful. PCRE2 provides these using the following 
+syntax:
+<pre>
+  (*non_atomic_positive_lookahead:  or (*napla:
+  (*non_atomic_positive_lookbehind: or (*naplb: 
+</pre>
+Consider the problem of finding the right-most word in a string that also
+appears earlier in the string, that is, it must appear at least twice in total.
+This pattern returns the required result as captured substring 1:
+<pre>
+  ^(?x)(*napla: .* \b(\w++)) (?&#62; .*? \b\1\b ){2}
+</pre>
+For a subject such as "word1 word2 word3 word2 word3 word4" the result is 
+"word3". How does it work? At the start, ^(?x) anchors the pattern and sets the 
+"x" option, which causes white space (introduced for readability) to be
+ignored. Inside the assertion, the greedy .* at first consumes the entire
+string, but then has to backtrack until the rest of the assertion can match a
+word, which is captured by group 1. In other words, when the assertion first
+succeeds, it captures the right-most word in the string.
 </P>
-<br><a name="SEC21" href="#TOC1">SCRIPT RUNS</a><br>
+<P>
+The current matching point is then reset to the start of the subject, and the
+rest of the pattern match checks for two occurrences of the captured word, 
+using an ungreedy .*? to scan from the left. If this succeeds, we are done, but 
+if the last word in the string does not occur twice, this part of the pattern 
+fails. If a traditional atomic lookhead (?= or (*pla: had been used, the
+assertion could not be re-entered, and the whole match would fail. The pattern
+would succeed only if the very last word in the subject was found twice.
+</P>
+<P>
+Using a non-atomic lookahead, however, means that when the last word does not
+occur twice in the string, the lookahead can backtrack and find the second-last
+word, and so on, until either the match succeeds, or all words have been
+tested.
+</P>
+<P>
+Two conditions must be met for a non-atomic assertion to be useful: the
+contents of one or more capturing groups must change after a backtrack into the
+assertion, and there must be a backreference to a changed group later in the
+pattern. If this is not the case, the rest of the pattern match fails exactly
+as before because nothing has changed, so using a non-atomic assertion just
+wastes resources.
+</P>
+<P>
+Non-atomic assertions are not supported by the alternative matching function
+<b>pcre2_dfa_match()</b>. They are also not supported by JIT (but may be in
+future). Note that assertions that appear as conditions for
+<a href="#conditions">conditional groups</a>
+(see below) must be atomic.
+</P>
+<br><a name="SEC22" href="#TOC1">SCRIPT RUNS</a><br>
 <P>
 In concept, a script run is a sequence of characters that are all from the same
 Unicode script such as Latin or Greek. However, because some scripts are
@ -2669,7 +2729,7 @@ parentheses.
 should not be used within a script run group, because it causes an immediate
 exit from the group, bypassing the script run checking.
 <a name="conditions"></a></P>
-<br><a name="SEC22" href="#TOC1">CONDITIONAL GROUPS</a><br>
+<br><a name="SEC23" href="#TOC1">CONDITIONAL GROUPS</a><br>
 <P>
 It is possible to cause the matching process to obey a pattern fragment
 conditionally or to choose between two alternative fragments, depending on
@ -2845,8 +2905,13 @@ Assertion conditions
 <P>
 If the condition is not in any of the above formats, it must be a parenthesized
 assertion. This may be a positive or negative lookahead or lookbehind
-assertion. Consider this pattern, again containing non-significant white space,
-and with the two alternatives on the second line:
+assertion. However, it must be a traditional atomic assertion, not one of the
+PCRE2-specific
+<a href="#nonatomicassertions">non-atomic assertions.</a>
+</P>
+<P>
+Consider this pattern, again containing non-significant white space, and with
+the two alternatives on the second line:
 <pre>
  (?(?=[^a-z]*[a-z])
  \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
@ -2865,7 +2930,7 @@ positive and negative assertions, because matching always continues after the
 assertion, whether it succeeds or fails. (Compare non-conditional assertions,
 for which captures are retained only for positive assertions that succeed.)
 <a name="comments"></a></P>
-<br><a name="SEC23" href="#TOC1">COMMENTS</a><br>
+<br><a name="SEC24" href="#TOC1">COMMENTS</a><br>
 <P>
 There are two ways of including comments in patterns that are processed by
 PCRE2. In both cases, the start of the comment must not be in a character
@ -2895,7 +2960,7 @@ a newline in the pattern. The sequence \n is still literal at this stage, so
 it does not terminate the comment. Only an actual character with the code value
 0x0a (the default newline) does so.
 <a name="recursion"></a></P>
-<br><a name="SEC24" href="#TOC1">RECURSIVE PATTERNS</a><br>
+<br><a name="SEC25" href="#TOC1">RECURSIVE PATTERNS</a><br>
 <P>
 Consider the problem of matching a string in parentheses, allowing for
 unlimited nested parentheses. Without the use of recursion, the best that can
@ -3083,7 +3148,7 @@ alternative matches "a" and then recurses. In the recursion, \1 does now match
 "b" and so the whole match succeeds. This match used to fail in Perl, but in
 later versions (I tried 5.024) it now works.
 <a name="groupsassubroutines"></a></P>
-<br><a name="SEC25" href="#TOC1">GROUPS AS SUBROUTINES</a><br>
+<br><a name="SEC26" href="#TOC1">GROUPS AS SUBROUTINES</a><br>
 <P>
 If the syntax for a recursive group call (either by number or by name) is used
 outside the parentheses to which it refers, it operates a bit like a subroutine
@ -3131,7 +3196,7 @@ in groups when called as subroutines is described in the section entitled
 <a href="#btsub">"Backtracking verbs in subroutines"</a>
 below.
 <a name="onigurumasubroutines"></a></P>
-<br><a name="SEC26" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
+<br><a name="SEC27" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
 <P>
 For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
 a number enclosed either in angle brackets or single quotes, is an alternative
@ -3149,7 +3214,7 @@ plus or a minus sign it is taken as a relative reference. For example:
 Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
 synonymous. The former is a backreference; the latter is a subroutine call.
 </P>
-<br><a name="SEC27" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC28" href="#TOC1">CALLOUTS</a><br>
 <P>
 Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
 code to be obeyed in the middle of matching a regular expression. This makes it
@ -3225,7 +3290,7 @@ example:
 </pre>
 The doubling is removed before the string is passed to the callout function.
 <a name="backtrackcontrol"></a></P>
-<br><a name="SEC28" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<br><a name="SEC29" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
 There are a number of special "Backtracking Control Verbs" (to use Perl's
 terminology) that modify the behaviour of backtracking during matching. They
@ -3739,12 +3804,12 @@ enclosing group that has alternatives (its normal behaviour). However, if there
 is no such group within the subroutine's group, the subroutine match fails and
 there is a backtrack at the outer level.
 </P>
-<br><a name="SEC29" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC30" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2api</b>(3), <b>pcre2callout</b>(3), <b>pcre2matching</b>(3),
 <b>pcre2syntax</b>(3), <b>pcre2</b>(3).
 </P>
-<br><a name="SEC30" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC31" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@ -3753,9 +3818,9 @@ University Computing Service
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC31" href="#TOC1">REVISION</a><br>
+<br><a name="SEC32" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 22 June 2019
+Last updated: 13 July 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@ -32,15 +32,16 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
 <li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
 <li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
-<li><a name="TOC20" href="#SEC20">SCRIPT RUNS</a>
-<li><a name="TOC21" href="#SEC21">BACKREFERENCES</a>
-<li><a name="TOC22" href="#SEC22">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
-<li><a name="TOC23" href="#SEC23">CONDITIONAL PATTERNS</a>
-<li><a name="TOC24" href="#SEC24">BACKTRACKING CONTROL</a>
-<li><a name="TOC25" href="#SEC25">CALLOUTS</a>
-<li><a name="TOC26" href="#SEC26">SEE ALSO</a>
-<li><a name="TOC27" href="#SEC27">AUTHOR</a>
-<li><a name="TOC28" href="#SEC28">REVISION</a>
+<li><a name="TOC20" href="#SEC20">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
+<li><a name="TOC21" href="#SEC21">SCRIPT RUNS</a>
+<li><a name="TOC22" href="#SEC22">BACKREFERENCES</a>
+<li><a name="TOC23" href="#SEC23">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
+<li><a name="TOC24" href="#SEC24">CONDITIONAL PATTERNS</a>
+<li><a name="TOC25" href="#SEC25">BACKTRACKING CONTROL</a>
+<li><a name="TOC26" href="#SEC26">CALLOUTS</a>
+<li><a name="TOC27" href="#SEC27">SEE ALSO</a>
+<li><a name="TOC28" href="#SEC28">AUTHOR</a>
+<li><a name="TOC29" href="#SEC29">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
 <P>
@ -544,7 +545,18 @@ setting with a similar syntax.
 </pre>
 Each top-level branch of a lookbehind must be of a fixed length.
 </P>
-<br><a name="SEC20" href="#TOC1">SCRIPT RUNS</a><br>
+<br><a name="SEC20" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
+<P>
+These assertions are specific to PCRE2 and are not Perl-compatible.
+<pre>
+  (*napla:...)                  
+  (*non_atomic_positive_lookahead:...)
+
+  (*naplb:...)
+  (*non_atomic_positive_lookbehind:...)
+</PRE>
+</P>
+<br><a name="SEC21" href="#TOC1">SCRIPT RUNS</a><br>
 <P>
 <pre>
  (*script_run:...)           ) script run, can be backtracked into
@ -554,7 +566,7 @@ Each top-level branch of a lookbehind must be of a fixed length.
  (*asr:...)                  )
 </PRE>
 </P>
-<br><a name="SEC21" href="#TOC1">BACKREFERENCES</a><br>
+<br><a name="SEC22" href="#TOC1">BACKREFERENCES</a><br>
 <P>
 <pre>
  \n              reference by number (can be ambiguous)
@ -571,7 +583,7 @@ Each top-level branch of a lookbehind must be of a fixed length.
  (?P=name)       reference by name (Python)
 </PRE>
 </P>
-<br><a name="SEC22" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
+<br><a name="SEC23" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
 <P>
 <pre>
  (?R)            recurse whole pattern
@ -590,7 +602,7 @@ Each top-level branch of a lookbehind must be of a fixed length.
  \g'-n'          call subroutine by relative number (PCRE2 extension)
 </PRE>
 </P>
-<br><a name="SEC23" href="#TOC1">CONDITIONAL PATTERNS</a><br>
+<br><a name="SEC24" href="#TOC1">CONDITIONAL PATTERNS</a><br>
 <P>
 <pre>
  (?(condition)yes-pattern)
@ -613,7 +625,7 @@ Note the ambiguity of (?(R) and (?(Rn) which might be named reference
 conditions or recursion tests. Such a condition is interpreted as a reference
 condition if the relevant named group exists.
 </P>
-<br><a name="SEC24" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
 All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
 name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
@ -640,7 +652,7 @@ pattern is not anchored.
 The effect of one of these verbs in a group called as a subroutine is confined
 to the subroutine call.
 </P>
-<br><a name="SEC25" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC26" href="#TOC1">CALLOUTS</a><br>
 <P>
 <pre>
  (?C)            callout (assumed number 0)
@ -651,12 +663,12 @@ The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
 start and the end), and the starting delimiter { matched with the ending
 delimiter }. To encode the ending delimiter within the string, double it.
 </P>
-<br><a name="SEC26" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC27" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
 <b>pcre2matching</b>(3), <b>pcre2</b>(3).
 </P>
-<br><a name="SEC27" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC28" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@ -665,9 +677,9 @@ University Computing Service
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC28" href="#TOC1">REVISION</a><br>
+<br><a name="SEC29" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 11 February 2019
+Last updated: 12 July 2019
 <br>
 Copyright &copy; 1997-2019 University of Cambridge.
 <br>
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
--- a/doc/pcre2compat.3
+++ b/doc/pcre2compat.3
@ -1,4 +1,4 @@
-.TH PCRE2COMPAT 3 "12 February 2019" "PCRE2 10.33"
+.TH PCRE2COMPAT 3 "13 July 2019" "PCRE2 10.34"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -170,6 +170,10 @@ different way and is not Perl-compatible.
 (l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at
 the start of a pattern that set overall options that cannot be changed within
 the pattern.
+.sp
+(m) PCRE2 supports non-atomic positive lookaround assertions. This is an 
+extension to the lookaround facilities. The default, Perl-compatible
+lookarounds are atomic.
 .P
 18. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa
 modifier restricts /i case-insensitive matching to pure ascii, ignoring Unicode
@ -199,6 +203,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 12 February 2019
+Last updated: 13 July 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "22 June 2019" "PCRE2 10.34"
+.TH PCRE2PATTERN 3 "13 July 2019" "PCRE2 10.34"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -2370,9 +2370,19 @@ those that look behind it, and in each case an assertion may be positive (must
 match for the assertion to be true) or negative (must not match for the
 assertion to be true). An assertion group is matched in the normal way,
 and if it is true, matching continues after it, but with the matching position
-in the subject string is was it was before the assertion was processed.
+in the subject string reset to what it was before the assertion was processed.
 .P
-A lookaround assertion may also appear as the condition in a
+The Perl-compatible lookaround assertions are atomic. If an assertion is true,
+but there is a subsequent matching failure, there is no backtracking into the
+assertion. However, there are some cases where non-atomic assertions can be 
+useful. PCRE2 has some support for these, described in the section entitled
+.\" HTML <a href="#nonatomicassertions">
+.\" </a>
+"Non-atomic assertions"
+.\"
+below, but they are not Perl-compatible.
+.P
+A lookaround assertion may appear as the condition in a
 .\" HTML <a href="#conditions">
 .\" </a>
 conditional group
@ -2380,9 +2390,6 @@ conditional group
 (see below). In this case, the result of matching the assertion determines
 which branch of the condition is followed.
 .P
-Lookaround assertions are atomic. If an assertion is true, but there is a
-subsequent matching failure, there is no backtracking into the assertion.
-.P
 Assertion groups are not capture groups. If an assertion contains capture
 groups within it, these are counted for the purposes of numbering the capture
 groups in the whole pattern. Within each branch of an assertion, locally
@ -2435,11 +2442,11 @@ The assertion is obeyed just once when encountered during matching.
 .SS "Alphabetic assertion names"
 .rs
 .sp
-Traditionally, symbolic sequences such as (?= and (?<= have been used to specify
-lookaround assertions. Perl 5.28 introduced some experimental alphabetic
-alternatives which might be easier to remember. They all start with (* instead
-of (? and must be written using lower case letters. PCRE2 supports the
-following synonyms:
+Traditionally, symbolic sequences such as (?= and (?<= have been used to
+specify lookaround assertions. Perl 5.28 introduced some experimental
+alphabetic alternatives which might be easier to remember. They all start with
+(* instead of (? and must be written using lower case letters. PCRE2 supports
+the following synonyms:
 .sp
  (*positive_lookahead:  or (*pla: is the same as (?=
  (*negative_lookahead:  or (*nla: is the same as (?!
@ -2616,6 +2623,63 @@ is another pattern that matches "foo" preceded by three digits and any three
 characters that are not "999".
 .
 .
+.\" HTML <a name="nonatomicassertions"></a>
+.SH "NON-ATOMIC ASSERTIONS"
+.rs
+.sp
+The traditional Perl-compatible lookaround assertions are atomic. That is, if
+an assertion is true, but there is a subsequent matching failure, there is no
+backtracking into the assertion. However, there are some cases where non-atomic
+positive assertions can be useful. PCRE2 provides these using the following 
+syntax:
+.sp
+  (*non_atomic_positive_lookahead:  or (*napla:
+  (*non_atomic_positive_lookbehind: or (*naplb: 
+.sp
+Consider the problem of finding the right-most word in a string that also
+appears earlier in the string, that is, it must appear at least twice in total.
+This pattern returns the required result as captured substring 1:
+.sp
+  ^(?x)(*napla: .* \eb(\ew++)) (?> .*? \eb\e1\eb ){2}
+.sp
+For a subject such as "word1 word2 word3 word2 word3 word4" the result is 
+"word3". How does it work? At the start, ^(?x) anchors the pattern and sets the 
+"x" option, which causes white space (introduced for readability) to be
+ignored. Inside the assertion, the greedy .* at first consumes the entire
+string, but then has to backtrack until the rest of the assertion can match a
+word, which is captured by group 1. In other words, when the assertion first
+succeeds, it captures the right-most word in the string.
+.P
+The current matching point is then reset to the start of the subject, and the
+rest of the pattern match checks for two occurrences of the captured word, 
+using an ungreedy .*? to scan from the left. If this succeeds, we are done, but 
+if the last word in the string does not occur twice, this part of the pattern 
+fails. If a traditional atomic lookhead (?= or (*pla: had been used, the
+assertion could not be re-entered, and the whole match would fail. The pattern
+would succeed only if the very last word in the subject was found twice.
+.P
+Using a non-atomic lookahead, however, means that when the last word does not
+occur twice in the string, the lookahead can backtrack and find the second-last
+word, and so on, until either the match succeeds, or all words have been
+tested.
+.P
+Two conditions must be met for a non-atomic assertion to be useful: the
+contents of one or more capturing groups must change after a backtrack into the
+assertion, and there must be a backreference to a changed group later in the
+pattern. If this is not the case, the rest of the pattern match fails exactly
+as before because nothing has changed, so using a non-atomic assertion just
+wastes resources.
+.P
+Non-atomic assertions are not supported by the alternative matching function
+\fBpcre2_dfa_match()\fP. They are also not supported by JIT (but may be in
+future). Note that assertions that appear as conditions for
+.\" HTML <a href="#conditions">
+.\" </a>
+conditional groups
+.\"
+(see below) must be atomic.
+.
+.
 .SH "SCRIPT RUNS"
 .rs
 .sp
@ -2867,8 +2931,15 @@ than two digits.
 .sp
 If the condition is not in any of the above formats, it must be a parenthesized
 assertion. This may be a positive or negative lookahead or lookbehind
-assertion. Consider this pattern, again containing non-significant white space,
-and with the two alternatives on the second line:
+assertion. However, it must be a traditional atomic assertion, not one of the
+PCRE2-specific
+.\" HTML <a href="#nonatomicassertions">
+.\" </a>
+non-atomic assertions.
+.\"
+.P
+Consider this pattern, again containing non-significant white space, and with
+the two alternatives on the second line:
 .sp
  (?(?=[^a-z]*[a-z])
  \ed{2}-[a-z]{3}-\ed{2}  |  \ed{2}-\ed{2}-\ed{2} )
@ -3788,6 +3859,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 22 June 2019
+Last updated: 13 July 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "11 February 2019" "PCRE2 10.33"
+.TH PCRE2SYNTAX 3 "12 July 2019" "PCRE2 10.34"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -522,6 +522,18 @@ setting with a similar syntax.
 Each top-level branch of a lookbehind must be of a fixed length.
 .
 .
+.SH "NON-ATOMIC LOOKAROUND ASSERTIONS"
+.rs
+.sp
+These assertions are specific to PCRE2 and are not Perl-compatible.
+.sp
+  (*napla:...)                  
+  (*non_atomic_positive_lookahead:...)
+.sp
+  (*naplb:...)
+  (*non_atomic_positive_lookbehind:...)
+.
+.
 .SH "SCRIPT RUNS"
 .rs
 .sp
@ -654,6 +666,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 11 February 2019
+Last updated: 12 July 2019
 Copyright (c) 1997-2019 University of Cambridge.
 .fi
--- a/src/pcre2.h.in
+++ b/src/pcre2.h.in
@ -307,6 +307,7 @@ pcre2_pattern_convert(). */
 #define PCRE2_ERROR_ALPHA_ASSERTION_UNKNOWN        195
 #define PCRE2_ERROR_SCRIPT_RUN_NOT_AVAILABLE       196
 #define PCRE2_ERROR_TOO_MANY_CAPTURES              197
+#define PCRE2_ERROR_CONDITION_ATOMIC_ASSERTION_EXPECTED  198


 /* "Expected" matching error codes: no match and partial match. */
--- a/src/pcre2_auto_possess.c
+++ b/src/pcre2_auto_possess.c
@ -624,6 +624,13 @@ for(;;)
      case OP_ASSERTBACK_NOT:
      case OP_ONCE:
      return !entered_a_group;
+      
+      /* Non-atomic assertions - don't possessify last iterator. This needs 
+      more thought. */
+      
+      case OP_ASSERT_NA:
+      case OP_ASSERTBACK_NA:
+      return FALSE;   
      }

    /* Skip over the bracket and inspect what comes next. */
--- a/src/pcre2_compile.c
+++ b/src/pcre2_compile.c
@ -250,36 +250,41 @@ is present where expected in a conditional group. */
 #define META_LOOKBEHIND       0x80250000u  /* (?<= */
 #define META_LOOKBEHINDNOT    0x80260000u  /* (?<! */

+/* These cannot be conditions */
+
+#define META_LOOKAHEAD_NA     0x80270000u  /* (*napla: */
+#define META_LOOKBEHIND_NA    0x80280000u  /* (*naplb: */
+
 /* These must be kept in this order, with consecutive values, and the _ARG
 versions of COMMIT, PRUNE, SKIP, and THEN immediately after their non-argument
 versions. */

-#define META_MARK             0x80270000u  /* (*MARK) */
-#define META_ACCEPT           0x80280000u  /* (*ACCEPT) */
-#define META_FAIL             0x80290000u  /* (*FAIL) */
-#define META_COMMIT           0x802a0000u  /* These               */
-#define META_COMMIT_ARG       0x802b0000u  /*   pairs             */
-#define META_PRUNE            0x802c0000u  /*     must            */
-#define META_PRUNE_ARG        0x802d0000u  /*       be            */
-#define META_SKIP             0x802e0000u  /*         kept        */
-#define META_SKIP_ARG         0x802f0000u  /*           in        */
-#define META_THEN             0x80300000u  /*             this    */
-#define META_THEN_ARG         0x80310000u  /*               order */
+#define META_MARK             0x80290000u  /* (*MARK) */
+#define META_ACCEPT           0x802a0000u  /* (*ACCEPT) */
+#define META_FAIL             0x802b0000u  /* (*FAIL) */
+#define META_COMMIT           0x802c0000u  /* These               */
+#define META_COMMIT_ARG       0x802d0000u  /*   pairs             */
+#define META_PRUNE            0x802e0000u  /*     must            */
+#define META_PRUNE_ARG        0x802f0000u  /*       be            */
+#define META_SKIP             0x80300000u  /*         kept        */
+#define META_SKIP_ARG         0x80310000u  /*           in        */
+#define META_THEN             0x80320000u  /*             this    */
+#define META_THEN_ARG         0x80330000u  /*               order */

 /* These must be kept in groups of adjacent 3 values, and all together. */

-#define META_ASTERISK         0x80320000u  /* *  */
-#define META_ASTERISK_PLUS    0x80330000u  /* *+ */
-#define META_ASTERISK_QUERY   0x80340000u  /* *? */
-#define META_PLUS             0x80350000u  /* +  */
-#define META_PLUS_PLUS        0x80360000u  /* ++ */
-#define META_PLUS_QUERY       0x80370000u  /* +? */
-#define META_QUERY            0x80380000u  /* ?  */
-#define META_QUERY_PLUS       0x80390000u  /* ?+ */
-#define META_QUERY_QUERY      0x803a0000u  /* ?? */
-#define META_MINMAX           0x803b0000u  /* {n,m}  repeat */
-#define META_MINMAX_PLUS      0x803c0000u  /* {n,m}+ repeat */
-#define META_MINMAX_QUERY     0x803d0000u  /* {n,m}? repeat */
+#define META_ASTERISK         0x80340000u  /* *  */
+#define META_ASTERISK_PLUS    0x80350000u  /* *+ */
+#define META_ASTERISK_QUERY   0x80360000u  /* *? */
+#define META_PLUS             0x80370000u  /* +  */
+#define META_PLUS_PLUS        0x80380000u  /* ++ */
+#define META_PLUS_QUERY       0x80390000u  /* +? */
+#define META_QUERY            0x803a0000u  /* ?  */
+#define META_QUERY_PLUS       0x803b0000u  /* ?+ */
+#define META_QUERY_QUERY      0x803c0000u  /* ?? */
+#define META_MINMAX           0x803d0000u  /* {n,m}  repeat */
+#define META_MINMAX_PLUS      0x803e0000u  /* {n,m}+ repeat */
+#define META_MINMAX_QUERY     0x803f0000u  /* {n,m}? repeat */

 #define META_FIRST_QUANTIFIER META_ASTERISK
 #define META_LAST_QUANTIFIER  META_MINMAX_QUERY
@ -335,6 +340,8 @@ static unsigned char meta_extra_lengths[] = {
  0,             /* META_LOOKAHEADNOT */
  SIZEOFFSET,    /* META_LOOKBEHIND */
  SIZEOFFSET,    /* META_LOOKBEHINDNOT */
+  0,             /* META_LOOKAHEAD_NA */
+  SIZEOFFSET,    /* META_LOOKBEHIND_NA */
  1,             /* META_MARK - plus the string length */
  0,             /* META_ACCEPT */
  0,             /* META_FAIL */
@ -637,10 +644,14 @@ typedef struct alasitem {
 static const char alasnames[] =
  STRING_pla0
  STRING_plb0
+  STRING_napla0
+  STRING_naplb0
  STRING_nla0
  STRING_nlb0
  STRING_positive_lookahead0
  STRING_positive_lookbehind0
+  STRING_non_atomic_positive_lookahead0
+  STRING_non_atomic_positive_lookbehind0  
  STRING_negative_lookahead0
  STRING_negative_lookbehind0
  STRING_atomic0
@ -652,10 +663,14 @@ static const char alasnames[] =
 static const alasitem alasmeta[] = {
  {  3, META_LOOKAHEAD         },
  {  3, META_LOOKBEHIND        },
+  {  5, META_LOOKAHEAD_NA      },
+  {  5, META_LOOKBEHIND_NA     },
  {  3, META_LOOKAHEADNOT      },
  {  3, META_LOOKBEHINDNOT     },
  { 18, META_LOOKAHEAD         },
  { 19, META_LOOKBEHIND        },
+  { 29, META_LOOKAHEAD_NA      },
+  { 30, META_LOOKBEHIND_NA     }, 
  { 18, META_LOOKAHEADNOT      },
  { 19, META_LOOKBEHINDNOT     },
  {  6, META_ATOMIC            },
@ -784,7 +799,7 @@ enum { ERR0 = COMPILE_ERROR_BASE,
       ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70,
       ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80,
       ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90,
-       ERR91, ERR92, ERR93, ERR94, ERR95, ERR96, ERR97 };
+       ERR91, ERR92, ERR93, ERR94, ERR95, ERR96, ERR97, ERR98 };

 /* This is a table of start-of-pattern options such as (*UTF) and settings such
 as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward
@ -1015,6 +1030,7 @@ for (;;)
    case META_NOCAPTURE: fprintf(stderr, "META (?:"); break;
    case META_LOOKAHEAD: fprintf(stderr, "META (?="); break;
    case META_LOOKAHEADNOT: fprintf(stderr, "META (?!"); break;
+    case META_LOOKAHEAD_NA: fprintf(stderr, "META (*napla:"); break;
    case META_SCRIPT_RUN: fprintf(stderr, "META (*sr:"); break;
    case META_KET: fprintf(stderr, "META )"); break;
    case META_ALT: fprintf(stderr, "META | %d", meta_arg); break;
@ -1046,6 +1062,12 @@ for (;;)
    fprintf(stderr, "%zd", offset);
    break;

+    case META_LOOKBEHIND_NA:
+    fprintf(stderr, "META (*naplb: %d offset=", meta_arg);
+    GETOFFSET(offset, pptr);
+    fprintf(stderr, "%zd", offset);
+    break;
+
    case META_LOOKBEHINDNOT:
    fprintf(stderr, "META (?<! %d offset=", meta_arg);
    GETOFFSET(offset, pptr);
@ -3695,19 +3717,20 @@ while (ptr < ptrend)
          goto FAILED;
          }

-        /* Check for expecting an assertion condition. If so, only lookaround
-        assertions are valid. */
+        /* Check for expecting an assertion condition. If so, only atomic
+        lookaround assertions are valid. */

        meta = alasmeta[i].meta;
        if (prev_expect_cond_assert > 0 &&
            (meta < META_LOOKAHEAD || meta > META_LOOKBEHINDNOT))
          {
-          errorcode = ERR28;  /* Assertion expected */
+          errorcode = (meta == META_LOOKAHEAD_NA || meta == META_LOOKBEHIND_NA)?
+            ERR98 : ERR28;  /* (Atomic) assertion expected */
          goto FAILED;
          }

-        /* The lookaround alphabetic synonyms can be almost entirely handled by
-        jumping to the code that handles the traditional symbolic forms. */
+        /* The lookaround alphabetic synonyms can mostly be handled by jumping
+        to the code that handles the traditional symbolic forms. */

        switch(meta)
          {
@ -3721,11 +3744,17 @@ while (ptr < ptrend)
          case META_LOOKAHEAD:
          goto POSITIVE_LOOK_AHEAD;

+          case META_LOOKAHEAD_NA:
+          *parsed_pattern++ = meta;
+          ptr++;
+          goto POST_ASSERTION;
+
          case META_LOOKAHEADNOT:
          goto NEGATIVE_LOOK_AHEAD;

          case META_LOOKBEHIND:
          case META_LOOKBEHINDNOT:
+          case META_LOOKBEHIND_NA:
          *parsed_pattern++ = meta;
          ptr--;
          goto POST_LOOKBEHIND;
@ -4429,7 +4458,7 @@ while (ptr < ptrend)
      *parsed_pattern++ = (ptr[1] == CHAR_EQUALS_SIGN)?
        META_LOOKBEHIND : META_LOOKBEHINDNOT;

-      POST_LOOKBEHIND:              /* Come from (*plb: and (*nlb: */
+      POST_LOOKBEHIND:              /* Come from (*plb: (*naplb: and (*nlb: */
      *has_lookbehind = TRUE;
      offset = (PCRE2_SIZE)(ptr - cb->start_pattern - 2);
      PUTOFFSET(offset, parsed_pattern);
@ -6300,6 +6329,11 @@ for (;; pptr++)
    cb->assert_depth += 1;
    goto GROUP_PROCESS;

+    case META_LOOKAHEAD_NA:
+    bravalue = OP_ASSERT_NA;
+    cb->assert_depth += 1;
+    goto GROUP_PROCESS;
+
    /* Optimize (?!) to (*FAIL) unless it is quantified - which is a weird
    thing to do, but Perl allows all assertions to be quantified, and when
    they contain capturing parentheses there may be a potential use for
@ -6331,6 +6365,11 @@ for (;; pptr++)
    cb->assert_depth += 1;
    goto GROUP_PROCESS;

+    case META_LOOKBEHIND_NA:
+    bravalue = OP_ASSERTBACK_NA;
+    cb->assert_depth += 1;
+    goto GROUP_PROCESS;
+
    case META_ATOMIC:
    bravalue = OP_ONCE;
    goto GROUP_PROCESS_NOTE_EMPTY;
@ -7931,7 +7970,10 @@ length = 2 + 2*LINK_SIZE + skipunits;
 /* Remember if this is a lookbehind assertion, and if it is, save its length
 and skip over the pattern offset. */

-lookbehind = *code == OP_ASSERTBACK || *code == OP_ASSERTBACK_NOT;
+lookbehind = *code == OP_ASSERTBACK || 
+             *code == OP_ASSERTBACK_NOT ||
+             *code == OP_ASSERTBACK_NA;
+
 if (lookbehind)
  {
  lookbehindlength = META_DATA(pptr[-1]);
@ -8802,8 +8844,10 @@ for (;; pptr++)
    case META_COND_VERSION:
    case META_LOOKAHEAD:
    case META_LOOKAHEADNOT:
+    case META_LOOKAHEAD_NA:
    case META_LOOKBEHIND:
    case META_LOOKBEHINDNOT:
+    case META_LOOKBEHIND_NA:
    case META_NOCAPTURE:
    case META_SCRIPT_RUN:
    nestlevel++;
@ -9064,6 +9108,7 @@ for (;; pptr++)

    case META_LOOKAHEAD:
    case META_LOOKAHEADNOT:
+    case META_LOOKAHEAD_NA:
    pptr = parsed_skip(pptr + 1, PSKIP_KET);
    if (pptr == NULL) goto PARSED_SKIP_FAILED;

@ -9102,6 +9147,7 @@ for (;; pptr++)

    case META_LOOKBEHIND:
    case META_LOOKBEHINDNOT:
+    case META_LOOKBEHIND_NA:
    if (!set_lookbehind_lengths(&pptr, &max, errcodeptr, lcptr, recurses, cb))
      return -1;
    if (max - branchlength > extra) extra = max - branchlength;
@ -9453,6 +9499,7 @@ for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++)
    case META_KET:
    case META_LOOKAHEAD:
    case META_LOOKAHEADNOT:
+    case META_LOOKAHEAD_NA:
    case META_NOCAPTURE:
    case META_PLUS:
    case META_PLUS_PLUS:
@ -9514,6 +9561,7 @@ for (pptr = cb->parsed_pattern; *pptr != META_END; pptr++)

    case META_LOOKBEHIND:
    case META_LOOKBEHINDNOT:
+    case META_LOOKBEHIND_NA:
    if (!set_lookbehind_lengths(&pptr, &max, &errorcode, &loopcount, NULL, cb))
      return errorcode;
    break;
--- a/src/pcre2_dfa_match.c
+++ b/src/pcre2_dfa_match.c
@ -173,6 +173,8 @@ static const uint8_t coptable[] = {
  0,                             /* Assert not                             */
  0,                             /* Assert behind                          */
  0,                             /* Assert behind not                      */
+  0,                             /* NA assert                              */
+  0,                             /* NA assert behind                       */ 
  0,                             /* ONCE                                   */
  0,                             /* SCRIPT_RUN                             */
  0, 0, 0, 0, 0,                 /* BRA, BRAPOS, CBRA, CBRAPOS, COND       */
@ -248,6 +250,8 @@ static const uint8_t poptable[] = {
  0,                             /* Assert not                             */
  0,                             /* Assert behind                          */
  0,                             /* Assert behind not                      */
+  0,                             /* NA assert                              */
+  0,                             /* NA assert behind                       */ 
  0,                             /* ONCE                                   */
  0,                             /* SCRIPT_RUN                             */
  0, 0, 0, 0, 0,                 /* BRA, BRAPOS, CBRA, CBRAPOS, COND       */
--- a/src/pcre2_error.c
+++ b/src/pcre2_error.c
@ -185,6 +185,7 @@ static const unsigned char compile_error_texts[] =
  "(*alpha_assertion) not recognized\0"
  "script runs require Unicode support, which this version of PCRE2 does not have\0"
  "too many capturing groups (maximum 65535)\0"
+  "atomic assertion expected after (?( or (?(?C)\0"
  ;

 /* Match-time and UTF error texts are in the same format. */
--- a/src/pcre2_internal.h
+++ b/src/pcre2_internal.h
@ -883,12 +883,16 @@ a positive value. */
 #define STRING_atomic0               "atomic\0"
 #define STRING_pla0                  "pla\0"
 #define STRING_plb0                  "plb\0"
+#define STRING_napla0                "napla\0"
+#define STRING_naplb0                "naplb\0"
 #define STRING_nla0                  "nla\0"
 #define STRING_nlb0                  "nlb\0"
 #define STRING_sr0                   "sr\0"
 #define STRING_asr0                  "asr\0"
 #define STRING_positive_lookahead0   "positive_lookahead\0"
 #define STRING_positive_lookbehind0  "positive_lookbehind\0"
+#define STRING_non_atomic_positive_lookahead0   "non_atomic_positive_lookahead\0"
+#define STRING_non_atomic_positive_lookbehind0  "non_atomic_positive_lookbehind\0"
 #define STRING_negative_lookahead0   "negative_lookahead\0"
 #define STRING_negative_lookbehind0  "negative_lookbehind\0"
 #define STRING_script_run0           "script_run\0"
@ -1173,12 +1177,16 @@ only. */
 #define STRING_atomic0               STR_a STR_t STR_o STR_m STR_i STR_c "\0"
 #define STRING_pla0                  STR_p STR_l STR_a "\0"
 #define STRING_plb0                  STR_p STR_l STR_b "\0"
+#define STRING_napla0                STR_n STR_a STR_p STR_l STR_a "\0"
+#define STRING_naplb0                STR_n STR_a STR_p STR_l STR_b "\0"
 #define STRING_nla0                  STR_n STR_l STR_a "\0"
 #define STRING_nlb0                  STR_n STR_l STR_b "\0"
 #define STRING_sr0                   STR_s STR_r "\0"
 #define STRING_asr0                  STR_a STR_s STR_r "\0"
 #define STRING_positive_lookahead0   STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0"
 #define STRING_positive_lookbehind0  STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0"
+#define STRING_non_atomic_positive_lookahead0   STR_n STR_o STR_n STR_UNDERSCORE STR_a STR_t STR_o STR_m STR_i STR_c STR_UNDERSCORE STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0"
+#define STRING_non_atomic_positive_lookbehind0  STR_n STR_o STR_n STR_UNDERSCORE STR_a STR_t STR_o STR_m STR_i STR_c STR_UNDERSCORE STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0"
 #define STRING_negative_lookahead0   STR_n STR_e STR_g STR_a STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0"
 #define STRING_negative_lookbehind0  STR_n STR_e STR_g STR_a STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0"
 #define STRING_script_run0           STR_s STR_c STR_r STR_i STR_p STR_t STR_UNDERSCORE STR_r STR_u STR_n "\0"
@ -1303,7 +1311,7 @@ enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s,
 Starting from 1 (i.e. after OP_END), the values up to OP_EOD must correspond in
 order to the list of escapes immediately above. Furthermore, values up to
 OP_DOLLM must not be changed without adjusting the table called autoposstab in
-pcre2_auto_possess.c
+pcre2_auto_possess.c.

 Whenever this list is updated, the two macro definitions that follow must be
 updated to match. The possessification table called "opcode_possessify" in
@ -1501,80 +1509,81 @@ enum {
  OP_KETRMIN,        /* 123 order. They are for groups the repeat for ever. */
  OP_KETRPOS,        /* 124 Possessive unlimited repeat. */

-  /* The assertions must come before BRA, CBRA, ONCE, and COND, and the four
-  asserts must remain in order. */
+  /* The assertions must come before BRA, CBRA, ONCE, and COND. */

  OP_REVERSE,        /* 125 Move pointer back - used in lookbehind assertions */
  OP_ASSERT,         /* 126 Positive lookahead */
  OP_ASSERT_NOT,     /* 127 Negative lookahead */
  OP_ASSERTBACK,     /* 128 Positive lookbehind */
  OP_ASSERTBACK_NOT, /* 129 Negative lookbehind */
+  OP_ASSERT_NA,      /* 130 Positive non-atomic lookahead */
+  OP_ASSERTBACK_NA,  /* 131 Positive non-atomic lookbehind */

  /* ONCE, SCRIPT_RUN, BRA, BRAPOS, CBRA, CBRAPOS, and COND must come
  immediately after the assertions, with ONCE first, as there's a test for >=
  ONCE for a subpattern that isn't an assertion. The POS versions must
  immediately follow the non-POS versions in each case. */

-  OP_ONCE,           /* 130 Atomic group, contains captures */
-  OP_SCRIPT_RUN,     /* 131 Non-capture, but check characters' scripts */
-  OP_BRA,            /* 132 Start of non-capturing bracket */
-  OP_BRAPOS,         /* 133 Ditto, with unlimited, possessive repeat */
-  OP_CBRA,           /* 134 Start of capturing bracket */
-  OP_CBRAPOS,        /* 135 Ditto, with unlimited, possessive repeat */
-  OP_COND,           /* 136 Conditional group */
+  OP_ONCE,           /* 132 Atomic group, contains captures */
+  OP_SCRIPT_RUN,     /* 133 Non-capture, but check characters' scripts */
+  OP_BRA,            /* 134 Start of non-capturing bracket */
+  OP_BRAPOS,         /* 135 Ditto, with unlimited, possessive repeat */
+  OP_CBRA,           /* 136 Start of capturing bracket */
+  OP_CBRAPOS,        /* 137 Ditto, with unlimited, possessive repeat */
+  OP_COND,           /* 138 Conditional group */

  /* These five must follow the previous five, in the same order. There's a
  check for >= SBRA to distinguish the two sets. */

-  OP_SBRA,           /* 137 Start of non-capturing bracket, check empty  */
-  OP_SBRAPOS,        /* 138 Ditto, with unlimited, possessive repeat */
-  OP_SCBRA,          /* 139 Start of capturing bracket, check empty */
-  OP_SCBRAPOS,       /* 140 Ditto, with unlimited, possessive repeat */
-  OP_SCOND,          /* 141 Conditional group, check empty */
+  OP_SBRA,           /* 139 Start of non-capturing bracket, check empty  */
+  OP_SBRAPOS,        /* 149 Ditto, with unlimited, possessive repeat */
+  OP_SCBRA,          /* 141 Start of capturing bracket, check empty */
+  OP_SCBRAPOS,       /* 142 Ditto, with unlimited, possessive repeat */
+  OP_SCOND,          /* 143 Conditional group, check empty */

  /* The next two pairs must (respectively) be kept together. */

-  OP_CREF,           /* 142 Used to hold a capture number as condition */
-  OP_DNCREF,         /* 143 Used to point to duplicate names as a condition */
-  OP_RREF,           /* 144 Used to hold a recursion number as condition */
-  OP_DNRREF,         /* 145 Used to point to duplicate names as a condition */
-  OP_FALSE,          /* 146 Always false (used by DEFINE and VERSION) */
-  OP_TRUE,           /* 147 Always true (used by VERSION) */
+  OP_CREF,           /* 144 Used to hold a capture number as condition */
+  OP_DNCREF,         /* 145 Used to point to duplicate names as a condition */
+  OP_RREF,           /* 146 Used to hold a recursion number as condition */
+  OP_DNRREF,         /* 147 Used to point to duplicate names as a condition */
+  OP_FALSE,          /* 148 Always false (used by DEFINE and VERSION) */
+  OP_TRUE,           /* 149 Always true (used by VERSION) */

-  OP_BRAZERO,        /* 148 These two must remain together and in this */
-  OP_BRAMINZERO,     /* 149 order. */
-  OP_BRAPOSZERO,     /* 150 */
+  OP_BRAZERO,        /* 150 These two must remain together and in this */
+  OP_BRAMINZERO,     /* 151 order. */
+  OP_BRAPOSZERO,     /* 152 */

  /* These are backtracking control verbs */

-  OP_MARK,           /* 151 always has an argument */
-  OP_PRUNE,          /* 152 */
-  OP_PRUNE_ARG,      /* 153 same, but with argument */
-  OP_SKIP,           /* 154 */
-  OP_SKIP_ARG,       /* 155 same, but with argument */
-  OP_THEN,           /* 156 */
-  OP_THEN_ARG,       /* 157 same, but with argument */
-  OP_COMMIT,         /* 158 */
-  OP_COMMIT_ARG,     /* 159 same, but with argument */
+  OP_MARK,           /* 153 always has an argument */
+  OP_PRUNE,          /* 154 */
+  OP_PRUNE_ARG,      /* 155 same, but with argument */
+  OP_SKIP,           /* 156 */
+  OP_SKIP_ARG,       /* 157 same, but with argument */
+  OP_THEN,           /* 158 */
+  OP_THEN_ARG,       /* 159 same, but with argument */
+  OP_COMMIT,         /* 160 */
+  OP_COMMIT_ARG,     /* 161 same, but with argument */

  /* These are forced failure and success verbs. FAIL and ACCEPT do accept an
  argument, but these cases can be compiled as, for example, (*MARK:X)(*FAIL)
  without the need for a special opcode. */

-  OP_FAIL,           /* 160 */
-  OP_ACCEPT,         /* 161 */
-  OP_ASSERT_ACCEPT,  /* 162 Used inside assertions */
-  OP_CLOSE,          /* 163 Used before OP_ACCEPT to close open captures */
+  OP_FAIL,           /* 162 */
+  OP_ACCEPT,         /* 163 */
+  OP_ASSERT_ACCEPT,  /* 164 Used inside assertions */
+  OP_CLOSE,          /* 165 Used before OP_ACCEPT to close open captures */

  /* This is used to skip a subpattern with a {0} quantifier */

-  OP_SKIPZERO,       /* 164 */
+  OP_SKIPZERO,       /* 166 */

  /* This is used to identify a DEFINE group during compilation so that it can
  be checked for having only one branch. It is changed to OP_FALSE before
  compilation finishes. */

-  OP_DEFINE,         /* 165 */
+  OP_DEFINE,         /* 167 */

  /* This is not an opcode, but is used to check that tables indexed by opcode
  are the correct length, in order to catch updating errors - there have been
@ -1587,7 +1596,7 @@ enum {
 /* *** NOTE NOTE NOTE *** Whenever the list above is updated, the two macro
 definitions that follow must also be updated to match. There are also tables
 called "opcode_possessify" in pcre2_compile.c and "coptable" and "poptable" in
-pcre2_dfa_exec.c that must be updated. */
+pcre2_dfa_match.c that must be updated. */


 /* This macro defines textual names for all the opcodes. These are used only
@ -1620,7 +1629,9 @@ some cases doesn't actually use these names at all). */
  "class", "nclass", "xclass", "Ref", "Refi", "DnRef", "DnRefi",  \
  "Recurse", "Callout", "CalloutStr",                             \
  "Alt", "Ket", "KetRmax", "KetRmin", "KetRpos",                  \
-  "Reverse", "Assert", "Assert not", "AssertB", "AssertB not",    \
+  "Reverse", "Assert", "Assert not",                              \
+  "Assert back", "Assert back not",                               \
+  "Non-atomic assert", "Non-atomic assert back",                  \
  "Once",                                                         \
  "Script run",                                                   \
  "Bra", "BraPos", "CBra", "CBraPos",                             \
@ -1705,6 +1716,8 @@ in UTF-8 mode. The code that uses this table must know about such things. */
  1+LINK_SIZE,                   /* Assert not                             */ \
  1+LINK_SIZE,                   /* Assert behind                          */ \
  1+LINK_SIZE,                   /* Assert behind not                      */ \
+  1+LINK_SIZE,                   /* NA Assert                              */ \
+  1+LINK_SIZE,                   /* NA Assert behind                       */ \
  1+LINK_SIZE,                   /* ONCE                                   */ \
  1+LINK_SIZE,                   /* SCRIPT_RUN                             */ \
  1+LINK_SIZE,                   /* BRA                                    */ \
--- a/src/pcre2_match.c
+++ b/src/pcre2_match.c
@ -5127,6 +5127,8 @@ fprintf(stderr, "++ op=%d\n", *Fecode);

    case OP_ASSERT:
    case OP_ASSERTBACK:
+    case OP_ASSERT_NA:
+    case OP_ASSERTBACK_NA:
    Lframe_type = GF_NOCAPTURE | Fop;
    for (;;)
      {
@ -5497,10 +5499,20 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
      case OP_SCOND:
      break;

-      /* Positive assertions are like OP_ONCE, except that in addition the
+      /* Non-atomic positive assertions are like OP_BRA, except that the
      subject pointer must be put back to where it was at the start of the
      assertion. */

+      case OP_ASSERT_NA:
+      case OP_ASSERTBACK_NA:
+      if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr;
+      Feptr = P->eptr;
+      break;
+
+      /* Atomic positive assertions are like OP_ONCE, except that in addition
+      the subject pointer must be put back to where it was at the start of the
+      assertion. */
+
      case OP_ASSERT:
      case OP_ASSERTBACK:
      if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr;
--- a/src/pcre2_printint.c
+++ b/src/pcre2_printint.c
@ -392,6 +392,8 @@ for(;;)
    case OP_ASSERT_NOT:
    case OP_ASSERTBACK:
    case OP_ASSERTBACK_NOT:
+    case OP_ASSERT_NA:
+    case OP_ASSERTBACK_NA:  
    case OP_ONCE:
    case OP_SCRIPT_RUN:
    case OP_COND:
--- a/src/pcre2_study.c
+++ b/src/pcre2_study.c
@ -240,6 +240,8 @@ for (;;)
    case OP_ASSERT_NOT:
    case OP_ASSERTBACK:
    case OP_ASSERTBACK_NOT:
+    case OP_ASSERT_NA:
+    case OP_ASSERTBACK_NA:  
    do cc += GET(cc, 1); while (*cc == OP_ALT);
    /* Fall through */

@ -1089,6 +1091,7 @@ do
      case OP_ONCE:
      case OP_SCRIPT_RUN:
      case OP_ASSERT:
+      case OP_ASSERT_NA: 
      rc = set_start_bits(re, tcode, utf);
      if (rc == SSB_FAIL || rc == SSB_UNKNOWN) return rc;
      if (rc == SSB_DONE) try_next = FALSE; else
@ -1131,6 +1134,7 @@ do
      case OP_ASSERT_NOT:
      case OP_ASSERTBACK:
      case OP_ASSERTBACK_NOT:
+      case OP_ASSERTBACK_NA: 
      do tcode += GET(tcode, 1); while (*tcode == OP_ALT);
      tcode += 1 + LINK_SIZE;
      break;
--- a/testdata/testinput2
+++ b/testdata/testinput2
@ -5653,4 +5653,33 @@ a)"xI
 # Multiplication overflow
 /(X{65535})(?<=\1{32770})/

+# ---- Non-atomic assertion tests ----
+
+# Expect error: not allowed as a condition
+/(?(*napla:xx)bc)/
+
+/\A(*pla:.*\b(\w++))(?>.*?\b\1\b){3}/
+    word1 word3 word1 word2 word3 word2 word2 word1 word3 word4
+
+/\A(*napla:.*\b(\w++))(?>.*?\b\1\b){3}/
+    word1 word3 word1 word2 word3 word2 word2 word1 word3 word4
+
+/(*plb:(.)..|(.)...)(\1|\2)/
+    abcdb\=offset=4 
+    abcda\=offset=4 
+
+/(*naplb:(.)..|(.)...)(\1|\2)/
+    abcdb\=offset=4 
+    abcda\=offset=4 
+    
+/(*non_atomic_positive_lookahead:ab)/B
+ 
+/(*non_atomic_positive_lookbehind:ab)/B 
+
+/(*pla:ab+)/B
+
+/(*napla:ab+)/B
+
+# ----
+
 # End of testinput2
--- a/testdata/testoutput2
+++ b/testdata/testoutput2
@ -11117,7 +11117,7 @@ Matched, but too many substrings
 ------------------------------------------------------------------
        Bra
        Brazero
-        AssertB
+        Assert back
        Reverse
        CBra 1
        abc
@ -13346,7 +13346,7 @@ Failed: error 144 at offset 5: subpattern name must start with a non-digit
        Ket
        red
        \b
-        AssertB
+        Assert back
        Reverse
        \w
        Ket
@ -13403,7 +13403,7 @@ Failed: error 133 at offset 7: parentheses are too deeply nested (stack check)
        Once
        \s*+
        Ket
-        AssertB
+        Assert back
        Reverse
        \w
        Ket
@ -16619,7 +16619,7 @@ No match
 /(?<=(?=.){4,5}x)/B
 ------------------------------------------------------------------
        Bra
-        AssertB
+        Assert back
        Reverse
        Assert
        Any
@ -17086,6 +17086,87 @@ Failed: error 187 at offset 15: lookbehind assertion is too long
 /(X{65535})(?<=\1{32770})/
 Failed: error 187 at offset 10: lookbehind assertion is too long

+# ---- Non-atomic assertion tests ----
+
+# Expect error: not allowed as a condition
+/(?(*napla:xx)bc)/
+Failed: error 198 at offset 9: atomic assertion expected after (?( or (?(?C)
+
+/\A(*pla:.*\b(\w++))(?>.*?\b\1\b){3}/
+    word1 word3 word1 word2 word3 word2 word2 word1 word3 word4
+No match
+
+/\A(*napla:.*\b(\w++))(?>.*?\b\1\b){3}/
+    word1 word3 word1 word2 word3 word2 word2 word1 word3 word4
+ 0: word1 word3 word1 word2 word3 word2 word2 word1 word3
+ 1: word3
+
+/(*plb:(.)..|(.)...)(\1|\2)/
+    abcdb\=offset=4 
+ 0: b
+ 1: b
+ 2: <unset>
+ 3: b
+    abcda\=offset=4 
+No match
+
+/(*naplb:(.)..|(.)...)(\1|\2)/
+    abcdb\=offset=4 
+ 0: b
+ 1: b
+ 2: <unset>
+ 3: b
+    abcda\=offset=4 
+ 0: a
+ 1: <unset>
+ 2: a
+ 3: a
+    
+/(*non_atomic_positive_lookahead:ab)/B
+------------------------------------------------------------------
+        Bra
+        Non-atomic assert
+        ab
+        Ket
+        Ket
+        End
+------------------------------------------------------------------
+ 
+/(*non_atomic_positive_lookbehind:ab)/B 
+------------------------------------------------------------------
+        Bra
+        Non-atomic assert back
+        Reverse
+        ab
+        Ket
+        Ket
+        End
+------------------------------------------------------------------
+
+/(*pla:ab+)/B
+------------------------------------------------------------------
+        Bra
+        Assert
+        a
+        b++
+        Ket
+        Ket
+        End
+------------------------------------------------------------------
+
+/(*napla:ab+)/B
+------------------------------------------------------------------
+        Bra
+        Non-atomic assert
+        a
+        b+
+        Ket
+        Ket
+        End
+------------------------------------------------------------------
+
+# ----
+
 # End of testinput2
 Error -70: PCRE2_ERROR_BADDATA (unknown error number)
 Error -62: bad serialized data
--- a/testdata/testoutput5
+++ b/testdata/testoutput5
@ -4017,7 +4017,7 @@ MK: a\x{12345}b\x{09}(d)c
 ------------------------------------------------------------------
        Bra
        \b
-        AssertB
+        Assert back
        Reverse
        prop Xwd
        Ket
@ -4196,7 +4196,7 @@ Failed: error 125 at offset 2: lookbehind assertion is not fixed length
 ------------------------------------------------------------------
        Bra
        ^
-        AssertB not
+        Assert back not
        Assert
        \x{10385c}
        Ket
@ -4828,7 +4828,7 @@ MK: ABC
 /(?<!)(*sr:)/B
 ------------------------------------------------------------------
        Bra
-        AssertB not
+        Assert back not
        Ket
        Script run
        Ket
@ -4839,7 +4839,7 @@ MK: ABC
 /(?<=abc(?=X(*sr:BXY)CCC)XBXYCCC)./B
 ------------------------------------------------------------------
        Bra
-        AssertB
+        Assert back
        Reverse
        abc
        Assert