Update definition of partial match and fix \z and \Z (as documented).
This commit is contained in:
parent
344056baf8
commit
c84a06c96e
10
ChangeLog
10
ChangeLog
|
@ -97,6 +97,16 @@ within it, the nested lookbehind was not correctly processed. For example, if
|
||||||
|
|
||||||
20. Implemented pcre2_get_match_data_size().
|
20. Implemented pcre2_get_match_data_size().
|
||||||
|
|
||||||
|
21. Two alterations to partial matching (not yet done by JIT):
|
||||||
|
|
||||||
|
(a) The definition of a partial match is slightly changed: if a pattern
|
||||||
|
contains any lookbehinds, an empty partial match may be given, because this
|
||||||
|
is another situation where adding characters to the current subject can
|
||||||
|
lead to a full match. Example: /c*+(?<=[bc])/ with subject "ab".
|
||||||
|
|
||||||
|
(b) An empty string partial hard match can be returned for \z and \Z as it
|
||||||
|
is documented that they shouldn't match.
|
||||||
|
|
||||||
|
|
||||||
Version 10.33 16-April-2019
|
Version 10.33 16-April-2019
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
|
@ -2725,12 +2725,16 @@ Your program may crash or loop indefinitely or give wrong results.
|
||||||
</pre>
|
</pre>
|
||||||
These options turn on the partial matching feature. A partial match occurs if
|
These options turn on the partial matching feature. A partial match occurs if
|
||||||
the end of the subject string is reached successfully, but there are not enough
|
the end of the subject string is reached successfully, but there are not enough
|
||||||
subject characters to complete the match. If this happens when
|
subject characters to complete the match. In addition, either at least one
|
||||||
PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD) is set, matching continues by
|
character must have been inspected or the pattern must contain a lookbehind.
|
||||||
testing any remaining alternatives. Only if no complete match can be found is
|
</P>
|
||||||
PCRE2_ERROR_PARTIAL returned instead of PCRE2_ERROR_NOMATCH. In other words,
|
<P>
|
||||||
PCRE2_PARTIAL_SOFT specifies that the caller is prepared to handle a partial
|
If this situation arises when PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD)
|
||||||
match, but only if no complete match can be found.
|
is set, matching continues by testing any remaining alternatives. Only if no
|
||||||
|
complete match can be found is PCRE2_ERROR_PARTIAL returned instead of
|
||||||
|
PCRE2_ERROR_NOMATCH. In other words, PCRE2_PARTIAL_SOFT specifies that the
|
||||||
|
caller is prepared to handle a partial match, but only if no complete match can
|
||||||
|
be found.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if
|
If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if
|
||||||
|
@ -3846,7 +3850,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 25 June 2019
|
Last updated: 20 July 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -45,7 +45,7 @@ as soon as a mistake is made, by beeping and not reflecting the character that
|
||||||
has been typed, for example. This immediate feedback is likely to be a better
|
has been typed, for example. This immediate feedback is likely to be a better
|
||||||
user interface than a check that is delayed until the entire string has been
|
user interface than a check that is delayed until the entire string has been
|
||||||
entered. Partial matching can also be useful when the subject string is very
|
entered. Partial matching can also be useful when the subject string is very
|
||||||
long and is not all available at once.
|
long and is not all available at once, as discussed below.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 supports partial matching by means of the PCRE2_PARTIAL_SOFT and
|
PCRE2 supports partial matching by means of the PCRE2_PARTIAL_SOFT and
|
||||||
|
@ -79,13 +79,18 @@ is also disabled for partial matching.
|
||||||
<P>
|
<P>
|
||||||
A partial match occurs during a call to <b>pcre2_match()</b> when the end of the
|
A partial match occurs during a call to <b>pcre2_match()</b> when the end of the
|
||||||
subject string is reached successfully, but matching cannot continue because
|
subject string is reached successfully, but matching cannot continue because
|
||||||
more characters are needed. However, at least one character in the subject must
|
more characters are needed, and in addition, either at least one character in
|
||||||
have been inspected. This character need not form part of the final matched
|
the subject has been inspected or the pattern contains a lookbehind. An
|
||||||
string; lookbehind assertions and the \K escape sequence provide ways of
|
inspected character need not form part of the final matched string; lookbehind
|
||||||
inspecting characters before the start of a matched string. The requirement for
|
assertions and the \K escape sequence provide ways of inspecting characters
|
||||||
inspecting at least one character exists because an empty string can always be
|
before the start of a matched string.
|
||||||
matched; without such a restriction there would always be a partial match of an
|
</P>
|
||||||
empty string at the end of the subject.
|
<P>
|
||||||
|
The two additional requirements define the cases where adding more characters
|
||||||
|
to the existing subject may complete the match. Without these conditions there
|
||||||
|
would be a partial match of an empty string at the end of the subject for all
|
||||||
|
unanchored patterns (and also for anchored patterns if the subject itself is
|
||||||
|
empty).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When a partial match is returned, the first two elements in the ovector point
|
When a partial match is returned, the first two elements in the ovector point
|
||||||
|
@ -104,7 +109,7 @@ characters.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
What happens when a partial match is identified depends on which of the two
|
What happens when a partial match is identified depends on which of the two
|
||||||
partial matching options are set.
|
partial matching options is set.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
PCRE2_PARTIAL_SOFT WITH pcre2_match()
|
PCRE2_PARTIAL_SOFT WITH pcre2_match()
|
||||||
|
@ -128,12 +133,12 @@ the data that is returned. Consider this pattern:
|
||||||
<pre>
|
<pre>
|
||||||
/123\w+X|dogY/
|
/123\w+X|dogY/
|
||||||
</pre>
|
</pre>
|
||||||
If this is matched against the subject string "abc123dog", both
|
If this is matched against the subject string "abc123dog", both alternatives
|
||||||
alternatives fail to match, but the end of the subject is reached during
|
fail to match, but the end of the subject is reached during matching, so
|
||||||
matching, so PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9,
|
PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9, identifying
|
||||||
identifying "123dog" as the first partial match that was found. (In this
|
"123dog" as the first partial match that was found. (In this example, there are
|
||||||
example, there are two partial matches, because "dog" on its own partially
|
two partial matches, because "dog" on its own partially matches the second
|
||||||
matches the second alternative.)
|
alternative.)
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
PCRE2_PARTIAL_HARD WITH pcre2_match()
|
PCRE2_PARTIAL_HARD WITH pcre2_match()
|
||||||
|
@ -145,8 +150,8 @@ possible complete matches. This option is "hard" because it prefers an earlier
|
||||||
partial match over a later complete match. For this reason, the assumption is
|
partial match over a later complete match. For this reason, the assumption is
|
||||||
made that the end of the supplied subject string may not be the true end of the
|
made that the end of the supplied subject string may not be the true end of the
|
||||||
available data, and so, if \z, \Z, \b, \B, or $ are encountered at the end
|
available data, and so, if \z, \Z, \b, \B, or $ are encountered at the end
|
||||||
of the subject, the result is PCRE2_ERROR_PARTIAL, provided that at least one
|
of the subject, the result is PCRE2_ERROR_PARTIAL, whether or not any
|
||||||
character in the subject has been inspected.
|
characters have been inspected.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Comparing hard and soft partial matching
|
Comparing hard and soft partial matching
|
||||||
|
@ -346,44 +351,25 @@ string "xx123ab", the ovector offsets are 5 and 7 ("ab"). The maximum
|
||||||
lookbehind count is 3, so all characters before offset 2 can be discarded. The
|
lookbehind count is 3, so all characters before offset 2 can be discarded. The
|
||||||
value of <b>startoffset</b> for the next match should be 3. When <b>pcre2test</b>
|
value of <b>startoffset</b> for the next match should be 3. When <b>pcre2test</b>
|
||||||
displays a partial match, it indicates the lookbehind characters with '<'
|
displays a partial match, it indicates the lookbehind characters with '<'
|
||||||
characters:
|
characters if the "allusedtext" modifier is set:
|
||||||
<pre>
|
<pre>
|
||||||
re> "(?<=123)abc"
|
re> "(?<=123)abc"
|
||||||
data> xx123ab\=ph
|
data> xx123ab\=ph,allusedtext
|
||||||
Partial match: 123ab
|
Partial match: 123ab
|
||||||
<<<
|
<<<
|
||||||
</PRE>
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
3. The maximum lookbehind count is also important when the result of a partial
|
|
||||||
match attempt is "no match". In this case, the maximum lookbehind characters
|
|
||||||
from the end of the current segment must be retained at the start of the next
|
|
||||||
segment, in case the lookbehind is at the start of the pattern. Matching the
|
|
||||||
next segment must then start at the appropriate offset.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
4. Because a partial match must always contain at least one character, what
|
|
||||||
might be considered a partial match of an empty string actually gives a "no
|
|
||||||
match" result. For example:
|
|
||||||
<pre>
|
|
||||||
re> /c(?<=abc)x/
|
|
||||||
data> ab\=ps
|
|
||||||
No match
|
|
||||||
</pre>
|
</pre>
|
||||||
If the next segment begins "cx", a match should be found, but this will only
|
However, the "allusedtext" modifier is not available for JIT matching, because
|
||||||
happen if characters from the previous segment are retained. For this reason, a
|
JIT matching does not maintain the first and last consulted characters.
|
||||||
"no match" result should be interpreted as "partial match of an empty string"
|
|
||||||
when the pattern contains lookbehinds.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
5. Matching a subject string that is split into multiple segments may not
|
3. Matching a subject string that is split into multiple segments may not
|
||||||
always produce exactly the same result as matching over one single long string,
|
always produce exactly the same result as matching over one single long string
|
||||||
especially when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and
|
when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and Word
|
||||||
Word Boundaries" above describes an issue that arises if the pattern ends with
|
Boundaries" above describes an issue that arises if the pattern ends with \b
|
||||||
\b or \B. Another kind of difference may occur when there are multiple
|
or \B. Another kind of difference may occur when there are multiple matching
|
||||||
matching possibilities, because (for PCRE2_PARTIAL_SOFT) a partial match result
|
possibilities, because (for PCRE2_PARTIAL_SOFT) a partial match result is given
|
||||||
is given only when there are no completed matches. This means that as soon as
|
only when there are no completed matches. This means that as soon as the
|
||||||
the shortest match has been found, continuation to a new subject segment is no
|
shortest match has been found, continuation to a new subject segment is no
|
||||||
longer possible. Consider this <b>pcre2test</b> example:
|
longer possible. Consider this <b>pcre2test</b> example:
|
||||||
<pre>
|
<pre>
|
||||||
re> /dog(sbody)?/
|
re> /dog(sbody)?/
|
||||||
|
@ -418,7 +404,7 @@ multi-segment data. The example above then behaves differently:
|
||||||
data> gsb\=ph,dfa,dfa_restart
|
data> gsb\=ph,dfa,dfa_restart
|
||||||
Partial match: gsb
|
Partial match: gsb
|
||||||
</pre>
|
</pre>
|
||||||
6. Patterns that contain alternatives at the top level which do not all start
|
4. Patterns that contain alternatives at the top level which do not all start
|
||||||
with the same pattern item may not work as expected when PCRE2_DFA_RESTART is
|
with the same pattern item may not work as expected when PCRE2_DFA_RESTART is
|
||||||
used. For example, consider this pattern:
|
used. For example, consider this pattern:
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -463,7 +449,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC10" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC10" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 21 June 2019
|
Last updated: 21 July 2019
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2019 University of Cambridge.
|
Copyright © 1997-2019 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -2661,13 +2661,16 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
|
||||||
|
|
||||||
These options turn on the partial matching feature. A partial match oc-
|
These options turn on the partial matching feature. A partial match oc-
|
||||||
curs if the end of the subject string is reached successfully, but
|
curs if the end of the subject string is reached successfully, but
|
||||||
there are not enough subject characters to complete the match. If this
|
there are not enough subject characters to complete the match. In addi-
|
||||||
happens when PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD) is set,
|
tion, either at least one character must have been inspected or the
|
||||||
matching continues by testing any remaining alternatives. Only if no
|
pattern must contain a lookbehind.
|
||||||
complete match can be found is PCRE2_ERROR_PARTIAL returned instead of
|
|
||||||
PCRE2_ERROR_NOMATCH. In other words, PCRE2_PARTIAL_SOFT specifies that
|
If this situation arises when PCRE2_PARTIAL_SOFT (but not PCRE2_PAR-
|
||||||
the caller is prepared to handle a partial match, but only if no com-
|
TIAL_HARD) is set, matching continues by testing any remaining alterna-
|
||||||
plete match can be found.
|
tives. Only if no complete match can be found is PCRE2_ERROR_PARTIAL
|
||||||
|
returned instead of PCRE2_ERROR_NOMATCH. In other words, PCRE2_PAR-
|
||||||
|
TIAL_SOFT specifies that the caller is prepared to handle a partial
|
||||||
|
match, but only if no complete match can be found.
|
||||||
|
|
||||||
If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this
|
If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this
|
||||||
case, if a partial match is found, pcre2_match() immediately returns
|
case, if a partial match is found, pcre2_match() immediately returns
|
||||||
|
@ -3702,7 +3705,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 25 June 2019
|
Last updated: 20 July 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -5665,7 +5668,7 @@ PARTIAL MATCHING IN PCRE2
|
||||||
feedback is likely to be a better user interface than a check that is
|
feedback is likely to be a better user interface than a check that is
|
||||||
delayed until the entire string has been entered. Partial matching can
|
delayed until the entire string has been entered. Partial matching can
|
||||||
also be useful when the subject string is very long and is not all
|
also be useful when the subject string is very long and is not all
|
||||||
available at once.
|
available at once, as discussed below.
|
||||||
|
|
||||||
PCRE2 supports partial matching by means of the PCRE2_PARTIAL_SOFT and
|
PCRE2 supports partial matching by means of the PCRE2_PARTIAL_SOFT and
|
||||||
PCRE2_PARTIAL_HARD options, which can be set when calling a matching
|
PCRE2_PARTIAL_HARD options, which can be set when calling a matching
|
||||||
|
@ -5698,14 +5701,18 @@ PARTIAL MATCHING USING pcre2_match()
|
||||||
|
|
||||||
A partial match occurs during a call to pcre2_match() when the end of
|
A partial match occurs during a call to pcre2_match() when the end of
|
||||||
the subject string is reached successfully, but matching cannot con-
|
the subject string is reached successfully, but matching cannot con-
|
||||||
tinue because more characters are needed. However, at least one charac-
|
tinue because more characters are needed, and in addition, either at
|
||||||
ter in the subject must have been inspected. This character need not
|
least one character in the subject has been inspected or the pattern
|
||||||
form part of the final matched string; lookbehind assertions and the \K
|
contains a lookbehind. An inspected character need not form part of the
|
||||||
escape sequence provide ways of inspecting characters before the start
|
final matched string; lookbehind assertions and the \K escape sequence
|
||||||
of a matched string. The requirement for inspecting at least one char-
|
provide ways of inspecting characters before the start of a matched
|
||||||
acter exists because an empty string can always be matched; without
|
string.
|
||||||
such a restriction there would always be a partial match of an empty
|
|
||||||
string at the end of the subject.
|
The two additional requirements define the cases where adding more
|
||||||
|
characters to the existing subject may complete the match. Without
|
||||||
|
these conditions there would be a partial match of an empty string at
|
||||||
|
the end of the subject for all unanchored patterns (and also for an-
|
||||||
|
chored patterns if the subject itself is empty).
|
||||||
|
|
||||||
When a partial match is returned, the first two elements in the ovector
|
When a partial match is returned, the first two elements in the ovector
|
||||||
point to the portion of the subject that was matched, but the values in
|
point to the portion of the subject that was matched, but the values in
|
||||||
|
@ -5722,7 +5729,7 @@ PARTIAL MATCHING USING pcre2_match()
|
||||||
quent re-match with additional characters.
|
quent re-match with additional characters.
|
||||||
|
|
||||||
What happens when a partial match is identified depends on which of the
|
What happens when a partial match is identified depends on which of the
|
||||||
two partial matching options are set.
|
two partial matching options is set.
|
||||||
|
|
||||||
PCRE2_PARTIAL_SOFT WITH pcre2_match()
|
PCRE2_PARTIAL_SOFT WITH pcre2_match()
|
||||||
|
|
||||||
|
@ -5759,8 +5766,8 @@ PARTIAL MATCHING USING pcre2_match()
|
||||||
reason, the assumption is made that the end of the supplied subject
|
reason, the assumption is made that the end of the supplied subject
|
||||||
string may not be the true end of the available data, and so, if \z,
|
string may not be the true end of the available data, and so, if \z,
|
||||||
\Z, \b, \B, or $ are encountered at the end of the subject, the result
|
\Z, \b, \B, or $ are encountered at the end of the subject, the result
|
||||||
is PCRE2_ERROR_PARTIAL, provided that at least one character in the
|
is PCRE2_ERROR_PARTIAL, whether or not any characters have been in-
|
||||||
subject has been inspected.
|
spected.
|
||||||
|
|
||||||
Comparing hard and soft partial matching
|
Comparing hard and soft partial matching
|
||||||
|
|
||||||
|
@ -5963,43 +5970,25 @@ ISSUES WITH MULTI-SEGMENT MATCHING
|
||||||
mum lookbehind count is 3, so all characters before offset 2 can be
|
mum lookbehind count is 3, so all characters before offset 2 can be
|
||||||
discarded. The value of startoffset for the next match should be 3.
|
discarded. The value of startoffset for the next match should be 3.
|
||||||
When pcre2test displays a partial match, it indicates the lookbehind
|
When pcre2test displays a partial match, it indicates the lookbehind
|
||||||
characters with '<' characters:
|
characters with '<' characters if the "allusedtext" modifier is set:
|
||||||
|
|
||||||
re> "(?<=123)abc"
|
re> "(?<=123)abc"
|
||||||
data> xx123ab\=ph
|
data> xx123ab\=ph,allusedtext
|
||||||
Partial match: 123ab
|
Partial match: 123ab
|
||||||
<<<
|
<<< However, the "allusedtext" modifier is not avail-
|
||||||
|
able for JIT matching, because JIT matching does not maintain the first
|
||||||
|
and last consulted characters.
|
||||||
|
|
||||||
3. The maximum lookbehind count is also important when the result of a
|
3. Matching a subject string that is split into multiple segments may
|
||||||
partial match attempt is "no match". In this case, the maximum lookbe-
|
|
||||||
hind characters from the end of the current segment must be retained at
|
|
||||||
the start of the next segment, in case the lookbehind is at the start
|
|
||||||
of the pattern. Matching the next segment must then start at the appro-
|
|
||||||
priate offset.
|
|
||||||
|
|
||||||
4. Because a partial match must always contain at least one character,
|
|
||||||
what might be considered a partial match of an empty string actually
|
|
||||||
gives a "no match" result. For example:
|
|
||||||
|
|
||||||
re> /c(?<=abc)x/
|
|
||||||
data> ab\=ps
|
|
||||||
No match
|
|
||||||
|
|
||||||
If the next segment begins "cx", a match should be found, but this will
|
|
||||||
only happen if characters from the previous segment are retained. For
|
|
||||||
this reason, a "no match" result should be interpreted as "partial
|
|
||||||
match of an empty string" when the pattern contains lookbehinds.
|
|
||||||
|
|
||||||
5. Matching a subject string that is split into multiple segments may
|
|
||||||
not always produce exactly the same result as matching over one single
|
not always produce exactly the same result as matching over one single
|
||||||
long string, especially when PCRE2_PARTIAL_SOFT is used. The section
|
long string when PCRE2_PARTIAL_SOFT is used. The section "Partial
|
||||||
"Partial Matching and Word Boundaries" above describes an issue that
|
Matching and Word Boundaries" above describes an issue that arises if
|
||||||
arises if the pattern ends with \b or \B. Another kind of difference
|
the pattern ends with \b or \B. Another kind of difference may occur
|
||||||
may occur when there are multiple matching possibilities, because (for
|
when there are multiple matching possibilities, because (for PCRE2_PAR-
|
||||||
PCRE2_PARTIAL_SOFT) a partial match result is given only when there are
|
TIAL_SOFT) a partial match result is given only when there are no com-
|
||||||
no completed matches. This means that as soon as the shortest match has
|
pleted matches. This means that as soon as the shortest match has been
|
||||||
been found, continuation to a new subject segment is no longer possi-
|
found, continuation to a new subject segment is no longer possible.
|
||||||
ble. Consider this pcre2test example:
|
Consider this pcre2test example:
|
||||||
|
|
||||||
re> /dog(sbody)?/
|
re> /dog(sbody)?/
|
||||||
data> dogsb\=ps
|
data> dogsb\=ps
|
||||||
|
@ -6034,7 +6023,7 @@ ISSUES WITH MULTI-SEGMENT MATCHING
|
||||||
data> gsb\=ph,dfa,dfa_restart
|
data> gsb\=ph,dfa,dfa_restart
|
||||||
Partial match: gsb
|
Partial match: gsb
|
||||||
|
|
||||||
6. Patterns that contain alternatives at the top level which do not all
|
4. Patterns that contain alternatives at the top level which do not all
|
||||||
start with the same pattern item may not work as expected when
|
start with the same pattern item may not work as expected when
|
||||||
PCRE2_DFA_RESTART is used. For example, consider this pattern:
|
PCRE2_DFA_RESTART is used. For example, consider this pattern:
|
||||||
|
|
||||||
|
@ -6079,7 +6068,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 21 June 2019
|
Last updated: 21 July 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "25 June 2019" "PCRE2 10.34"
|
.TH PCRE2API 3 "20 July 2019" "PCRE2 10.34"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -2719,12 +2719,15 @@ Your program may crash or loop indefinitely or give wrong results.
|
||||||
.sp
|
.sp
|
||||||
These options turn on the partial matching feature. A partial match occurs if
|
These options turn on the partial matching feature. A partial match occurs if
|
||||||
the end of the subject string is reached successfully, but there are not enough
|
the end of the subject string is reached successfully, but there are not enough
|
||||||
subject characters to complete the match. If this happens when
|
subject characters to complete the match. In addition, either at least one
|
||||||
PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD) is set, matching continues by
|
character must have been inspected or the pattern must contain a lookbehind.
|
||||||
testing any remaining alternatives. Only if no complete match can be found is
|
.P
|
||||||
PCRE2_ERROR_PARTIAL returned instead of PCRE2_ERROR_NOMATCH. In other words,
|
If this situation arises when PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD)
|
||||||
PCRE2_PARTIAL_SOFT specifies that the caller is prepared to handle a partial
|
is set, matching continues by testing any remaining alternatives. Only if no
|
||||||
match, but only if no complete match can be found.
|
complete match can be found is PCRE2_ERROR_PARTIAL returned instead of
|
||||||
|
PCRE2_ERROR_NOMATCH. In other words, PCRE2_PARTIAL_SOFT specifies that the
|
||||||
|
caller is prepared to handle a partial match, but only if no complete match can
|
||||||
|
be found.
|
||||||
.P
|
.P
|
||||||
If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if
|
If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if
|
||||||
a partial match is found, \fBpcre2_match()\fP immediately returns
|
a partial match is found, \fBpcre2_match()\fP immediately returns
|
||||||
|
@ -3859,6 +3862,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 25 June 2019
|
Last updated: 20 July 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PARTIAL 3 "21 June 2019" "PCRE2 10.34"
|
.TH PCRE2PARTIAL 3 "21 July 2019" "PCRE2 10.34"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions
|
PCRE2 - Perl-compatible regular expressions
|
||||||
.SH "PARTIAL MATCHING IN PCRE2"
|
.SH "PARTIAL MATCHING IN PCRE2"
|
||||||
|
@ -22,7 +22,7 @@ as soon as a mistake is made, by beeping and not reflecting the character that
|
||||||
has been typed, for example. This immediate feedback is likely to be a better
|
has been typed, for example. This immediate feedback is likely to be a better
|
||||||
user interface than a check that is delayed until the entire string has been
|
user interface than a check that is delayed until the entire string has been
|
||||||
entered. Partial matching can also be useful when the subject string is very
|
entered. Partial matching can also be useful when the subject string is very
|
||||||
long and is not all available at once.
|
long and is not all available at once, as discussed below.
|
||||||
.P
|
.P
|
||||||
PCRE2 supports partial matching by means of the PCRE2_PARTIAL_SOFT and
|
PCRE2 supports partial matching by means of the PCRE2_PARTIAL_SOFT and
|
||||||
PCRE2_PARTIAL_HARD options, which can be set when calling a matching function.
|
PCRE2_PARTIAL_HARD options, which can be set when calling a matching function.
|
||||||
|
@ -55,13 +55,17 @@ is also disabled for partial matching.
|
||||||
.sp
|
.sp
|
||||||
A partial match occurs during a call to \fBpcre2_match()\fP when the end of the
|
A partial match occurs during a call to \fBpcre2_match()\fP when the end of the
|
||||||
subject string is reached successfully, but matching cannot continue because
|
subject string is reached successfully, but matching cannot continue because
|
||||||
more characters are needed. However, at least one character in the subject must
|
more characters are needed, and in addition, either at least one character in
|
||||||
have been inspected. This character need not form part of the final matched
|
the subject has been inspected or the pattern contains a lookbehind. An
|
||||||
string; lookbehind assertions and the \eK escape sequence provide ways of
|
inspected character need not form part of the final matched string; lookbehind
|
||||||
inspecting characters before the start of a matched string. The requirement for
|
assertions and the \eK escape sequence provide ways of inspecting characters
|
||||||
inspecting at least one character exists because an empty string can always be
|
before the start of a matched string.
|
||||||
matched; without such a restriction there would always be a partial match of an
|
.P
|
||||||
empty string at the end of the subject.
|
The two additional requirements define the cases where adding more characters
|
||||||
|
to the existing subject may complete the match. Without these conditions there
|
||||||
|
would be a partial match of an empty string at the end of the subject for all
|
||||||
|
unanchored patterns (and also for anchored patterns if the subject itself is
|
||||||
|
empty).
|
||||||
.P
|
.P
|
||||||
When a partial match is returned, the first two elements in the ovector point
|
When a partial match is returned, the first two elements in the ovector point
|
||||||
to the portion of the subject that was matched, but the values in the rest of
|
to the portion of the subject that was matched, but the values in the rest of
|
||||||
|
@ -78,7 +82,7 @@ these characters are needed for a subsequent re-match with additional
|
||||||
characters.
|
characters.
|
||||||
.P
|
.P
|
||||||
What happens when a partial match is identified depends on which of the two
|
What happens when a partial match is identified depends on which of the two
|
||||||
partial matching options are set.
|
partial matching options is set.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "PCRE2_PARTIAL_SOFT WITH pcre2_match()"
|
.SS "PCRE2_PARTIAL_SOFT WITH pcre2_match()"
|
||||||
|
@ -100,12 +104,12 @@ the data that is returned. Consider this pattern:
|
||||||
.sp
|
.sp
|
||||||
/123\ew+X|dogY/
|
/123\ew+X|dogY/
|
||||||
.sp
|
.sp
|
||||||
If this is matched against the subject string "abc123dog", both
|
If this is matched against the subject string "abc123dog", both alternatives
|
||||||
alternatives fail to match, but the end of the subject is reached during
|
fail to match, but the end of the subject is reached during matching, so
|
||||||
matching, so PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9,
|
PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9, identifying
|
||||||
identifying "123dog" as the first partial match that was found. (In this
|
"123dog" as the first partial match that was found. (In this example, there are
|
||||||
example, there are two partial matches, because "dog" on its own partially
|
two partial matches, because "dog" on its own partially matches the second
|
||||||
matches the second alternative.)
|
alternative.)
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "PCRE2_PARTIAL_HARD WITH pcre2_match()"
|
.SS "PCRE2_PARTIAL_HARD WITH pcre2_match()"
|
||||||
|
@ -117,8 +121,8 @@ possible complete matches. This option is "hard" because it prefers an earlier
|
||||||
partial match over a later complete match. For this reason, the assumption is
|
partial match over a later complete match. For this reason, the assumption is
|
||||||
made that the end of the supplied subject string may not be the true end of the
|
made that the end of the supplied subject string may not be the true end of the
|
||||||
available data, and so, if \ez, \eZ, \eb, \eB, or $ are encountered at the end
|
available data, and so, if \ez, \eZ, \eb, \eB, or $ are encountered at the end
|
||||||
of the subject, the result is PCRE2_ERROR_PARTIAL, provided that at least one
|
of the subject, the result is PCRE2_ERROR_PARTIAL, whether or not any
|
||||||
character in the subject has been inspected.
|
characters have been inspected.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Comparing hard and soft partial matching"
|
.SS "Comparing hard and soft partial matching"
|
||||||
|
@ -319,40 +323,23 @@ string "xx123ab", the ovector offsets are 5 and 7 ("ab"). The maximum
|
||||||
lookbehind count is 3, so all characters before offset 2 can be discarded. The
|
lookbehind count is 3, so all characters before offset 2 can be discarded. The
|
||||||
value of \fBstartoffset\fP for the next match should be 3. When \fBpcre2test\fP
|
value of \fBstartoffset\fP for the next match should be 3. When \fBpcre2test\fP
|
||||||
displays a partial match, it indicates the lookbehind characters with '<'
|
displays a partial match, it indicates the lookbehind characters with '<'
|
||||||
characters:
|
characters if the "allusedtext" modifier is set:
|
||||||
.sp
|
.sp
|
||||||
re> "(?<=123)abc"
|
re> "(?<=123)abc"
|
||||||
data> xx123ab\e=ph
|
data> xx123ab\e=ph,allusedtext
|
||||||
Partial match: 123ab
|
Partial match: 123ab
|
||||||
<<<
|
<<<
|
||||||
|
However, the "allusedtext" modifier is not available for JIT matching, because
|
||||||
|
JIT matching does not maintain the first and last consulted characters.
|
||||||
.P
|
.P
|
||||||
3. The maximum lookbehind count is also important when the result of a partial
|
3. Matching a subject string that is split into multiple segments may not
|
||||||
match attempt is "no match". In this case, the maximum lookbehind characters
|
always produce exactly the same result as matching over one single long string
|
||||||
from the end of the current segment must be retained at the start of the next
|
when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and Word
|
||||||
segment, in case the lookbehind is at the start of the pattern. Matching the
|
Boundaries" above describes an issue that arises if the pattern ends with \eb
|
||||||
next segment must then start at the appropriate offset.
|
or \eB. Another kind of difference may occur when there are multiple matching
|
||||||
.P
|
possibilities, because (for PCRE2_PARTIAL_SOFT) a partial match result is given
|
||||||
4. Because a partial match must always contain at least one character, what
|
only when there are no completed matches. This means that as soon as the
|
||||||
might be considered a partial match of an empty string actually gives a "no
|
shortest match has been found, continuation to a new subject segment is no
|
||||||
match" result. For example:
|
|
||||||
.sp
|
|
||||||
re> /c(?<=abc)x/
|
|
||||||
data> ab\e=ps
|
|
||||||
No match
|
|
||||||
.sp
|
|
||||||
If the next segment begins "cx", a match should be found, but this will only
|
|
||||||
happen if characters from the previous segment are retained. For this reason, a
|
|
||||||
"no match" result should be interpreted as "partial match of an empty string"
|
|
||||||
when the pattern contains lookbehinds.
|
|
||||||
.P
|
|
||||||
5. Matching a subject string that is split into multiple segments may not
|
|
||||||
always produce exactly the same result as matching over one single long string,
|
|
||||||
especially when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and
|
|
||||||
Word Boundaries" above describes an issue that arises if the pattern ends with
|
|
||||||
\eb or \eB. Another kind of difference may occur when there are multiple
|
|
||||||
matching possibilities, because (for PCRE2_PARTIAL_SOFT) a partial match result
|
|
||||||
is given only when there are no completed matches. This means that as soon as
|
|
||||||
the shortest match has been found, continuation to a new subject segment is no
|
|
||||||
longer possible. Consider this \fBpcre2test\fP example:
|
longer possible. Consider this \fBpcre2test\fP example:
|
||||||
.sp
|
.sp
|
||||||
re> /dog(sbody)?/
|
re> /dog(sbody)?/
|
||||||
|
@ -386,7 +373,7 @@ multi-segment data. The example above then behaves differently:
|
||||||
data> gsb\e=ph,dfa,dfa_restart
|
data> gsb\e=ph,dfa,dfa_restart
|
||||||
Partial match: gsb
|
Partial match: gsb
|
||||||
.sp
|
.sp
|
||||||
6. Patterns that contain alternatives at the top level which do not all start
|
4. Patterns that contain alternatives at the top level which do not all start
|
||||||
with the same pattern item may not work as expected when PCRE2_DFA_RESTART is
|
with the same pattern item may not work as expected when PCRE2_DFA_RESTART is
|
||||||
used. For example, consider this pattern:
|
used. For example, consider this pattern:
|
||||||
.sp
|
.sp
|
||||||
|
@ -435,6 +422,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 21 June 2019
|
Last updated: 21 July 2019
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2019 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -966,7 +966,7 @@ for (;;)
|
||||||
if (ptr >= end_subject)
|
if (ptr >= end_subject)
|
||||||
{
|
{
|
||||||
if ((mb->moptions & PCRE2_PARTIAL_HARD) != 0)
|
if ((mb->moptions & PCRE2_PARTIAL_HARD) != 0)
|
||||||
could_continue = TRUE;
|
return PCRE2_ERROR_PARTIAL;
|
||||||
else { ADD_ACTIVE(state_offset + 1, 0); }
|
else { ADD_ACTIVE(state_offset + 1, 0); }
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
|
@ -1015,10 +1015,12 @@ for (;;)
|
||||||
|
|
||||||
/*-----------------------------------------------------------------*/
|
/*-----------------------------------------------------------------*/
|
||||||
case OP_EODN:
|
case OP_EODN:
|
||||||
if (clen == 0 && (mb->moptions & PCRE2_PARTIAL_HARD) != 0)
|
if (clen == 0 || (IS_NEWLINE(ptr) && ptr == end_subject - mb->nllen))
|
||||||
could_continue = TRUE;
|
{
|
||||||
else if (clen == 0 || (IS_NEWLINE(ptr) && ptr == end_subject - mb->nllen))
|
if ((mb->moptions & PCRE2_PARTIAL_HARD) != 0)
|
||||||
{ ADD_ACTIVE(state_offset + 1, 0); }
|
return PCRE2_ERROR_PARTIAL;
|
||||||
|
ADD_ACTIVE(state_offset + 1, 0);
|
||||||
|
}
|
||||||
break;
|
break;
|
||||||
|
|
||||||
/*-----------------------------------------------------------------*/
|
/*-----------------------------------------------------------------*/
|
||||||
|
@ -3181,9 +3183,12 @@ for (;;)
|
||||||
partial_newline || /* Either partial NL */
|
partial_newline || /* Either partial NL */
|
||||||
( /* or ... */
|
( /* or ... */
|
||||||
ptr >= end_subject && /* End of subject and */
|
ptr >= end_subject && /* End of subject and */
|
||||||
ptr > mb->start_used_ptr) /* Inspected non-empty string */
|
( /* either */
|
||||||
|
ptr > mb->start_used_ptr || /* Inspected non-empty string */
|
||||||
|
mb->haslookbehind /* or pattern has lookbehind */
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
))
|
||||||
match_count = PCRE2_ERROR_PARTIAL;
|
match_count = PCRE2_ERROR_PARTIAL;
|
||||||
break; /* Exit from loop along the subject string */
|
break; /* Exit from loop along the subject string */
|
||||||
}
|
}
|
||||||
|
@ -3412,6 +3417,7 @@ mb->tables = re->tables;
|
||||||
mb->start_subject = subject;
|
mb->start_subject = subject;
|
||||||
mb->end_subject = end_subject;
|
mb->end_subject = end_subject;
|
||||||
mb->start_offset = start_offset;
|
mb->start_offset = start_offset;
|
||||||
|
mb->haslookbehind = (re->max_lookbehind > 0);
|
||||||
mb->moptions = options;
|
mb->moptions = options;
|
||||||
mb->poptions = re->overall_options;
|
mb->poptions = re->overall_options;
|
||||||
mb->match_call_count = 0;
|
mb->match_call_count = 0;
|
||||||
|
|
|
@ -854,6 +854,7 @@ typedef struct match_block {
|
||||||
uint32_t match_call_count; /* Number of times a new frame is created */
|
uint32_t match_call_count; /* Number of times a new frame is created */
|
||||||
BOOL hitend; /* Hit the end of the subject at some point */
|
BOOL hitend; /* Hit the end of the subject at some point */
|
||||||
BOOL hasthen; /* Pattern contains (*THEN) */
|
BOOL hasthen; /* Pattern contains (*THEN) */
|
||||||
|
BOOL haslookbehind; /* Pattern contains sigificant lookbehind */
|
||||||
const uint8_t *lcc; /* Points to lower casing table */
|
const uint8_t *lcc; /* Points to lower casing table */
|
||||||
const uint8_t *fcc; /* Points to case-flipping table */
|
const uint8_t *fcc; /* Points to case-flipping table */
|
||||||
const uint8_t *ctypes; /* Points to table of type maps */
|
const uint8_t *ctypes; /* Points to table of type maps */
|
||||||
|
@ -909,6 +910,7 @@ typedef struct dfa_match_block {
|
||||||
uint32_t poptions; /* Pattern options */
|
uint32_t poptions; /* Pattern options */
|
||||||
uint32_t nltype; /* Newline type */
|
uint32_t nltype; /* Newline type */
|
||||||
uint32_t nllen; /* Newline string length */
|
uint32_t nllen; /* Newline string length */
|
||||||
|
BOOL haslookbehind; /* Pattern contains significant lookbehind */
|
||||||
PCRE2_UCHAR nl[4]; /* Newline string when fixed */
|
PCRE2_UCHAR nl[4]; /* Newline string when fixed */
|
||||||
uint16_t bsr_convention; /* \R interpretation */
|
uint16_t bsr_convention; /* \R interpretation */
|
||||||
pcre2_callout_block *cb; /* Points to a callout block */
|
pcre2_callout_block *cb; /* Points to a callout block */
|
||||||
|
|
|
@ -416,7 +416,6 @@ if (caseless)
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* Not in UTF mode */
|
/* Not in UTF mode */
|
||||||
|
|
||||||
{
|
{
|
||||||
for (; length > 0; length--)
|
for (; length > 0; length--)
|
||||||
{
|
{
|
||||||
|
@ -491,11 +490,16 @@ heap is used for a larger vector.
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
|
||||||
/* These macros pack up tests that are used for partial matching several times
|
/* These macros pack up tests that are used for partial matching several times
|
||||||
in the code. We set the "hit end" flag if the pointer is at the end of the
|
in the code. The second one is used when we already know we are past the end of
|
||||||
subject and also past the earliest inspected character (i.e. something has been
|
the subject. We set the "hit end" flag if the pointer is at the end of the
|
||||||
matched, even if not part of the actual matched string). For hard partial
|
subject and either (a) the pointer is past the earliest inspected character
|
||||||
matching, we then return immediately. The second one is used when we already
|
(i.e. something has been matched, even if not part of the actual matched
|
||||||
know we are past the end of the subject. */
|
string), or (b) the pattern contains a lookbehind. These are the conditions for
|
||||||
|
which adding more characters may allow the current match to continue.
|
||||||
|
|
||||||
|
For hard partial matching, we immediately return a partial match. Otherwise,
|
||||||
|
carrying on means that a complete match on the current subject will be sought.
|
||||||
|
A partial match is returned only if no complete match can be found. */
|
||||||
|
|
||||||
#define CHECK_PARTIAL()\
|
#define CHECK_PARTIAL()\
|
||||||
if (Feptr >= mb->end_subject) \
|
if (Feptr >= mb->end_subject) \
|
||||||
|
@ -503,31 +507,13 @@ know we are past the end of the subject. */
|
||||||
SCHECK_PARTIAL(); \
|
SCHECK_PARTIAL(); \
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Original version that allows hard partial to continue if no inspected
|
|
||||||
characters. */
|
|
||||||
|
|
||||||
#define SCHECK_PARTIAL()\
|
#define SCHECK_PARTIAL()\
|
||||||
if (mb->partial != 0 && Feptr > mb->start_used_ptr) \
|
if (mb->partial != 0 && (Feptr > mb->start_used_ptr || mb->haslookbehind)) \
|
||||||
{ \
|
{ \
|
||||||
mb->hitend = TRUE; \
|
mb->hitend = TRUE; \
|
||||||
if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; \
|
if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; \
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Experimental version that makes hard partial give no match instead of
|
|
||||||
continuing if no characters have been inspected. */
|
|
||||||
|
|
||||||
#ifdef NEVERNEVER
|
|
||||||
#define SCHECK_PARTIAL()\
|
|
||||||
if (mb->partial != 0) \
|
|
||||||
{ \
|
|
||||||
if (Feptr > mb->start_used_ptr) \
|
|
||||||
{ \
|
|
||||||
mb->hitend = TRUE; \
|
|
||||||
if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; \
|
|
||||||
} \
|
|
||||||
else if (mb->partial > 1) RRETURN(MATCH_NOMATCH); \
|
|
||||||
}
|
|
||||||
#endif /* NEVERNEVER */
|
|
||||||
|
|
||||||
/* These macros are used to implement backtracking. They simulate a recursive
|
/* These macros are used to implement backtracking. They simulate a recursive
|
||||||
call to the match() function by means of a local vector of frames which
|
call to the match() function by means of a local vector of frames which
|
||||||
|
@ -5670,7 +5656,11 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
|
|
||||||
case OP_EOD:
|
case OP_EOD:
|
||||||
if (Feptr < mb->end_subject) RRETURN(MATCH_NOMATCH);
|
if (Feptr < mb->end_subject) RRETURN(MATCH_NOMATCH);
|
||||||
SCHECK_PARTIAL();
|
if (mb->partial != 0)
|
||||||
|
{
|
||||||
|
mb->hitend = TRUE;
|
||||||
|
if (mb->partial > 1) return PCRE2_ERROR_PARTIAL;
|
||||||
|
}
|
||||||
Fecode++;
|
Fecode++;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
|
@ -5695,7 +5685,11 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
|
|
||||||
/* Either at end of string or \n before end. */
|
/* Either at end of string or \n before end. */
|
||||||
|
|
||||||
SCHECK_PARTIAL();
|
if (mb->partial != 0)
|
||||||
|
{
|
||||||
|
mb->hitend = TRUE;
|
||||||
|
if (mb->partial > 1) return PCRE2_ERROR_PARTIAL;
|
||||||
|
}
|
||||||
Fecode++;
|
Fecode++;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
|
@ -6457,6 +6451,7 @@ mb->start_subject = subject;
|
||||||
mb->start_offset = start_offset;
|
mb->start_offset = start_offset;
|
||||||
mb->end_subject = end_subject;
|
mb->end_subject = end_subject;
|
||||||
mb->hasthen = (re->flags & PCRE2_HASTHEN) != 0;
|
mb->hasthen = (re->flags & PCRE2_HASTHEN) != 0;
|
||||||
|
mb->haslookbehind = (re->max_lookbehind > 0);
|
||||||
mb->poptions = re->overall_options; /* Pattern options */
|
mb->poptions = re->overall_options; /* Pattern options */
|
||||||
mb->ignore_skip_arg = 0;
|
mb->ignore_skip_arg = 0;
|
||||||
mb->mark = mb->nomatch_mark = NULL; /* In case never set */
|
mb->mark = mb->nomatch_mark = NULL; /* In case never set */
|
||||||
|
|
|
@ -5690,10 +5690,33 @@ a)"xI
|
||||||
|
|
||||||
# ----
|
# ----
|
||||||
|
|
||||||
/(?<=(?=.(?<=x)))/
|
|
||||||
ab\=ph
|
|
||||||
|
|
||||||
# Expect error (recursion => not fixed length)
|
# Expect error (recursion => not fixed length)
|
||||||
/(\2)((?=(?<=\1)))/
|
/(\2)((?=(?<=\1)))/
|
||||||
|
|
||||||
|
/c*+(?<=[bc])/
|
||||||
|
abc\=ph,no_jit
|
||||||
|
ab\=ph,no_jit
|
||||||
|
abc\=ps,no_jit
|
||||||
|
ab\=ps,no_jit
|
||||||
|
|
||||||
|
/c++(?<=[bc])/
|
||||||
|
abc\=ph,no_jit
|
||||||
|
ab\=ph,no_jit
|
||||||
|
|
||||||
|
/(?<=(?=.(?<=x)))/
|
||||||
|
abx
|
||||||
|
ab\=ph,no_jit
|
||||||
|
bxyz
|
||||||
|
xyz
|
||||||
|
|
||||||
|
/\z/
|
||||||
|
abc\=ph,no_jit
|
||||||
|
abc\=ps
|
||||||
|
|
||||||
|
/\Z/
|
||||||
|
abc\=ph,no_jit
|
||||||
|
abc\=ps
|
||||||
|
abc\n\=ph,no_jit
|
||||||
|
abc\n\=ps
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -4994,4 +4994,30 @@
|
||||||
ab\=ps
|
ab\=ps
|
||||||
abcx
|
abcx
|
||||||
|
|
||||||
|
/\z/
|
||||||
|
abc\=ph
|
||||||
|
abc\=ps
|
||||||
|
|
||||||
|
/\Z/
|
||||||
|
abc\=ph
|
||||||
|
abc\=ps
|
||||||
|
abc\n\=ph
|
||||||
|
abc\n\=ps
|
||||||
|
|
||||||
|
/c*+(?<=[bc])/
|
||||||
|
abc\=ph
|
||||||
|
ab\=ph
|
||||||
|
abc\=ps
|
||||||
|
ab\=ps
|
||||||
|
|
||||||
|
/c++(?<=[bc])/
|
||||||
|
abc\=ph
|
||||||
|
ab\=ph
|
||||||
|
|
||||||
|
/(?<=(?=.(?<=x)))/
|
||||||
|
abx
|
||||||
|
ab\=ph
|
||||||
|
bxyz
|
||||||
|
xyz
|
||||||
|
|
||||||
# End of testinput6
|
# End of testinput6
|
||||||
|
|
|
@ -17185,14 +17185,52 @@ Subject length lower bound = 1
|
||||||
|
|
||||||
# ----
|
# ----
|
||||||
|
|
||||||
/(?<=(?=.(?<=x)))/
|
|
||||||
ab\=ph
|
|
||||||
No match
|
|
||||||
|
|
||||||
# Expect error (recursion => not fixed length)
|
# Expect error (recursion => not fixed length)
|
||||||
/(\2)((?=(?<=\1)))/
|
/(\2)((?=(?<=\1)))/
|
||||||
Failed: error 125 at offset 8: lookbehind assertion is not fixed length
|
Failed: error 125 at offset 8: lookbehind assertion is not fixed length
|
||||||
|
|
||||||
|
/c*+(?<=[bc])/
|
||||||
|
abc\=ph,no_jit
|
||||||
|
Partial match: c
|
||||||
|
ab\=ph,no_jit
|
||||||
|
Partial match:
|
||||||
|
abc\=ps,no_jit
|
||||||
|
0: c
|
||||||
|
ab\=ps,no_jit
|
||||||
|
0:
|
||||||
|
|
||||||
|
/c++(?<=[bc])/
|
||||||
|
abc\=ph,no_jit
|
||||||
|
Partial match: c
|
||||||
|
ab\=ph,no_jit
|
||||||
|
Partial match:
|
||||||
|
|
||||||
|
/(?<=(?=.(?<=x)))/
|
||||||
|
abx
|
||||||
|
0:
|
||||||
|
ab\=ph,no_jit
|
||||||
|
Partial match:
|
||||||
|
bxyz
|
||||||
|
0:
|
||||||
|
xyz
|
||||||
|
0:
|
||||||
|
|
||||||
|
/\z/
|
||||||
|
abc\=ph,no_jit
|
||||||
|
Partial match:
|
||||||
|
abc\=ps
|
||||||
|
0:
|
||||||
|
|
||||||
|
/\Z/
|
||||||
|
abc\=ph,no_jit
|
||||||
|
Partial match:
|
||||||
|
abc\=ps
|
||||||
|
0:
|
||||||
|
abc\n\=ph,no_jit
|
||||||
|
Partial match: \x0a
|
||||||
|
abc\n\=ps
|
||||||
|
0:
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||||
Error -62: bad serialized data
|
Error -62: bad serialized data
|
||||||
|
|
|
@ -7845,4 +7845,46 @@ Partial match: ab
|
||||||
abcx
|
abcx
|
||||||
0: abcx
|
0: abcx
|
||||||
|
|
||||||
|
/\z/
|
||||||
|
abc\=ph
|
||||||
|
Partial match:
|
||||||
|
abc\=ps
|
||||||
|
0:
|
||||||
|
|
||||||
|
/\Z/
|
||||||
|
abc\=ph
|
||||||
|
Partial match:
|
||||||
|
abc\=ps
|
||||||
|
0:
|
||||||
|
abc\n\=ph
|
||||||
|
Partial match: \x0a
|
||||||
|
abc\n\=ps
|
||||||
|
0:
|
||||||
|
|
||||||
|
/c*+(?<=[bc])/
|
||||||
|
abc\=ph
|
||||||
|
Partial match: c
|
||||||
|
ab\=ph
|
||||||
|
Partial match:
|
||||||
|
abc\=ps
|
||||||
|
0: c
|
||||||
|
ab\=ps
|
||||||
|
0:
|
||||||
|
|
||||||
|
/c++(?<=[bc])/
|
||||||
|
abc\=ph
|
||||||
|
Partial match: c
|
||||||
|
ab\=ph
|
||||||
|
Partial match:
|
||||||
|
|
||||||
|
/(?<=(?=.(?<=x)))/
|
||||||
|
abx
|
||||||
|
0:
|
||||||
|
ab\=ph
|
||||||
|
Partial match:
|
||||||
|
bxyz
|
||||||
|
0:
|
||||||
|
xyz
|
||||||
|
0:
|
||||||
|
|
||||||
# End of testinput6
|
# End of testinput6
|
||||||
|
|
Loading…
Reference in New Issue