Update definition of partial match and fix \z and \Z (as documented).
This commit is contained in:
parent
344056baf8
commit
c84a06c96e
10
ChangeLog
10
ChangeLog
|
@ -97,6 +97,16 @@ within it, the nested lookbehind was not correctly processed. For example, if
|
|||
|
||||
20. Implemented pcre2_get_match_data_size().
|
||||
|
||||
21. Two alterations to partial matching (not yet done by JIT):
|
||||
|
||||
(a) The definition of a partial match is slightly changed: if a pattern
|
||||
contains any lookbehinds, an empty partial match may be given, because this
|
||||
is another situation where adding characters to the current subject can
|
||||
lead to a full match. Example: /c*+(?<=[bc])/ with subject "ab".
|
||||
|
||||
(b) An empty string partial hard match can be returned for \z and \Z as it
|
||||
is documented that they shouldn't match.
|
||||
|
||||
|
||||
Version 10.33 16-April-2019
|
||||
---------------------------
|
||||
|
|
|
@ -2725,12 +2725,16 @@ Your program may crash or loop indefinitely or give wrong results.
|
|||
</pre>
|
||||
These options turn on the partial matching feature. A partial match occurs if
|
||||
the end of the subject string is reached successfully, but there are not enough
|
||||
subject characters to complete the match. If this happens when
|
||||
PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD) is set, matching continues by
|
||||
testing any remaining alternatives. Only if no complete match can be found is
|
||||
PCRE2_ERROR_PARTIAL returned instead of PCRE2_ERROR_NOMATCH. In other words,
|
||||
PCRE2_PARTIAL_SOFT specifies that the caller is prepared to handle a partial
|
||||
match, but only if no complete match can be found.
|
||||
subject characters to complete the match. In addition, either at least one
|
||||
character must have been inspected or the pattern must contain a lookbehind.
|
||||
</P>
|
||||
<P>
|
||||
If this situation arises when PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD)
|
||||
is set, matching continues by testing any remaining alternatives. Only if no
|
||||
complete match can be found is PCRE2_ERROR_PARTIAL returned instead of
|
||||
PCRE2_ERROR_NOMATCH. In other words, PCRE2_PARTIAL_SOFT specifies that the
|
||||
caller is prepared to handle a partial match, but only if no complete match can
|
||||
be found.
|
||||
</P>
|
||||
<P>
|
||||
If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if
|
||||
|
@ -3846,7 +3850,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 25 June 2019
|
||||
Last updated: 20 July 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -45,7 +45,7 @@ as soon as a mistake is made, by beeping and not reflecting the character that
|
|||
has been typed, for example. This immediate feedback is likely to be a better
|
||||
user interface than a check that is delayed until the entire string has been
|
||||
entered. Partial matching can also be useful when the subject string is very
|
||||
long and is not all available at once.
|
||||
long and is not all available at once, as discussed below.
|
||||
</P>
|
||||
<P>
|
||||
PCRE2 supports partial matching by means of the PCRE2_PARTIAL_SOFT and
|
||||
|
@ -79,13 +79,18 @@ is also disabled for partial matching.
|
|||
<P>
|
||||
A partial match occurs during a call to <b>pcre2_match()</b> when the end of the
|
||||
subject string is reached successfully, but matching cannot continue because
|
||||
more characters are needed. However, at least one character in the subject must
|
||||
have been inspected. This character need not form part of the final matched
|
||||
string; lookbehind assertions and the \K escape sequence provide ways of
|
||||
inspecting characters before the start of a matched string. The requirement for
|
||||
inspecting at least one character exists because an empty string can always be
|
||||
matched; without such a restriction there would always be a partial match of an
|
||||
empty string at the end of the subject.
|
||||
more characters are needed, and in addition, either at least one character in
|
||||
the subject has been inspected or the pattern contains a lookbehind. An
|
||||
inspected character need not form part of the final matched string; lookbehind
|
||||
assertions and the \K escape sequence provide ways of inspecting characters
|
||||
before the start of a matched string.
|
||||
</P>
|
||||
<P>
|
||||
The two additional requirements define the cases where adding more characters
|
||||
to the existing subject may complete the match. Without these conditions there
|
||||
would be a partial match of an empty string at the end of the subject for all
|
||||
unanchored patterns (and also for anchored patterns if the subject itself is
|
||||
empty).
|
||||
</P>
|
||||
<P>
|
||||
When a partial match is returned, the first two elements in the ovector point
|
||||
|
@ -104,7 +109,7 @@ characters.
|
|||
</P>
|
||||
<P>
|
||||
What happens when a partial match is identified depends on which of the two
|
||||
partial matching options are set.
|
||||
partial matching options is set.
|
||||
</P>
|
||||
<br><b>
|
||||
PCRE2_PARTIAL_SOFT WITH pcre2_match()
|
||||
|
@ -128,12 +133,12 @@ the data that is returned. Consider this pattern:
|
|||
<pre>
|
||||
/123\w+X|dogY/
|
||||
</pre>
|
||||
If this is matched against the subject string "abc123dog", both
|
||||
alternatives fail to match, but the end of the subject is reached during
|
||||
matching, so PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9,
|
||||
identifying "123dog" as the first partial match that was found. (In this
|
||||
example, there are two partial matches, because "dog" on its own partially
|
||||
matches the second alternative.)
|
||||
If this is matched against the subject string "abc123dog", both alternatives
|
||||
fail to match, but the end of the subject is reached during matching, so
|
||||
PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9, identifying
|
||||
"123dog" as the first partial match that was found. (In this example, there are
|
||||
two partial matches, because "dog" on its own partially matches the second
|
||||
alternative.)
|
||||
</P>
|
||||
<br><b>
|
||||
PCRE2_PARTIAL_HARD WITH pcre2_match()
|
||||
|
@ -145,8 +150,8 @@ possible complete matches. This option is "hard" because it prefers an earlier
|
|||
partial match over a later complete match. For this reason, the assumption is
|
||||
made that the end of the supplied subject string may not be the true end of the
|
||||
available data, and so, if \z, \Z, \b, \B, or $ are encountered at the end
|
||||
of the subject, the result is PCRE2_ERROR_PARTIAL, provided that at least one
|
||||
character in the subject has been inspected.
|
||||
of the subject, the result is PCRE2_ERROR_PARTIAL, whether or not any
|
||||
characters have been inspected.
|
||||
</P>
|
||||
<br><b>
|
||||
Comparing hard and soft partial matching
|
||||
|
@ -346,44 +351,25 @@ string "xx123ab", the ovector offsets are 5 and 7 ("ab"). The maximum
|
|||
lookbehind count is 3, so all characters before offset 2 can be discarded. The
|
||||
value of <b>startoffset</b> for the next match should be 3. When <b>pcre2test</b>
|
||||
displays a partial match, it indicates the lookbehind characters with '<'
|
||||
characters:
|
||||
characters if the "allusedtext" modifier is set:
|
||||
<pre>
|
||||
re> "(?<=123)abc"
|
||||
data> xx123ab\=ph
|
||||
data> xx123ab\=ph,allusedtext
|
||||
Partial match: 123ab
|
||||
<<<
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
3. The maximum lookbehind count is also important when the result of a partial
|
||||
match attempt is "no match". In this case, the maximum lookbehind characters
|
||||
from the end of the current segment must be retained at the start of the next
|
||||
segment, in case the lookbehind is at the start of the pattern. Matching the
|
||||
next segment must then start at the appropriate offset.
|
||||
</P>
|
||||
<P>
|
||||
4. Because a partial match must always contain at least one character, what
|
||||
might be considered a partial match of an empty string actually gives a "no
|
||||
match" result. For example:
|
||||
<pre>
|
||||
re> /c(?<=abc)x/
|
||||
data> ab\=ps
|
||||
No match
|
||||
</pre>
|
||||
If the next segment begins "cx", a match should be found, but this will only
|
||||
happen if characters from the previous segment are retained. For this reason, a
|
||||
"no match" result should be interpreted as "partial match of an empty string"
|
||||
when the pattern contains lookbehinds.
|
||||
However, the "allusedtext" modifier is not available for JIT matching, because
|
||||
JIT matching does not maintain the first and last consulted characters.
|
||||
</P>
|
||||
<P>
|
||||
5. Matching a subject string that is split into multiple segments may not
|
||||
always produce exactly the same result as matching over one single long string,
|
||||
especially when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and
|
||||
Word Boundaries" above describes an issue that arises if the pattern ends with
|
||||
\b or \B. Another kind of difference may occur when there are multiple
|
||||
matching possibilities, because (for PCRE2_PARTIAL_SOFT) a partial match result
|
||||
is given only when there are no completed matches. This means that as soon as
|
||||
the shortest match has been found, continuation to a new subject segment is no
|
||||
3. Matching a subject string that is split into multiple segments may not
|
||||
always produce exactly the same result as matching over one single long string
|
||||
when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and Word
|
||||
Boundaries" above describes an issue that arises if the pattern ends with \b
|
||||
or \B. Another kind of difference may occur when there are multiple matching
|
||||
possibilities, because (for PCRE2_PARTIAL_SOFT) a partial match result is given
|
||||
only when there are no completed matches. This means that as soon as the
|
||||
shortest match has been found, continuation to a new subject segment is no
|
||||
longer possible. Consider this <b>pcre2test</b> example:
|
||||
<pre>
|
||||
re> /dog(sbody)?/
|
||||
|
@ -418,7 +404,7 @@ multi-segment data. The example above then behaves differently:
|
|||
data> gsb\=ph,dfa,dfa_restart
|
||||
Partial match: gsb
|
||||
</pre>
|
||||
6. Patterns that contain alternatives at the top level which do not all start
|
||||
4. Patterns that contain alternatives at the top level which do not all start
|
||||
with the same pattern item may not work as expected when PCRE2_DFA_RESTART is
|
||||
used. For example, consider this pattern:
|
||||
<pre>
|
||||
|
@ -463,7 +449,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 21 June 2019
|
||||
Last updated: 21 July 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
869
doc/pcre2.txt
869
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "25 June 2019" "PCRE2 10.34"
|
||||
.TH PCRE2API 3 "20 July 2019" "PCRE2 10.34"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -2719,12 +2719,15 @@ Your program may crash or loop indefinitely or give wrong results.
|
|||
.sp
|
||||
These options turn on the partial matching feature. A partial match occurs if
|
||||
the end of the subject string is reached successfully, but there are not enough
|
||||
subject characters to complete the match. If this happens when
|
||||
PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD) is set, matching continues by
|
||||
testing any remaining alternatives. Only if no complete match can be found is
|
||||
PCRE2_ERROR_PARTIAL returned instead of PCRE2_ERROR_NOMATCH. In other words,
|
||||
PCRE2_PARTIAL_SOFT specifies that the caller is prepared to handle a partial
|
||||
match, but only if no complete match can be found.
|
||||
subject characters to complete the match. In addition, either at least one
|
||||
character must have been inspected or the pattern must contain a lookbehind.
|
||||
.P
|
||||
If this situation arises when PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD)
|
||||
is set, matching continues by testing any remaining alternatives. Only if no
|
||||
complete match can be found is PCRE2_ERROR_PARTIAL returned instead of
|
||||
PCRE2_ERROR_NOMATCH. In other words, PCRE2_PARTIAL_SOFT specifies that the
|
||||
caller is prepared to handle a partial match, but only if no complete match can
|
||||
be found.
|
||||
.P
|
||||
If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if
|
||||
a partial match is found, \fBpcre2_match()\fP immediately returns
|
||||
|
@ -3859,6 +3862,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 25 June 2019
|
||||
Last updated: 20 July 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PARTIAL 3 "21 June 2019" "PCRE2 10.34"
|
||||
.TH PCRE2PARTIAL 3 "21 July 2019" "PCRE2 10.34"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions
|
||||
.SH "PARTIAL MATCHING IN PCRE2"
|
||||
|
@ -22,7 +22,7 @@ as soon as a mistake is made, by beeping and not reflecting the character that
|
|||
has been typed, for example. This immediate feedback is likely to be a better
|
||||
user interface than a check that is delayed until the entire string has been
|
||||
entered. Partial matching can also be useful when the subject string is very
|
||||
long and is not all available at once.
|
||||
long and is not all available at once, as discussed below.
|
||||
.P
|
||||
PCRE2 supports partial matching by means of the PCRE2_PARTIAL_SOFT and
|
||||
PCRE2_PARTIAL_HARD options, which can be set when calling a matching function.
|
||||
|
@ -55,13 +55,17 @@ is also disabled for partial matching.
|
|||
.sp
|
||||
A partial match occurs during a call to \fBpcre2_match()\fP when the end of the
|
||||
subject string is reached successfully, but matching cannot continue because
|
||||
more characters are needed. However, at least one character in the subject must
|
||||
have been inspected. This character need not form part of the final matched
|
||||
string; lookbehind assertions and the \eK escape sequence provide ways of
|
||||
inspecting characters before the start of a matched string. The requirement for
|
||||
inspecting at least one character exists because an empty string can always be
|
||||
matched; without such a restriction there would always be a partial match of an
|
||||
empty string at the end of the subject.
|
||||
more characters are needed, and in addition, either at least one character in
|
||||
the subject has been inspected or the pattern contains a lookbehind. An
|
||||
inspected character need not form part of the final matched string; lookbehind
|
||||
assertions and the \eK escape sequence provide ways of inspecting characters
|
||||
before the start of a matched string.
|
||||
.P
|
||||
The two additional requirements define the cases where adding more characters
|
||||
to the existing subject may complete the match. Without these conditions there
|
||||
would be a partial match of an empty string at the end of the subject for all
|
||||
unanchored patterns (and also for anchored patterns if the subject itself is
|
||||
empty).
|
||||
.P
|
||||
When a partial match is returned, the first two elements in the ovector point
|
||||
to the portion of the subject that was matched, but the values in the rest of
|
||||
|
@ -78,7 +82,7 @@ these characters are needed for a subsequent re-match with additional
|
|||
characters.
|
||||
.P
|
||||
What happens when a partial match is identified depends on which of the two
|
||||
partial matching options are set.
|
||||
partial matching options is set.
|
||||
.
|
||||
.
|
||||
.SS "PCRE2_PARTIAL_SOFT WITH pcre2_match()"
|
||||
|
@ -100,12 +104,12 @@ the data that is returned. Consider this pattern:
|
|||
.sp
|
||||
/123\ew+X|dogY/
|
||||
.sp
|
||||
If this is matched against the subject string "abc123dog", both
|
||||
alternatives fail to match, but the end of the subject is reached during
|
||||
matching, so PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9,
|
||||
identifying "123dog" as the first partial match that was found. (In this
|
||||
example, there are two partial matches, because "dog" on its own partially
|
||||
matches the second alternative.)
|
||||
If this is matched against the subject string "abc123dog", both alternatives
|
||||
fail to match, but the end of the subject is reached during matching, so
|
||||
PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9, identifying
|
||||
"123dog" as the first partial match that was found. (In this example, there are
|
||||
two partial matches, because "dog" on its own partially matches the second
|
||||
alternative.)
|
||||
.
|
||||
.
|
||||
.SS "PCRE2_PARTIAL_HARD WITH pcre2_match()"
|
||||
|
@ -117,8 +121,8 @@ possible complete matches. This option is "hard" because it prefers an earlier
|
|||
partial match over a later complete match. For this reason, the assumption is
|
||||
made that the end of the supplied subject string may not be the true end of the
|
||||
available data, and so, if \ez, \eZ, \eb, \eB, or $ are encountered at the end
|
||||
of the subject, the result is PCRE2_ERROR_PARTIAL, provided that at least one
|
||||
character in the subject has been inspected.
|
||||
of the subject, the result is PCRE2_ERROR_PARTIAL, whether or not any
|
||||
characters have been inspected.
|
||||
.
|
||||
.
|
||||
.SS "Comparing hard and soft partial matching"
|
||||
|
@ -319,40 +323,23 @@ string "xx123ab", the ovector offsets are 5 and 7 ("ab"). The maximum
|
|||
lookbehind count is 3, so all characters before offset 2 can be discarded. The
|
||||
value of \fBstartoffset\fP for the next match should be 3. When \fBpcre2test\fP
|
||||
displays a partial match, it indicates the lookbehind characters with '<'
|
||||
characters:
|
||||
characters if the "allusedtext" modifier is set:
|
||||
.sp
|
||||
re> "(?<=123)abc"
|
||||
data> xx123ab\e=ph
|
||||
data> xx123ab\e=ph,allusedtext
|
||||
Partial match: 123ab
|
||||
<<<
|
||||
However, the "allusedtext" modifier is not available for JIT matching, because
|
||||
JIT matching does not maintain the first and last consulted characters.
|
||||
.P
|
||||
3. The maximum lookbehind count is also important when the result of a partial
|
||||
match attempt is "no match". In this case, the maximum lookbehind characters
|
||||
from the end of the current segment must be retained at the start of the next
|
||||
segment, in case the lookbehind is at the start of the pattern. Matching the
|
||||
next segment must then start at the appropriate offset.
|
||||
.P
|
||||
4. Because a partial match must always contain at least one character, what
|
||||
might be considered a partial match of an empty string actually gives a "no
|
||||
match" result. For example:
|
||||
.sp
|
||||
re> /c(?<=abc)x/
|
||||
data> ab\e=ps
|
||||
No match
|
||||
.sp
|
||||
If the next segment begins "cx", a match should be found, but this will only
|
||||
happen if characters from the previous segment are retained. For this reason, a
|
||||
"no match" result should be interpreted as "partial match of an empty string"
|
||||
when the pattern contains lookbehinds.
|
||||
.P
|
||||
5. Matching a subject string that is split into multiple segments may not
|
||||
always produce exactly the same result as matching over one single long string,
|
||||
especially when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and
|
||||
Word Boundaries" above describes an issue that arises if the pattern ends with
|
||||
\eb or \eB. Another kind of difference may occur when there are multiple
|
||||
matching possibilities, because (for PCRE2_PARTIAL_SOFT) a partial match result
|
||||
is given only when there are no completed matches. This means that as soon as
|
||||
the shortest match has been found, continuation to a new subject segment is no
|
||||
3. Matching a subject string that is split into multiple segments may not
|
||||
always produce exactly the same result as matching over one single long string
|
||||
when PCRE2_PARTIAL_SOFT is used. The section "Partial Matching and Word
|
||||
Boundaries" above describes an issue that arises if the pattern ends with \eb
|
||||
or \eB. Another kind of difference may occur when there are multiple matching
|
||||
possibilities, because (for PCRE2_PARTIAL_SOFT) a partial match result is given
|
||||
only when there are no completed matches. This means that as soon as the
|
||||
shortest match has been found, continuation to a new subject segment is no
|
||||
longer possible. Consider this \fBpcre2test\fP example:
|
||||
.sp
|
||||
re> /dog(sbody)?/
|
||||
|
@ -386,7 +373,7 @@ multi-segment data. The example above then behaves differently:
|
|||
data> gsb\e=ph,dfa,dfa_restart
|
||||
Partial match: gsb
|
||||
.sp
|
||||
6. Patterns that contain alternatives at the top level which do not all start
|
||||
4. Patterns that contain alternatives at the top level which do not all start
|
||||
with the same pattern item may not work as expected when PCRE2_DFA_RESTART is
|
||||
used. For example, consider this pattern:
|
||||
.sp
|
||||
|
@ -435,6 +422,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 21 June 2019
|
||||
Last updated: 21 July 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -174,7 +174,7 @@ static const uint8_t coptable[] = {
|
|||
0, /* Assert behind */
|
||||
0, /* Assert behind not */
|
||||
0, /* NA assert */
|
||||
0, /* NA assert behind */
|
||||
0, /* NA assert behind */
|
||||
0, /* ONCE */
|
||||
0, /* SCRIPT_RUN */
|
||||
0, 0, 0, 0, 0, /* BRA, BRAPOS, CBRA, CBRAPOS, COND */
|
||||
|
@ -251,7 +251,7 @@ static const uint8_t poptable[] = {
|
|||
0, /* Assert behind */
|
||||
0, /* Assert behind not */
|
||||
0, /* NA assert */
|
||||
0, /* NA assert behind */
|
||||
0, /* NA assert behind */
|
||||
0, /* ONCE */
|
||||
0, /* SCRIPT_RUN */
|
||||
0, 0, 0, 0, 0, /* BRA, BRAPOS, CBRA, CBRAPOS, COND */
|
||||
|
@ -966,7 +966,7 @@ for (;;)
|
|||
if (ptr >= end_subject)
|
||||
{
|
||||
if ((mb->moptions & PCRE2_PARTIAL_HARD) != 0)
|
||||
could_continue = TRUE;
|
||||
return PCRE2_ERROR_PARTIAL;
|
||||
else { ADD_ACTIVE(state_offset + 1, 0); }
|
||||
}
|
||||
break;
|
||||
|
@ -1015,10 +1015,12 @@ for (;;)
|
|||
|
||||
/*-----------------------------------------------------------------*/
|
||||
case OP_EODN:
|
||||
if (clen == 0 && (mb->moptions & PCRE2_PARTIAL_HARD) != 0)
|
||||
could_continue = TRUE;
|
||||
else if (clen == 0 || (IS_NEWLINE(ptr) && ptr == end_subject - mb->nllen))
|
||||
{ ADD_ACTIVE(state_offset + 1, 0); }
|
||||
if (clen == 0 || (IS_NEWLINE(ptr) && ptr == end_subject - mb->nllen))
|
||||
{
|
||||
if ((mb->moptions & PCRE2_PARTIAL_HARD) != 0)
|
||||
return PCRE2_ERROR_PARTIAL;
|
||||
ADD_ACTIVE(state_offset + 1, 0);
|
||||
}
|
||||
break;
|
||||
|
||||
/*-----------------------------------------------------------------*/
|
||||
|
@ -3175,15 +3177,18 @@ for (;;)
|
|||
(mb->moptions & PCRE2_PARTIAL_HARD) != 0 /* Hard partial */
|
||||
|| /* or... */
|
||||
((mb->moptions & PCRE2_PARTIAL_SOFT) != 0 && /* Soft partial and */
|
||||
match_count < 0) /* no matches */
|
||||
match_count < 0) /* no matches */
|
||||
) && /* And... */
|
||||
(
|
||||
partial_newline || /* Either partial NL */
|
||||
( /* or ... */
|
||||
ptr >= end_subject && /* End of subject and */
|
||||
ptr > mb->start_used_ptr) /* Inspected non-empty string */
|
||||
partial_newline || /* Either partial NL */
|
||||
( /* or ... */
|
||||
ptr >= end_subject && /* End of subject and */
|
||||
( /* either */
|
||||
ptr > mb->start_used_ptr || /* Inspected non-empty string */
|
||||
mb->haslookbehind /* or pattern has lookbehind */
|
||||
)
|
||||
)
|
||||
)
|
||||
))
|
||||
match_count = PCRE2_ERROR_PARTIAL;
|
||||
break; /* Exit from loop along the subject string */
|
||||
}
|
||||
|
@ -3412,6 +3417,7 @@ mb->tables = re->tables;
|
|||
mb->start_subject = subject;
|
||||
mb->end_subject = end_subject;
|
||||
mb->start_offset = start_offset;
|
||||
mb->haslookbehind = (re->max_lookbehind > 0);
|
||||
mb->moptions = options;
|
||||
mb->poptions = re->overall_options;
|
||||
mb->match_call_count = 0;
|
||||
|
|
|
@ -854,6 +854,7 @@ typedef struct match_block {
|
|||
uint32_t match_call_count; /* Number of times a new frame is created */
|
||||
BOOL hitend; /* Hit the end of the subject at some point */
|
||||
BOOL hasthen; /* Pattern contains (*THEN) */
|
||||
BOOL haslookbehind; /* Pattern contains sigificant lookbehind */
|
||||
const uint8_t *lcc; /* Points to lower casing table */
|
||||
const uint8_t *fcc; /* Points to case-flipping table */
|
||||
const uint8_t *ctypes; /* Points to table of type maps */
|
||||
|
@ -909,6 +910,7 @@ typedef struct dfa_match_block {
|
|||
uint32_t poptions; /* Pattern options */
|
||||
uint32_t nltype; /* Newline type */
|
||||
uint32_t nllen; /* Newline string length */
|
||||
BOOL haslookbehind; /* Pattern contains significant lookbehind */
|
||||
PCRE2_UCHAR nl[4]; /* Newline string when fixed */
|
||||
uint16_t bsr_convention; /* \R interpretation */
|
||||
pcre2_callout_block *cb; /* Points to a callout block */
|
||||
|
|
|
@ -415,8 +415,7 @@ if (caseless)
|
|||
else
|
||||
#endif
|
||||
|
||||
/* Not in UTF mode */
|
||||
|
||||
/* Not in UTF mode */
|
||||
{
|
||||
for (; length > 0; length--)
|
||||
{
|
||||
|
@ -491,11 +490,16 @@ heap is used for a larger vector.
|
|||
*************************************************/
|
||||
|
||||
/* These macros pack up tests that are used for partial matching several times
|
||||
in the code. We set the "hit end" flag if the pointer is at the end of the
|
||||
subject and also past the earliest inspected character (i.e. something has been
|
||||
matched, even if not part of the actual matched string). For hard partial
|
||||
matching, we then return immediately. The second one is used when we already
|
||||
know we are past the end of the subject. */
|
||||
in the code. The second one is used when we already know we are past the end of
|
||||
the subject. We set the "hit end" flag if the pointer is at the end of the
|
||||
subject and either (a) the pointer is past the earliest inspected character
|
||||
(i.e. something has been matched, even if not part of the actual matched
|
||||
string), or (b) the pattern contains a lookbehind. These are the conditions for
|
||||
which adding more characters may allow the current match to continue.
|
||||
|
||||
For hard partial matching, we immediately return a partial match. Otherwise,
|
||||
carrying on means that a complete match on the current subject will be sought.
|
||||
A partial match is returned only if no complete match can be found. */
|
||||
|
||||
#define CHECK_PARTIAL()\
|
||||
if (Feptr >= mb->end_subject) \
|
||||
|
@ -503,31 +507,13 @@ know we are past the end of the subject. */
|
|||
SCHECK_PARTIAL(); \
|
||||
}
|
||||
|
||||
/* Original version that allows hard partial to continue if no inspected
|
||||
characters. */
|
||||
|
||||
#define SCHECK_PARTIAL()\
|
||||
if (mb->partial != 0 && Feptr > mb->start_used_ptr) \
|
||||
if (mb->partial != 0 && (Feptr > mb->start_used_ptr || mb->haslookbehind)) \
|
||||
{ \
|
||||
mb->hitend = TRUE; \
|
||||
if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; \
|
||||
}
|
||||
|
||||
/* Experimental version that makes hard partial give no match instead of
|
||||
continuing if no characters have been inspected. */
|
||||
|
||||
#ifdef NEVERNEVER
|
||||
#define SCHECK_PARTIAL()\
|
||||
if (mb->partial != 0) \
|
||||
{ \
|
||||
if (Feptr > mb->start_used_ptr) \
|
||||
{ \
|
||||
mb->hitend = TRUE; \
|
||||
if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; \
|
||||
} \
|
||||
else if (mb->partial > 1) RRETURN(MATCH_NOMATCH); \
|
||||
}
|
||||
#endif /* NEVERNEVER */
|
||||
|
||||
/* These macros are used to implement backtracking. They simulate a recursive
|
||||
call to the match() function by means of a local vector of frames which
|
||||
|
@ -5670,7 +5656,11 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
|||
|
||||
case OP_EOD:
|
||||
if (Feptr < mb->end_subject) RRETURN(MATCH_NOMATCH);
|
||||
SCHECK_PARTIAL();
|
||||
if (mb->partial != 0)
|
||||
{
|
||||
mb->hitend = TRUE;
|
||||
if (mb->partial > 1) return PCRE2_ERROR_PARTIAL;
|
||||
}
|
||||
Fecode++;
|
||||
break;
|
||||
|
||||
|
@ -5695,7 +5685,11 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
|||
|
||||
/* Either at end of string or \n before end. */
|
||||
|
||||
SCHECK_PARTIAL();
|
||||
if (mb->partial != 0)
|
||||
{
|
||||
mb->hitend = TRUE;
|
||||
if (mb->partial > 1) return PCRE2_ERROR_PARTIAL;
|
||||
}
|
||||
Fecode++;
|
||||
break;
|
||||
|
||||
|
@ -6457,9 +6451,10 @@ mb->start_subject = subject;
|
|||
mb->start_offset = start_offset;
|
||||
mb->end_subject = end_subject;
|
||||
mb->hasthen = (re->flags & PCRE2_HASTHEN) != 0;
|
||||
mb->poptions = re->overall_options; /* Pattern options */
|
||||
mb->haslookbehind = (re->max_lookbehind > 0);
|
||||
mb->poptions = re->overall_options; /* Pattern options */
|
||||
mb->ignore_skip_arg = 0;
|
||||
mb->mark = mb->nomatch_mark = NULL; /* In case never set */
|
||||
mb->mark = mb->nomatch_mark = NULL; /* In case never set */
|
||||
|
||||
/* The name table is needed for finding all the numbers associated with a
|
||||
given name, for condition testing. The code follows the name table. */
|
||||
|
|
|
@ -5690,10 +5690,33 @@ a)"xI
|
|||
|
||||
# ----
|
||||
|
||||
/(?<=(?=.(?<=x)))/
|
||||
ab\=ph
|
||||
|
||||
# Expect error (recursion => not fixed length)
|
||||
/(\2)((?=(?<=\1)))/
|
||||
|
||||
/c*+(?<=[bc])/
|
||||
abc\=ph,no_jit
|
||||
ab\=ph,no_jit
|
||||
abc\=ps,no_jit
|
||||
ab\=ps,no_jit
|
||||
|
||||
/c++(?<=[bc])/
|
||||
abc\=ph,no_jit
|
||||
ab\=ph,no_jit
|
||||
|
||||
/(?<=(?=.(?<=x)))/
|
||||
abx
|
||||
ab\=ph,no_jit
|
||||
bxyz
|
||||
xyz
|
||||
|
||||
/\z/
|
||||
abc\=ph,no_jit
|
||||
abc\=ps
|
||||
|
||||
/\Z/
|
||||
abc\=ph,no_jit
|
||||
abc\=ps
|
||||
abc\n\=ph,no_jit
|
||||
abc\n\=ps
|
||||
|
||||
# End of testinput2
|
||||
|
|
|
@ -4994,4 +4994,30 @@
|
|||
ab\=ps
|
||||
abcx
|
||||
|
||||
/\z/
|
||||
abc\=ph
|
||||
abc\=ps
|
||||
|
||||
/\Z/
|
||||
abc\=ph
|
||||
abc\=ps
|
||||
abc\n\=ph
|
||||
abc\n\=ps
|
||||
|
||||
/c*+(?<=[bc])/
|
||||
abc\=ph
|
||||
ab\=ph
|
||||
abc\=ps
|
||||
ab\=ps
|
||||
|
||||
/c++(?<=[bc])/
|
||||
abc\=ph
|
||||
ab\=ph
|
||||
|
||||
/(?<=(?=.(?<=x)))/
|
||||
abx
|
||||
ab\=ph
|
||||
bxyz
|
||||
xyz
|
||||
|
||||
# End of testinput6
|
||||
|
|
|
@ -17185,14 +17185,52 @@ Subject length lower bound = 1
|
|||
|
||||
# ----
|
||||
|
||||
/(?<=(?=.(?<=x)))/
|
||||
ab\=ph
|
||||
No match
|
||||
|
||||
# Expect error (recursion => not fixed length)
|
||||
/(\2)((?=(?<=\1)))/
|
||||
Failed: error 125 at offset 8: lookbehind assertion is not fixed length
|
||||
|
||||
/c*+(?<=[bc])/
|
||||
abc\=ph,no_jit
|
||||
Partial match: c
|
||||
ab\=ph,no_jit
|
||||
Partial match:
|
||||
abc\=ps,no_jit
|
||||
0: c
|
||||
ab\=ps,no_jit
|
||||
0:
|
||||
|
||||
/c++(?<=[bc])/
|
||||
abc\=ph,no_jit
|
||||
Partial match: c
|
||||
ab\=ph,no_jit
|
||||
Partial match:
|
||||
|
||||
/(?<=(?=.(?<=x)))/
|
||||
abx
|
||||
0:
|
||||
ab\=ph,no_jit
|
||||
Partial match:
|
||||
bxyz
|
||||
0:
|
||||
xyz
|
||||
0:
|
||||
|
||||
/\z/
|
||||
abc\=ph,no_jit
|
||||
Partial match:
|
||||
abc\=ps
|
||||
0:
|
||||
|
||||
/\Z/
|
||||
abc\=ph,no_jit
|
||||
Partial match:
|
||||
abc\=ps
|
||||
0:
|
||||
abc\n\=ph,no_jit
|
||||
Partial match: \x0a
|
||||
abc\n\=ps
|
||||
0:
|
||||
|
||||
# End of testinput2
|
||||
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||
Error -62: bad serialized data
|
||||
|
|
|
@ -7845,4 +7845,46 @@ Partial match: ab
|
|||
abcx
|
||||
0: abcx
|
||||
|
||||
/\z/
|
||||
abc\=ph
|
||||
Partial match:
|
||||
abc\=ps
|
||||
0:
|
||||
|
||||
/\Z/
|
||||
abc\=ph
|
||||
Partial match:
|
||||
abc\=ps
|
||||
0:
|
||||
abc\n\=ph
|
||||
Partial match: \x0a
|
||||
abc\n\=ps
|
||||
0:
|
||||
|
||||
/c*+(?<=[bc])/
|
||||
abc\=ph
|
||||
Partial match: c
|
||||
ab\=ph
|
||||
Partial match:
|
||||
abc\=ps
|
||||
0: c
|
||||
ab\=ps
|
||||
0:
|
||||
|
||||
/c++(?<=[bc])/
|
||||
abc\=ph
|
||||
Partial match: c
|
||||
ab\=ph
|
||||
Partial match:
|
||||
|
||||
/(?<=(?=.(?<=x)))/
|
||||
abx
|
||||
0:
|
||||
ab\=ph
|
||||
Partial match:
|
||||
bxyz
|
||||
0:
|
||||
xyz
|
||||
0:
|
||||
|
||||
# End of testinput6
|
||||
|
|
Loading…
Reference in New Issue