Make pcre2test show actual pre-match consulted characters for a partial match,
not the length of the longest lookbehind. Control this by "allusedtext".
This commit is contained in:
parent
d21f7daf9b
commit
434e3f7468
|
@ -71,6 +71,14 @@ lookbehind value. For example /(?<=a(?<=ba)c)/ previously set a maximum
|
|||
lookbehind of 2, because that is the largest individual lookbehind. Now it sets
|
||||
it to 3, because matching looks back 3 characters.
|
||||
|
||||
14. For partial matches, pcre2test was always showing the maximum lookbehind
|
||||
characters, flagged with "<", which is misleading when the lookbehind didn't
|
||||
actually look behind the start (because it was later in the pattern). Showing
|
||||
all consulted preceding characters for partial matches is now controlled by the
|
||||
existing "allusedtext" modifier and, as for complete matches, this facility is
|
||||
available only for non-JIT matching, because JIT does not maintain the first
|
||||
and last consulted characters.
|
||||
|
||||
|
||||
Version 10.33 16-April-2019
|
||||
---------------------------
|
||||
|
|
|
@ -1252,22 +1252,27 @@ following line with a plus character following the capture number.
|
|||
</P>
|
||||
<P>
|
||||
The <b>allusedtext</b> modifier requests that all the text that was consulted
|
||||
during a successful pattern match by the interpreter should be shown. This
|
||||
feature is not supported for JIT matching, and if requested with JIT it is
|
||||
ignored (with a warning message). Setting this modifier affects the output if
|
||||
there is a lookbehind at the start of a match, or a lookahead at the end, or if
|
||||
\K is used in the pattern. Characters that precede or follow the start and end
|
||||
of the actual match are indicated in the output by '<' or '>' characters
|
||||
underneath them. Here is an example:
|
||||
during a successful pattern match by the interpreter should be shown, for both
|
||||
full and partial matches. This feature is not supported for JIT matching, and
|
||||
if requested with JIT it is ignored (with a warning message). Setting this
|
||||
modifier affects the output if there is a lookbehind at the start of a match,
|
||||
or, for a complete match, a lookahead at the end, or if \K is used in the
|
||||
pattern. Characters that precede or follow the start and end of the actual
|
||||
match are indicated in the output by '<' or '>' characters underneath them.
|
||||
Here is an example:
|
||||
<pre>
|
||||
re> /(?<=pqr)abc(?=xyz)/
|
||||
data> 123pqrabcxyz456\=allusedtext
|
||||
0: pqrabcxyz
|
||||
<<< >>>
|
||||
data> 123pqrabcxy\=ph,allusedtext
|
||||
Partial match: pqrabcxy
|
||||
<<<
|
||||
</pre>
|
||||
This shows that the matched string is "abc", with the preceding and following
|
||||
strings "pqr" and "xyz" having been consulted during the match (when processing
|
||||
the assertions).
|
||||
The first, complete match shows that the matched string is "abc", with the
|
||||
preceding and following strings "pqr" and "xyz" having been consulted during
|
||||
the match (when processing the assertions). The partial match can indicate only
|
||||
the preceding string.
|
||||
</P>
|
||||
<P>
|
||||
The <b>startchar</b> modifier requests that the starting character for the match
|
||||
|
@ -2081,7 +2086,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 20 June 2019
|
||||
Last updated: 26 June 2019
|
||||
<br>
|
||||
Copyright © 1997-2019 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2TEST 1 "20 June 2019" "PCRE 10.34"
|
||||
.TH PCRE2TEST 1 "26 June 2019" "PCRE 10.34"
|
||||
.SH NAME
|
||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -1220,22 +1220,27 @@ well as the main matched substring. In each case the remainder is output on the
|
|||
following line with a plus character following the capture number.
|
||||
.P
|
||||
The \fBallusedtext\fP modifier requests that all the text that was consulted
|
||||
during a successful pattern match by the interpreter should be shown. This
|
||||
feature is not supported for JIT matching, and if requested with JIT it is
|
||||
ignored (with a warning message). Setting this modifier affects the output if
|
||||
there is a lookbehind at the start of a match, or a lookahead at the end, or if
|
||||
\eK is used in the pattern. Characters that precede or follow the start and end
|
||||
of the actual match are indicated in the output by '<' or '>' characters
|
||||
underneath them. Here is an example:
|
||||
during a successful pattern match by the interpreter should be shown, for both
|
||||
full and partial matches. This feature is not supported for JIT matching, and
|
||||
if requested with JIT it is ignored (with a warning message). Setting this
|
||||
modifier affects the output if there is a lookbehind at the start of a match,
|
||||
or, for a complete match, a lookahead at the end, or if \eK is used in the
|
||||
pattern. Characters that precede or follow the start and end of the actual
|
||||
match are indicated in the output by '<' or '>' characters underneath them.
|
||||
Here is an example:
|
||||
.sp
|
||||
re> /(?<=pqr)abc(?=xyz)/
|
||||
data> 123pqrabcxyz456\e=allusedtext
|
||||
0: pqrabcxyz
|
||||
<<< >>>
|
||||
data> 123pqrabcxy\e=ph,allusedtext
|
||||
Partial match: pqrabcxy
|
||||
<<<
|
||||
.sp
|
||||
This shows that the matched string is "abc", with the preceding and following
|
||||
strings "pqr" and "xyz" having been consulted during the match (when processing
|
||||
the assertions).
|
||||
The first, complete match shows that the matched string is "abc", with the
|
||||
preceding and following strings "pqr" and "xyz" having been consulted during
|
||||
the match (when processing the assertions). The partial match can indicate only
|
||||
the preceding string.
|
||||
.P
|
||||
The \fBstartchar\fP modifier requests that the starting character for the match
|
||||
be indicated, if it is different to the start of the matched string. The only
|
||||
|
@ -2062,6 +2067,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 20 June 2019
|
||||
Last updated: 26 June 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1122,23 +1122,27 @@ SUBJECT MODIFIERS
|
|||
capture number.
|
||||
|
||||
The allusedtext modifier requests that all the text that was consulted
|
||||
during a successful pattern match by the interpreter should be shown.
|
||||
This feature is not supported for JIT matching, and if requested with
|
||||
JIT it is ignored (with a warning message). Setting this modifier af-
|
||||
fects the output if there is a lookbehind at the start of a match, or a
|
||||
lookahead at the end, or if \K is used in the pattern. Characters that
|
||||
precede or follow the start and end of the actual match are indicated
|
||||
in the output by '<' or '>' characters underneath them. Here is an ex-
|
||||
ample:
|
||||
during a successful pattern match by the interpreter should be shown,
|
||||
for both full and partial matches. This feature is not supported for
|
||||
JIT matching, and if requested with JIT it is ignored (with a warning
|
||||
message). Setting this modifier affects the output if there is a look-
|
||||
behind at the start of a match, or, for a complete match, a lookahead
|
||||
at the end, or if \K is used in the pattern. Characters that precede or
|
||||
follow the start and end of the actual match are indicated in the out-
|
||||
put by '<' or '>' characters underneath them. Here is an example:
|
||||
|
||||
re> /(?<=pqr)abc(?=xyz)/
|
||||
data> 123pqrabcxyz456\=allusedtext
|
||||
0: pqrabcxyz
|
||||
<<< >>>
|
||||
data> 123pqrabcxy\=ph,allusedtext
|
||||
Partial match: pqrabcxy
|
||||
<<<
|
||||
|
||||
This shows that the matched string is "abc", with the preceding and
|
||||
following strings "pqr" and "xyz" having been consulted during the
|
||||
match (when processing the assertions).
|
||||
The first, complete match shows that the matched string is "abc", with
|
||||
the preceding and following strings "pqr" and "xyz" having been con-
|
||||
sulted during the match (when processing the assertions). The partial
|
||||
match can indicate only the preceding string.
|
||||
|
||||
The startchar modifier requests that the starting character for the
|
||||
match be indicated, if it is different to the start of the matched
|
||||
|
@ -1893,5 +1897,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 20 June 2019
|
||||
Last updated: 26 June 2019
|
||||
Copyright (c) 1997-2019 University of Cambridge.
|
||||
|
|
|
@ -7761,14 +7761,22 @@ for (gmatched = 0;; gmatched++)
|
|||
} /* End of handling a successful match */
|
||||
|
||||
/* There was a partial match. The value of ovector[0] is the bumpalong point,
|
||||
that is, startchar, not any \K point that might have been passed. */
|
||||
that is, startchar, not any \K point that might have been passed. When JIT is
|
||||
not in use, "allusedtext" may be set, in which case we indicate the leftmost
|
||||
consulted character. */
|
||||
|
||||
else if (capcount == PCRE2_ERROR_PARTIAL)
|
||||
{
|
||||
PCRE2_SIZE poffset;
|
||||
PCRE2_SIZE leftchar;
|
||||
int backlength;
|
||||
int rubriclength = 0;
|
||||
|
||||
if ((dat_datctl.control & CTL_ALLUSEDTEXT) != 0)
|
||||
{
|
||||
leftchar = FLD(match_data, leftchar);
|
||||
}
|
||||
else leftchar = ovector[0];
|
||||
|
||||
fprintf(outfile, "Partial match");
|
||||
if ((dat_datctl.control & CTL_MARK) != 0 &&
|
||||
TESTFLD(match_data, mark, !=, NULL))
|
||||
|
@ -7781,8 +7789,7 @@ for (gmatched = 0;; gmatched++)
|
|||
fprintf(outfile, ": ");
|
||||
rubriclength += 15;
|
||||
|
||||
poffset = backchars(pp, ovector[0], maxlookbehind, utf);
|
||||
PCHARS(backlength, pp, poffset, ovector[0] - poffset, utf, outfile);
|
||||
PCHARS(backlength, pp, leftchar, ovector[0] - leftchar, utf, outfile);
|
||||
PCHARSV(pp, ovector[0], ulen - ovector[0], utf, outfile);
|
||||
|
||||
if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used)
|
||||
|
|
|
@ -122,6 +122,44 @@
|
|||
/abc(?=abcde)(?=ab)/allusedtext
|
||||
abcabcdefg
|
||||
|
||||
#subject allusedtext
|
||||
|
||||
/(?<=abc)123/
|
||||
xyzabc123pqr
|
||||
xyzabc12\=ps
|
||||
xyzabc12\=ph
|
||||
|
||||
/\babc\b/
|
||||
+++abc+++
|
||||
+++ab\=ps
|
||||
+++ab\=ph
|
||||
|
||||
/(?<=abc)def/
|
||||
abc\=ph
|
||||
|
||||
/(?<=123)(*MARK:xx)abc/mark
|
||||
xxxx123a\=ph
|
||||
xxxx123a\=ps
|
||||
|
||||
/(?<=(?<=a)b)c.*/I
|
||||
abc\=ph
|
||||
\= Expect no match
|
||||
xbc\=ph
|
||||
|
||||
/(?<=ab)c.*/I
|
||||
abc\=ph
|
||||
\= Expect no match
|
||||
xbc\=ph
|
||||
|
||||
/abc(?<=bc)def/
|
||||
xxxabcd\=ph
|
||||
|
||||
/(?<=ab)cdef/
|
||||
xxabcd\=ph
|
||||
|
||||
#subject
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
# These tests provoke recursion loops, which give a different error message
|
||||
# when JIT is used.
|
||||
|
||||
|
|
|
@ -486,7 +486,7 @@
|
|||
def\=dfa_restart
|
||||
|
||||
/(?<=foo)bar/
|
||||
foob\=ps,offset=2
|
||||
foob\=ps,offset=2,allusedtext
|
||||
foobar...\=ps,dfa_restart,offset=4
|
||||
foobar\=offset=2
|
||||
\= Expect no match
|
||||
|
@ -4415,12 +4415,12 @@
|
|||
/abc\K123/
|
||||
xyzabc123pqr
|
||||
|
||||
/(?<=abc)123/
|
||||
/(?<=abc)123/allusedtext
|
||||
xyzabc123pqr
|
||||
xyzabc12\=ps
|
||||
xyzabc12\=ph
|
||||
|
||||
/\babc\b/
|
||||
/\babc\b/allusedtext
|
||||
+++abc+++
|
||||
+++ab\=ps
|
||||
+++ab\=ph
|
||||
|
@ -4490,7 +4490,7 @@
|
|||
/^(?(?!a(*SKIP)b))/
|
||||
ac
|
||||
|
||||
/(?<=abc)def/
|
||||
/(?<=abc)def/allusedtext
|
||||
abc\=ph
|
||||
|
||||
/abc$/
|
||||
|
|
|
@ -265,6 +265,79 @@ Failed: error -52: nested recursion at the same subject position
|
|||
0: abcabcde
|
||||
>>>>>
|
||||
|
||||
#subject allusedtext
|
||||
|
||||
/(?<=abc)123/
|
||||
xyzabc123pqr
|
||||
0: abc123
|
||||
<<<
|
||||
xyzabc12\=ps
|
||||
Partial match: abc12
|
||||
<<<
|
||||
xyzabc12\=ph
|
||||
Partial match: abc12
|
||||
<<<
|
||||
|
||||
/\babc\b/
|
||||
+++abc+++
|
||||
0: +abc+
|
||||
< >
|
||||
+++ab\=ps
|
||||
Partial match: +ab
|
||||
<
|
||||
+++ab\=ph
|
||||
Partial match: +ab
|
||||
<
|
||||
|
||||
/(?<=abc)def/
|
||||
abc\=ph
|
||||
Partial match: abc
|
||||
<<<
|
||||
|
||||
/(?<=123)(*MARK:xx)abc/mark
|
||||
xxxx123a\=ph
|
||||
Partial match, mark=xx: 123a
|
||||
<<<
|
||||
xxxx123a\=ps
|
||||
Partial match, mark=xx: 123a
|
||||
<<<
|
||||
|
||||
/(?<=(?<=a)b)c.*/I
|
||||
Capture group count = 0
|
||||
Max lookbehind = 2
|
||||
First code unit = 'c'
|
||||
Subject length lower bound = 1
|
||||
abc\=ph
|
||||
Partial match: abc
|
||||
<<
|
||||
\= Expect no match
|
||||
xbc\=ph
|
||||
No match
|
||||
|
||||
/(?<=ab)c.*/I
|
||||
Capture group count = 0
|
||||
Max lookbehind = 2
|
||||
First code unit = 'c'
|
||||
Subject length lower bound = 1
|
||||
abc\=ph
|
||||
Partial match: abc
|
||||
<<
|
||||
\= Expect no match
|
||||
xbc\=ph
|
||||
No match
|
||||
|
||||
/abc(?<=bc)def/
|
||||
xxxabcd\=ph
|
||||
Partial match: abcd
|
||||
|
||||
/(?<=ab)cdef/
|
||||
xxabcd\=ph
|
||||
Partial match: abcd
|
||||
<<
|
||||
|
||||
#subject
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
# These tests provoke recursion loops, which give a different error message
|
||||
# when JIT is used.
|
||||
|
||||
|
|
|
@ -9369,21 +9369,17 @@ Partial match: abc12
|
|||
xyzabc123pqr
|
||||
0: 123
|
||||
xyzabc12\=ps
|
||||
Partial match: abc12
|
||||
<<<
|
||||
Partial match: 12
|
||||
xyzabc12\=ph
|
||||
Partial match: abc12
|
||||
<<<
|
||||
Partial match: 12
|
||||
|
||||
/\babc\b/
|
||||
+++abc+++
|
||||
0: abc
|
||||
+++ab\=ps
|
||||
Partial match: +ab
|
||||
<
|
||||
Partial match: ab
|
||||
+++ab\=ph
|
||||
Partial match: +ab
|
||||
<
|
||||
Partial match: ab
|
||||
|
||||
/(?&word)(?&element)(?(DEFINE)(?<element><[^m][^>]>[^<])(?<word>\w*+))/B
|
||||
------------------------------------------------------------------
|
||||
|
@ -10401,8 +10397,7 @@ No match
|
|||
|
||||
/(?<=abc)def/
|
||||
abc\=ph
|
||||
Partial match: abc
|
||||
<<<
|
||||
Partial match:
|
||||
|
||||
/abc$/
|
||||
abc
|
||||
|
@ -11959,11 +11954,9 @@ Callout 2: last capture = 0
|
|||
|
||||
/(?<=123)(*MARK:xx)abc/mark
|
||||
xxxx123a\=ph
|
||||
Partial match, mark=xx: 123a
|
||||
<<<
|
||||
Partial match, mark=xx: a
|
||||
xxxx123a\=ps
|
||||
Partial match, mark=xx: 123a
|
||||
<<<
|
||||
Partial match, mark=xx: a
|
||||
|
||||
/123\Kabc/startchar
|
||||
xxxx123a\=ph
|
||||
|
@ -17045,8 +17038,7 @@ Max lookbehind = 2
|
|||
First code unit = 'c'
|
||||
Subject length lower bound = 1
|
||||
abc\=ph
|
||||
Partial match: abc
|
||||
<<
|
||||
Partial match: c
|
||||
\= Expect no match
|
||||
xbc\=ph
|
||||
No match
|
||||
|
@ -17057,8 +17049,7 @@ Max lookbehind = 2
|
|||
First code unit = 'c'
|
||||
Subject length lower bound = 1
|
||||
abc\=ph
|
||||
Partial match: abc
|
||||
<<
|
||||
Partial match: c
|
||||
\= Expect no match
|
||||
xbc\=ph
|
||||
No match
|
||||
|
|
|
@ -876,7 +876,7 @@ Partial match: abc
|
|||
0: def
|
||||
|
||||
/(?<=foo)bar/
|
||||
foob\=ps,offset=2
|
||||
foob\=ps,offset=2,allusedtext
|
||||
Partial match: foob
|
||||
<<<
|
||||
foobar...\=ps,dfa_restart,offset=4
|
||||
|
@ -6803,9 +6803,10 @@ Partial match: dogs
|
|||
xyzabc123pqr
|
||||
Failed: error -42: pattern contains an item that is not supported for DFA matching
|
||||
|
||||
/(?<=abc)123/
|
||||
/(?<=abc)123/allusedtext
|
||||
xyzabc123pqr
|
||||
0: 123
|
||||
0: abc123
|
||||
<<<
|
||||
xyzabc12\=ps
|
||||
Partial match: abc12
|
||||
<<<
|
||||
|
@ -6813,9 +6814,10 @@ Partial match: abc12
|
|||
Partial match: abc12
|
||||
<<<
|
||||
|
||||
/\babc\b/
|
||||
/\babc\b/allusedtext
|
||||
+++abc+++
|
||||
0: abc
|
||||
0: +abc+
|
||||
< >
|
||||
+++ab\=ps
|
||||
Partial match: +ab
|
||||
<
|
||||
|
@ -6932,7 +6934,7 @@ Failed: error -42: pattern contains an item that is not supported for DFA matchi
|
|||
ac
|
||||
Failed: error -42: pattern contains an item that is not supported for DFA matching
|
||||
|
||||
/(?<=abc)def/
|
||||
/(?<=abc)def/allusedtext
|
||||
abc\=ph
|
||||
Partial match: abc
|
||||
<<<
|
||||
|
|
Loading…
Reference in New Issue