Remove leftchar/rightchar from the public API.

This commit is contained in:
Philip.Hazel 2014-10-10 11:55:28 +00:00
parent 0907fc6e92
commit 62d728bb0b
13 changed files with 112 additions and 153 deletions

View File

@ -64,7 +64,7 @@ title10="Test 10: Specials for the 8-bit library with UTF-8 and UCP support"
title11="Test 11: Specials for the basic 16-bit and 32-bit libraries" title11="Test 11: Specials for the basic 16-bit and 32-bit libraries"
title12="Test 12: Specials for the 16-bit and 32-bit libraries UTF and UCP support" title12="Test 12: Specials for the 16-bit and 32-bit libraries UTF and UCP support"
title13="Test 13: DFA specials for the basic 16-bit and 32-bit libraries" title13="Test 13: DFA specials for the basic 16-bit and 32-bit libraries"
title14="Test 14: Non-JIT limits tests" title14="Test 14: Non-JIT limits and other non-JIT tests"
title15="Test 15: JIT-specific features when JIT is not available" title15="Test 15: JIT-specific features when JIT is not available"
title16="Test 16: JIT-specific features when JIT is available" title16="Test 16: JIT-specific features when JIT is available"
title17="Test 17: Tests of the POSIX interface, excluding UTF/UCP" title17="Test 17: Tests of the POSIX interface, excluding UTF/UCP"

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "05 October 2014" "PCRE2 10.00" .TH PCRE2API 3 "10 October 2014" "PCRE2 10.00"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -47,16 +47,12 @@ document for an overview of all the PCRE2 documentation.
.rs .rs
.sp .sp
.nf .nf
.B PCRE2_SIZE pcre2_get_leftchar(pcre2_match_data *\fImatch_data\fP);
.sp
.B PCRE2_SPTR pcre2_get_mark(pcre2_match_data *\fImatch_data\fP); .B PCRE2_SPTR pcre2_get_mark(pcre2_match_data *\fImatch_data\fP);
.sp .sp
.B uint32_t pcre2_get_ovector_count(pcre2_match_data *\fImatch_data\fP); .B uint32_t pcre2_get_ovector_count(pcre2_match_data *\fImatch_data\fP);
.sp .sp
.B PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *\fImatch_data\fP); .B PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *\fImatch_data\fP);
.sp .sp
.B PCRE2_SIZE pcre2_get_rightchar(pcre2_match_data *\fImatch_data\fP);
.sp
.B PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *\fImatch_data\fP); .B PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *\fImatch_data\fP);
.fi .fi
. .
@ -2054,10 +2050,6 @@ had.
.nf .nf
.B PCRE2_SPTR pcre2_get_mark(pcre2_match_data *\fImatch_data\fP); .B PCRE2_SPTR pcre2_get_mark(pcre2_match_data *\fImatch_data\fP);
.sp .sp
.B PCRE2_SIZE pcre2_get_leftchar(pcre2_match_data *\fImatch_data\fP);
.sp
.B PCRE2_SIZE pcre2_get_rightchar(pcre2_match_data *\fImatch_data\fP);
.sp
.B PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *\fImatch_data\fP); .B PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *\fImatch_data\fP);
.fi .fi
.P .P
@ -2069,35 +2061,15 @@ pointer to the zero-terminated name, which is within the compiled pattern.
Otherwise NULL is returned. A (*MARK) name may be available after a failed Otherwise NULL is returned. A (*MARK) name may be available after a failed
match or a partial match, as well as after a successful one. match or a partial match, as well as after a successful one.
.P .P
The other three functions yield values that give information about the part of The offset of the character at which the successful or partial match started is
the subject string that was inspected during a successful match or a partial returned by \fBpcre2_get_startchar()\fP. This can be different to the value of
match. Their results are undefined after a failed match. They return the \fIovector[0]\fP if the pattern contains the \eK escape sequence. This
following values, respectively: information is needed when doing partial matching over multiple data segments
.sp (see the
(1) The offset of the leftmost character that was inspected during the match. .\" HREF
This can be earlier than the point at which the match started if the pattern \fBpcre2partial\fP
contains lookbehind assertions or \eb or \eB at the start. .\"
.sp documentation).
(2) The offset of the character that follows the rightmost character that was
inspected during the match. This can be after the end of the match if the
pattern contains lookahead assertions.
.sp
(3) The offset of the character at which the successful or partial match
started. This can be different to the value of \fIovector[0]\fP if the pattern
contains the \eK escape sequence.
.P
For example, if the pattern (?<=abc)xx\eKyy(?=def) is matched against the
string "123abcxxyydef123", the resulting offsets are:
.sp
ovector[0] 8
ovector[1] 10
leftchar 3
rightchar 13
startchar 6
.sp
The \fBallusedtext\fP modifier in \fBpcre2test\fP can be used to display a
longer string that shows the leftmost and rightmost characters in a match
instead of just the matched string.
. .
. .
.\" HTML <a name="errorlist"></a> .\" HTML <a name="errorlist"></a>
@ -2654,6 +2626,6 @@ Cambridge CB2 3QH, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 05 October 2014 Last updated: 10 October 2014
Copyright (c) 1997-2014 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "05 October 2014" "PCRE 10.00" .TH PCRE2TEST 1 "10 October 2014" "PCRE 10.00"
.SH NAME .SH NAME
pcre2test - a program for testing Perl-compatible regular expressions. pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
@ -688,7 +688,7 @@ pattern.
aftertext show text after match aftertext show text after match
allaftertext show text after captures allaftertext show text after captures
allcaptures show all captures allcaptures show all captures
allusedtext show all consulted text allusedtext show all consulted text (non-JIT only)
altglobal alternative global matching altglobal alternative global matching
callout_capture show captures at callout time callout_capture show captures at callout time
callout_data=<n> set a value to pass via callouts callout_data=<n> set a value to pass via callouts
@ -725,10 +725,12 @@ substring. In each case the remainder is output on the following line with a
plus character following the capture number. plus character following the capture number.
.P .P
The \fBallusedtext\fP modifier requests that all the text that was consulted The \fBallusedtext\fP modifier requests that all the text that was consulted
during a successful pattern match be shown. This affects the output if there during a successful pattern match by the interpreter should be shown. This
is a lookbehind at the start of a match, or a lookahead at the end, or if \eK feature is not supported for JIT matching, and if requested with JIT it is
is used in the pattern. Characters that precede or follow the start and end of ignored (with a warning message). Setting this modifier affects the output if
the actual match are indicated in the output by '<' or '>' characters there is a lookbehind at the start of a match, or a lookahead at the end, or if
\eK is used in the pattern. Characters that precede or follow the start and end
of the actual match are indicated in the output by '<' or '>' characters
underneath them. Here is an example: underneath them. Here is an example:
.sp .sp
/(?<=pqr)abc(?=xyz)/ /(?<=pqr)abc(?=xyz)/
@ -1151,6 +1153,6 @@ Cambridge CB2 3QH, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 05 October 2014 Last updated: 10 October 2014
Copyright (c) 1997-2014 University of Cambridge. Copyright (c) 1997-2014 University of Cambridge.
.fi .fi

View File

@ -415,11 +415,9 @@ PCRE2_EXP_DECL int pcre2_match(const pcre2_code *, \
PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, uint32_t, \ PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, uint32_t, \
pcre2_match_data *, pcre2_match_context *); \ pcre2_match_data *, pcre2_match_context *); \
PCRE2_EXP_DECL void pcre2_match_data_free(pcre2_match_data *); \ PCRE2_EXP_DECL void pcre2_match_data_free(pcre2_match_data *); \
PCRE2_EXP_DECL PCRE2_SIZE pcre2_get_leftchar(pcre2_match_data *); \
PCRE2_EXP_DECL PCRE2_SPTR pcre2_get_mark(pcre2_match_data *); \ PCRE2_EXP_DECL PCRE2_SPTR pcre2_get_mark(pcre2_match_data *); \
PCRE2_EXP_DECL uint32_t pcre2_get_ovector_count(pcre2_match_data *); \ PCRE2_EXP_DECL uint32_t pcre2_get_ovector_count(pcre2_match_data *); \
PCRE2_EXP_DECL PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *); \ PCRE2_EXP_DECL PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *); \
PCRE2_EXP_DECL PCRE2_SIZE pcre2_get_rightchar(pcre2_match_data *); \
PCRE2_EXP_DECL PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *); PCRE2_EXP_DECL PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *);
@ -525,11 +523,9 @@ pcre2_compile are called by application code. */
#define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_) #define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_)
#define pcre2_general_context_free PCRE2_SUFFIX(pcre2_general_context_free_) #define pcre2_general_context_free PCRE2_SUFFIX(pcre2_general_context_free_)
#define pcre2_get_error_message PCRE2_SUFFIX(pcre2_get_error_message_) #define pcre2_get_error_message PCRE2_SUFFIX(pcre2_get_error_message_)
#define pcre2_get_leftchar PCRE2_SUFFIX(pcre2_get_leftchar_)
#define pcre2_get_mark PCRE2_SUFFIX(pcre2_get_mark_) #define pcre2_get_mark PCRE2_SUFFIX(pcre2_get_mark_)
#define pcre2_get_ovector_pointer PCRE2_SUFFIX(pcre2_get_ovector_pointer_) #define pcre2_get_ovector_pointer PCRE2_SUFFIX(pcre2_get_ovector_pointer_)
#define pcre2_get_ovector_count PCRE2_SUFFIX(pcre2_get_ovector_count_) #define pcre2_get_ovector_count PCRE2_SUFFIX(pcre2_get_ovector_count_)
#define pcre2_get_rightchar PCRE2_SUFFIX(pcre2_get_rightchar_)
#define pcre2_get_startchar PCRE2_SUFFIX(pcre2_get_startchar_) #define pcre2_get_startchar PCRE2_SUFFIX(pcre2_get_startchar_)
#define pcre2_jit_compile PCRE2_SUFFIX(pcre2_jit_compile_) #define pcre2_jit_compile PCRE2_SUFFIX(pcre2_jit_compile_)
#define pcre2_jit_match PCRE2_SUFFIX(pcre2_jit_match_) #define pcre2_jit_match PCRE2_SUFFIX(pcre2_jit_match_)

View File

@ -515,10 +515,10 @@ same response. */
/* These macros pack up tests that are used for partial matching, and which /* These macros pack up tests that are used for partial matching, and which
appear several times in the code. We set the "hit end" flag if the pointer is appear several times in the code. We set the "hit end" flag if the pointer is
at the end of the subject and also past the start of the subject (i.e. at the end of the subject and also past the earliest inspected character (i.e.
something has been matched). For hard partial matching, we then return something has been matched, even if not part of the actual matched string). For
immediately. The second one is used when we already know we are past the end of hard partial matching, we then return immediately. The second one is used when
the subject. */ we already know we are past the end of the subject. */
#define CHECK_PARTIAL()\ #define CHECK_PARTIAL()\
if (mb->partial != 0 && eptr >= mb->end_subject && \ if (mb->partial != 0 && eptr >= mb->end_subject && \

View File

@ -94,18 +94,6 @@ if (match_data != NULL)
/*************************************************
* Get left-most code unit in match *
*************************************************/
PCRE2_EXP_DEFN PCRE2_SIZE PCRE2_CALL_CONVENTION
pcre2_get_leftchar(pcre2_match_data *match_data)
{
return match_data->leftchar;
}
/************************************************* /*************************************************
* Get last mark in match * * Get last mark in match *
*************************************************/ *************************************************/
@ -142,18 +130,6 @@ return match_data->oveccount;
/*************************************************
* Get right-most code unit in match *
*************************************************/
PCRE2_EXP_DEFN PCRE2_SIZE PCRE2_CALL_CONVENTION
pcre2_get_rightchar(pcre2_match_data *match_data)
{
return match_data->rightchar;
}
/************************************************* /*************************************************
* Get starting code unit in match * * Get starting code unit in match *
*************************************************/ *************************************************/

View File

@ -4381,10 +4381,17 @@ modifiers. */
if ((dat_datctl.control & (CTL_DFA|CTL_FINDLIMITS)) == (CTL_DFA|CTL_FINDLIMITS)) if ((dat_datctl.control & (CTL_DFA|CTL_FINDLIMITS)) == (CTL_DFA|CTL_FINDLIMITS))
{ {
printf("** Finding match limits is not relevant for DFA matching: ignored\n"); fprintf(outfile, "** Finding match limits is not relevant for DFA matching: ignored\n");
dat_datctl.control &= ~CTL_FINDLIMITS; dat_datctl.control &= ~CTL_FINDLIMITS;
} }
if ((dat_datctl.control & CTL_ALLUSEDTEXT) != 0 &&
FLD(compiled_code, executable_jit) != NULL)
{
fprintf(outfile, "** Showing all consulted text is not supported by JIT: ignored\n");
dat_datctl.control &= ~CTL_ALLUSEDTEXT;
}
/* As pcre2_match_data_create() imposes a minimum of 1 on the ovector count, we /* As pcre2_match_data_create() imposes a minimum of 1 on the ovector count, we
must do so too. */ must do so too. */

30
testdata/testinput14 vendored
View File

@ -1,7 +1,11 @@
# These are tests of the match-limiting features. The results are different for # These are:
#
# (1) Tests of the match-limiting features. The results are different for
# interpretive or JIT matching, so this test should not be run with JIT. The # interpretive or JIT matching, so this test should not be run with JIT. The
# same tests are run using JIT in test 16. # same tests are run using JIT in test 16.
# (2) Other tests that must not be run with JIT.
/(a+)*zz/I /(a+)*zz/I
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits
aaaaaaaaaaaaaz\=find_limits aaaaaaaaaaaaaz\=find_limits
@ -81,4 +85,28 @@
/(?(R)a*(?1)|((?R))b)/ /(?(R)a*(?1)|((?R))b)/
aaaabcde aaaabcde
# The allusedtext modifier does not work with JIT, which does not maintain
# the leftchar/rightchar data.
/abc(?=xyz)/allusedtext
abcxyzpqr
abcxyzpqr\=aftertext
/(?<=pqr)abc(?=xyz)/allusedtext
xyzpqrabcxyzpqr
xyzpqrabcxyzpqr\=aftertext
/a\b/
a.\=allusedtext
a\=allusedtext
/abc\Kxyz/
abcxyz\=allusedtext
/abc(?=xyz(*ACCEPT))/
abcxyz\=allusedtext
/abc(?=abcde)(?=ab)/allusedtext
abcabcdefg
# End of testinput14 # End of testinput14

21
testdata/testinput2 vendored
View File

@ -3955,27 +3955,6 @@ a random value. /Ix
aaaabcde aaaabcde
aaaabcde\=ovector=100 aaaabcde\=ovector=100
/abc(?=xyz)/allusedtext
abcxyzpqr
abcxyzpqr\=aftertext
/(?<=pqr)abc(?=xyz)/allusedtext
xyzpqrabcxyzpqr
xyzpqrabcxyzpqr\=aftertext
/a\b/
a.\=allusedtext
a\=allusedtext
/abc\Kxyz/
abcxyz\=allusedtext
/abc(?=xyz(*ACCEPT))/
abcxyz\=allusedtext
/abc(?=abcde)(?=ab)/allusedtext
abcabcdefg
/a*?b*?/ /a*?b*?/
ab ab

3
testdata/testinput5 vendored
View File

@ -1625,7 +1625,4 @@
/\X?abc/utf,no_start_optimize /\X?abc/utf,no_start_optimize
\xff\x7f\x00\x00\x03\x00\x41\xcc\x80\x41\x{300}\x61\x62\x63\x00\=no_utf_check,offset=06 \xff\x7f\x00\x00\x03\x00\x41\xcc\x80\x41\x{300}\x61\x62\x63\x00\=no_utf_check,offset=06
/(?<=\x{100})\x{200}(?=\x{300})/utf,allusedtext
\x{100}\x{200}\x{300}
# End of testinput5 # End of testinput5

49
testdata/testoutput14 vendored
View File

@ -1,7 +1,11 @@
# These are tests of the match-limiting features. The results are different for # These are:
#
# (1) Tests of the match-limiting features. The results are different for
# interpretive or JIT matching, so this test should not be run with JIT. The # interpretive or JIT matching, so this test should not be run with JIT. The
# same tests are run using JIT in test 16. # same tests are run using JIT in test 16.
# (2) Other tests that must not be run with JIT.
/(a+)*zz/I /(a+)*zz/I
Capturing subpattern count = 1 Capturing subpattern count = 1
Starting code units: a z Starting code units: a z
@ -192,4 +196,47 @@ Failed: error -49: nested recursion at the same subject position
aaaabcde aaaabcde
Failed: error -49: nested recursion at the same subject position Failed: error -49: nested recursion at the same subject position
# The allusedtext modifier does not work with JIT, which does not maintain
# the leftchar/rightchar data.
/abc(?=xyz)/allusedtext
abcxyzpqr
0: abcxyz
>>>
abcxyzpqr\=aftertext
0: abcxyz
>>>
0+ xyzpqr
/(?<=pqr)abc(?=xyz)/allusedtext
xyzpqrabcxyzpqr
0: pqrabcxyz
<<< >>>
xyzpqrabcxyzpqr\=aftertext
0: pqrabcxyz
<<< >>>
0+ xyzpqr
/a\b/
a.\=allusedtext
0: a.
>
a\=allusedtext
0: a
/abc\Kxyz/
abcxyz\=allusedtext
0: abcxyz
<<<
/abc(?=xyz(*ACCEPT))/
abcxyz\=allusedtext
0: abcxyz
>>>
/abc(?=abcde)(?=ab)/allusedtext
abcabcdefg
0: abcabcde
>>>>>
# End of testinput14 # End of testinput14

40
testdata/testoutput2 vendored
View File

@ -13463,46 +13463,6 @@ No match
aaaabcde\=ovector=100 aaaabcde\=ovector=100
0: aaaab 0: aaaab
/abc(?=xyz)/allusedtext
abcxyzpqr
0: abcxyz
>>>
abcxyzpqr\=aftertext
0: abcxyz
>>>
0+ xyzpqr
/(?<=pqr)abc(?=xyz)/allusedtext
xyzpqrabcxyzpqr
0: pqrabcxyz
<<< >>>
xyzpqrabcxyzpqr\=aftertext
0: pqrabcxyz
<<< >>>
0+ xyzpqr
/a\b/
a.\=allusedtext
0: a.
>
a\=allusedtext
0: a
/abc\Kxyz/
abcxyz\=allusedtext
0: abcxyz
<<<
/abc(?=xyz(*ACCEPT))/
abcxyz\=allusedtext
0: abcxyz
>>>
/abc(?=abcde)(?=ab)/allusedtext
abcabcdefg
0: abcabcde
>>>>>
/a*?b*?/ /a*?b*?/
ab ab
0: 0:

View File

@ -3991,9 +3991,4 @@ Subject length lower bound = 1
\xff\x7f\x00\x00\x03\x00\x41\xcc\x80\x41\x{300}\x61\x62\x63\x00\=no_utf_check,offset=06 \xff\x7f\x00\x00\x03\x00\x41\xcc\x80\x41\x{300}\x61\x62\x63\x00\=no_utf_check,offset=06
0: A\x{300}abc 0: A\x{300}abc
/(?<=\x{100})\x{200}(?=\x{300})/utf,allusedtext
\x{100}\x{200}\x{300}
0: \x{100}\x{200}\x{300}
<<<<<<< >>>>>>>
# End of testinput5 # End of testinput5