Impose a minimum of 1 for the number of pairs in the ovector.

This commit is contained in:
Philip.Hazel 2014-10-05 17:55:25 +00:00
parent 4ca4ad688d
commit 4bdfd990af
6 changed files with 61 additions and 46 deletions

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "01 October 2014" "PCRE2 10.00"
.TH PCRE2API 3 "05 October 2014" "PCRE2 10.00"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -1655,8 +1655,10 @@ match data block by calling one of the creation functions above. For
\fBpcre2_match_data_create()\fP, the first argument is the number of pairs of
offsets in the \fIovector\fP. One pair of offsets is required to identify the
string that matched the whole pattern, with another pair for each captured
substring. For example, a value of 4 creates enough space to record the
matched portion of the subject plus three captured substrings.
substring. For example, a value of 4 creates enough space to record the matched
portion of the subject plus three captured substrings. A minimum of at least 1
pair is imposed by \fBpcre2_match_data_create()\fP, so it is always possible to
return the overall matched string.
.P
For \fBpcre2_match_data_create_from_pattern()\fP, the first argument is a
pointer to a compiled pattern. In this case the ovector is created to be
@ -2015,13 +2017,13 @@ operation, it is the last portion of the string that it matched that is
returned.
.P
If the ovector is too small to hold all the captured substring offsets, as much
as possible is filled in, and the function returns a value of zero. If neither
the actual string matched nor any captured substrings are of interest,
\fBpcre2_match()\fP may be called with a match data block whose ovector is of
zero length. However, if the pattern contains back references and the
\fIovector\fP is not big enough to remember the related substrings, PCRE2 has
to get additional memory for use during matching. Thus it is usually advisable
to set up a match data block containing an ovector of reasonable size.
as possible is filled in, and the function returns a value of zero. If captured
substrings are not of interest, \fBpcre2_match()\fP may be called with a match
data block whose ovector is of minimum length (that is, one pair). However, if
the pattern contains back references and the \fIovector\fP is not big enough to
remember the related substrings, PCRE2 has to get additional memory for use
during matching. Thus it is usually advisable to set up a match data block
containing an ovector of reasonable size.
.P
It is possible for capturing subpattern number \fIn+1\fP to match some part of
the subject when subpattern \fIn\fP has not been used at all. For example, if
@ -2652,6 +2654,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
Last updated: 01 October 2014
Last updated: 05 October 2014
Copyright (c) 1997-2014 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "19 August 2014" "PCRE 10.00"
.TH PCRE2TEST 1 "05 October 2014" "PCRE 10.00"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -881,6 +881,12 @@ The \fBovector\fP modifier applies only to the subject line in which it
appears, though of course it can also be used to set a default in a
\fB#subject\fP command. It specifies the number of pairs of offsets that are
available for storing matching information. The default is 15.
.P
At least one pair of offsets is always created by
\fBpcre2_match_data_create()\fP, for matching with PCRE2's native API, so a
value of 0 is the same as 1. However a value of 0 is useful when testing the
POSIX API because it causes \fBregexec()\fP to be called with a NULL capture
vector.
.
.
.SH "THE ALTERNATIVE MATCHING FUNCTION"
@ -1145,6 +1151,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
Last updated: 19 August 2014
Last updated: 05 October 2014
Copyright (c) 1997-2014 University of Cambridge.
.fi

View File

@ -51,10 +51,14 @@ POSSIBILITY OF SUCH DAMAGE.
* Create a match data block given ovector size *
*************************************************/
/* A minimum of 1 is imposed on the number of ovector triplets. */
PCRE2_EXP_DEFN pcre2_match_data * PCRE2_CALL_CONVENTION
pcre2_match_data_create(uint32_t oveccount, pcre2_general_context *gcontext)
{
pcre2_match_data *yield = PRIV(memctl_malloc)(
pcre2_match_data *yield;
if (oveccount < 1) oveccount = 1;
yield = PRIV(memctl_malloc)(
sizeof(pcre2_match_data) + 3*oveccount*sizeof(PCRE2_SIZE),
(pcre2_memctl *)gcontext);
yield->oveccount = oveccount;

View File

@ -4385,11 +4385,10 @@ if ((dat_datctl.control & (CTL_DFA|CTL_FINDLIMITS)) == (CTL_DFA|CTL_FINDLIMITS))
dat_datctl.control &= ~CTL_FINDLIMITS;
}
if ((dat_datctl.control & CTL_ANYGLOB) != 0 && dat_datctl.oveccount < 1)
{
printf("** Global matching requires a non-zero ovector count: ignored\n");
dat_datctl.control &= ~CTL_ANYGLOB;
}
/* As pcre2_match_data_create() imposes a minimum of 1 on the ovector count, we
must do so too. */
if (dat_datctl.oveccount < 1) dat_datctl.oveccount = 1;
/* Enable display of malloc/free if wanted. */
@ -4875,8 +4874,7 @@ for (gmatched = 0;; gmatched++)
If that is the case, this is not necessarily the end. We want to advance the
start offset, and continue. We won't be at the end of the string - that was
checked before setting g_notempty. We achieve the effect by pretending that a
single character was matched. We know that match_data->oveccount is at least
1 because that was checked above.
single character was matched.
Complication arises in the case when the newline convention is "any", "crlf",
or "anycrlf". If the previous match was at the end of a line terminated by

View File

@ -245,6 +245,7 @@ Subject length lower bound = 4
3: c
abcb\=ovector=0
Matched, but too many substrings
0: abcb
abcb\=ovector=1
Matched, but too many substrings
0: abcb
@ -273,6 +274,7 @@ Subject length lower bound = 3
1: a
abc\=ovector=0
Matched, but too many substrings
0: abc
abc\=ovector=1
Matched, but too many substrings
0: abc
@ -286,6 +288,7 @@ Matched, but too many substrings
3: b
aba\=ovector=0
Matched, but too many substrings
0: aba
aba\=ovector=1
Matched, but too many substrings
0: aba
@ -7404,6 +7407,7 @@ Subject length lower bound = 3
No match
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4\=ovector=0
Matched, but too many substrings
0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4
/^a.b/newline=lf
a\rb
@ -10922,6 +10926,7 @@ Minimum recursion limit = 4
3: baz
bazfooX\=ovector=0
Matched, but too many substrings
0: fooX
bazfooX\=ovector=1
Matched, but too many substrings
0: fooX
@ -11970,7 +11975,7 @@ Callout 2: last capture = 0
/(ab)x|ab/
ab\=ovector=0
Matched, but too many substrings
0: ab
ab\=ovector=1
0: ab

View File

@ -7611,7 +7611,7 @@ Failed: error -37: invalid data in workspace for DFA restart
/abcd/
abcd\=ovector=0
Matched, but offsets vector is too small to show all matches
0: abcd
# These tests show up auto-possessification