Update and improve substring handling and its documentation.

This commit is contained in:
Philip.Hazel 2014-12-14 17:17:06 +00:00
parent a85d15cbd1
commit cb8865d247
17 changed files with 337 additions and 164 deletions

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "13 December 2014" "PCRE2 10.00"
.TH PCRE2API 3 "14 December 2014" "PCRE2 10.00"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -921,6 +921,16 @@ PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of memory that
contains the compiled pattern and related data. The caller must free the memory
by calling \fBpcre2_code_free()\fP when it is no longer needed.
.P
NOTE: When one of the matching functions is called, pointers to the compiled
pattern and the subject string are set in the match data block so that they can
be referenced by the extraction functions. After running a match, you must not
free a compiled pattern (or a subject string) until after all operations on the
.\" HTML <a href="#matchdatablock">
.\" </a>
match data block
.\"
have taken place.
.P
If the compile context argument \fIccontext\fP is NULL, memory for the compiled
pattern is obtained by calling \fBmalloc()\fP. Otherwise, it is obtained from
the same memory function that was used for the compile context.
@ -1683,7 +1693,7 @@ pattern with the JIT compiler does not alter the value returned by this option.
.B void pcre2_match_data_free(pcre2_match_data *\fImatch_data\fP);
.fi
.P
Information about successful and unsuccessful matches is placed in a match
Information about a successful or unsuccessful match is placed in a match
data block, which is an opaque structure that is accessed by function calls. In
particular, the match data block contains a vector of offsets into the subject
string that define the matched part of the subject and any substrings that were
@ -1713,10 +1723,8 @@ memory is obtained using the same allocator that was used for the compiled
pattern (custom or default).
.P
A match data block can be used many times, with the same or different compiled
patterns. When it is no longer needed, it should be freed by calling
\fBpcre2_match_data_free()\fP. You can extract information from a match data
block after a match operation has finished, using functions that are described
in the sections on
patterns. You can extract information from a match data block after a match
operation has finished, using functions that are described in the sections on
.\" HTML <a href="#matchedstrings">
.\" </a>
matched strings
@ -1727,6 +1735,15 @@ and
other match data
.\"
below.
.P
When one of the matching functions is called, pointers to the compiled pattern
and the subject string are set in the match data block so that they can be
referenced by the extraction functions. After running a match, you must not
free a compiled pattern or a subject string until after all operations on the
match data block (for that match) have taken place.
.P
When a match data block itself is no longer needed, it should be freed by
calling \fBpcre2_match_data_free()\fP.
.
.
.SH "MATCHING A PATTERN: THE TRADITIONAL FUNCTION"
@ -2053,8 +2070,13 @@ returned value is 3. If there are no capturing subpatterns, the return value
from a successful match is 1, indicating that just the first pair of offsets
has been set.
.P
If a capturing subpattern is matched repeatedly within a single match
operation, it is the last portion of the string that it matched that is
If a pattern uses the \eK escape sequence within a positive assertion, the
reported start of the match can be greater than the end of the match. For
example, if the pattern (?=ab\eK) is matched against "ab", the start and end
offset values for the match are 2 and 0.
.P
If a capturing subpattern group is matched repeatedly within a single match
operation, it is the last portion of the subject that it matched that is
returned.
.P
If the ovector is too small to hold all the captured substring offsets, as much
@ -2268,23 +2290,31 @@ above.
.\"
For convenience, auxiliary functions are provided for extracting captured
substrings as new, separate, zero-terminated strings. The functions in this
section identify substrings by number. The next section describes similar
functions for extracting substrings by name. A substring that contains a binary
zero is correctly extracted and has a further zero added on the end, but the
result is not, of course, a C string.
section identify substrings by number. The number zero refers to the entire
matched substring, with higher numbers referring to substrings captured by
parenthesized groups. The next section describes similar functions for
extracting captured substrings by name. A substring that contains a binary zero
is correctly extracted and has a further zero added on the end, but the result
is not, of course, a C string.
.P
If a pattern uses the \eK escape sequence within a positive assertion, the
reported start of the match can be greater than the end of the match. For
example, if the pattern (?=ab\eK) is matched against "ab", the start and end
offset values for the match are 2 and 0. In this situation, calling these
functions with a zero substring number extracts a zero-length empty string.
.P
You can find the length in code units of a captured substring without
extracting it by calling \fBpcre2_substring_length_bynumber()\fP. The first
argument is a pointer to the match data block, the second is the group number,
and the third is a pointer to a variable into which the length is placed.
and the third is a pointer to a variable into which the length is placed. If
you just want to know whether or not the substring has been captured, you can
pass the third argument as NULL.
.P
The \fBpcre2_substring_copy_bynumber()\fP function copies one string into a
supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it into
new memory, obtained using the same memory allocation function that was used
for the match data block. The first two arguments of these functions are a
pointer to the match data block and a capturing group number. A group number of
zero extracts the substring that matched the entire pattern, and higher values
extract the captured substrings.
The \fBpcre2_substring_copy_bynumber()\fP function copies a captured substring
into a supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it
into new memory, obtained using the same memory allocation function that was
used for the match data block. The first two arguments of these functions are a
pointer to the match data block and a capturing group number.
.P
The final arguments of \fBpcre2_substring_copy_bynumber()\fP are a pointer to
the buffer and a pointer to a variable that contains its length in code units.
@ -2297,8 +2327,9 @@ of code units that comprise the substring, again excluding the terminating
zero. When the substring is no longer needed, the memory should be freed by
calling \fBpcre2_substring_free()\fP.
.P
The return value from these functions is zero for success, or one of these
error codes:
The return value from all these functions is zero for success, or a negative
error code. If the pattern match failed, the match failure code is returned.
Other possible error codes are:
.sp
PCRE2_ERROR_NOMEMORY
.sp
@ -2319,7 +2350,8 @@ could not be captured.
PCRE2_ERROR_UNSET
.sp
The substring did not participate in the match. For example, if the pattern is
(abc)|(def) and the subject is "def", substring number 1 is unset.
(abc)|(def) and the subject is "def", and the ovector contains at least two
capturing slots, substring number 1 is unset.
.
.
.SH "EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS"
@ -2388,15 +2420,20 @@ calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
compiled pattern, and the second is the name. The yield of the function is the
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
that name.
that name. Given the number, you can extract the substring directly, or use one
of the functions described above.
.P
Given the number, you can extract the substring directly, or use one of the
functions described above. For convenience, there are also "byname" functions
that correspond to the "bynumber" functions, the only difference being that the
second argument is a name instead of a number. If PCRE2_DUPNAMES is
set and there are duplicate names, these functions return the first named
string that is set. PCRE2_ERROR_UNSET is returned only if all groups of the
same name are unset.
For convenience, there are also "byname" functions that correspond to the
"bynumber" functions, the only difference being that the second argument is a
name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
names, these functions scan all the groups with the given name, and return the
first named string that is set.
.P
If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
returned. If all groups with the name have numbers that are greater than the
number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there
is at least one group with a slot in the ovector, but no group is found to be
set, PCRE2_ERROR_UNSET is returned.
.P
\fBWarning:\fP If the pattern uses the (?| feature to set up multiple
subpatterns with the same number, as described in the
@ -2660,17 +2697,36 @@ is matched against the string
.sp
the three matched strings are
.sp
<something>
<something> <something else>
<something> <something else> <something further>
<something> <something else>
<something>
.sp
On success, the yield of the function is a number greater than zero, which is
the number of matched substrings. The offsets of the substrings are returned in
the ovector, and can be extracted in the same way as for \fBpcre2_match()\fP.
They are returned in reverse order of length; that is, the longest
matching string is given first. If there were too many matches to fit into
the ovector, the yield of the function is zero, and the vector is filled with
the longest matches.
the ovector, and can be extracted by number in the same way as for
\fBpcre2_match()\fP, but the numbers bear no relation to any capturing groups
that may exist in the pattern, because DFA matching does not support group
capture.
.P
Calls to the convenience functions that extract substrings by name
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
DFA match. The convenience functions that extract substrings by number never
return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
slightly different:
.sp
PCRE2_ERROR_UNAVAILABLE
.sp
The ovector is not big enough to include a slot for the given substring number.
.sp
PCRE2_ERROR_UNSET
.sp
There is a slot in the ovector for this substring, but there were insufficient
matches to fill it.
.P
The matched strings are stored in the ovector in reverse order of length; that
is, the longest matching string is first. If there were too many matches to fit
into the ovector, the yield of the function is zero, and the vector is filled
with the longest matches.
.P
NOTE: PCRE2's "auto-possessification" optimization usually applies to character
repeats at the end of a pattern (as well as internally). For example, the
@ -2746,6 +2802,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 13 December 2014
Last updated: 14 December 2014
Copyright (c) 1997-2014 University of Cambridge.
.fi

View File

@ -212,20 +212,21 @@ context functions. */
#define PCRE2_ERROR_DFA_BADRESTART (-38)
#define PCRE2_ERROR_DFA_RECURSE (-39)
#define PCRE2_ERROR_DFA_UCOND (-40)
#define PCRE2_ERROR_DFA_UITEM (-41)
#define PCRE2_ERROR_DFA_WSSIZE (-42)
#define PCRE2_ERROR_INTERNAL (-43)
#define PCRE2_ERROR_JIT_BADOPTION (-44)
#define PCRE2_ERROR_JIT_STACKLIMIT (-45)
#define PCRE2_ERROR_MATCHLIMIT (-46)
#define PCRE2_ERROR_NOMEMORY (-47)
#define PCRE2_ERROR_NOSUBSTRING (-48)
#define PCRE2_ERROR_NOUNIQUESUBSTRING (-49)
#define PCRE2_ERROR_NULL (-50)
#define PCRE2_ERROR_RECURSELOOP (-51)
#define PCRE2_ERROR_RECURSIONLIMIT (-52)
#define PCRE2_ERROR_UNAVAILABLE (-53)
#define PCRE2_ERROR_UNSET (-54)
#define PCRE2_ERROR_DFA_UFUNC (-41)
#define PCRE2_ERROR_DFA_UITEM (-42)
#define PCRE2_ERROR_DFA_WSSIZE (-43)
#define PCRE2_ERROR_INTERNAL (-44)
#define PCRE2_ERROR_JIT_BADOPTION (-45)
#define PCRE2_ERROR_JIT_STACKLIMIT (-46)
#define PCRE2_ERROR_MATCHLIMIT (-47)
#define PCRE2_ERROR_NOMEMORY (-48)
#define PCRE2_ERROR_NOSUBSTRING (-49)
#define PCRE2_ERROR_NOUNIQUESUBSTRING (-50)
#define PCRE2_ERROR_NULL (-51)
#define PCRE2_ERROR_RECURSELOOP (-52)
#define PCRE2_ERROR_RECURSIONLIMIT (-53)
#define PCRE2_ERROR_UNAVAILABLE (-54)
#define PCRE2_ERROR_UNSET (-55)
/* Request types for pcre2_pattern_info() */

View File

@ -3275,6 +3275,12 @@ if ((re->flags & PCRE2_LASTSET) != 0)
}
}
/* Fill in fields that are always returned in the match data. */
match_data->code = re;
match_data->subject = subject;
match_data->mark = NULL;
match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER;
/* Call the main matching function, looping for a non-anchored regex after a
failed match. If not restarting, perform certain optimizations at the start of

View File

@ -212,18 +212,19 @@ static const char match_error_texts[] =
"invalid data in workspace for DFA restart\0"
"too much recursion for DFA matching\0"
/* 40 */
"backreference condition or recursion test not supported for DFA matching\0"
"item unsupported for DFA matching\0"
"backreference condition or recursion test is not supported for DFA matching\0"
"function is not supported for DFA matching\0"
"pattern contains an item that is not supported for DFA matching\0"
"workspace size exceeded in DFA matching\0"
"internal error - pattern overwritten?\0"
"bad JIT option\0"
/* 45 */
"bad JIT option\0"
"JIT stack limit reached\0"
"match limit exceeded\0"
"no more memory\0"
"unknown substring\0"
"non-unique substring name\0"
/* 50 */
"non-unique substring name\0"
"NULL argument passed\0"
"nested recursion at the same subject position\0"
"recursion limit exceeded\0"

View File

@ -526,15 +526,16 @@ bytes in a code unit in that mode. */
#define PCRE2_MODE_MASK (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32)
/* Values for the matchedby field in a match data block. */
enum { PCRE2_MATCHEDBY_INTERPRETER, /* pcre2_match() */
PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */
PCRE2_MATCHEDBY_JIT }; /* pcre2_jit_match() */
/* Magic number to provide a small check against being handed junk. */
#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */
/* This value is used to detect a loaded regular expression in different
endianness. */
#define REVERSED_MAGIC_NUMBER 0x45524350UL /* 'ERCP' */
/* The maximum remaining length of subject we are prepared to search for a
req_unit match. */

View File

@ -616,12 +616,13 @@ typedef struct pcre2_real_match_data {
pcre2_memctl memctl;
const pcre2_real_code *code; /* The pattern used for the match */
PCRE2_SPTR subject; /* The subject that was matched */
int rc; /* The return code from the match */
PCRE2_SPTR mark; /* Pointer to last mark */
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
PCRE2_SIZE startchar; /* Offset to starting code unit */
PCRE2_SPTR mark; /* Pointer to last mark */
uint16_t matchedby; /* Type of match (normal, JIT, DFA) */
uint16_t oveccount; /* Number of pairs */
int rc; /* The return code from the match */
PCRE2_SIZE ovector[1]; /* The first field */
} pcre2_real_match_data;

View File

@ -180,6 +180,7 @@ match_data->startchar = arguments.startchar_ptr - subject;
match_data->leftchar = 0;
match_data->rightchar = 0;
match_data->mark = arguments.mark_ptr;
match_data->matchedby = PCRE2_MATCHEDBY_JIT;
return match_data->rc;

View File

@ -6995,6 +6995,7 @@ while (mb->ovecsave_chain != NULL)
match_data->code = re;
match_data->subject = subject;
match_data->mark = mb->mark;
match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER;
/* Handle a fully successful match. */
@ -7026,14 +7027,15 @@ if (rc == MATCH_MATCH || rc == MATCH_ACCEPT)
match_data->rc = ((mb->capture_last & OVFLBIT) != 0)?
0 : mb->end_offset_top/2;
/* If there is space in the offset vector, set any unused pairs at the end to
PCRE2_UNSET for backwards compatibility. It is documented that this happens.
In earlier versions, the whole set of potential capturing offsets was
initialized each time round the loop, but this is handled differently now.
"Gaps" are set to PCRE2_UNSET dynamically instead (this fixes a bug). Thus,
it is only those at the end that need setting here. We can't just set them
all at the start of the whole thing because they may get set in one branch
that is not the final matching branch. */
/* If there is space in the offset vector, set any pairs that follow the
highest-numbered captured string but are less than the number of capturing
groups in the pattern (and are within the ovector) to PCRE2_UNSET. It is
documented that this happens. In earlier versions, the whole set of potential
capturing offsets was initialized each time round the loop, but this is
handled differently now. "Gaps" are set to PCRE2_UNSET dynamically instead
(this fixed a bug). Thus, it is only those at the end that need setting here.
We can't just mark them all unset at the start of the whole thing because
they may get set in one branch that is not the final matching branch. */
if (mb->end_offset_top/2 <= re->top_bracket)
{

View File

@ -65,26 +65,33 @@ Returns: if successful: zero
if not successful, a negative error code:
(1) an error from nametable_scan()
(2) an error from copy_bynumber()
(3) PCRE2_ERROR_UNSET: all named groups are unset
(3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector
(4) PCRE2_ERROR_UNSET: all named groups in ovector are unset
*/
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
pcre2_substring_copy_byname(pcre2_match_data *match_data, PCRE2_SPTR stringname,
PCRE2_UCHAR *buffer, PCRE2_SIZE *sizeptr)
{
PCRE2_SPTR first;
PCRE2_SPTR last;
PCRE2_SPTR entry;
int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
PCRE2_SPTR first, last, entry;
int failrc, entrysize;
if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
return PCRE2_ERROR_DFA_UFUNC;
entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
&first, &last);
if (entrysize < 0) return entrysize;
failrc = PCRE2_ERROR_UNAVAILABLE;
for (entry = first; entry <= last; entry += entrysize)
{
uint32_t n = GET2(entry, 0);
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr);
if (n < match_data->oveccount)
{
if (match_data->ovector[n*2] != PCRE2_UNSET)
return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr);
failrc = PCRE2_ERROR_UNSET;
}
}
return PCRE2_ERROR_UNSET;
return failrc;
}
@ -146,26 +153,33 @@ Returns: if successful: zero
if not successful, a negative value:
(1) an error from nametable_scan()
(2) an error from get_bynumber()
(3) PCRE2_ERROR_UNSET: all named groups are unset
(3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector
(4) PCRE2_ERROR_UNSET: all named groups in ovector are unset
*/
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
pcre2_substring_get_byname(pcre2_match_data *match_data,
PCRE2_SPTR stringname, PCRE2_UCHAR **stringptr, PCRE2_SIZE *sizeptr)
{
PCRE2_SPTR first;
PCRE2_SPTR last;
PCRE2_SPTR entry;
int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
PCRE2_SPTR first, last, entry;
int failrc, entrysize;
if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
return PCRE2_ERROR_DFA_UFUNC;
entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
&first, &last);
if (entrysize < 0) return entrysize;
failrc = PCRE2_ERROR_UNAVAILABLE;
for (entry = first; entry <= last; entry += entrysize)
{
uint32_t n = GET2(entry, 0);
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr);
if (n < match_data->oveccount)
{
if (match_data->ovector[n*2] != PCRE2_UNSET)
return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr);
failrc = PCRE2_ERROR_UNSET;
}
}
return PCRE2_ERROR_UNSET;
return failrc;
}
@ -251,19 +265,25 @@ PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
pcre2_substring_length_byname(pcre2_match_data *match_data,
PCRE2_SPTR stringname, PCRE2_SIZE *sizeptr)
{
PCRE2_SPTR first;
PCRE2_SPTR last;
PCRE2_SPTR entry;
int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
PCRE2_SPTR first, last, entry;
int failrc, entrysize;
if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
return PCRE2_ERROR_DFA_UFUNC;
entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
&first, &last);
if (entrysize < 0) return entrysize;
failrc = PCRE2_ERROR_UNAVAILABLE;
for (entry = first; entry <= last; entry += entrysize)
{
uint32_t n = GET2(entry, 0);
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
return pcre2_substring_length_bynumber(match_data, n, sizeptr);
if (n < match_data->oveccount)
{
if (match_data->ovector[n*2] != PCRE2_UNSET)
return pcre2_substring_length_bynumber(match_data, n, sizeptr);
failrc = PCRE2_ERROR_UNSET;
}
}
return PCRE2_ERROR_UNSET;
return failrc;
}
@ -292,13 +312,23 @@ PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
pcre2_substring_length_bynumber(pcre2_match_data *match_data,
uint32_t stringnumber, PCRE2_SIZE *sizeptr)
{
int count;
PCRE2_SIZE left, right;
if (stringnumber > match_data->code->top_bracket)
return PCRE2_ERROR_NOSUBSTRING;
if (stringnumber >= match_data->oveccount)
return PCRE2_ERROR_UNAVAILABLE;
if (match_data->ovector[stringnumber*2] == PCRE2_UNSET)
return PCRE2_ERROR_UNSET;
if ((count = match_data->rc) < 0) return count; /* Match failed */
if (match_data->matchedby != PCRE2_MATCHEDBY_DFA_INTERPRETER)
{
if (stringnumber > match_data->code->top_bracket)
return PCRE2_ERROR_NOSUBSTRING;
if (stringnumber >= match_data->oveccount)
return PCRE2_ERROR_UNAVAILABLE;
if (match_data->ovector[stringnumber*2] == PCRE2_UNSET)
return PCRE2_ERROR_UNSET;
}
else /* Matched using pcre2_dfa_match() */
{
if (stringnumber >= match_data->oveccount) return PCRE2_ERROR_UNAVAILABLE;
if (count != 0 && stringnumber >= (uint32_t)count) return PCRE2_ERROR_UNSET;
}
left = match_data->ovector[stringnumber*2];
right = match_data->ovector[stringnumber*2+1];
if (sizeptr != NULL) *sizeptr = (left > right)? 0 : right - left;

14
testdata/grepoutput vendored
View File

@ -384,15 +384,15 @@ aaaaa2
010203040506
RC=0
======== STDERR ========
pcre2grep: pcre2_match() gave error -46 while matching this text:
pcre2grep: pcre2_match() gave error -47 while matching this text:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
pcre2grep: pcre2_match() gave error -46 while matching this text:
pcre2grep: pcre2_match() gave error -47 while matching this text:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
pcre2grep: Error -45, -46 or -52 means that a resource limit was exceeded.
pcre2grep: Error -46, -47 or -53 means that a resource limit was exceeded.
pcre2grep: Check your regex for nested unlimited loops.
---------------------------- Test 38 ------------------------------
This line contains a binary zero here >< for testing.
@ -510,23 +510,23 @@ In the middle of a line, PATTERN appears.
Check up on PATTERN near the end.
RC=0
---------------------------- Test 62 -----------------------------
pcre2grep: pcre2_match() gave error -46 while matching text that starts:
pcre2grep: pcre2_match() gave error -47 while matching text that starts:
This is a file of miscellaneous text that is used as test data for checking
that the pcregrep command is working correctly. The file must be more than 24K
long so that it needs more than a single read
pcre2grep: Error -45, -46 or -52 means that a resource limit was exceeded.
pcre2grep: Error -46, -47 or -53 means that a resource limit was exceeded.
pcre2grep: Check your regex for nested unlimited loops.
RC=1
---------------------------- Test 63 -----------------------------
pcre2grep: pcre2_match() gave error -52 while matching text that starts:
pcre2grep: pcre2_match() gave error -53 while matching text that starts:
This is a file of miscellaneous text that is used as test data for checking
that the pcregrep command is working correctly. The file must be more than 24K
long so that it needs more than a single read
pcre2grep: Error -45, -46 or -52 means that a resource limit was exceeded.
pcre2grep: Error -46, -47 or -53 means that a resource limit was exceeded.
pcre2grep: Check your regex for nested unlimited loops.
RC=1
---------------------------- Test 64 ------------------------------

6
testdata/testinput2 vendored
View File

@ -4090,5 +4090,11 @@ a random value. /Ix
/x(?=ab\K)/
xab\=get=0
xab\=copy=0
xab\=getall
/(?<A>a)|(?<A>b)/dupnames
a\=ovector=1,copy=A,get=A,get=2
a\=ovector=2,copy=A,get=A,get=2
b\=ovector=2,copy=A,get=A,get=2
# End of testinput2

11
testdata/testinput6 vendored
View File

@ -4797,4 +4797,15 @@
ab
cdab
/(a)(b)|(c)/
XcX\=ovector=2,get=1,get=2,get=3,get=4,getall
/(?<A>aa)/
aa\=get=A
aa\=copy=A
/a+/no_auto_possess
a\=ovector=2,get=1,get=2,getall
aaa\=ovector=2,get=1,get=2,getall
# End of testinput6

24
testdata/testoutput14 vendored
View File

@ -114,11 +114,11 @@ Subject length lower bound = 3
aaaaaaaaaaaaaz
No match
aaaaaaaaaaaaaz\=match_limit=3000
Failed: error -46: match limit exceeded
Failed: error -47: match limit exceeded
/(a+)*zz/
aaaaaaaaaaaaaz\=recursion_limit=10
Failed: error -52: recursion limit exceeded
Failed: error -53: recursion limit exceeded
/(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1
@ -127,9 +127,9 @@ Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
Failed: error -46: match limit exceeded
Failed: error -47: match limit exceeded
aaaaaaaaaaaaaz\=match_limit=60000
Failed: error -46: match limit exceeded
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1
@ -138,7 +138,7 @@ Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
Failed: error -46: match limit exceeded
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(a+)*zz/I
Capturing subpattern count = 1
@ -149,7 +149,7 @@ Subject length lower bound = 2
aaaaaaaaaaaaaz
No match
aaaaaaaaaaaaaz\=match_limit=3000
Failed: error -46: match limit exceeded
Failed: error -47: match limit exceeded
/(*LIMIT_RECURSION=10)(a+)*zz/I
Capturing subpattern count = 1
@ -158,9 +158,9 @@ Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
Failed: error -52: recursion limit exceeded
Failed: error -53: recursion limit exceeded
aaaaaaaaaaaaaz\=recursion_limit=1000
Failed: error -52: recursion limit exceeded
Failed: error -53: recursion limit exceeded
/(*LIMIT_RECURSION=10)(*LIMIT_RECURSION=1000)(a+)*zz/I
Capturing subpattern count = 1
@ -180,21 +180,21 @@ Subject length lower bound = 2
aaaaaaaaaaaaaz
No match
aaaaaaaaaaaaaz\=recursion_limit=10
Failed: error -52: recursion limit exceeded
Failed: error -53: recursion limit exceeded
# These three have infinitely nested recursions.
/((?2))((?1))/
abc
Failed: error -51: nested recursion at the same subject position
Failed: error -52: nested recursion at the same subject position
/((?(R2)a+|(?1)b))/
aaaabcde
Failed: error -51: nested recursion at the same subject position
Failed: error -52: nested recursion at the same subject position
/(?(R)a*(?1)|((?R))b)/
aaaabcde
Failed: error -51: nested recursion at the same subject position
Failed: error -52: nested recursion at the same subject position
# The allusedtext modifier does not work with JIT, which does not maintain
# the leftchar/rightchar data.

30
testdata/testoutput16 vendored
View File

@ -15,7 +15,7 @@ JIT compilation was not successful
/(?(R)a*(?1)|((?R))b)/
aaaabcde
Failed: error -45: JIT stack limit reached
Failed: error -46: JIT stack limit reached
/abcd/I
Capturing subpattern count = 0
@ -64,13 +64,13 @@ No match
abcd
0: abcd (JIT)
ab\=ps
Failed: error -44: bad JIT option
Failed: error -45: bad JIT option
ab\=ph
Failed: error -44: bad JIT option
Failed: error -45: bad JIT option
xyz
No match (JIT)
xyz\=ps
Failed: error -44: bad JIT option
Failed: error -45: bad JIT option
/abcd/jit=2
abcd
@ -84,13 +84,13 @@ No match
/abcd/jit=2,jitfast
abcd
Failed: error -44: bad JIT option
Failed: error -45: bad JIT option
ab\=ps
Partial match: ab (JIT)
ab\=ph
Failed: error -44: bad JIT option
Failed: error -45: bad JIT option
xyz
Failed: error -44: bad JIT option
Failed: error -45: bad JIT option
/abcd/jit=3
abcd
@ -256,7 +256,7 @@ Minimum match limit = 6
aaaaaaaaaaaaaz
No match (JIT)
aaaaaaaaaaaaaz\=match_limit=3000
Failed: error -46: match limit exceeded
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1
@ -266,9 +266,9 @@ Last code unit = 'z'
Subject length lower bound = 2
JIT compilation was successful
aaaaaaaaaaaaaz
Failed: error -46: match limit exceeded
Failed: error -47: match limit exceeded
aaaaaaaaaaaaaz\=match_limit=60000
Failed: error -46: match limit exceeded
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1
@ -278,7 +278,7 @@ Last code unit = 'z'
Subject length lower bound = 2
JIT compilation was successful
aaaaaaaaaaaaaz
Failed: error -46: match limit exceeded
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(a+)*zz/I
Capturing subpattern count = 1
@ -290,21 +290,21 @@ JIT compilation was successful
aaaaaaaaaaaaaz
No match (JIT)
aaaaaaaaaaaaaz\=match_limit=3000
Failed: error -46: match limit exceeded
Failed: error -47: match limit exceeded
# These three have infinitely nested recursions.
/((?2))((?1))/
abc
Failed: error -45: JIT stack limit reached
Failed: error -46: JIT stack limit reached
/((?(R2)a+|(?1)b))/
aaaabcde
Failed: error -45: JIT stack limit reached
Failed: error -46: JIT stack limit reached
/(?(R)a*(?1)|((?R))b)/
aaaabcde
Failed: error -45: JIT stack limit reached
Failed: error -46: JIT stack limit reached
# Invalid options disable JIT when called via pcre2_match(), causing the
# match to happen via the interpreter, but for fast JIT invalid options are

57
testdata/testoutput2 vendored
View File

@ -993,7 +993,7 @@ Subject length lower bound = 4
0: abcd
1: a
2: d
Copy substring 5 failed (-48): unknown substring
Copy substring 5 failed (-49): unknown substring
/(.{20})/I
Capturing subpattern count = 1
@ -1047,9 +1047,9 @@ Subject length lower bound = 4
2: <unset>
3: f
1G a (1)
Get substring 2 failed (-54): requested value is not set
Get substring 2 failed (-55): requested value is not set
3G f (1)
Get substring 4 failed (-48): unknown substring
Get substring 4 failed (-49): unknown substring
0L adef
1L a
2L
@ -1062,7 +1062,7 @@ Get substring 4 failed (-48): unknown substring
1G bc (2)
2G bc (2)
3G f (1)
Get substring 4 failed (-48): unknown substring
Get substring 4 failed (-49): unknown substring
0L bcdef
1L bc
2L bc
@ -4363,7 +4363,7 @@ Subject length lower bound = 8
1: cd
2: gh
Number not found for group 'three'
Copy substring 'three' failed (-48): unknown substring
Copy substring 'three' failed (-49): unknown substring
/(?P<Tes>)(?P<Test>)/IB
------------------------------------------------------------------
@ -5731,7 +5731,7 @@ No match
1: a1
2: a1
Number not found for group 'Z'
Copy substring 'Z' failed (-48): unknown substring
Copy substring 'Z' failed (-49): unknown substring
C a1 (2) A (non-unique)
/(?|(?<a>)(?<b>)(?<a>)|(?<a>)(?<b>)(?<a>))/I,dupnames
@ -5772,7 +5772,7 @@ Subject length lower bound = 2
C a (1) A (non-unique)
cd\=copy=A
0: cd
Copy substring 'A' failed (-54): requested value is not set
Copy substring 'A' failed (-55): requested value is not set
/^(?P<A>a)(?P<A>b)|cd(?P<A>ef)(?P<A>gh)/I,dupnames
Capturing subpattern count = 4
@ -5817,7 +5817,7 @@ No match
1: a1
2: a1
Number not found for group 'Z'
Get substring 'Z' failed (-48): unknown substring
Get substring 'Z' failed (-49): unknown substring
G a1 (2) A (non-unique)
/^(?P<A>a)(?P<A>b)/I,dupnames
@ -5848,7 +5848,7 @@ Subject length lower bound = 2
G a (1) A (non-unique)
cd\=get=A
0: cd
Get substring 'A' failed (-54): requested value is not set
Get substring 'A' failed (-55): requested value is not set
/^(?P<A>a)(?P<A>b)|cd(?P<A>ef)(?P<A>gh)/I,dupnames
Capturing subpattern count = 4
@ -13659,11 +13659,11 @@ Failed: error -35: invalid replacement string
/abc/replace=a$bad
123abc
Failed: error -48: unknown substring
Failed: error -49: unknown substring
/abc/replace=a${A234567890123456789_123456789012}z
123abc
Failed: error -48: unknown substring
Failed: error -49: unknown substring
/abc/replace=a${A23456789012345678901234567890123}z
123abc
@ -13683,7 +13683,7 @@ Failed: error -35: invalid replacement string
/abc/replace=[9]XYZ
123abc123
Failed: error -47: no more memory
Failed: error -48: no more memory
/abc/replace=xyz
1abc2\=partial_hard
@ -13720,10 +13720,10 @@ No match
Matched, but too many substrings
0: c
1: <unset>
Get substring 1 failed (-54): requested value is not set
Get substring 2 failed (-53): requested value is not available
Get substring 3 failed (-53): requested value is not available
Get substring 4 failed (-48): unknown substring
Get substring 1 failed (-55): requested value is not set
Get substring 2 failed (-54): requested value is not available
Get substring 3 failed (-54): requested value is not available
Get substring 4 failed (-49): unknown substring
0L c
1L
@ -13736,5 +13736,30 @@ Start of matched string is beyond its end - displaying from end to start.
Start of matched string is beyond its end - displaying from end to start.
0: ab
0C (0)
xab\=getall
Start of matched string is beyond its end - displaying from end to start.
0: ab
0L
/(?<A>a)|(?<A>b)/dupnames
a\=ovector=1,copy=A,get=A,get=2
Matched, but too many substrings
0: a
Copy substring 'A' failed (-54): requested value is not available
Get substring 2 failed (-54): requested value is not available
Get substring 'A' failed (-54): requested value is not available
a\=ovector=2,copy=A,get=A,get=2
0: a
1: a
C a (1) A (non-unique)
Get substring 2 failed (-54): requested value is not available
G a (1) A (non-unique)
b\=ovector=2,copy=A,get=A,get=2
Matched, but too many substrings
0: b
1: <unset>
Copy substring 'A' failed (-55): requested value is not set
Get substring 2 failed (-54): requested value is not available
Get substring 'A' failed (-55): requested value is not set
# End of testinput2

54
testdata/testoutput6 vendored
View File

@ -6133,7 +6133,7 @@ No match
/^(?(2)a|(1)(2))+$/
123a
Failed: error -40: backreference condition or recursion test not supported for DFA matching
Failed: error -40: backreference condition or recursion test is not supported for DFA matching
/(?<=a|bbbb)c/
ac
@ -7087,7 +7087,7 @@ Partial match: dogs
/abc\K123/
xyzabc123pqr
Failed: error -41: item unsupported for DFA matching
Failed: error -42: pattern contains an item that is not supported for DFA matching
/(?<=abc)123/
xyzabc123pqr
@ -7205,29 +7205,29 @@ No match
/^(?!a(*SKIP)b)/
ac
Failed: error -41: item unsupported for DFA matching
Failed: error -42: pattern contains an item that is not supported for DFA matching
/^(?=a(*SKIP)b|ac)/
** Failers
No match
ac
Failed: error -41: item unsupported for DFA matching
Failed: error -42: pattern contains an item that is not supported for DFA matching
/^(?=a(*THEN)b|ac)/
ac
Failed: error -41: item unsupported for DFA matching
Failed: error -42: pattern contains an item that is not supported for DFA matching
/^(?=a(*PRUNE)b)/
ab
Failed: error -41: item unsupported for DFA matching
Failed: error -42: pattern contains an item that is not supported for DFA matching
** Failers
No match
ac
Failed: error -41: item unsupported for DFA matching
Failed: error -42: pattern contains an item that is not supported for DFA matching
/^(?(?!a(*SKIP)b))/
ac
Failed: error -41: item unsupported for DFA matching
Failed: error -42: pattern contains an item that is not supported for DFA matching
/(?<=abc)def/
abc\=ph
@ -7424,7 +7424,7 @@ No match
/((?2))((?1))/
abc
Failed: error -51: nested recursion at the same subject position
Failed: error -52: nested recursion at the same subject position
/(?(R)a+|(?R)b)/
aaaabcde
@ -7440,11 +7440,11 @@ Failed: error -51: nested recursion at the same subject position
/((?(R2)a+|(?1)b))/
aaaabcde
Failed: error -40: backreference condition or recursion test not supported for DFA matching
Failed: error -40: backreference condition or recursion test is not supported for DFA matching
/(?(R)a*(?1)|((?R))b)/
aaaabcde
Failed: error -51: nested recursion at the same subject position
Failed: error -52: nested recursion at the same subject position
/(a+)/no_auto_possess
aaaa\=ovector=3
@ -7734,4 +7734,36 @@ Failed: error -38: invalid data in workspace for DFA restart
0:
0+ dab
/(a)(b)|(c)/
XcX\=ovector=2,get=1,get=2,get=3,get=4,getall
0: c
Get substring 1 failed (-55): requested value is not set
Get substring 2 failed (-54): requested value is not available
Get substring 3 failed (-54): requested value is not available
Get substring 4 failed (-54): requested value is not available
0L c
/(?<A>aa)/
aa\=get=A
0: aa
Get substring 'A' failed (-41): function is not supported for DFA matching
aa\=copy=A
0: aa
Copy substring 'A' failed (-41): function is not supported for DFA matching
/a+/no_auto_possess
a\=ovector=2,get=1,get=2,getall
0: a
Get substring 1 failed (-55): requested value is not set
Get substring 2 failed (-54): requested value is not available
0L a
aaa\=ovector=2,get=1,get=2,getall
Matched, but offsets vector is too small to show all matches
0: aaa
1: aa
1G aa (2)
Get substring 2 failed (-54): requested value is not available
0L aaa
1L aa
# End of testinput6

View File

@ -1218,7 +1218,7 @@ Partial match: the cat
/ab\Cde/utf
abXde
Failed: error -41: item unsupported for DFA matching
Failed: error -42: pattern contains an item that is not supported for DFA matching
/(?<=ab\Cde)X/utf
Failed: error 136 at offset 10: \C is not allowed in a lookbehind assertion