Update and improve substring handling and its documentation.
This commit is contained in:
parent
a85d15cbd1
commit
cb8865d247
134
doc/pcre2api.3
134
doc/pcre2api.3
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "13 December 2014" "PCRE2 10.00"
|
||||
.TH PCRE2API 3 "14 December 2014" "PCRE2 10.00"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -921,6 +921,16 @@ PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of memory that
|
|||
contains the compiled pattern and related data. The caller must free the memory
|
||||
by calling \fBpcre2_code_free()\fP when it is no longer needed.
|
||||
.P
|
||||
NOTE: When one of the matching functions is called, pointers to the compiled
|
||||
pattern and the subject string are set in the match data block so that they can
|
||||
be referenced by the extraction functions. After running a match, you must not
|
||||
free a compiled pattern (or a subject string) until after all operations on the
|
||||
.\" HTML <a href="#matchdatablock">
|
||||
.\" </a>
|
||||
match data block
|
||||
.\"
|
||||
have taken place.
|
||||
.P
|
||||
If the compile context argument \fIccontext\fP is NULL, memory for the compiled
|
||||
pattern is obtained by calling \fBmalloc()\fP. Otherwise, it is obtained from
|
||||
the same memory function that was used for the compile context.
|
||||
|
@ -1683,7 +1693,7 @@ pattern with the JIT compiler does not alter the value returned by this option.
|
|||
.B void pcre2_match_data_free(pcre2_match_data *\fImatch_data\fP);
|
||||
.fi
|
||||
.P
|
||||
Information about successful and unsuccessful matches is placed in a match
|
||||
Information about a successful or unsuccessful match is placed in a match
|
||||
data block, which is an opaque structure that is accessed by function calls. In
|
||||
particular, the match data block contains a vector of offsets into the subject
|
||||
string that define the matched part of the subject and any substrings that were
|
||||
|
@ -1713,10 +1723,8 @@ memory is obtained using the same allocator that was used for the compiled
|
|||
pattern (custom or default).
|
||||
.P
|
||||
A match data block can be used many times, with the same or different compiled
|
||||
patterns. When it is no longer needed, it should be freed by calling
|
||||
\fBpcre2_match_data_free()\fP. You can extract information from a match data
|
||||
block after a match operation has finished, using functions that are described
|
||||
in the sections on
|
||||
patterns. You can extract information from a match data block after a match
|
||||
operation has finished, using functions that are described in the sections on
|
||||
.\" HTML <a href="#matchedstrings">
|
||||
.\" </a>
|
||||
matched strings
|
||||
|
@ -1727,6 +1735,15 @@ and
|
|||
other match data
|
||||
.\"
|
||||
below.
|
||||
.P
|
||||
When one of the matching functions is called, pointers to the compiled pattern
|
||||
and the subject string are set in the match data block so that they can be
|
||||
referenced by the extraction functions. After running a match, you must not
|
||||
free a compiled pattern or a subject string until after all operations on the
|
||||
match data block (for that match) have taken place.
|
||||
.P
|
||||
When a match data block itself is no longer needed, it should be freed by
|
||||
calling \fBpcre2_match_data_free()\fP.
|
||||
.
|
||||
.
|
||||
.SH "MATCHING A PATTERN: THE TRADITIONAL FUNCTION"
|
||||
|
@ -2053,8 +2070,13 @@ returned value is 3. If there are no capturing subpatterns, the return value
|
|||
from a successful match is 1, indicating that just the first pair of offsets
|
||||
has been set.
|
||||
.P
|
||||
If a capturing subpattern is matched repeatedly within a single match
|
||||
operation, it is the last portion of the string that it matched that is
|
||||
If a pattern uses the \eK escape sequence within a positive assertion, the
|
||||
reported start of the match can be greater than the end of the match. For
|
||||
example, if the pattern (?=ab\eK) is matched against "ab", the start and end
|
||||
offset values for the match are 2 and 0.
|
||||
.P
|
||||
If a capturing subpattern group is matched repeatedly within a single match
|
||||
operation, it is the last portion of the subject that it matched that is
|
||||
returned.
|
||||
.P
|
||||
If the ovector is too small to hold all the captured substring offsets, as much
|
||||
|
@ -2268,23 +2290,31 @@ above.
|
|||
.\"
|
||||
For convenience, auxiliary functions are provided for extracting captured
|
||||
substrings as new, separate, zero-terminated strings. The functions in this
|
||||
section identify substrings by number. The next section describes similar
|
||||
functions for extracting substrings by name. A substring that contains a binary
|
||||
zero is correctly extracted and has a further zero added on the end, but the
|
||||
result is not, of course, a C string.
|
||||
section identify substrings by number. The number zero refers to the entire
|
||||
matched substring, with higher numbers referring to substrings captured by
|
||||
parenthesized groups. The next section describes similar functions for
|
||||
extracting captured substrings by name. A substring that contains a binary zero
|
||||
is correctly extracted and has a further zero added on the end, but the result
|
||||
is not, of course, a C string.
|
||||
.P
|
||||
If a pattern uses the \eK escape sequence within a positive assertion, the
|
||||
reported start of the match can be greater than the end of the match. For
|
||||
example, if the pattern (?=ab\eK) is matched against "ab", the start and end
|
||||
offset values for the match are 2 and 0. In this situation, calling these
|
||||
functions with a zero substring number extracts a zero-length empty string.
|
||||
.P
|
||||
You can find the length in code units of a captured substring without
|
||||
extracting it by calling \fBpcre2_substring_length_bynumber()\fP. The first
|
||||
argument is a pointer to the match data block, the second is the group number,
|
||||
and the third is a pointer to a variable into which the length is placed.
|
||||
and the third is a pointer to a variable into which the length is placed. If
|
||||
you just want to know whether or not the substring has been captured, you can
|
||||
pass the third argument as NULL.
|
||||
.P
|
||||
The \fBpcre2_substring_copy_bynumber()\fP function copies one string into a
|
||||
supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it into
|
||||
new memory, obtained using the same memory allocation function that was used
|
||||
for the match data block. The first two arguments of these functions are a
|
||||
pointer to the match data block and a capturing group number. A group number of
|
||||
zero extracts the substring that matched the entire pattern, and higher values
|
||||
extract the captured substrings.
|
||||
The \fBpcre2_substring_copy_bynumber()\fP function copies a captured substring
|
||||
into a supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it
|
||||
into new memory, obtained using the same memory allocation function that was
|
||||
used for the match data block. The first two arguments of these functions are a
|
||||
pointer to the match data block and a capturing group number.
|
||||
.P
|
||||
The final arguments of \fBpcre2_substring_copy_bynumber()\fP are a pointer to
|
||||
the buffer and a pointer to a variable that contains its length in code units.
|
||||
|
@ -2297,8 +2327,9 @@ of code units that comprise the substring, again excluding the terminating
|
|||
zero. When the substring is no longer needed, the memory should be freed by
|
||||
calling \fBpcre2_substring_free()\fP.
|
||||
.P
|
||||
The return value from these functions is zero for success, or one of these
|
||||
error codes:
|
||||
The return value from all these functions is zero for success, or a negative
|
||||
error code. If the pattern match failed, the match failure code is returned.
|
||||
Other possible error codes are:
|
||||
.sp
|
||||
PCRE2_ERROR_NOMEMORY
|
||||
.sp
|
||||
|
@ -2319,7 +2350,8 @@ could not be captured.
|
|||
PCRE2_ERROR_UNSET
|
||||
.sp
|
||||
The substring did not participate in the match. For example, if the pattern is
|
||||
(abc)|(def) and the subject is "def", substring number 1 is unset.
|
||||
(abc)|(def) and the subject is "def", and the ovector contains at least two
|
||||
capturing slots, substring number 1 is unset.
|
||||
.
|
||||
.
|
||||
.SH "EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS"
|
||||
|
@ -2388,15 +2420,20 @@ calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
|
|||
compiled pattern, and the second is the name. The yield of the function is the
|
||||
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
|
||||
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
|
||||
that name.
|
||||
that name. Given the number, you can extract the substring directly, or use one
|
||||
of the functions described above.
|
||||
.P
|
||||
Given the number, you can extract the substring directly, or use one of the
|
||||
functions described above. For convenience, there are also "byname" functions
|
||||
that correspond to the "bynumber" functions, the only difference being that the
|
||||
second argument is a name instead of a number. If PCRE2_DUPNAMES is
|
||||
set and there are duplicate names, these functions return the first named
|
||||
string that is set. PCRE2_ERROR_UNSET is returned only if all groups of the
|
||||
same name are unset.
|
||||
For convenience, there are also "byname" functions that correspond to the
|
||||
"bynumber" functions, the only difference being that the second argument is a
|
||||
name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
|
||||
names, these functions scan all the groups with the given name, and return the
|
||||
first named string that is set.
|
||||
.P
|
||||
If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
|
||||
returned. If all groups with the name have numbers that are greater than the
|
||||
number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there
|
||||
is at least one group with a slot in the ovector, but no group is found to be
|
||||
set, PCRE2_ERROR_UNSET is returned.
|
||||
.P
|
||||
\fBWarning:\fP If the pattern uses the (?| feature to set up multiple
|
||||
subpatterns with the same number, as described in the
|
||||
|
@ -2660,17 +2697,36 @@ is matched against the string
|
|||
.sp
|
||||
the three matched strings are
|
||||
.sp
|
||||
<something>
|
||||
<something> <something else>
|
||||
<something> <something else> <something further>
|
||||
<something> <something else>
|
||||
<something>
|
||||
.sp
|
||||
On success, the yield of the function is a number greater than zero, which is
|
||||
the number of matched substrings. The offsets of the substrings are returned in
|
||||
the ovector, and can be extracted in the same way as for \fBpcre2_match()\fP.
|
||||
They are returned in reverse order of length; that is, the longest
|
||||
matching string is given first. If there were too many matches to fit into
|
||||
the ovector, the yield of the function is zero, and the vector is filled with
|
||||
the longest matches.
|
||||
the ovector, and can be extracted by number in the same way as for
|
||||
\fBpcre2_match()\fP, but the numbers bear no relation to any capturing groups
|
||||
that may exist in the pattern, because DFA matching does not support group
|
||||
capture.
|
||||
.P
|
||||
Calls to the convenience functions that extract substrings by name
|
||||
return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a
|
||||
DFA match. The convenience functions that extract substrings by number never
|
||||
return PCRE2_ERROR_NOSUBSTRING, and the meanings of some other errors are
|
||||
slightly different:
|
||||
.sp
|
||||
PCRE2_ERROR_UNAVAILABLE
|
||||
.sp
|
||||
The ovector is not big enough to include a slot for the given substring number.
|
||||
.sp
|
||||
PCRE2_ERROR_UNSET
|
||||
.sp
|
||||
There is a slot in the ovector for this substring, but there were insufficient
|
||||
matches to fill it.
|
||||
.P
|
||||
The matched strings are stored in the ovector in reverse order of length; that
|
||||
is, the longest matching string is first. If there were too many matches to fit
|
||||
into the ovector, the yield of the function is zero, and the vector is filled
|
||||
with the longest matches.
|
||||
.P
|
||||
NOTE: PCRE2's "auto-possessification" optimization usually applies to character
|
||||
repeats at the end of a pattern (as well as internally). For example, the
|
||||
|
@ -2746,6 +2802,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 13 December 2014
|
||||
Last updated: 14 December 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -212,20 +212,21 @@ context functions. */
|
|||
#define PCRE2_ERROR_DFA_BADRESTART (-38)
|
||||
#define PCRE2_ERROR_DFA_RECURSE (-39)
|
||||
#define PCRE2_ERROR_DFA_UCOND (-40)
|
||||
#define PCRE2_ERROR_DFA_UITEM (-41)
|
||||
#define PCRE2_ERROR_DFA_WSSIZE (-42)
|
||||
#define PCRE2_ERROR_INTERNAL (-43)
|
||||
#define PCRE2_ERROR_JIT_BADOPTION (-44)
|
||||
#define PCRE2_ERROR_JIT_STACKLIMIT (-45)
|
||||
#define PCRE2_ERROR_MATCHLIMIT (-46)
|
||||
#define PCRE2_ERROR_NOMEMORY (-47)
|
||||
#define PCRE2_ERROR_NOSUBSTRING (-48)
|
||||
#define PCRE2_ERROR_NOUNIQUESUBSTRING (-49)
|
||||
#define PCRE2_ERROR_NULL (-50)
|
||||
#define PCRE2_ERROR_RECURSELOOP (-51)
|
||||
#define PCRE2_ERROR_RECURSIONLIMIT (-52)
|
||||
#define PCRE2_ERROR_UNAVAILABLE (-53)
|
||||
#define PCRE2_ERROR_UNSET (-54)
|
||||
#define PCRE2_ERROR_DFA_UFUNC (-41)
|
||||
#define PCRE2_ERROR_DFA_UITEM (-42)
|
||||
#define PCRE2_ERROR_DFA_WSSIZE (-43)
|
||||
#define PCRE2_ERROR_INTERNAL (-44)
|
||||
#define PCRE2_ERROR_JIT_BADOPTION (-45)
|
||||
#define PCRE2_ERROR_JIT_STACKLIMIT (-46)
|
||||
#define PCRE2_ERROR_MATCHLIMIT (-47)
|
||||
#define PCRE2_ERROR_NOMEMORY (-48)
|
||||
#define PCRE2_ERROR_NOSUBSTRING (-49)
|
||||
#define PCRE2_ERROR_NOUNIQUESUBSTRING (-50)
|
||||
#define PCRE2_ERROR_NULL (-51)
|
||||
#define PCRE2_ERROR_RECURSELOOP (-52)
|
||||
#define PCRE2_ERROR_RECURSIONLIMIT (-53)
|
||||
#define PCRE2_ERROR_UNAVAILABLE (-54)
|
||||
#define PCRE2_ERROR_UNSET (-55)
|
||||
|
||||
/* Request types for pcre2_pattern_info() */
|
||||
|
||||
|
|
|
@ -3275,6 +3275,12 @@ if ((re->flags & PCRE2_LASTSET) != 0)
|
|||
}
|
||||
}
|
||||
|
||||
/* Fill in fields that are always returned in the match data. */
|
||||
|
||||
match_data->code = re;
|
||||
match_data->subject = subject;
|
||||
match_data->mark = NULL;
|
||||
match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER;
|
||||
|
||||
/* Call the main matching function, looping for a non-anchored regex after a
|
||||
failed match. If not restarting, perform certain optimizations at the start of
|
||||
|
|
|
@ -212,18 +212,19 @@ static const char match_error_texts[] =
|
|||
"invalid data in workspace for DFA restart\0"
|
||||
"too much recursion for DFA matching\0"
|
||||
/* 40 */
|
||||
"backreference condition or recursion test not supported for DFA matching\0"
|
||||
"item unsupported for DFA matching\0"
|
||||
"backreference condition or recursion test is not supported for DFA matching\0"
|
||||
"function is not supported for DFA matching\0"
|
||||
"pattern contains an item that is not supported for DFA matching\0"
|
||||
"workspace size exceeded in DFA matching\0"
|
||||
"internal error - pattern overwritten?\0"
|
||||
"bad JIT option\0"
|
||||
/* 45 */
|
||||
"bad JIT option\0"
|
||||
"JIT stack limit reached\0"
|
||||
"match limit exceeded\0"
|
||||
"no more memory\0"
|
||||
"unknown substring\0"
|
||||
"non-unique substring name\0"
|
||||
/* 50 */
|
||||
"non-unique substring name\0"
|
||||
"NULL argument passed\0"
|
||||
"nested recursion at the same subject position\0"
|
||||
"recursion limit exceeded\0"
|
||||
|
|
|
@ -526,15 +526,16 @@ bytes in a code unit in that mode. */
|
|||
|
||||
#define PCRE2_MODE_MASK (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32)
|
||||
|
||||
/* Values for the matchedby field in a match data block. */
|
||||
|
||||
enum { PCRE2_MATCHEDBY_INTERPRETER, /* pcre2_match() */
|
||||
PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */
|
||||
PCRE2_MATCHEDBY_JIT }; /* pcre2_jit_match() */
|
||||
|
||||
/* Magic number to provide a small check against being handed junk. */
|
||||
|
||||
#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */
|
||||
|
||||
/* This value is used to detect a loaded regular expression in different
|
||||
endianness. */
|
||||
|
||||
#define REVERSED_MAGIC_NUMBER 0x45524350UL /* 'ERCP' */
|
||||
|
||||
/* The maximum remaining length of subject we are prepared to search for a
|
||||
req_unit match. */
|
||||
|
||||
|
|
|
@ -616,12 +616,13 @@ typedef struct pcre2_real_match_data {
|
|||
pcre2_memctl memctl;
|
||||
const pcre2_real_code *code; /* The pattern used for the match */
|
||||
PCRE2_SPTR subject; /* The subject that was matched */
|
||||
int rc; /* The return code from the match */
|
||||
PCRE2_SPTR mark; /* Pointer to last mark */
|
||||
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
|
||||
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
|
||||
PCRE2_SIZE startchar; /* Offset to starting code unit */
|
||||
PCRE2_SPTR mark; /* Pointer to last mark */
|
||||
uint16_t matchedby; /* Type of match (normal, JIT, DFA) */
|
||||
uint16_t oveccount; /* Number of pairs */
|
||||
int rc; /* The return code from the match */
|
||||
PCRE2_SIZE ovector[1]; /* The first field */
|
||||
} pcre2_real_match_data;
|
||||
|
||||
|
|
|
@ -180,6 +180,7 @@ match_data->startchar = arguments.startchar_ptr - subject;
|
|||
match_data->leftchar = 0;
|
||||
match_data->rightchar = 0;
|
||||
match_data->mark = arguments.mark_ptr;
|
||||
match_data->matchedby = PCRE2_MATCHEDBY_JIT;
|
||||
|
||||
return match_data->rc;
|
||||
|
||||
|
|
|
@ -6995,6 +6995,7 @@ while (mb->ovecsave_chain != NULL)
|
|||
match_data->code = re;
|
||||
match_data->subject = subject;
|
||||
match_data->mark = mb->mark;
|
||||
match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER;
|
||||
|
||||
/* Handle a fully successful match. */
|
||||
|
||||
|
@ -7026,14 +7027,15 @@ if (rc == MATCH_MATCH || rc == MATCH_ACCEPT)
|
|||
match_data->rc = ((mb->capture_last & OVFLBIT) != 0)?
|
||||
0 : mb->end_offset_top/2;
|
||||
|
||||
/* If there is space in the offset vector, set any unused pairs at the end to
|
||||
PCRE2_UNSET for backwards compatibility. It is documented that this happens.
|
||||
In earlier versions, the whole set of potential capturing offsets was
|
||||
initialized each time round the loop, but this is handled differently now.
|
||||
"Gaps" are set to PCRE2_UNSET dynamically instead (this fixes a bug). Thus,
|
||||
it is only those at the end that need setting here. We can't just set them
|
||||
all at the start of the whole thing because they may get set in one branch
|
||||
that is not the final matching branch. */
|
||||
/* If there is space in the offset vector, set any pairs that follow the
|
||||
highest-numbered captured string but are less than the number of capturing
|
||||
groups in the pattern (and are within the ovector) to PCRE2_UNSET. It is
|
||||
documented that this happens. In earlier versions, the whole set of potential
|
||||
capturing offsets was initialized each time round the loop, but this is
|
||||
handled differently now. "Gaps" are set to PCRE2_UNSET dynamically instead
|
||||
(this fixed a bug). Thus, it is only those at the end that need setting here.
|
||||
We can't just mark them all unset at the start of the whole thing because
|
||||
they may get set in one branch that is not the final matching branch. */
|
||||
|
||||
if (mb->end_offset_top/2 <= re->top_bracket)
|
||||
{
|
||||
|
|
|
@ -64,27 +64,34 @@ Arguments:
|
|||
Returns: if successful: zero
|
||||
if not successful, a negative error code:
|
||||
(1) an error from nametable_scan()
|
||||
(2) an error from copy_bynumber()
|
||||
(3) PCRE2_ERROR_UNSET: all named groups are unset
|
||||
(2) an error from copy_bynumber()
|
||||
(3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector
|
||||
(4) PCRE2_ERROR_UNSET: all named groups in ovector are unset
|
||||
*/
|
||||
|
||||
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
||||
pcre2_substring_copy_byname(pcre2_match_data *match_data, PCRE2_SPTR stringname,
|
||||
PCRE2_UCHAR *buffer, PCRE2_SIZE *sizeptr)
|
||||
{
|
||||
PCRE2_SPTR first;
|
||||
PCRE2_SPTR last;
|
||||
PCRE2_SPTR entry;
|
||||
int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
|
||||
PCRE2_SPTR first, last, entry;
|
||||
int failrc, entrysize;
|
||||
if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
|
||||
return PCRE2_ERROR_DFA_UFUNC;
|
||||
entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
|
||||
&first, &last);
|
||||
if (entrysize < 0) return entrysize;
|
||||
failrc = PCRE2_ERROR_UNAVAILABLE;
|
||||
for (entry = first; entry <= last; entry += entrysize)
|
||||
{
|
||||
uint32_t n = GET2(entry, 0);
|
||||
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
|
||||
return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr);
|
||||
if (n < match_data->oveccount)
|
||||
{
|
||||
if (match_data->ovector[n*2] != PCRE2_UNSET)
|
||||
return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr);
|
||||
failrc = PCRE2_ERROR_UNSET;
|
||||
}
|
||||
}
|
||||
return PCRE2_ERROR_UNSET;
|
||||
return failrc;
|
||||
}
|
||||
|
||||
|
||||
|
@ -146,26 +153,33 @@ Returns: if successful: zero
|
|||
if not successful, a negative value:
|
||||
(1) an error from nametable_scan()
|
||||
(2) an error from get_bynumber()
|
||||
(3) PCRE2_ERROR_UNSET: all named groups are unset
|
||||
(3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector
|
||||
(4) PCRE2_ERROR_UNSET: all named groups in ovector are unset
|
||||
*/
|
||||
|
||||
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
||||
pcre2_substring_get_byname(pcre2_match_data *match_data,
|
||||
PCRE2_SPTR stringname, PCRE2_UCHAR **stringptr, PCRE2_SIZE *sizeptr)
|
||||
{
|
||||
PCRE2_SPTR first;
|
||||
PCRE2_SPTR last;
|
||||
PCRE2_SPTR entry;
|
||||
int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
|
||||
PCRE2_SPTR first, last, entry;
|
||||
int failrc, entrysize;
|
||||
if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
|
||||
return PCRE2_ERROR_DFA_UFUNC;
|
||||
entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
|
||||
&first, &last);
|
||||
if (entrysize < 0) return entrysize;
|
||||
failrc = PCRE2_ERROR_UNAVAILABLE;
|
||||
for (entry = first; entry <= last; entry += entrysize)
|
||||
{
|
||||
uint32_t n = GET2(entry, 0);
|
||||
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
|
||||
return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr);
|
||||
if (n < match_data->oveccount)
|
||||
{
|
||||
if (match_data->ovector[n*2] != PCRE2_UNSET)
|
||||
return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr);
|
||||
failrc = PCRE2_ERROR_UNSET;
|
||||
}
|
||||
}
|
||||
return PCRE2_ERROR_UNSET;
|
||||
return failrc;
|
||||
}
|
||||
|
||||
|
||||
|
@ -251,19 +265,25 @@ PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
|||
pcre2_substring_length_byname(pcre2_match_data *match_data,
|
||||
PCRE2_SPTR stringname, PCRE2_SIZE *sizeptr)
|
||||
{
|
||||
PCRE2_SPTR first;
|
||||
PCRE2_SPTR last;
|
||||
PCRE2_SPTR entry;
|
||||
int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
|
||||
PCRE2_SPTR first, last, entry;
|
||||
int failrc, entrysize;
|
||||
if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER)
|
||||
return PCRE2_ERROR_DFA_UFUNC;
|
||||
entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
|
||||
&first, &last);
|
||||
if (entrysize < 0) return entrysize;
|
||||
failrc = PCRE2_ERROR_UNAVAILABLE;
|
||||
for (entry = first; entry <= last; entry += entrysize)
|
||||
{
|
||||
uint32_t n = GET2(entry, 0);
|
||||
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
|
||||
return pcre2_substring_length_bynumber(match_data, n, sizeptr);
|
||||
if (n < match_data->oveccount)
|
||||
{
|
||||
if (match_data->ovector[n*2] != PCRE2_UNSET)
|
||||
return pcre2_substring_length_bynumber(match_data, n, sizeptr);
|
||||
failrc = PCRE2_ERROR_UNSET;
|
||||
}
|
||||
}
|
||||
return PCRE2_ERROR_UNSET;
|
||||
return failrc;
|
||||
}
|
||||
|
||||
|
||||
|
@ -292,13 +312,23 @@ PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
|||
pcre2_substring_length_bynumber(pcre2_match_data *match_data,
|
||||
uint32_t stringnumber, PCRE2_SIZE *sizeptr)
|
||||
{
|
||||
int count;
|
||||
PCRE2_SIZE left, right;
|
||||
if (stringnumber > match_data->code->top_bracket)
|
||||
return PCRE2_ERROR_NOSUBSTRING;
|
||||
if (stringnumber >= match_data->oveccount)
|
||||
return PCRE2_ERROR_UNAVAILABLE;
|
||||
if (match_data->ovector[stringnumber*2] == PCRE2_UNSET)
|
||||
return PCRE2_ERROR_UNSET;
|
||||
if ((count = match_data->rc) < 0) return count; /* Match failed */
|
||||
if (match_data->matchedby != PCRE2_MATCHEDBY_DFA_INTERPRETER)
|
||||
{
|
||||
if (stringnumber > match_data->code->top_bracket)
|
||||
return PCRE2_ERROR_NOSUBSTRING;
|
||||
if (stringnumber >= match_data->oveccount)
|
||||
return PCRE2_ERROR_UNAVAILABLE;
|
||||
if (match_data->ovector[stringnumber*2] == PCRE2_UNSET)
|
||||
return PCRE2_ERROR_UNSET;
|
||||
}
|
||||
else /* Matched using pcre2_dfa_match() */
|
||||
{
|
||||
if (stringnumber >= match_data->oveccount) return PCRE2_ERROR_UNAVAILABLE;
|
||||
if (count != 0 && stringnumber >= (uint32_t)count) return PCRE2_ERROR_UNSET;
|
||||
}
|
||||
left = match_data->ovector[stringnumber*2];
|
||||
right = match_data->ovector[stringnumber*2+1];
|
||||
if (sizeptr != NULL) *sizeptr = (left > right)? 0 : right - left;
|
||||
|
|
|
@ -384,15 +384,15 @@ aaaaa2
|
|||
010203040506
|
||||
RC=0
|
||||
======== STDERR ========
|
||||
pcre2grep: pcre2_match() gave error -46 while matching this text:
|
||||
pcre2grep: pcre2_match() gave error -47 while matching this text:
|
||||
|
||||
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
|
||||
|
||||
pcre2grep: pcre2_match() gave error -46 while matching this text:
|
||||
pcre2grep: pcre2_match() gave error -47 while matching this text:
|
||||
|
||||
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
|
||||
|
||||
pcre2grep: Error -45, -46 or -52 means that a resource limit was exceeded.
|
||||
pcre2grep: Error -46, -47 or -53 means that a resource limit was exceeded.
|
||||
pcre2grep: Check your regex for nested unlimited loops.
|
||||
---------------------------- Test 38 ------------------------------
|
||||
This line contains a binary zero here > |