Improve interfaces to substring functions, and fix bugs.
This commit is contained in:
parent
45b4ec3f8d
commit
a85d15cbd1
|
@ -24,8 +24,16 @@ by name, into a given buffer. The arguments are:
|
||||||
.sp
|
.sp
|
||||||
The \fIbufflen\fP variable is updated to contain the length of the extracted
|
The \fIbufflen\fP variable is updated to contain the length of the extracted
|
||||||
string, excluding the trailing zero. The yield of the function is zero for
|
string, excluding the trailing zero. The yield of the function is zero for
|
||||||
success, PCRE2_ERROR_NOMEMORY if the buffer is too small, or
|
success or one of the following error numbers:
|
||||||
PCRE2_ERROR_NOSUBSTRING if the string name is invalid.
|
.sp
|
||||||
|
PCRE2_ERROR_NOSUBSTRING there are no groups of that name
|
||||||
|
PCRE2_ERROR_UNAVAILBLE the ovector was too small for that group
|
||||||
|
PCRE2_ERROR_UNSET the group did not participate in the match
|
||||||
|
PCRE2_ERROR_NOMEMORY the buffer is not big enough
|
||||||
|
.sp
|
||||||
|
If there is more than one group with the given name, the first one that is set
|
||||||
|
is returned. In this situation PCRE2_ERROR_UNSET means that no group with the
|
||||||
|
given name was set.
|
||||||
.P
|
.P
|
||||||
There is a complete description of the PCRE2 native API in the
|
There is a complete description of the PCRE2 native API in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_SUBSTRING_COPY_BYNUMBER 3 "01 December 2014" "PCRE2 10.00"
|
.TH PCRE2_SUBSTRING_COPY_BYNUMBER 3 "13 December 2014" "PCRE2 10.00"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -24,9 +24,14 @@ buffer. The arguments are:
|
||||||
\fIbufflen\fP Length of buffer
|
\fIbufflen\fP Length of buffer
|
||||||
.sp
|
.sp
|
||||||
The \fIbufflen\fP variable is updated with the length of the extracted string,
|
The \fIbufflen\fP variable is updated with the length of the extracted string,
|
||||||
excluding the terminating zero. The yield of the function is zero for success,
|
excluding the terminating zero. The yield of the function is zero for success
|
||||||
PCRE2_ERROR_NOMEMORY if the buffer was too small, or PCRE2_ERROR_NOSUBSTRING if
|
or one of the following error numbers:
|
||||||
the string number is invalid.
|
.sp
|
||||||
|
PCRE2_ERROR_NOSUBSTRING there are no groups of that number
|
||||||
|
PCRE2_ERROR_UNAVAILBLE the ovector was too small for that group
|
||||||
|
PCRE2_ERROR_UNSET the group did not participate in the match
|
||||||
|
PCRE2_ERROR_NOMEMORY the buffer is too small
|
||||||
|
.sp
|
||||||
.P
|
.P
|
||||||
There is a complete description of the PCRE2 native API in the
|
There is a complete description of the PCRE2 native API in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
|
|
|
@ -25,9 +25,17 @@ newly acquired memory. The arguments are:
|
||||||
The memory in which the substring is placed is obtained by calling the same
|
The memory in which the substring is placed is obtained by calling the same
|
||||||
memory allocation function that was used for the match data block. The
|
memory allocation function that was used for the match data block. The
|
||||||
convenience function \fBpcre2_substring_free()\fP can be used to free it when
|
convenience function \fBpcre2_substring_free()\fP can be used to free it when
|
||||||
it is no longer needed. The yield of the function is zero for success,
|
it is no longer needed. The yield of the function is zero for success or one of
|
||||||
PCRE2_ERROR_NOMEMORY if sufficient memory could not be obtained, or
|
the following error numbers:
|
||||||
PCRE2_ERROR_NOSUBSTRING if the string name is invalid.
|
.sp
|
||||||
|
PCRE2_ERROR_NOSUBSTRING there are no groups of that name
|
||||||
|
PCRE2_ERROR_UNAVAILBLE the ovector was too small for that group
|
||||||
|
PCRE2_ERROR_UNSET the group did not participate in the match
|
||||||
|
PCRE2_ERROR_NOMEMORY memory could not be obtained
|
||||||
|
.sp
|
||||||
|
If there is more than one group with the given name, the first one that is set
|
||||||
|
is returned. In this situation PCRE2_ERROR_UNSET means that no group with the
|
||||||
|
given name was set.
|
||||||
.P
|
.P
|
||||||
There is a complete description of the PCRE2 native API in the
|
There is a complete description of the PCRE2 native API in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_SUBSTRING_GET_BYNUMBER 3 "01 December 2014" "PCRE2 10.00"
|
.TH PCRE2_SUBSTRING_GET_BYNUMBER 3 "13 December 2014" "PCRE2 10.00"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -25,9 +25,14 @@ into newly acquired memory. The arguments are:
|
||||||
The memory in which the substring is placed is obtained by calling the same
|
The memory in which the substring is placed is obtained by calling the same
|
||||||
memory allocation function that was used for the match data block. The
|
memory allocation function that was used for the match data block. The
|
||||||
convenience function \fBpcre2_substring_free()\fP can be used to free it when
|
convenience function \fBpcre2_substring_free()\fP can be used to free it when
|
||||||
it is no longer needed. The yield of the function is zero for success,
|
it is no longer needed. The yield of the function is zero for success or one of
|
||||||
PCRE2_ERROR_NOMEMORY if sufficient memory could not be obtained, or
|
the following error numbers:
|
||||||
PCRE2_ERROR_NOSUBSTRING if the string number is invalid.
|
.sp
|
||||||
|
PCRE2_ERROR_NOSUBSTRING there are no groups of that number
|
||||||
|
PCRE2_ERROR_UNAVAILBLE the ovector was too small for that group
|
||||||
|
PCRE2_ERROR_UNSET the group did not participate in the match
|
||||||
|
PCRE2_ERROR_NOMEMORY memory could not be obtained
|
||||||
|
.sp
|
||||||
.P
|
.P
|
||||||
There is a complete description of the PCRE2 native API in the
|
There is a complete description of the PCRE2 native API in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "01 December 2014" "PCRE2 10.00"
|
.TH PCRE2API 3 "13 December 2014" "PCRE2 10.00"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -2307,10 +2307,19 @@ attempt to get memory failed for \fBpcre2_substring_get_bynumber()\fP.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_NOSUBSTRING
|
PCRE2_ERROR_NOSUBSTRING
|
||||||
.sp
|
.sp
|
||||||
No substring with the given number was captured. This could be because there is
|
There is no substring with that number in the pattern, that is, the number is
|
||||||
no capturing group of that number in the pattern, or because the group with
|
greater than the number of capturing parentheses.
|
||||||
that number did not participate in the match, or because the ovector was too
|
.sp
|
||||||
small to capture that group.
|
PCRE2_ERROR_UNAVAILABLE
|
||||||
|
.sp
|
||||||
|
The substring number, though not greater than the number of captures in the
|
||||||
|
pattern, is greater than the number of slots in the ovector, so the substring
|
||||||
|
could not be captured.
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_UNSET
|
||||||
|
.sp
|
||||||
|
The substring did not participate in the match. For example, if the pattern is
|
||||||
|
(abc)|(def) and the subject is "def", substring number 1 is unset.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS"
|
.SH "EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS"
|
||||||
|
@ -2345,7 +2354,7 @@ capturing subpattern number \fIn+1\fP matches some part of the subject, but
|
||||||
subpattern \fIn\fP has not been used at all, it returns an empty string. This
|
subpattern \fIn\fP has not been used at all, it returns an empty string. This
|
||||||
can be distinguished from a genuine zero-length substring by inspecting the
|
can be distinguished from a genuine zero-length substring by inspecting the
|
||||||
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
||||||
substrings.
|
substrings, or by calling \fBpcre2_substring_length_bynumber()\fP.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="extractbyname"></a>
|
.\" HTML <a name="extractbyname"></a>
|
||||||
|
@ -2384,8 +2393,10 @@ that name.
|
||||||
Given the number, you can extract the substring directly, or use one of the
|
Given the number, you can extract the substring directly, or use one of the
|
||||||
functions described above. For convenience, there are also "byname" functions
|
functions described above. For convenience, there are also "byname" functions
|
||||||
that correspond to the "bynumber" functions, the only difference being that the
|
that correspond to the "bynumber" functions, the only difference being that the
|
||||||
second argument is a name instead of a number. However, if PCRE2_DUPNAMES is
|
second argument is a name instead of a number. If PCRE2_DUPNAMES is
|
||||||
set and there are duplicate names, the behaviour may not be what you want.
|
set and there are duplicate names, these functions return the first named
|
||||||
|
string that is set. PCRE2_ERROR_UNSET is returned only if all groups of the
|
||||||
|
same name are unset.
|
||||||
.P
|
.P
|
||||||
\fBWarning:\fP If the pattern uses the (?| feature to set up multiple
|
\fBWarning:\fP If the pattern uses the (?| feature to set up multiple
|
||||||
subpatterns with the same number, as described in the
|
subpatterns with the same number, as described in the
|
||||||
|
@ -2485,9 +2496,9 @@ documentation.
|
||||||
.P
|
.P
|
||||||
When duplicates are present, \fBpcre2_substring_copy_byname()\fP and
|
When duplicates are present, \fBpcre2_substring_copy_byname()\fP and
|
||||||
\fBpcre2_substring_get_byname()\fP return the first substring corresponding to
|
\fBpcre2_substring_get_byname()\fP return the first substring corresponding to
|
||||||
the given name that is set. If none are set, PCRE2_ERROR_NOSUBSTRING is
|
the given name that is set. Only if none are set is PCRE2_ERROR_UNSET is
|
||||||
returned. The \fBpcre2_substring_number_from_name()\fP function returns
|
returned. The \fBpcre2_substring_number_from_name()\fP function returns the
|
||||||
the error PCRE2_ERROR_NOUNIQUESUBSTRING.
|
error PCRE2_ERROR_NOUNIQUESUBSTRING when there are duplicate names.
|
||||||
.P
|
.P
|
||||||
If you want to get full details of all captured substrings for a given name,
|
If you want to get full details of all captured substrings for a given name,
|
||||||
you must use the \fBpcre2_substring_nametable_scan()\fP function. The first
|
you must use the \fBpcre2_substring_nametable_scan()\fP function. The first
|
||||||
|
@ -2735,6 +2746,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 01 December 2014
|
Last updated: 13 December 2014
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2014 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -80,20 +80,20 @@ uint8_t, UCHAR_MAX, etc are defined. */
|
||||||
extern "C" {
|
extern "C" {
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* The following options can be passed to pcre2_compile(), pcre2_match(), or
|
/* The following option bits can be passed to pcre2_compile(), pcre2_match(),
|
||||||
pcre2_dfa_match(). PCRE2_NO_UTF_CHECK affects only the function to which it is
|
or pcre2_dfa_match(). PCRE2_NO_UTF_CHECK affects only the function to which it
|
||||||
passed. Put these bits at the most significant end of the options word so
|
is passed. Put these bits at the most significant end of the options word so
|
||||||
others can be added next to them */
|
others can be added next to them */
|
||||||
|
|
||||||
#define PCRE2_ANCHORED 0x80000000u
|
#define PCRE2_ANCHORED 0x80000000u
|
||||||
#define PCRE2_NO_UTF_CHECK 0x40000000u
|
#define PCRE2_NO_UTF_CHECK 0x40000000u
|
||||||
|
|
||||||
/* Other options that can be passed to pcre2_compile(). They may affect
|
/* The following option bits can be passed only to pcre2_compile(). However,
|
||||||
compilation, JIT compilation, and/or interpretive execution. The following tags
|
they may affect compilation, JIT compilation, and/or interpretive execution.
|
||||||
indicate which:
|
The following tags indicate which:
|
||||||
|
|
||||||
C alters what is compiled
|
C alters what is compiled by pcre2_compile()
|
||||||
J alters what JIT compiles
|
J alters what is compiled by pcre2_jit_compile()
|
||||||
M is inspected during pcre2_match() execution
|
M is inspected during pcre2_match() execution
|
||||||
D is inspected during pcre2_dfa_match() execution
|
D is inspected during pcre2_dfa_match() execution
|
||||||
*/
|
*/
|
||||||
|
@ -224,7 +224,8 @@ context functions. */
|
||||||
#define PCRE2_ERROR_NULL (-50)
|
#define PCRE2_ERROR_NULL (-50)
|
||||||
#define PCRE2_ERROR_RECURSELOOP (-51)
|
#define PCRE2_ERROR_RECURSELOOP (-51)
|
||||||
#define PCRE2_ERROR_RECURSIONLIMIT (-52)
|
#define PCRE2_ERROR_RECURSIONLIMIT (-52)
|
||||||
#define PCRE2_ERROR_UNSET (-53)
|
#define PCRE2_ERROR_UNAVAILABLE (-53)
|
||||||
|
#define PCRE2_ERROR_UNSET (-54)
|
||||||
|
|
||||||
/* Request types for pcre2_pattern_info() */
|
/* Request types for pcre2_pattern_info() */
|
||||||
|
|
||||||
|
|
|
@ -221,12 +221,13 @@ static const char match_error_texts[] =
|
||||||
"JIT stack limit reached\0"
|
"JIT stack limit reached\0"
|
||||||
"match limit exceeded\0"
|
"match limit exceeded\0"
|
||||||
"no more memory\0"
|
"no more memory\0"
|
||||||
"unknown or unset substring\0"
|
"unknown substring\0"
|
||||||
"non-unique substring name\0"
|
"non-unique substring name\0"
|
||||||
/* 50 */
|
/* 50 */
|
||||||
"NULL argument passed\0"
|
"NULL argument passed\0"
|
||||||
"nested recursion at the same subject position\0"
|
"nested recursion at the same subject position\0"
|
||||||
"recursion limit exceeded\0"
|
"recursion limit exceeded\0"
|
||||||
|
"requested value is not available\0"
|
||||||
"requested value is not set\0"
|
"requested value is not set\0"
|
||||||
;
|
;
|
||||||
|
|
||||||
|
|
|
@ -7023,8 +7023,7 @@ if (rc == MATCH_MATCH || rc == MATCH_ACCEPT)
|
||||||
/* Set the return code to the number of captured strings, or 0 if there were
|
/* Set the return code to the number of captured strings, or 0 if there were
|
||||||
too many to fit into the ovector. */
|
too many to fit into the ovector. */
|
||||||
|
|
||||||
match_data->rc = ((mb->capture_last & OVFLBIT) != 0 &&
|
match_data->rc = ((mb->capture_last & OVFLBIT) != 0)?
|
||||||
mb->end_offset_top >= arg_offset_max)?
|
|
||||||
0 : mb->end_offset_top/2;
|
0 : mb->end_offset_top/2;
|
||||||
|
|
||||||
/* If there is space in the offset vector, set any unused pairs at the end to
|
/* If there is space in the offset vector, set any unused pairs at the end to
|
||||||
|
|
|
@ -63,8 +63,9 @@ Arguments:
|
||||||
|
|
||||||
Returns: if successful: zero
|
Returns: if successful: zero
|
||||||
if not successful, a negative error code:
|
if not successful, a negative error code:
|
||||||
PCRE2_ERROR_NOMEMORY: buffer too small
|
(1) an error from nametable_scan()
|
||||||
PCRE2_ERROR_NOSUBSTRING: no such captured substring
|
(2) an error from copy_bynumber()
|
||||||
|
(3) PCRE2_ERROR_UNSET: all named groups are unset
|
||||||
*/
|
*/
|
||||||
|
|
||||||
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
||||||
|
@ -83,7 +84,7 @@ for (entry = first; entry <= last; entry += entrysize)
|
||||||
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
|
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
|
||||||
return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr);
|
return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr);
|
||||||
}
|
}
|
||||||
return PCRE2_ERROR_NOSUBSTRING;
|
return PCRE2_ERROR_UNSET;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@ -104,25 +105,24 @@ Arguments:
|
||||||
Returns: if successful: 0
|
Returns: if successful: 0
|
||||||
if not successful, a negative error code:
|
if not successful, a negative error code:
|
||||||
PCRE2_ERROR_NOMEMORY: buffer too small
|
PCRE2_ERROR_NOMEMORY: buffer too small
|
||||||
PCRE2_ERROR_NOSUBSTRING: no such captured substring
|
PCRE2_ERROR_NOSUBSTRING: no such substring
|
||||||
|
PCRE2_ERROR_UNAVAILABLE: ovector too small
|
||||||
|
PCRE2_ERROR_UNSET: substring is not set
|
||||||
*/
|
*/
|
||||||
|
|
||||||
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
||||||
pcre2_substring_copy_bynumber(pcre2_match_data *match_data,
|
pcre2_substring_copy_bynumber(pcre2_match_data *match_data,
|
||||||
uint32_t stringnumber, PCRE2_UCHAR *buffer, PCRE2_SIZE *sizeptr)
|
uint32_t stringnumber, PCRE2_UCHAR *buffer, PCRE2_SIZE *sizeptr)
|
||||||
{
|
{
|
||||||
PCRE2_SIZE left, right;
|
int rc;
|
||||||
PCRE2_SIZE p = 0;
|
PCRE2_SIZE size;
|
||||||
PCRE2_SPTR subject = match_data->subject;
|
rc = pcre2_substring_length_bynumber(match_data, stringnumber, &size);
|
||||||
if (stringnumber >= match_data->oveccount ||
|
if (rc < 0) return rc;
|
||||||
stringnumber > match_data->code->top_bracket ||
|
if (size + 1 > *sizeptr) return PCRE2_ERROR_NOMEMORY;
|
||||||
(left = match_data->ovector[stringnumber*2]) == PCRE2_UNSET)
|
memcpy(buffer, match_data->subject + match_data->ovector[stringnumber*2],
|
||||||
return PCRE2_ERROR_NOSUBSTRING;
|
CU2BYTES(size));
|
||||||
right = match_data->ovector[stringnumber*2+1];
|
buffer[size] = 0;
|
||||||
if (right - left + 1 > *sizeptr) return PCRE2_ERROR_NOMEMORY;
|
*sizeptr = size;
|
||||||
while (left < right) buffer[p++] = subject[left++];
|
|
||||||
buffer[p] = 0;
|
|
||||||
*sizeptr = p;
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -144,8 +144,9 @@ Arguments:
|
||||||
|
|
||||||
Returns: if successful: zero
|
Returns: if successful: zero
|
||||||
if not successful, a negative value:
|
if not successful, a negative value:
|
||||||
PCRE2_ERROR_NOMEMORY: couldn't get memory
|
(1) an error from nametable_scan()
|
||||||
PCRE2_ERROR_NOSUBSTRING: no such captured substring
|
(2) an error from get_bynumber()
|
||||||
|
(3) PCRE2_ERROR_UNSET: all named groups are unset
|
||||||
*/
|
*/
|
||||||
|
|
||||||
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
||||||
|
@ -164,7 +165,7 @@ for (entry = first; entry <= last; entry += entrysize)
|
||||||
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
|
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
|
||||||
return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr);
|
return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr);
|
||||||
}
|
}
|
||||||
return PCRE2_ERROR_NOSUBSTRING;
|
return PCRE2_ERROR_UNSET;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@ -182,37 +183,32 @@ Arguments:
|
||||||
stringptr where to put a pointer to the new memory
|
stringptr where to put a pointer to the new memory
|
||||||
sizeptr where to put the size of the substring
|
sizeptr where to put the size of the substring
|
||||||
|
|
||||||
Returns: if successful: zero
|
Returns: if successful: 0
|
||||||
if not successful a negative error code:
|
if not successful, a negative error code:
|
||||||
PCRE2_ERROR_NOMEMORY: failed to get memory
|
PCRE2_ERROR_NOMEMORY: failed to get memory
|
||||||
PCRE2_ERROR_NOSUBSTRING: substring not present
|
PCRE2_ERROR_NOSUBSTRING: no such substring
|
||||||
|
PCRE2_ERROR_UNAVAILABLE: ovector too small
|
||||||
|
PCRE2_ERROR_UNSET: substring is not set
|
||||||
*/
|
*/
|
||||||
|
|
||||||
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
||||||
pcre2_substring_get_bynumber(pcre2_match_data *match_data,
|
pcre2_substring_get_bynumber(pcre2_match_data *match_data,
|
||||||
uint32_t stringnumber, PCRE2_UCHAR **stringptr, PCRE2_SIZE *sizeptr)
|
uint32_t stringnumber, PCRE2_UCHAR **stringptr, PCRE2_SIZE *sizeptr)
|
||||||
{
|
{
|
||||||
PCRE2_SIZE left, right;
|
int rc;
|
||||||
PCRE2_SIZE p = 0;
|
PCRE2_SIZE size;
|
||||||
void *block;
|
|
||||||
PCRE2_UCHAR *yield;
|
PCRE2_UCHAR *yield;
|
||||||
|
rc = pcre2_substring_length_bynumber(match_data, stringnumber, &size);
|
||||||
PCRE2_SPTR subject = match_data->subject;
|
if (rc < 0) return rc;
|
||||||
if (stringnumber >= match_data->oveccount ||
|
yield = PRIV(memctl_malloc)(sizeof(pcre2_memctl) +
|
||||||
stringnumber > match_data->code->top_bracket ||
|
(size + 1)*PCRE2_CODE_UNIT_WIDTH, (pcre2_memctl *)match_data);
|
||||||
(left = match_data->ovector[stringnumber*2]) == PCRE2_UNSET)
|
if (yield == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
return PCRE2_ERROR_NOSUBSTRING;
|
yield = (PCRE2_UCHAR *)(((char *)yield) + sizeof(pcre2_memctl));
|
||||||
right = match_data->ovector[stringnumber*2+1];
|
memcpy(yield, match_data->subject + match_data->ovector[stringnumber*2],
|
||||||
|
CU2BYTES(size));
|
||||||
block = PRIV(memctl_malloc)(sizeof(pcre2_memctl) +
|
yield[size] = 0;
|
||||||
(right-left+1)*PCRE2_CODE_UNIT_WIDTH, (pcre2_memctl *)match_data);
|
|
||||||
if (block == NULL) return PCRE2_ERROR_NOMEMORY;
|
|
||||||
|
|
||||||
yield = (PCRE2_UCHAR *)((char *)block + sizeof(pcre2_memctl));
|
|
||||||
while (left < right) yield[p++] = subject[left++];
|
|
||||||
yield[p] = 0;
|
|
||||||
*stringptr = yield;
|
*stringptr = yield;
|
||||||
*sizeptr = p;
|
*sizeptr = size;
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -260,14 +256,14 @@ PCRE2_SPTR last;
|
||||||
PCRE2_SPTR entry;
|
PCRE2_SPTR entry;
|
||||||
int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
|
int entrysize = pcre2_substring_nametable_scan(match_data->code, stringname,
|
||||||
&first, &last);
|
&first, &last);
|
||||||
if (entrysize <= 0) return entrysize;
|
if (entrysize < 0) return entrysize;
|
||||||
for (entry = first; entry <= last; entry += entrysize)
|
for (entry = first; entry <= last; entry += entrysize)
|
||||||
{
|
{
|
||||||
uint32_t n = GET2(entry, 0);
|
uint32_t n = GET2(entry, 0);
|
||||||
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
|
if (n < match_data->oveccount && match_data->ovector[n*2] != PCRE2_UNSET)
|
||||||
return pcre2_substring_length_bynumber(match_data, n, sizeptr);
|
return pcre2_substring_length_bynumber(match_data, n, sizeptr);
|
||||||
}
|
}
|
||||||
return PCRE2_ERROR_NOSUBSTRING;
|
return PCRE2_ERROR_UNSET;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@ -276,26 +272,36 @@ return PCRE2_ERROR_NOSUBSTRING;
|
||||||
* Get length of a numbered substring *
|
* Get length of a numbered substring *
|
||||||
*************************************************/
|
*************************************************/
|
||||||
|
|
||||||
/* This function returns the length of a captured substring.
|
/* This function returns the length of a captured substring. If the start is
|
||||||
|
beyond the end (which can happen when \K is used in an assertion), it sets the
|
||||||
|
length to zero.
|
||||||
|
|
||||||
Arguments:
|
Arguments:
|
||||||
match_data pointer to match data
|
match_data pointer to match data
|
||||||
stringnumber the number of the required substring
|
stringnumber the number of the required substring
|
||||||
sizeptr where to put the length
|
sizeptr where to put the length, if not NULL
|
||||||
|
|
||||||
Returns: 0 if successful, else a negative error number
|
Returns: if successful: 0
|
||||||
|
if not successful, a negative error code:
|
||||||
|
PCRE2_ERROR_NOSUBSTRING: no such substring
|
||||||
|
PCRE2_ERROR_UNAVAILABLE: ovector is too small
|
||||||
|
PCRE2_ERROR_UNSET: substring is not set
|
||||||
*/
|
*/
|
||||||
|
|
||||||
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
||||||
pcre2_substring_length_bynumber(pcre2_match_data *match_data,
|
pcre2_substring_length_bynumber(pcre2_match_data *match_data,
|
||||||
uint32_t stringnumber, PCRE2_SIZE *sizeptr)
|
uint32_t stringnumber, PCRE2_SIZE *sizeptr)
|
||||||
{
|
{
|
||||||
if (stringnumber >= match_data->oveccount ||
|
PCRE2_SIZE left, right;
|
||||||
stringnumber > match_data->code->top_bracket ||
|
if (stringnumber > match_data->code->top_bracket)
|
||||||
match_data->ovector[stringnumber*2] == PCRE2_UNSET)
|
|
||||||
return PCRE2_ERROR_NOSUBSTRING;
|
return PCRE2_ERROR_NOSUBSTRING;
|
||||||
*sizeptr = match_data->ovector[stringnumber*2 + 1] -
|
if (stringnumber >= match_data->oveccount)
|
||||||
match_data->ovector[stringnumber*2];
|
return PCRE2_ERROR_UNAVAILABLE;
|
||||||
|
if (match_data->ovector[stringnumber*2] == PCRE2_UNSET)
|
||||||
|
return PCRE2_ERROR_UNSET;
|
||||||
|
left = match_data->ovector[stringnumber*2];
|
||||||
|
right = match_data->ovector[stringnumber*2+1];
|
||||||
|
if (sizeptr != NULL) *sizeptr = (left > right)? 0 : right - left;
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -334,7 +340,8 @@ PCRE2_UCHAR **listp;
|
||||||
PCRE2_UCHAR *sp;
|
PCRE2_UCHAR *sp;
|
||||||
PCRE2_SIZE *ovector;
|
PCRE2_SIZE *ovector;
|
||||||
|
|
||||||
if ((count = match_data->rc) < 0) return count;
|
if ((count = match_data->rc) < 0) return count; /* Match failed */
|
||||||
|
if (count == 0) count = match_data->oveccount; /* Ovector too small */
|
||||||
|
|
||||||
count2 = 2*count;
|
count2 = 2*count;
|
||||||
ovector = match_data->ovector;
|
ovector = match_data->ovector;
|
||||||
|
@ -342,7 +349,11 @@ size = sizeof(pcre2_memctl) + sizeof(PCRE2_UCHAR *); /* For final NULL */
|
||||||
if (lengthsptr != NULL) size += sizeof(PCRE2_SIZE) * count; /* For lengths */
|
if (lengthsptr != NULL) size += sizeof(PCRE2_SIZE) * count; /* For lengths */
|
||||||
|
|
||||||
for (i = 0; i < count2; i += 2)
|
for (i = 0; i < count2; i += 2)
|
||||||
size += sizeof(PCRE2_UCHAR *) + CU2BYTES(ovector[i+1] - ovector[i] + 1);
|
{
|
||||||
|
size += sizeof(PCRE2_UCHAR *) + CU2BYTES(1);
|
||||||
|
if (ovector[i+1] > ovector[i]) size += CU2BYTES(ovector[i+1] - ovector[i]);
|
||||||
|
}
|
||||||
|
|
||||||
memp = PRIV(memctl_malloc)(size, (pcre2_memctl *)match_data);
|
memp = PRIV(memctl_malloc)(size, (pcre2_memctl *)match_data);
|
||||||
if (memp == NULL) return PCRE2_ERROR_NOMEMORY;
|
if (memp == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
|
|
||||||
|
@ -362,7 +373,7 @@ else
|
||||||
|
|
||||||
for (i = 0; i < count2; i += 2)
|
for (i = 0; i < count2; i += 2)
|
||||||
{
|
{
|
||||||
size = ovector[i+1] - ovector[i];
|
size = (ovector[i+1] > ovector[i])? (ovector[i+1] - ovector[i]) : 0;
|
||||||
memcpy(sp, match_data->subject + ovector[i], CU2BYTES(size));
|
memcpy(sp, match_data->subject + ovector[i], CU2BYTES(size));
|
||||||
*listp++ = sp;
|
*listp++ = sp;
|
||||||
if (lensp != NULL) *lensp++ = size;
|
if (lensp != NULL) *lensp++ = size;
|
||||||
|
@ -400,8 +411,8 @@ memctl->free(memctl, memctl->memory_data);
|
||||||
|
|
||||||
/* This function scans the nametable for a given name, using binary chop. It
|
/* This function scans the nametable for a given name, using binary chop. It
|
||||||
returns either two pointers to the entries in the table, or, if no pointers are
|
returns either two pointers to the entries in the table, or, if no pointers are
|
||||||
given, the number of a group with the given name. If duplicate names are
|
given, the number of a unique group with the given name. If duplicate names are
|
||||||
permitted, this may not be unique.
|
permitted, and the name is not unique, an error is generated.
|
||||||
|
|
||||||
Arguments:
|
Arguments:
|
||||||
code the compiled regex
|
code the compiled regex
|
||||||
|
@ -409,10 +420,12 @@ Arguments:
|
||||||
firstptr where to put the pointer to the first entry
|
firstptr where to put the pointer to the first entry
|
||||||
lastptr where to put the pointer to the last entry
|
lastptr where to put the pointer to the last entry
|
||||||
|
|
||||||
Returns: if firstptr and lastptr are NULL, a group number for a
|
Returns: PCRE2_ERROR_NOSUBSTRING if the name is not found
|
||||||
unique substring, or PCRE2_ERROR_NOUNIQUESUBSTRING
|
otherwise, if firstptr and lastptr are NULL:
|
||||||
otherwise, the length of each entry, or a negative number
|
a group number for a unique substring
|
||||||
(PCRE2_ERROR_NOSUBSTRING) if not found
|
else PCRE2_ERROR_NOUNIQUESUBSTRING
|
||||||
|
otherwise:
|
||||||
|
the length of each entry, having set firstptr and lastptr
|
||||||
*/
|
*/
|
||||||
|
|
||||||
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
||||||
|
@ -446,8 +459,8 @@ while (top > bot)
|
||||||
if (PRIV(strcmp)(stringname, (last + entrysize + IMM2_SIZE)) != 0) break;
|
if (PRIV(strcmp)(stringname, (last + entrysize + IMM2_SIZE)) != 0) break;
|
||||||
last += entrysize;
|
last += entrysize;
|
||||||
}
|
}
|
||||||
if (firstptr == NULL)
|
if (firstptr == NULL) return (first == last)?
|
||||||
return (first == last)? (int)GET2(entry, 0) : PCRE2_ERROR_NOUNIQUESUBSTRING;
|
(int)GET2(entry, 0) : PCRE2_ERROR_NOUNIQUESUBSTRING;
|
||||||
*firstptr = first;
|
*firstptr = first;
|
||||||
*lastptr = last;
|
*lastptr = last;
|
||||||
return entrysize;
|
return entrysize;
|
||||||
|
|
|
@ -4084,4 +4084,11 @@ a random value. /Ix
|
||||||
"(?(?=)?==)(((((((((?=)))))))))"
|
"(?(?=)?==)(((((((((?=)))))))))"
|
||||||
a
|
a
|
||||||
|
|
||||||
|
/(a)(b)|(c)/
|
||||||
|
XcX\=ovector=2,get=1,get=2,get=3,get=4,getall
|
||||||
|
|
||||||
|
/x(?=ab\K)/
|
||||||
|
xab\=get=0
|
||||||
|
xab\=copy=0
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -993,7 +993,7 @@ Subject length lower bound = 4
|
||||||
0: abcd
|
0: abcd
|
||||||
1: a
|
1: a
|
||||||
2: d
|
2: d
|
||||||
Copy substring 5 failed (-48): unknown or unset substring
|
Copy substring 5 failed (-48): unknown substring
|
||||||
|
|
||||||
/(.{20})/I
|
/(.{20})/I
|
||||||
Capturing subpattern count = 1
|
Capturing subpattern count = 1
|
||||||
|
@ -1047,9 +1047,9 @@ Subject length lower bound = 4
|
||||||
2: <unset>
|
2: <unset>
|
||||||
3: f
|
3: f
|
||||||
1G a (1)
|
1G a (1)
|
||||||
Get substring 2 failed (-48): unknown or unset substring
|
Get substring 2 failed (-54): requested value is not set
|
||||||
3G f (1)
|
3G f (1)
|
||||||
Get substring 4 failed (-48): unknown or unset substring
|
Get substring 4 failed (-48): unknown substring
|
||||||
0L adef
|
0L adef
|
||||||
1L a
|
1L a
|
||||||
2L
|
2L
|
||||||
|
@ -1062,7 +1062,7 @@ Get substring 4 failed (-48): unknown or unset substring
|
||||||
1G bc (2)
|
1G bc (2)
|
||||||
2G bc (2)
|
2G bc (2)
|
||||||
3G f (1)
|
3G f (1)
|
||||||
Get substring 4 failed (-48): unknown or unset substring
|
Get substring 4 failed (-48): unknown substring
|
||||||
0L bcdef
|
0L bcdef
|
||||||
1L bc
|
1L bc
|
||||||
2L bc
|
2L bc
|
||||||
|
@ -4363,7 +4363,7 @@ Subject length lower bound = 8
|
||||||
1: cd
|
1: cd
|
||||||
2: gh
|
2: gh
|
||||||
Number not found for group 'three'
|
Number not found for group 'three'
|
||||||
Copy substring 'three' failed (-48): unknown or unset substring
|
Copy substring 'three' failed (-48): unknown substring
|
||||||
|
|
||||||
/(?P<Tes>)(?P<Test>)/IB
|
/(?P<Tes>)(?P<Test>)/IB
|
||||||
------------------------------------------------------------------
|
------------------------------------------------------------------
|
||||||
|
@ -5731,7 +5731,7 @@ No match
|
||||||
1: a1
|
1: a1
|
||||||
2: a1
|
2: a1
|
||||||
Number not found for group 'Z'
|
Number not found for group 'Z'
|
||||||
Copy substring 'Z' failed (-48): unknown or unset substring
|
Copy substring 'Z' failed (-48): unknown substring
|
||||||
C a1 (2) A (non-unique)
|
C a1 (2) A (non-unique)
|
||||||
|
|
||||||
/(?|(?<a>)(?<b>)(?<a>)|(?<a>)(?<b>)(?<a>))/I,dupnames
|
/(?|(?<a>)(?<b>)(?<a>)|(?<a>)(?<b>)(?<a>))/I,dupnames
|
||||||
|
@ -5772,7 +5772,7 @@ Subject length lower bound = 2
|
||||||
C a (1) A (non-unique)
|
C a (1) A (non-unique)
|
||||||
cd\=copy=A
|
cd\=copy=A
|
||||||
0: cd
|
0: cd
|
||||||
Copy substring 'A' failed (-48): unknown or unset substring
|
Copy substring 'A' failed (-54): requested value is not set
|
||||||
|
|
||||||
/^(?P<A>a)(?P<A>b)|cd(?P<A>ef)(?P<A>gh)/I,dupnames
|
/^(?P<A>a)(?P<A>b)|cd(?P<A>ef)(?P<A>gh)/I,dupnames
|
||||||
Capturing subpattern count = 4
|
Capturing subpattern count = 4
|
||||||
|
@ -5817,7 +5817,7 @@ No match
|
||||||
1: a1
|
1: a1
|
||||||
2: a1
|
2: a1
|
||||||
Number not found for group 'Z'
|
Number not found for group 'Z'
|
||||||
Get substring 'Z' failed (-48): unknown or unset substring
|
Get substring 'Z' failed (-48): unknown substring
|
||||||
G a1 (2) A (non-unique)
|
G a1 (2) A (non-unique)
|
||||||
|
|
||||||
/^(?P<A>a)(?P<A>b)/I,dupnames
|
/^(?P<A>a)(?P<A>b)/I,dupnames
|
||||||
|
@ -5848,7 +5848,7 @@ Subject length lower bound = 2
|
||||||
G a (1) A (non-unique)
|
G a (1) A (non-unique)
|
||||||
cd\=get=A
|
cd\=get=A
|
||||||
0: cd
|
0: cd
|
||||||
Get substring 'A' failed (-48): unknown or unset substring
|
Get substring 'A' failed (-54): requested value is not set
|
||||||
|
|
||||||
/^(?P<A>a)(?P<A>b)|cd(?P<A>ef)(?P<A>gh)/I,dupnames
|
/^(?P<A>a)(?P<A>b)|cd(?P<A>ef)(?P<A>gh)/I,dupnames
|
||||||
Capturing subpattern count = 4
|
Capturing subpattern count = 4
|
||||||
|
@ -13659,11 +13659,11 @@ Failed: error -35: invalid replacement string
|
||||||
|
|
||||||
/abc/replace=a$bad
|
/abc/replace=a$bad
|
||||||
123abc
|
123abc
|
||||||
Failed: error -48: unknown or unset substring
|
Failed: error -48: unknown substring
|
||||||
|
|
||||||
/abc/replace=a${A234567890123456789_123456789012}z
|
/abc/replace=a${A234567890123456789_123456789012}z
|
||||||
123abc
|
123abc
|
||||||
Failed: error -48: unknown or unset substring
|
Failed: error -48: unknown substring
|
||||||
|
|
||||||
/abc/replace=a${A23456789012345678901234567890123}z
|
/abc/replace=a${A23456789012345678901234567890123}z
|
||||||
123abc
|
123abc
|
||||||
|
@ -13715,4 +13715,26 @@ Failed: error -34: bad option value
|
||||||
a
|
a
|
||||||
No match
|
No match
|
||||||
|
|
||||||
|
/(a)(b)|(c)/
|
||||||
|
XcX\=ovector=2,get=1,get=2,get=3,get=4,getall
|
||||||
|
Matched, but too many substrings
|
||||||
|
0: c
|
||||||
|
1: <unset>
|
||||||
|
Get substring 1 failed (-54): requested value is not set
|
||||||
|
Get substring 2 failed (-53): requested value is not available
|
||||||
|
Get substring 3 failed (-53): requested value is not available
|
||||||
|
Get substring 4 failed (-48): unknown substring
|
||||||
|
0L c
|
||||||
|
1L
|
||||||
|
|
||||||
|
/x(?=ab\K)/
|
||||||
|
xab\=get=0
|
||||||
|
Start of matched string is beyond its end - displaying from end to start.
|
||||||
|
0: ab
|
||||||
|
0G (0)
|
||||||
|
xab\=copy=0
|
||||||
|
Start of matched string is beyond its end - displaying from end to start.
|
||||||
|
0: ab
|
||||||
|
0C (0)
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
Loading…
Reference in New Issue