Improvements for substring handling with partial matches.
This commit is contained in:
parent
2a5767d757
commit
b8dbae1474
|
@ -31,9 +31,11 @@ The arguments are:
|
|||
<pre>
|
||||
<i>match_data</i> The match data block for the match
|
||||
<i>number</i> The substring number
|
||||
<i>length</i> Where to return the length
|
||||
<i>length</i> Where to return the length, or NULL
|
||||
</pre>
|
||||
The yield is zero on success, or an error code if the substring is not found.
|
||||
The third argument may be NULL if all you want to know is whether or not a
|
||||
substring is set. The yield is zero on success, or a negative error code
|
||||
otherwise. After a partial match, only substring 0 is available.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
|
|
|
@ -1740,6 +1740,12 @@ and
|
|||
below.
|
||||
</P>
|
||||
<P>
|
||||
When a call of <b>pcre2_match()</b> fails, valid data is available in the match
|
||||
block only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ERROR_PARTIAL, or one
|
||||
of the error codes for an invalid UTF string. Exactly what is available depends
|
||||
on the error, and is detailed below.
|
||||
</P>
|
||||
<P>
|
||||
When one of the matching functions is called, pointers to the compiled pattern
|
||||
and the subject string are set in the match data block so that they can be
|
||||
referenced by the extraction functions. After running a match, you must not
|
||||
|
@ -2018,9 +2024,9 @@ function can be used to find out how many capturing subpatterns there are in a
|
|||
compiled pattern.
|
||||
</P>
|
||||
<P>
|
||||
The overall matched string and any captured substrings are returned to the
|
||||
caller via a vector of PCRE2_SIZE values. This is called the <b>ovector</b>, and
|
||||
is contained within the
|
||||
A successful match returns the overall matched string and any captured
|
||||
substrings to the caller via a vector of PCRE2_SIZE values. This is called the
|
||||
<b>ovector</b>, and is contained within the
|
||||
<a href="#matchdatablock">match data block.</a>
|
||||
You can obtain direct access to the ovector by calling
|
||||
<b>pcre2_get_ovector_pointer()</b> to find its address, and
|
||||
|
@ -2041,20 +2047,26 @@ library, 16-bit offsets in the 16-bit library, and 32-bit offsets in the 32-bit
|
|||
library.
|
||||
</P>
|
||||
<P>
|
||||
The first pair of offsets (that is, <i>ovector[0]</i> and <i>ovector[1]</i>)
|
||||
identifies the portion of the subject string that was matched by the entire
|
||||
pattern. The next pair is used for the first capturing subpattern, and so on.
|
||||
The value returned by <b>pcre2_match()</b> is one more than the highest numbered
|
||||
pair that has been set. For example, if two substrings have been captured, the
|
||||
returned value is 3. If there are no capturing subpatterns, the return value
|
||||
from a successful match is 1, indicating that just the first pair of offsets
|
||||
has been set.
|
||||
After a partial match (error return PCRE2_ERROR_PARTIAL), only the first pair
|
||||
of offsets (that is, <i>ovector[0]</i> and <i>ovector[1]</i>) are set. They
|
||||
identify the part of the subject that was partially matched. See the
|
||||
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
||||
documentation for details of partial matching.
|
||||
</P>
|
||||
<P>
|
||||
After a successful match, the first pair of offsets identifies the portion of
|
||||
the subject string that was matched by the entire pattern. The next pair is
|
||||
used for the first capturing subpattern, and so on. The value returned by
|
||||
<b>pcre2_match()</b> is one more than the highest numbered pair that has been
|
||||
set. For example, if two substrings have been captured, the returned value is
|
||||
3. If there are no capturing subpatterns, the return value from a successful
|
||||
match is 1, indicating that just the first pair of offsets has been set.
|
||||
</P>
|
||||
<P>
|
||||
If a pattern uses the \K escape sequence within a positive assertion, the
|
||||
reported start of the match can be greater than the end of the match. For
|
||||
example, if the pattern (?=ab\K) is matched against "ab", the start and end
|
||||
offset values for the match are 2 and 0.
|
||||
reported start of a successful match can be greater than the end of the match.
|
||||
For example, if the pattern (?=ab\K) is matched against "ab", the start and
|
||||
end offset values for the match are 2 and 0.
|
||||
</P>
|
||||
<P>
|
||||
If a capturing subpattern group is matched repeatedly within a single match
|
||||
|
@ -2104,24 +2116,38 @@ had.
|
|||
</P>
|
||||
<P>
|
||||
As well as the offsets in the ovector, other information about a match is
|
||||
retained in the match data block and can be retrieved by the above functions.
|
||||
retained in the match data block and can be retrieved by the above functions in
|
||||
appropriate circumstances. If they are called at other times, the result is
|
||||
undefined.
|
||||
</P>
|
||||
<P>
|
||||
When a (*MARK) name is to be passed back, <b>pcre2_get_mark()</b> returns a
|
||||
pointer to the zero-terminated name, which is within the compiled pattern.
|
||||
Otherwise NULL is returned. A (*MARK) name may be available after a failed
|
||||
match or a partial match, as well as after a successful one.
|
||||
After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
|
||||
to match (PCRE2_ERROR_NOMATCH), a (*MARK) name may be available, and
|
||||
<b>pcre2_get_mark()</b> can be called. It returns a pointer to the
|
||||
zero-terminated name, which is within the compiled pattern. Otherwise NULL is
|
||||
returned. After a successful match, the (*MARK) name that is returned is the
|
||||
last one encountered on the matching path through the pattern. After a "no
|
||||
match" or a partial match, the last encountered (*MARK) name is returned. For
|
||||
example, consider this pattern:
|
||||
<pre>
|
||||
^(*MARK:A)((*MARK:B)a|b)c
|
||||
</pre>
|
||||
When it matches "bc", the returned mark is A. The B mark is "seen" in the first
|
||||
branch of the group, but it is not on the matching path. On the other hand,
|
||||
when this pattern fails to match "bx", the returned mark is B.
|
||||
</P>
|
||||
<P>
|
||||
The code unit offset of the character at which a successful match started is
|
||||
returned by <b>pcre2_get_startchar()</b>. For a non-partial match, this can be
|
||||
After a successful match, a partial match, or one of the invalid UTF errors
|
||||
(for example, PCRE2_ERROR_UTF8_ERR5), <b>pcre2_get_startchar()</b> can be
|
||||
called. After a successful or partial match it returns the code unit offset of
|
||||
the character at which the match started. For a non-partial match, this can be
|
||||
different to the value of <i>ovector[0]</i> if the pattern contains the \K
|
||||
escape sequence. After a partial match, however, this value is always the same
|
||||
as <i>ovector[0]</i> because \K does not affect the result of a partial match.
|
||||
</P>
|
||||
<P>
|
||||
The <b>startchar</b> field is also used to return the offset of an invalid
|
||||
UTF character when UTF checking fails. Details are given in the
|
||||
After a UTF check failure, \fBpcre2_get_startchar()\fB can be used to obtain
|
||||
the code unit offset of the invalid UTF character. Details are given in the
|
||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||
page.
|
||||
<a name="errorlist"></a></P>
|
||||
|
@ -2256,19 +2282,23 @@ The internal recursion limit was reached.
|
|||
Captured substrings can be accessed directly by using the ovector as described
|
||||
<a href="#matchedstrings">above.</a>
|
||||
For convenience, auxiliary functions are provided for extracting captured
|
||||
substrings as new, separate, zero-terminated strings. The functions in this
|
||||
section identify substrings by number. The number zero refers to the entire
|
||||
matched substring, with higher numbers referring to substrings captured by
|
||||
parenthesized groups. The next section describes similar functions for
|
||||
extracting captured substrings by name. A substring that contains a binary zero
|
||||
is correctly extracted and has a further zero added on the end, but the result
|
||||
is not, of course, a C string.
|
||||
substrings as new, separate, zero-terminated strings. A substring that contains
|
||||
a binary zero is correctly extracted and has a further zero added on the end,
|
||||
but the result is not, of course, a C string.
|
||||
</P>
|
||||
<P>
|
||||
The functions in this section identify substrings by number. The number zero
|
||||
refers to the entire matched substring, with higher numbers referring to
|
||||
substrings captured by parenthesized groups. After a partial match, only
|
||||
substring zero is available. An attempt to extract any other substring gives
|
||||
the error PCRE2_ERROR_PARTIAL. The next section describes similar functions for
|
||||
extracting captured substrings by name.
|
||||
</P>
|
||||
<P>
|
||||
If a pattern uses the \K escape sequence within a positive assertion, the
|
||||
reported start of the match can be greater than the end of the match. For
|
||||
example, if the pattern (?=ab\K) is matched against "ab", the start and end
|
||||
offset values for the match are 2 and 0. In this situation, calling these
|
||||
reported start of a successful match can be greater than the end of the match.
|
||||
For example, if the pattern (?=ab\K) is matched against "ab", the start and
|
||||
end offset values for the match are 2 and 0. In this situation, calling these
|
||||
functions with a zero substring number extracts a zero-length empty string.
|
||||
</P>
|
||||
<P>
|
||||
|
@ -2302,7 +2332,8 @@ calling <b>pcre2_substring_free()</b>.
|
|||
<P>
|
||||
The return value from all these functions is zero for success, or a negative
|
||||
error code. If the pattern match failed, the match failure code is returned.
|
||||
Other possible error codes are:
|
||||
If a substring number greater than zero is used after a partial match,
|
||||
PCRE2_ERROR_PARTIAL is returned. Other possible error codes are:
|
||||
<pre>
|
||||
PCRE2_ERROR_NOMEMORY
|
||||
</pre>
|
||||
|
@ -2343,6 +2374,10 @@ that is obtained using the same memory allocation function that was used to get
|
|||
the match data block.
|
||||
</P>
|
||||
<P>
|
||||
This function must be called only after a successful match. If called after a
|
||||
partial match, the error code PCRE2_ERROR_PARTIAL is returned.
|
||||
</P>
|
||||
<P>
|
||||
The address of the memory block is returned via <i>listptr</i>, which is also
|
||||
the start of the list of string pointers. The end of the list is marked by a
|
||||
NULL pointer. The address of the list of lengths is returned via
|
||||
|
@ -2757,7 +2792,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC37" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 14 December 2014
|
||||
Last updated: 22 December 2014
|
||||
<br>
|
||||
Copyright © 1997-2014 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -89,8 +89,9 @@ empty string at the end of the subject.
|
|||
</P>
|
||||
<P>
|
||||
When a partial match is returned, the first two elements in the ovector point
|
||||
to the portion of the subject that was matched. The appearance of \K in the
|
||||
pattern has no effect for a partial match. Consider this pattern:
|
||||
to the portion of the subject that was matched, but the values in the rest of
|
||||
the ovector are undefined. The appearance of \K in the pattern has no effect
|
||||
for a partial match. Consider this pattern:
|
||||
<pre>
|
||||
/abc\K123/
|
||||
</pre>
|
||||
|
@ -455,7 +456,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 14 October 2014
|
||||
Last updated: 22 December 2014
|
||||
<br>
|
||||
Copyright © 1997-2014 University of Cambridge.
|
||||
<br>
|
||||
|
|
210
doc/pcre2.txt
210
doc/pcre2.txt
|
@ -1753,6 +1753,12 @@ THE MATCH DATA BLOCK
|
|||
described in the sections on matched strings and other match data
|
||||
below.
|
||||
|
||||
When a call of pcre2_match() fails, valid data is available in the
|
||||
match block only when the error is PCRE2_ERROR_NOMATCH,
|
||||
PCRE2_ERROR_PARTIAL, or one of the error codes for an invalid UTF
|
||||
string. Exactly what is available depends on the error, and is detailed
|
||||
below.
|
||||
|
||||
When one of the matching functions is called, pointers to the compiled
|
||||
pattern and the subject string are set in the match data block so that
|
||||
they can be referenced by the extraction functions. After running a
|
||||
|
@ -2008,14 +2014,14 @@ HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
|
|||
be captured. The pcre2_pattern_info() function can be used to find out
|
||||
how many capturing subpatterns there are in a compiled pattern.
|
||||
|
||||
The overall matched string and any captured substrings are returned to
|
||||
the caller via a vector of PCRE2_SIZE values. This is called the ovec-
|
||||
tor, and is contained within the match data block. You can obtain
|
||||
direct access to the ovector by calling pcre2_get_ovector_pointer() to
|
||||
find its address, and pcre2_get_ovector_count() to find the number of
|
||||
pairs of values it contains. Alternatively, you can use the auxiliary
|
||||
functions for accessing captured substrings by number or by name (see
|
||||
below).
|
||||
A successful match returns the overall matched string and any captured
|
||||
substrings to the caller via a vector of PCRE2_SIZE values. This is
|
||||
called the ovector, and is contained within the match data block. You
|
||||
can obtain direct access to the ovector by calling pcre2_get_ovec-
|
||||
tor_pointer() to find its address, and pcre2_get_ovector_count() to
|
||||
find the number of pairs of values it contains. Alternatively, you can
|
||||
use the auxiliary functions for accessing captured substrings by number
|
||||
or by name (see below).
|
||||
|
||||
Within the ovector, the first in each pair of values is set to the off-
|
||||
set of the first code unit of a substring, and the second is set to the
|
||||
|
@ -2024,53 +2030,58 @@ HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
|
|||
are byte offsets in the 8-bit library, 16-bit offsets in the 16-bit
|
||||
library, and 32-bit offsets in the 32-bit library.
|
||||
|
||||
The first pair of offsets (that is, ovector[0] and ovector[1]) identi-
|
||||
fies the portion of the subject string that was matched by the entire
|
||||
pattern. The next pair is used for the first capturing subpattern, and
|
||||
so on. The value returned by pcre2_match() is one more than the high-
|
||||
est numbered pair that has been set. For example, if two substrings
|
||||
have been captured, the returned value is 3. If there are no capturing
|
||||
subpatterns, the return value from a successful match is 1, indicating
|
||||
that just the first pair of offsets has been set.
|
||||
After a partial match (error return PCRE2_ERROR_PARTIAL), only the
|
||||
first pair of offsets (that is, ovector[0] and ovector[1]) are set.
|
||||
They identify the part of the subject that was partially matched. See
|
||||
the pcre2partial documentation for details of partial matching.
|
||||
|
||||
If a pattern uses the \K escape sequence within a positive assertion,
|
||||
the reported start of the match can be greater than the end of the
|
||||
match. For example, if the pattern (?=ab\K) is matched against "ab",
|
||||
the start and end offset values for the match are 2 and 0.
|
||||
After a successful match, the first pair of offsets identifies the por-
|
||||
tion of the subject string that was matched by the entire pattern. The
|
||||
next pair is used for the first capturing subpattern, and so on. The
|
||||
value returned by pcre2_match() is one more than the highest numbered
|
||||
pair that has been set. For example, if two substrings have been cap-
|
||||
tured, the returned value is 3. If there are no capturing subpatterns,
|
||||
the return value from a successful match is 1, indicating that just the
|
||||
first pair of offsets has been set.
|
||||
|
||||
If a capturing subpattern group is matched repeatedly within a single
|
||||
match operation, it is the last portion of the subject that it matched
|
||||
If a pattern uses the \K escape sequence within a positive assertion,
|
||||
the reported start of a successful match can be greater than the end of
|
||||
the match. For example, if the pattern (?=ab\K) is matched against
|
||||
"ab", the start and end offset values for the match are 2 and 0.
|
||||
|
||||
If a capturing subpattern group is matched repeatedly within a single
|
||||
match operation, it is the last portion of the subject that it matched
|
||||
that is returned.
|
||||
|
||||
If the ovector is too small to hold all the captured substring offsets,
|
||||
as much as possible is filled in, and the function returns a value of
|
||||
zero. If captured substrings are not of interest, pcre2_match() may be
|
||||
as much as possible is filled in, and the function returns a value of
|
||||
zero. If captured substrings are not of interest, pcre2_match() may be
|
||||
called with a match data block whose ovector is of minimum length (that
|
||||
is, one pair). However, if the pattern contains back references and the
|
||||
ovector is not big enough to remember the related substrings, PCRE2 has
|
||||
to get additional memory for use during matching. Thus it is usually
|
||||
to get additional memory for use during matching. Thus it is usually
|
||||
advisable to set up a match data block containing an ovector of reason-
|
||||
able size.
|
||||
|
||||
It is possible for capturing subpattern number n+1 to match some part
|
||||
It is possible for capturing subpattern number n+1 to match some part
|
||||
of the subject when subpattern n has not been used at all. For example,
|
||||
if the string "abc" is matched against the pattern (a|(z))(bc) the
|
||||
if the string "abc" is matched against the pattern (a|(z))(bc) the
|
||||
return from the function is 4, and subpatterns 1 and 3 are matched, but
|
||||
2 is not. When this happens, both values in the offset pairs corre-
|
||||
2 is not. When this happens, both values in the offset pairs corre-
|
||||
sponding to unused subpatterns are set to PCRE2_UNSET.
|
||||
|
||||
Offset values that correspond to unused subpatterns at the end of the
|
||||
expression are also set to PCRE2_UNSET. For example, if the string
|
||||
Offset values that correspond to unused subpatterns at the end of the
|
||||
expression are also set to PCRE2_UNSET. For example, if the string
|
||||
"abc" is matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3
|
||||
are not matched. The return from the function is 2, because the high-
|
||||
are not matched. The return from the function is 2, because the high-
|
||||
est used capturing subpattern number is 1. The offsets for for the sec-
|
||||
ond and third capturing subpatterns (assuming the vector is large
|
||||
ond and third capturing subpatterns (assuming the vector is large
|
||||
enough, of course) are set to PCRE2_UNSET.
|
||||
|
||||
Elements in the ovector that do not correspond to capturing parentheses
|
||||
in the pattern are never changed. That is, if a pattern contains n cap-
|
||||
turing parentheses, no more than ovector[0] to ovector[2n+1] are set by
|
||||
pcre2_match(). The other elements retain whatever values they previ-
|
||||
pcre2_match(). The other elements retain whatever values they previ-
|
||||
ously had.
|
||||
|
||||
|
||||
|
@ -2080,26 +2091,39 @@ OTHER INFORMATION ABOUT A MATCH
|
|||
|
||||
PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *match_data);
|
||||
|
||||
As well as the offsets in the ovector, other information about a match
|
||||
is retained in the match data block and can be retrieved by the above
|
||||
functions.
|
||||
As well as the offsets in the ovector, other information about a match
|
||||
is retained in the match data block and can be retrieved by the above
|
||||
functions in appropriate circumstances. If they are called at other
|
||||
times, the result is undefined.
|
||||
|
||||
When a (*MARK) name is to be passed back, pcre2_get_mark() returns a
|
||||
pointer to the zero-terminated name, which is within the compiled pat-
|
||||
tern. Otherwise NULL is returned. A (*MARK) name may be available
|
||||
after a failed match or a partial match, as well as after a successful
|
||||
one.
|
||||
After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a
|
||||
failure to match (PCRE2_ERROR_NOMATCH), a (*MARK) name may be avail-
|
||||
able, and pcre2_get_mark() can be called. It returns a pointer to the
|
||||
zero-terminated name, which is within the compiled pattern. Otherwise
|
||||
NULL is returned. After a successful match, the (*MARK) name that is
|
||||
returned is the last one encountered on the matching path through the
|
||||
pattern. After a "no match" or a partial match, the last encountered
|
||||
(*MARK) name is returned. For example, consider this pattern:
|
||||
|
||||
The code unit offset of the character at which a successful match
|
||||
started is returned by pcre2_get_startchar(). For a non-partial match,
|
||||
this can be different to the value of ovector[0] if the pattern con-
|
||||
tains the \K escape sequence. After a partial match, however, this
|
||||
^(*MARK:A)((*MARK:B)a|b)c
|
||||
|
||||
When it matches "bc", the returned mark is A. The B mark is "seen" in
|
||||
the first branch of the group, but it is not on the matching path. On
|
||||
the other hand, when this pattern fails to match "bx", the returned
|
||||
mark is B.
|
||||
|
||||
After a successful match, a partial match, or one of the invalid UTF
|
||||
errors (for example, PCRE2_ERROR_UTF8_ERR5), pcre2_get_startchar() can
|
||||
be called. After a successful or partial match it returns the code unit
|
||||
offset of the character at which the match started. For a non-partial
|
||||
match, this can be different to the value of ovector[0] if the pattern
|
||||
contains the \K escape sequence. After a partial match, however, this
|
||||
value is always the same as ovector[0] because \K does not affect the
|
||||
result of a partial match.
|
||||
|
||||
The startchar field is also used to return the offset of an invalid UTF
|
||||
character when UTF checking fails. Details are given in the pcre2uni-
|
||||
code page.
|
||||
After a UTF check failure, pcre2_get_startchar() can be used to obtain
|
||||
the code unit offset of the invalid UTF character. Details are given in
|
||||
the pcre2unicode page.
|
||||
|
||||
|
||||
ERROR RETURNS FROM pcre2_match()
|
||||
|
@ -2225,33 +2249,36 @@ EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
|
|||
Captured substrings can be accessed directly by using the ovector as
|
||||
described above. For convenience, auxiliary functions are provided for
|
||||
extracting captured substrings as new, separate, zero-terminated
|
||||
strings. The functions in this section identify substrings by number.
|
||||
The number zero refers to the entire matched substring, with higher
|
||||
numbers referring to substrings captured by parenthesized groups. The
|
||||
next section describes similar functions for extracting captured sub-
|
||||
strings by name. A substring that contains a binary zero is correctly
|
||||
extracted and has a further zero added on the end, but the result is
|
||||
not, of course, a C string.
|
||||
strings. A substring that contains a binary zero is correctly extracted
|
||||
and has a further zero added on the end, but the result is not, of
|
||||
course, a C string.
|
||||
|
||||
If a pattern uses the \K escape sequence within a positive assertion,
|
||||
the reported start of the match can be greater than the end of the
|
||||
match. For example, if the pattern (?=ab\K) is matched against "ab",
|
||||
the start and end offset values for the match are 2 and 0. In this sit-
|
||||
uation, calling these functions with a zero substring number extracts a
|
||||
zero-length empty string.
|
||||
The functions in this section identify substrings by number. The number
|
||||
zero refers to the entire matched substring, with higher numbers refer-
|
||||
ring to substrings captured by parenthesized groups. After a partial
|
||||
match, only substring zero is available. An attempt to extract any
|
||||
other substring gives the error PCRE2_ERROR_PARTIAL. The next section
|
||||
describes similar functions for extracting captured substrings by name.
|
||||
|
||||
You can find the length in code units of a captured substring without
|
||||
extracting it by calling pcre2_substring_length_bynumber(). The first
|
||||
argument is a pointer to the match data block, the second is the group
|
||||
number, and the third is a pointer to a variable into which the length
|
||||
is placed. If you just want to know whether or not the substring has
|
||||
If a pattern uses the \K escape sequence within a positive assertion,
|
||||
the reported start of a successful match can be greater than the end of
|
||||
the match. For example, if the pattern (?=ab\K) is matched against
|
||||
"ab", the start and end offset values for the match are 2 and 0. In
|
||||
this situation, calling these functions with a zero substring number
|
||||
extracts a zero-length empty string.
|
||||
|
||||
You can find the length in code units of a captured substring without
|
||||
extracting it by calling pcre2_substring_length_bynumber(). The first
|
||||
argument is a pointer to the match data block, the second is the group
|
||||
number, and the third is a pointer to a variable into which the length
|
||||
is placed. If you just want to know whether or not the substring has
|
||||
been captured, you can pass the third argument as NULL.
|
||||
|
||||
The pcre2_substring_copy_bynumber() function copies a captured sub-
|
||||
string into a supplied buffer, whereas pcre2_substring_get_bynumber()
|
||||
copies it into new memory, obtained using the same memory allocation
|
||||
function that was used for the match data block. The first two argu-
|
||||
ments of these functions are a pointer to the match data block and a
|
||||
The pcre2_substring_copy_bynumber() function copies a captured sub-
|
||||
string into a supplied buffer, whereas pcre2_substring_get_bynumber()
|
||||
copies it into new memory, obtained using the same memory allocation
|
||||
function that was used for the match data block. The first two argu-
|
||||
ments of these functions are a pointer to the match data block and a
|
||||
capturing group number.
|
||||
|
||||
The final arguments of pcre2_substring_copy_bynumber() are a pointer to
|
||||
|
@ -2260,23 +2287,25 @@ EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
|
|||
for the extracted substring, excluding the terminating zero.
|
||||
|
||||
For pcre2_substring_get_bynumber() the third and fourth arguments point
|
||||
to variables that are updated with a pointer to the new memory and the
|
||||
number of code units that comprise the substring, again excluding the
|
||||
terminating zero. When the substring is no longer needed, the memory
|
||||
to variables that are updated with a pointer to the new memory and the
|
||||
number of code units that comprise the substring, again excluding the
|
||||
terminating zero. When the substring is no longer needed, the memory
|
||||
should be freed by calling pcre2_substring_free().
|
||||
|
||||
The return value from all these functions is zero for success, or a
|
||||
negative error code. If the pattern match failed, the match failure
|
||||
code is returned. Other possible error codes are:
|
||||
The return value from all these functions is zero for success, or a
|
||||
negative error code. If the pattern match failed, the match failure
|
||||
code is returned. If a substring number greater than zero is used
|
||||
after a partial match, PCRE2_ERROR_PARTIAL is returned. Other possible
|
||||
error codes are:
|
||||
|
||||
PCRE2_ERROR_NOMEMORY
|
||||
|
||||
The buffer was too small for pcre2_substring_copy_bynumber(), or the
|
||||
The buffer was too small for pcre2_substring_copy_bynumber(), or the
|
||||
attempt to get memory failed for pcre2_substring_get_bynumber().
|
||||
|
||||
PCRE2_ERROR_NOSUBSTRING
|
||||
|
||||
There is no substring with that number in the pattern, that is, the
|
||||
There is no substring with that number in the pattern, that is, the
|
||||
number is greater than the number of capturing parentheses.
|
||||
|
||||
PCRE2_ERROR_UNAVAILABLE
|
||||
|
@ -2287,8 +2316,8 @@ EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
|
|||
|
||||
PCRE2_ERROR_UNSET
|
||||
|
||||
The substring did not participate in the match. For example, if the
|
||||
pattern is (abc)|(def) and the subject is "def", and the ovector con-
|
||||
The substring did not participate in the match. For example, if the
|
||||
pattern is (abc)|(def) and the subject is "def", and the ovector con-
|
||||
tains at least two capturing slots, substring number 1 is unset.
|
||||
|
||||
|
||||
|
@ -2299,13 +2328,16 @@ EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS
|
|||
|
||||
void pcre2_substring_list_free(PCRE2_SPTR *list);
|
||||
|
||||
The pcre2_substring_list_get() function extracts all available sub-
|
||||
strings and builds a list of pointers to them. It also (optionally)
|
||||
builds a second list that contains their lengths (in code units),
|
||||
The pcre2_substring_list_get() function extracts all available sub-
|
||||
strings and builds a list of pointers to them. It also (optionally)
|
||||
builds a second list that contains their lengths (in code units),
|
||||
excluding a terminating zero that is added to each of them. All this is
|
||||
done in a single block of memory that is obtained using the same memory
|
||||
allocation function that was used to get the match data block.
|
||||
|
||||
This function must be called only after a successful match. If called
|
||||
after a partial match, the error code PCRE2_ERROR_PARTIAL is returned.
|
||||
|
||||
The address of the memory block is returned via listptr, which is also
|
||||
the start of the list of string pointers. The end of the list is marked
|
||||
by a NULL pointer. The address of the list of lengths is returned via
|
||||
|
@ -2694,7 +2726,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 14 December 2014
|
||||
Last updated: 22 December 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -4314,9 +4346,9 @@ PARTIAL MATCHING USING pcre2_match()
|
|||
string at the end of the subject.
|
||||
|
||||
When a partial match is returned, the first two elements in the ovector
|
||||
point to the portion of the subject that was matched. The appearance of
|
||||
\K in the pattern has no effect for a partial match. Consider this pat-
|
||||
tern:
|
||||
point to the portion of the subject that was matched, but the values in
|
||||
the rest of the ovector are undefined. The appearance of \K in the pat-
|
||||
tern has no effect for a partial match. Consider this pattern:
|
||||
|
||||
/abc\K123/
|
||||
|
||||
|
@ -4678,7 +4710,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 14 October 2014
|
||||
Last updated: 22 December 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_SUBSTRING_LENGTH_BYNUMBER 3 "01 December 2014" "PCRE2 10.00"
|
||||
.TH PCRE2_SUBSTRING_LENGTH_BYNUMBER 3 "22 December 2014" "PCRE2 10.00"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -19,9 +19,11 @@ The arguments are:
|
|||
.sp
|
||||
\fImatch_data\fP The match data block for the match
|
||||
\fInumber\fP The substring number
|
||||
\fIlength\fP Where to return the length
|
||||
\fIlength\fP Where to return the length, or NULL
|
||||
.sp
|
||||
The yield is zero on success, or an error code if the substring is not found.
|
||||
The third argument may be NULL if all you want to know is whether or not a
|
||||
substring is set. The yield is zero on success, or a negative error code
|
||||
otherwise. After a partial match, only substring 0 is available.
|
||||
.P
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
.\" HREF
|
||||
|
|
105
doc/pcre2api.3
105
doc/pcre2api.3
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "14 December 2014" "PCRE2 10.00"
|
||||
.TH PCRE2API 3 "22 December 2014" "PCRE2 10.00"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -1736,6 +1736,11 @@ other match data
|
|||
.\"
|
||||
below.
|
||||
.P
|
||||
When a call of \fBpcre2_match()\fP fails, valid data is available in the match
|
||||
block only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ERROR_PARTIAL, or one
|
||||
of the error codes for an invalid UTF string. Exactly what is available depends
|
||||
on the error, and is detailed below.
|
||||
.P
|
||||
When one of the matching functions is called, pointers to the compiled pattern
|
||||
and the subject string are set in the match data block so that they can be
|
||||
referenced by the extraction functions. After running a match, you must not
|
||||
|
@ -2031,9 +2036,9 @@ that do not cause substrings to be captured. The \fBpcre2_pattern_info()\fP
|
|||
function can be used to find out how many capturing subpatterns there are in a
|
||||
compiled pattern.
|
||||
.P
|
||||
The overall matched string and any captured substrings are returned to the
|
||||
caller via a vector of PCRE2_SIZE values. This is called the \fBovector\fP, and
|
||||
is contained within the
|
||||
A successful match returns the overall matched string and any captured
|
||||
substrings to the caller via a vector of PCRE2_SIZE values. This is called the
|
||||
\fBovector\fP, and is contained within the
|
||||
.\" HTML <a href="#matchdatablock">
|
||||
.\" </a>
|
||||
match data block.
|
||||
|
@ -2061,19 +2066,26 @@ offsets, not character offsets. That is, they are byte offsets in the 8-bit
|
|||
library, 16-bit offsets in the 16-bit library, and 32-bit offsets in the 32-bit
|
||||
library.
|
||||
.P
|
||||
The first pair of offsets (that is, \fIovector[0]\fP and \fIovector[1]\fP)
|
||||
identifies the portion of the subject string that was matched by the entire
|
||||
pattern. The next pair is used for the first capturing subpattern, and so on.
|
||||
The value returned by \fBpcre2_match()\fP is one more than the highest numbered
|
||||
pair that has been set. For example, if two substrings have been captured, the
|
||||
returned value is 3. If there are no capturing subpatterns, the return value
|
||||
from a successful match is 1, indicating that just the first pair of offsets
|
||||
has been set.
|
||||
After a partial match (error return PCRE2_ERROR_PARTIAL), only the first pair
|
||||
of offsets (that is, \fIovector[0]\fP and \fIovector[1]\fP) are set. They
|
||||
identify the part of the subject that was partially matched. See the
|
||||
.\" HREF
|
||||
\fBpcre2partial\fP
|
||||
.\"
|
||||
documentation for details of partial matching.
|
||||
.P
|
||||
After a successful match, the first pair of offsets identifies the portion of
|
||||
the subject string that was matched by the entire pattern. The next pair is
|
||||
used for the first capturing subpattern, and so on. The value returned by
|
||||
\fBpcre2_match()\fP is one more than the highest numbered pair that has been
|
||||
set. For example, if two substrings have been captured, the returned value is
|
||||
3. If there are no capturing subpatterns, the return value from a successful
|
||||
match is 1, indicating that just the first pair of offsets has been set.
|
||||
.P
|
||||
If a pattern uses the \eK escape sequence within a positive assertion, the
|
||||
reported start of the match can be greater than the end of the match. For
|
||||
example, if the pattern (?=ab\eK) is matched against "ab", the start and end
|
||||
offset values for the match are 2 and 0.
|
||||
reported start of a successful match can be greater than the end of the match.
|
||||
For example, if the pattern (?=ab\eK) is matched against "ab", the start and
|
||||
end offset values for the match are 2 and 0.
|
||||
.P
|
||||
If a capturing subpattern group is matched repeatedly within a single match
|
||||
operation, it is the last portion of the subject that it matched that is
|
||||
|
@ -2121,21 +2133,35 @@ had.
|
|||
.fi
|
||||
.P
|
||||
As well as the offsets in the ovector, other information about a match is
|
||||
retained in the match data block and can be retrieved by the above functions.
|
||||
retained in the match data block and can be retrieved by the above functions in
|
||||
appropriate circumstances. If they are called at other times, the result is
|
||||
undefined.
|
||||
.P
|
||||
When a (*MARK) name is to be passed back, \fBpcre2_get_mark()\fP returns a
|
||||
pointer to the zero-terminated name, which is within the compiled pattern.
|
||||
Otherwise NULL is returned. A (*MARK) name may be available after a failed
|
||||
match or a partial match, as well as after a successful one.
|
||||
After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure
|
||||
to match (PCRE2_ERROR_NOMATCH), a (*MARK) name may be available, and
|
||||
\fBpcre2_get_mark()\fP can be called. It returns a pointer to the
|
||||
zero-terminated name, which is within the compiled pattern. Otherwise NULL is
|
||||
returned. After a successful match, the (*MARK) name that is returned is the
|
||||
last one encountered on the matching path through the pattern. After a "no
|
||||
match" or a partial match, the last encountered (*MARK) name is returned. For
|
||||
example, consider this pattern:
|
||||
.sp
|
||||
^(*MARK:A)((*MARK:B)a|b)c
|
||||
.sp
|
||||
When it matches "bc", the returned mark is A. The B mark is "seen" in the first
|
||||
branch of the group, but it is not on the matching path. On the other hand,
|
||||
when this pattern fails to match "bx", the returned mark is B.
|
||||
.P
|
||||
The code unit offset of the character at which a successful match started is
|
||||
returned by \fBpcre2_get_startchar()\fP. For a non-partial match, this can be
|
||||
After a successful match, a partial match, or one of the invalid UTF errors
|
||||
(for example, PCRE2_ERROR_UTF8_ERR5), \fBpcre2_get_startchar()\fP can be
|
||||
called. After a successful or partial match it returns the code unit offset of
|
||||
the character at which the match started. For a non-partial match, this can be
|
||||
different to the value of \fIovector[0]\fP if the pattern contains the \eK
|
||||
escape sequence. After a partial match, however, this value is always the same
|
||||
as \fIovector[0]\fP because \eK does not affect the result of a partial match.
|
||||
.P
|
||||
The \fBstartchar\fP field is also used to return the offset of an invalid
|
||||
UTF character when UTF checking fails. Details are given in the
|
||||
After a UTF check failure, \fBpcre2_get_startchar()\fB can be used to obtain
|
||||
the code unit offset of the invalid UTF character. Details are given in the
|
||||
.\" HREF
|
||||
\fBpcre2unicode\fP
|
||||
.\"
|
||||
|
@ -2289,18 +2315,21 @@ Captured substrings can be accessed directly by using the ovector as described
|
|||
above.
|
||||
.\"
|
||||
For convenience, auxiliary functions are provided for extracting captured
|
||||
substrings as new, separate, zero-terminated strings. The functions in this
|
||||
section identify substrings by number. The number zero refers to the entire
|
||||
matched substring, with higher numbers referring to substrings captured by
|
||||
parenthesized groups. The next section describes similar functions for
|
||||
extracting captured substrings by name. A substring that contains a binary zero
|
||||
is correctly extracted and has a further zero added on the end, but the result
|
||||
is not, of course, a C string.
|
||||
substrings as new, separate, zero-terminated strings. A substring that contains
|
||||
a binary zero is correctly extracted and has a further zero added on the end,
|
||||
but the result is not, of course, a C string.
|
||||
.P
|
||||
The functions in this section identify substrings by number. The number zero
|
||||
refers to the entire matched substring, with higher numbers referring to
|
||||
substrings captured by parenthesized groups. After a partial match, only
|
||||
substring zero is available. An attempt to extract any other substring gives
|
||||
the error PCRE2_ERROR_PARTIAL. The next section describes similar functions for
|
||||
extracting captured substrings by name.
|
||||
.P
|
||||
If a pattern uses the \eK escape sequence within a positive assertion, the
|
||||
reported start of the match can be greater than the end of the match. For
|
||||
example, if the pattern (?=ab\eK) is matched against "ab", the start and end
|
||||
offset values for the match are 2 and 0. In this situation, calling these
|
||||
reported start of a successful match can be greater than the end of the match.
|
||||
For example, if the pattern (?=ab\eK) is matched against "ab", the start and
|
||||
end offset values for the match are 2 and 0. In this situation, calling these
|
||||
functions with a zero substring number extracts a zero-length empty string.
|
||||
.P
|
||||
You can find the length in code units of a captured substring without
|
||||
|
@ -2329,7 +2358,8 @@ calling \fBpcre2_substring_free()\fP.
|
|||
.P
|
||||
The return value from all these functions is zero for success, or a negative
|
||||
error code. If the pattern match failed, the match failure code is returned.
|
||||
Other possible error codes are:
|
||||
If a substring number greater than zero is used after a partial match,
|
||||
PCRE2_ERROR_PARTIAL is returned. Other possible error codes are:
|
||||
.sp
|
||||
PCRE2_ERROR_NOMEMORY
|
||||
.sp
|
||||
|
@ -2371,6 +2401,9 @@ that is added to each of them. All this is done in a single block of memory
|
|||
that is obtained using the same memory allocation function that was used to get
|
||||
the match data block.
|
||||
.P
|
||||
This function must be called only after a successful match. If called after a
|
||||
partial match, the error code PCRE2_ERROR_PARTIAL is returned.
|
||||
.P
|
||||
The address of the memory block is returned via \fIlistptr\fP, which is also
|
||||
the start of the list of string pointers. The end of the list is marked by a
|
||||
NULL pointer. The address of the list of lengths is returned via
|
||||
|
@ -2802,6 +2835,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 14 December 2014
|
||||
Last updated: 22 December 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2PARTIAL 3 "14 October 2014" "PCRE2 10.00"
|
||||
.TH PCRE2PARTIAL 3 "22 December 2014" "PCRE2 10.00"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions
|
||||
.SH "PARTIAL MATCHING IN PCRE2"
|
||||
|
@ -64,8 +64,9 @@ matched; without such a restriction there would always be a partial match of an
|
|||
empty string at the end of the subject.
|
||||
.P
|
||||
When a partial match is returned, the first two elements in the ovector point
|
||||
to the portion of the subject that was matched. The appearance of \eK in the
|
||||
pattern has no effect for a partial match. Consider this pattern:
|
||||
to the portion of the subject that was matched, but the values in the rest of
|
||||
the ovector are undefined. The appearance of \eK in the pattern has no effect
|
||||
for a partial match. Consider this pattern:
|
||||
.sp
|
||||
/abc\eK123/
|
||||
.sp
|
||||
|
@ -428,6 +429,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 14 October 2014
|
||||
Last updated: 22 December 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -312,9 +312,15 @@ PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
|||
pcre2_substring_length_bynumber(pcre2_match_data *match_data,
|
||||
uint32_t stringnumber, PCRE2_SIZE *sizeptr)
|
||||
{
|
||||
int count;
|
||||
PCRE2_SIZE left, right;
|
||||
if ((count = match_data->rc) < 0) return count; /* Match failed */
|
||||
int count = match_data->rc;
|
||||
if (count == PCRE2_ERROR_PARTIAL)
|
||||
{
|
||||
if (stringnumber > 0) return PCRE2_ERROR_PARTIAL;
|
||||
count = 0;
|
||||
}
|
||||
else if (count < 0) return count; /* Match failed */
|
||||
|
||||
if (match_data->matchedby != PCRE2_MATCHEDBY_DFA_INTERPRETER)
|
||||
{
|
||||
if (stringnumber > match_data->code->top_bracket)
|
||||
|
@ -329,6 +335,7 @@ else /* Matched using pcre2_dfa_match() */
|
|||
if (stringnumber >= match_data->oveccount) return PCRE2_ERROR_UNAVAILABLE;
|
||||
if (count != 0 && stringnumber >= (uint32_t)count) return PCRE2_ERROR_UNSET;
|
||||
}
|
||||
|
||||
left = match_data->ovector[stringnumber*2];
|
||||
right = match_data->ovector[stringnumber*2+1];
|
||||
if (sizeptr != NULL) *sizeptr = (left > right)? 0 : right - left;
|
||||
|
|
435
src/pcre2test.c
435
src/pcre2test.c
|
@ -4233,6 +4233,232 @@ return (cb->callout_number != dat_datctl.cfail[0])? 0 :
|
|||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Handle *MARK and copy/get tests *
|
||||
*************************************************/
|
||||
|
||||
/* This function is called after complete and partial matches. It runs the
|
||||
tests for substring extraction.
|
||||
|
||||
Arguments:
|
||||
utf TRUE for utf
|
||||
capcount return from pcre2_match()
|
||||
|
||||
Returns: nothing
|
||||
*/
|
||||
|
||||
static void
|
||||
copy_and_get(BOOL utf, int capcount)
|
||||
{
|
||||
int i;
|
||||
uint8_t *nptr;
|
||||
|
||||
/* Test copy strings by number */
|
||||
|
||||
for (i = 0; i < MAXCPYGET && dat_datctl.copy_numbers[i] >= 0; i++)
|
||||
{
|
||||
int rc;
|
||||
PCRE2_SIZE length, length2;
|
||||
uint32_t copybuffer[256];
|
||||
uint32_t n = (uint32_t)(dat_datctl.copy_numbers[i]);
|
||||
length = sizeof(copybuffer)/code_unit_size;
|
||||
PCRE2_SUBSTRING_COPY_BYNUMBER(rc, match_data, n, copybuffer, &length);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Copy substring %d failed (%d): ", n, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else
|
||||
{
|
||||
PCRE2_SUBSTRING_LENGTH_BYNUMBER(rc, match_data, n, &length2);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Get substring %d length failed (%d): ", n, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else if (length2 != length)
|
||||
{
|
||||
fprintf(outfile, "Mismatched substring lengths: %ld %ld\n",
|
||||
length, length2);
|
||||
}
|
||||
fprintf(outfile, "%2dC ", n);
|
||||
PCHARSV(copybuffer, 0, length, utf, outfile);
|
||||
fprintf(outfile, " (%lu)\n", (unsigned long)length);
|
||||
}
|
||||
}
|
||||
|
||||
/* Test copy strings by name */
|
||||
|
||||
nptr = dat_datctl.copy_names;
|
||||
for (;;)
|
||||
{
|
||||
int rc;
|
||||
int groupnumber;
|
||||
PCRE2_SIZE length, length2;
|
||||
uint32_t copybuffer[256];
|
||||
int namelen = strlen((const char *)nptr);
|
||||
#if defined SUPPORT_PCRE2_16 || defined SUPPORT_PCRE2_32
|
||||
PCRE2_SIZE cnl = namelen;
|
||||
#endif
|
||||
if (namelen == 0) break;
|
||||
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
if (test_mode == PCRE8_MODE) strcpy((char *)pbuffer8, (char *)nptr);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_16
|
||||
if (test_mode == PCRE16_MODE)(void)to16(nptr, utf, &cnl);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_32
|
||||
if (test_mode == PCRE32_MODE)(void)to32(nptr, utf, &cnl);
|
||||
#endif
|
||||
|
||||
PCRE2_SUBSTRING_NUMBER_FROM_NAME(groupnumber, compiled_code, pbuffer);
|
||||
if (groupnumber < 0 && groupnumber != PCRE2_ERROR_NOUNIQUESUBSTRING)
|
||||
fprintf(outfile, "Number not found for group '%s'\n", nptr);
|
||||
|
||||
length = sizeof(copybuffer)/code_unit_size;
|
||||
PCRE2_SUBSTRING_COPY_BYNAME(rc, match_data, pbuffer, copybuffer, &length);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Copy substring '%s' failed (%d): ", nptr, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else
|
||||
{
|
||||
PCRE2_SUBSTRING_LENGTH_BYNAME(rc, match_data, pbuffer, &length2);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Get substring '%s' length failed (%d): ", nptr, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else if (length2 != length)
|
||||
{
|
||||
fprintf(outfile, "Mismatched substring lengths: %ld %ld\n",
|
||||
length, length2);
|
||||
}
|
||||
fprintf(outfile, " C ");
|
||||
PCHARSV(copybuffer, 0, length, utf, outfile);
|
||||
fprintf(outfile, " (%lu) %s", (unsigned long)length, nptr);
|
||||
if (groupnumber >= 0) fprintf(outfile, " (group %d)\n", groupnumber);
|
||||
else fprintf(outfile, " (non-unique)\n");
|
||||
}
|
||||
nptr += namelen + 1;
|
||||
}
|
||||
|
||||
/* Test get strings by number */
|
||||
|
||||
for (i = 0; i < MAXCPYGET && dat_datctl.get_numbers[i] >= 0; i++)
|
||||
{
|
||||
int rc;
|
||||
PCRE2_SIZE length;
|
||||
void *gotbuffer;
|
||||
uint32_t n = (uint32_t)(dat_datctl.get_numbers[i]);
|
||||
PCRE2_SUBSTRING_GET_BYNUMBER(rc, match_data, n, &gotbuffer, &length);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Get substring %d failed (%d): ", n, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else
|
||||
{
|
||||
fprintf(outfile, "%2dG ", n);
|
||||
PCHARSV(gotbuffer, 0, length, utf, outfile);
|
||||
fprintf(outfile, " (%lu)\n", (unsigned long)length);
|
||||
PCRE2_SUBSTRING_FREE(gotbuffer);
|
||||
}
|
||||
}
|
||||
|
||||
/* Test get strings by name */
|
||||
|
||||
nptr = dat_datctl.get_names;
|
||||
for (;;)
|
||||
{
|
||||
PCRE2_SIZE length;
|
||||
void *gotbuffer;
|
||||
int rc;
|
||||
int groupnumber;
|
||||
int namelen = strlen((const char *)nptr);
|
||||
#if defined SUPPORT_PCRE2_16 || defined SUPPORT_PCRE2_32
|
||||
PCRE2_SIZE cnl = namelen;
|
||||
#endif
|
||||
if (namelen == 0) break;
|
||||
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
if (test_mode == PCRE8_MODE) strcpy((char *)pbuffer8, (char *)nptr);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_16
|
||||
if (test_mode == PCRE16_MODE)(void)to16(nptr, utf, &cnl);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_32
|
||||
if (test_mode == PCRE32_MODE)(void)to32(nptr, utf, &cnl);
|
||||
#endif
|
||||
|
||||
PCRE2_SUBSTRING_NUMBER_FROM_NAME(groupnumber, compiled_code, pbuffer);
|
||||
if (groupnumber < 0 && groupnumber != PCRE2_ERROR_NOUNIQUESUBSTRING)
|
||||
fprintf(outfile, "Number not found for group '%s'\n", nptr);
|
||||
|
||||
PCRE2_SUBSTRING_GET_BYNAME(rc, match_data, pbuffer, &gotbuffer, &length);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Get substring '%s' failed (%d): ", nptr, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else
|
||||
{
|
||||
fprintf(outfile, " G ");
|
||||
PCHARSV(gotbuffer, 0, length, utf, outfile);
|
||||
fprintf(outfile, " (%lu) %s", (unsigned long)length, nptr);
|
||||
if (groupnumber >= 0) fprintf(outfile, " (group %d)\n", groupnumber);
|
||||
else fprintf(outfile, " (non-unique)\n");
|
||||
PCRE2_SUBSTRING_FREE(gotbuffer);
|
||||
}
|
||||
nptr += namelen + 1;
|
||||
}
|
||||
|
||||
/* Test getting the complete list of captured strings. */
|
||||
|
||||
if ((dat_datctl.control & CTL_GETALL) != 0)
|
||||
{
|
||||
int rc;
|
||||
void **stringlist;
|
||||
PCRE2_SIZE *lengths;
|
||||
PCRE2_SUBSTRING_LIST_GET(rc, match_data, &stringlist, &lengths);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "get substring list failed (%d): ", rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else
|
||||
{
|
||||
for (i = 0; i < capcount; i++)
|
||||
{
|
||||
fprintf(outfile, "%2dL ", i);
|
||||
PCHARSV(stringlist[i], 0, lengths[i], utf, outfile);
|
||||
putc('\n', outfile);
|
||||
}
|
||||
if (stringlist[i] != NULL)
|
||||
fprintf(outfile, "string list not terminated by NULL\n");
|
||||
PCRE2_SUBSTRING_LIST_FREE(stringlist);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*************************************************
|
||||
* Process a data line *
|
||||
*************************************************/
|
||||
|
@ -5074,7 +5300,6 @@ else for (gmatched = 0;; gmatched++)
|
|||
{
|
||||
int i;
|
||||
uint32_t oveccount;
|
||||
uint8_t *nptr;
|
||||
|
||||
/* This is a check against a lunatic return value. */
|
||||
|
||||
|
@ -5239,7 +5464,7 @@ else for (gmatched = 0;; gmatched++)
|
|||
}
|
||||
}
|
||||
|
||||
/* Output mark data if requested. */
|
||||
/* Output (*MARK) data if requested */
|
||||
|
||||
if ((dat_datctl.control & CTL_MARK) != 0 &&
|
||||
TESTFLD(match_data, mark, !=, NULL))
|
||||
|
@ -5249,208 +5474,10 @@ else for (gmatched = 0;; gmatched++)
|
|||
fprintf(outfile, "\n");
|
||||
}
|
||||
|
||||
/* Test copy strings by number */
|
||||
/* Process copy/get strings */
|
||||
|
||||
for (i = 0; i < MAXCPYGET && dat_datctl.copy_numbers[i] >= 0; i++)
|
||||
{
|
||||
int rc;
|
||||
PCRE2_SIZE length, length2;
|
||||
uint32_t copybuffer[256];
|
||||
uint32_t n = (uint32_t)(dat_datctl.copy_numbers[i]);
|
||||
length = sizeof(copybuffer)/code_unit_size;
|
||||
PCRE2_SUBSTRING_COPY_BYNUMBER(rc, match_data, n, copybuffer, &length);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Copy substring %d failed (%d): ", n, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else
|
||||
{
|
||||
PCRE2_SUBSTRING_LENGTH_BYNUMBER(rc, match_data, n, &length2);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Get substring %d length failed (%d): ", n, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else if (length2 != length)
|
||||
{
|
||||
fprintf(outfile, "Mismatched substring lengths: %ld %ld\n",
|
||||
length, length2);
|
||||
}
|
||||
fprintf(outfile, "%2dC ", n);
|
||||
PCHARSV(copybuffer, 0, length, utf, outfile);
|
||||
fprintf(outfile, " (%lu)\n", (unsigned long)length);
|
||||
}
|
||||
}
|
||||
copy_and_get(utf, capcount);
|
||||
|
||||
/* Test copy strings by name */
|
||||
|
||||
nptr = dat_datctl.copy_names;
|
||||
for (;;)
|
||||
{
|
||||
int rc;
|
||||
int groupnumber;
|
||||
PCRE2_SIZE length, length2;
|
||||
uint32_t copybuffer[256];
|
||||
int namelen = strlen((const char *)nptr);
|
||||
#if defined SUPPORT_PCRE2_16 || defined SUPPORT_PCRE2_32
|
||||
PCRE2_SIZE cnl = namelen;
|
||||
#endif
|
||||
if (namelen == 0) break;
|
||||
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
if (test_mode == PCRE8_MODE) strcpy((char *)pbuffer8, (char *)nptr);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_16
|
||||
if (test_mode == PCRE16_MODE)(void)to16(nptr, utf, &cnl);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_32
|
||||
if (test_mode == PCRE32_MODE)(void)to32(nptr, utf, &cnl);
|
||||
#endif
|
||||
|
||||
PCRE2_SUBSTRING_NUMBER_FROM_NAME(groupnumber, compiled_code, pbuffer);
|
||||
if (groupnumber < 0 && groupnumber != PCRE2_ERROR_NOUNIQUESUBSTRING)
|
||||
fprintf(outfile, "Number not found for group '%s'\n", nptr);
|
||||
|
||||
length = sizeof(copybuffer)/code_unit_size;
|
||||
PCRE2_SUBSTRING_COPY_BYNAME(rc, match_data, pbuffer, copybuffer, &length);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Copy substring '%s' failed (%d): ", nptr, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else
|
||||
{
|
||||
PCRE2_SUBSTRING_LENGTH_BYNAME(rc, match_data, pbuffer, &length2);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Get substring '%s' length failed (%d): ", nptr, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else if (length2 != length)
|
||||
{
|
||||
fprintf(outfile, "Mismatched substring lengths: %ld %ld\n",
|
||||
length, length2);
|
||||
}
|
||||
fprintf(outfile, " C ");
|
||||
PCHARSV(copybuffer, 0, length, utf, outfile);
|
||||
fprintf(outfile, " (%lu) %s", (unsigned long)length, nptr);
|
||||
if (groupnumber >= 0) fprintf(outfile, " (group %d)\n", groupnumber);
|
||||
else fprintf(outfile, " (non-unique)\n");
|
||||
}
|
||||
nptr += namelen + 1;
|
||||
}
|
||||
|
||||
/* Test get strings by number */
|
||||
|
||||
for (i = 0; i < MAXCPYGET && dat_datctl.get_numbers[i] >= 0; i++)
|
||||
{
|
||||
int rc;
|
||||
PCRE2_SIZE length;
|
||||
void *gotbuffer;
|
||||
uint32_t n = (uint32_t)(dat_datctl.get_numbers[i]);
|
||||
PCRE2_SUBSTRING_GET_BYNUMBER(rc, match_data, n, &gotbuffer, &length);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Get substring %d failed (%d): ", n, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else
|
||||
{
|
||||
fprintf(outfile, "%2dG ", n);
|
||||
PCHARSV(gotbuffer, 0, length, utf, outfile);
|
||||
fprintf(outfile, " (%lu)\n", (unsigned long)length);
|
||||
PCRE2_SUBSTRING_FREE(gotbuffer);
|
||||
}
|
||||
}
|
||||
|
||||
/* Test get strings by name */
|
||||
|
||||
nptr = dat_datctl.get_names;
|
||||
for (;;)
|
||||
{
|
||||
PCRE2_SIZE length;
|
||||
void *gotbuffer;
|
||||
int rc;
|
||||
int groupnumber;
|
||||
int namelen = strlen((const char *)nptr);
|
||||
#if defined SUPPORT_PCRE2_16 || defined SUPPORT_PCRE2_32
|
||||
PCRE2_SIZE cnl = namelen;
|
||||
#endif
|
||||
if (namelen == 0) break;
|
||||
|
||||
#ifdef SUPPORT_PCRE2_8
|
||||
if (test_mode == PCRE8_MODE) strcpy((char *)pbuffer8, (char *)nptr);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_16
|
||||
if (test_mode == PCRE16_MODE)(void)to16(nptr, utf, &cnl);
|
||||
#endif
|
||||
#ifdef SUPPORT_PCRE2_32
|
||||
if (test_mode == PCRE32_MODE)(void)to32(nptr, utf, &cnl);
|
||||
#endif
|
||||
|
||||
PCRE2_SUBSTRING_NUMBER_FROM_NAME(groupnumber, compiled_code, pbuffer);
|
||||
if (groupnumber < 0 && groupnumber != PCRE2_ERROR_NOUNIQUESUBSTRING)
|
||||
fprintf(outfile, "Number not found for group '%s'\n", nptr);
|
||||
|
||||
PCRE2_SUBSTRING_GET_BYNAME(rc, match_data, pbuffer, &gotbuffer, &length);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "Get substring '%s' failed (%d): ", nptr, rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else
|
||||
{
|
||||
fprintf(outfile, " G ");
|
||||
PCHARSV(gotbuffer, 0, length, utf, outfile);
|
||||
fprintf(outfile, " (%lu) %s", (unsigned long)length, nptr);
|
||||
if (groupnumber >= 0) fprintf(outfile, " (group %d)\n", groupnumber);
|
||||
else fprintf(outfile, " (non-unique)\n");
|
||||
PCRE2_SUBSTRING_FREE(gotbuffer);
|
||||
}
|
||||
nptr += namelen + 1;
|
||||
}
|
||||
|
||||
/* Test getting the complete list of captured strings. */
|
||||
|
||||
if ((dat_datctl.control & CTL_GETALL) != 0)
|
||||
{
|
||||
int rc;
|
||||
void **stringlist;
|
||||
PCRE2_SIZE *lengths;
|
||||
PCRE2_SUBSTRING_LIST_GET(rc, match_data, &stringlist, &lengths);
|
||||
if (rc < 0)
|
||||
{
|
||||
fprintf(outfile, "get substring list failed (%d): ", rc);
|
||||
PCRE2_GET_ERROR_MESSAGE(rc, rc, pbuffer);
|
||||
PCHARSV(CASTVAR(void *, pbuffer), 0, rc, FALSE, outfile);
|
||||
fprintf(outfile, "\n");
|
||||
}
|
||||
else
|
||||
{
|
||||
for (i = 0; i < capcount; i++)
|
||||
{
|
||||
fprintf(outfile, "%2dL ", i);
|
||||
PCHARSV(stringlist[i], 0, lengths[i], utf, outfile);
|
||||
putc('\n', outfile);
|
||||
}
|
||||
if (stringlist[i] != NULL)
|
||||
fprintf(outfile, "string list not terminated by NULL\n");
|
||||
PCRE2_SUBSTRING_LIST_FREE(stringlist);
|
||||
}
|
||||
}
|
||||
} /* End of handling a successful match */
|
||||
|
||||
/* There was a partial match. The value of ovector[0] is the bumpalong point,
|
||||
|
@ -5489,6 +5516,10 @@ else for (gmatched = 0;; gmatched++)
|
|||
fprintf(outfile, "\n");
|
||||
}
|
||||
|
||||
/* Process copy/get strings */
|
||||
|
||||
copy_and_get(utf, 1);
|
||||
|
||||
break; /* Out of the /g loop */
|
||||
} /* End of handling partial match */
|
||||
|
||||
|
|
|
@ -4097,4 +4097,7 @@ a random value. /Ix
|
|||
a\=ovector=2,copy=A,get=A,get=2
|
||||
b\=ovector=2,copy=A,get=A,get=2
|
||||
|
||||
/a(b)c(d)/
|
||||
abc\=ph,copy=0,copy=1,getall
|
||||
|
||||
# End of testinput2
|
||||
|
|
|
@ -4808,4 +4808,7 @@
|
|||
a\=ovector=2,get=1,get=2,getall
|
||||
aaa\=ovector=2,get=1,get=2,getall
|
||||
|
||||
/a(b)c(d)/
|
||||
abc\=ph,copy=0,copy=1,getall
|
||||
|
||||
# End of testinput6
|
||||
|
|
|
@ -13762,4 +13762,11 @@ Copy substring 'A' failed (-55): requested value is not set
|
|||
Get substring 2 failed (-54): requested value is not available
|
||||
Get substring 'A' failed (-55): requested value is not set
|
||||
|
||||
/a(b)c(d)/
|
||||
abc\=ph,copy=0,copy=1,getall
|
||||
Partial match: abc
|
||||
0C abc (3)
|
||||
Copy substring 1 failed (-2): partial match
|
||||
get substring list failed (-2): partial match
|
||||
|
||||
# End of testinput2
|
||||
|
|
|
@ -7766,4 +7766,11 @@ Get substring 2 failed (-54): requested value is not available
|
|||
0L aaa
|
||||
1L aa
|
||||
|
||||
/a(b)c(d)/
|
||||
abc\=ph,copy=0,copy=1,getall
|
||||
Partial match: abc
|
||||
0C abc (3)
|
||||
Copy substring 1 failed (-2): partial match
|
||||
get substring list failed (-2): partial match
|
||||
|
||||
# End of testinput6
|
||||
|
|
Loading…
Reference in New Issue