Implement PCRE2_COPY_MATCHED_SUBJECT.
This commit is contained in:
parent
971f885277
commit
f90ce1a333
|
@ -37,6 +37,10 @@ src/pcre2_chartables.c.dist are updated.
|
||||||
ranges such as a-z in EBCDIC environments. The original code probably never
|
ranges such as a-z in EBCDIC environments. The original code probably never
|
||||||
worked, though there were no bug reports.
|
worked, though there were no bug reports.
|
||||||
|
|
||||||
|
10. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via
|
||||||
|
pcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast
|
||||||
|
path.
|
||||||
|
|
||||||
|
|
||||||
Version 10.32 10-September-2018
|
Version 10.32 10-September-2018
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
|
@ -51,6 +51,8 @@ depth limits. The <i>length</i> and <i>startoffset</i> values are code units, no
|
||||||
characters. The options are:
|
characters. The options are:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ANCHORED Match only at the first position
|
PCRE2_ANCHORED Match only at the first position
|
||||||
|
PCRE2_COPY_MATCHED_SUBJECT
|
||||||
|
On success, make a private subject copy
|
||||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||||
PCRE2_NOTBOL Subject is not the beginning of a line
|
PCRE2_NOTBOL Subject is not the beginning of a line
|
||||||
PCRE2_NOTEOL Subject is not the end of a line
|
PCRE2_NOTEOL Subject is not the end of a line
|
||||||
|
|
|
@ -55,11 +55,13 @@ A match context is needed only if you want to:
|
||||||
Change the backtracking depth limit
|
Change the backtracking depth limit
|
||||||
Set custom memory management specifically for the match
|
Set custom memory management specifically for the match
|
||||||
</pre>
|
</pre>
|
||||||
The <i>length</i> and <i>startoffset</i> values are code
|
The <i>length</i> and <i>startoffset</i> values are code units, not characters.
|
||||||
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
The length may be given as PCRE2_ZERO_TERMINATED for a subject that is
|
||||||
subject that is terminated by a binary zero code unit. The options are:
|
terminated by a binary zero code unit. The options are:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ANCHORED Match only at the first position
|
PCRE2_ANCHORED Match only at the first position
|
||||||
|
PCRE2_COPY_MATCHED_SUBJECT
|
||||||
|
On success, make a private subject copy
|
||||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||||
PCRE2_NOTBOL Subject string is not the beginning of a line
|
PCRE2_NOTBOL Subject string is not the beginning of a line
|
||||||
PCRE2_NOTEOL Subject string is not the end of a line
|
PCRE2_NOTEOL Subject string is not the end of a line
|
||||||
|
|
|
@ -31,6 +31,11 @@ using the memory freeing function from the general context or compiled pattern
|
||||||
with which it was created, or <b>free()</b> if that was not set.
|
with which it was created, or <b>free()</b> if that was not set.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this
|
||||||
|
match data block, the copy of the subject that was remembered with the block is
|
||||||
|
also freed.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
There is a complete description of the PCRE2 native API in the
|
There is a complete description of the PCRE2 native API in the
|
||||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
page and a description of the POSIX API in the
|
page and a description of the POSIX API in the
|
||||||
|
|
|
@ -1305,10 +1305,13 @@ NULL.
|
||||||
NOTE: When one of the matching functions is called, pointers to the compiled
|
NOTE: When one of the matching functions is called, pointers to the compiled
|
||||||
pattern and the subject string are set in the match data block so that they can
|
pattern and the subject string are set in the match data block so that they can
|
||||||
be referenced by the substring extraction functions. After running a match, you
|
be referenced by the substring extraction functions. After running a match, you
|
||||||
must not free a compiled pattern (or a subject string) until after all
|
must not free a compiled pattern or a subject string until after all
|
||||||
operations on the
|
operations on the
|
||||||
<a href="#matchdatablock">match data block</a>
|
<a href="#matchdatablock">match data block</a>
|
||||||
have taken place.
|
have taken place, unless, in the case of the subject string, you have used the
|
||||||
|
PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
|
||||||
|
"Option bits for <b>pcre2_match()</b>"
|
||||||
|
<a href="#matchoptions>">below.</a>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
|
The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
|
||||||
|
@ -2419,7 +2422,10 @@ When one of the matching functions is called, pointers to the compiled pattern
|
||||||
and the subject string are set in the match data block so that they can be
|
and the subject string are set in the match data block so that they can be
|
||||||
referenced by the extraction functions. After running a match, you must not
|
referenced by the extraction functions. After running a match, you must not
|
||||||
free a compiled pattern or a subject string until after all operations on the
|
free a compiled pattern or a subject string until after all operations on the
|
||||||
match data block (for that match) have taken place.
|
match data block (for that match) have taken place, unless, in the case of the
|
||||||
|
subject string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is
|
||||||
|
described in the section entitled "Option bits for <b>pcre2_match()</b>"
|
||||||
|
<a href="#matchoptions>">below.</a>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When a match data block itself is no longer needed, it should be freed by
|
When a match data block itself is no longer needed, it should be freed by
|
||||||
|
@ -2531,10 +2537,10 @@ Option bits for <b>pcre2_match()</b>
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The unused bits of the <i>options</i> argument for <b>pcre2_match()</b> must be
|
The unused bits of the <i>options</i> argument for <b>pcre2_match()</b> must be
|
||||||
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
|
zero. The only bits that may be set are PCRE2_ANCHORED,
|
||||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
|
||||||
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT.
|
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
|
||||||
Their action is described below.
|
PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
|
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
|
||||||
|
@ -2549,6 +2555,22 @@ matching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out
|
||||||
to be anchored by virtue of its contents, it cannot be made unachored at
|
to be anchored by virtue of its contents, it cannot be made unachored at
|
||||||
matching time. Note that setting the option at match time disables JIT
|
matching time. Note that setting the option at match time disables JIT
|
||||||
matching.
|
matching.
|
||||||
|
<pre>
|
||||||
|
PCRE2_COPY_MATCHED_SUBJECT
|
||||||
|
</pre>
|
||||||
|
By default, a pointer to the subject is remembered in the match data block so
|
||||||
|
that, after a successful match, it can be referenced by the substring
|
||||||
|
extraction functions. This means that the subject's memory must not be freed
|
||||||
|
until all such operations are complete. For some applications where the
|
||||||
|
lifetime of the subject string is not guaranteed, it may be necessary to make a
|
||||||
|
copy of the subject string, but it is wasteful to do this unless the match is
|
||||||
|
successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
|
||||||
|
subject is copied and the new pointer is remembered in the match data block
|
||||||
|
instead of the original subject pointer. The memory allocator that was used for
|
||||||
|
the match block itself is used. The copy is automatically freed when
|
||||||
|
<b>pcre2_match_data_free()</b> is called to free the match data block. It is also
|
||||||
|
automatically freed if the match data block is re-used for another match
|
||||||
|
operation.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ENDANCHORED
|
PCRE2_ENDANCHORED
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2954,7 +2976,8 @@ The backtracking match limit was reached.
|
||||||
If a pattern contains many nested backtracking points, heap memory is used to
|
If a pattern contains many nested backtracking points, heap memory is used to
|
||||||
remember them. This error is given when the memory allocation function (default
|
remember them. This error is given when the memory allocation function (default
|
||||||
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
|
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
|
||||||
if the amount of memory needed exceeds the heap limit.
|
if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is
|
||||||
|
also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_NULL
|
PCRE2_ERROR_NULL
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -3584,11 +3607,12 @@ Option bits for <b>pcre_dfa_match()</b>
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The unused bits of the <i>options</i> argument for <b>pcre2_dfa_match()</b> must
|
The unused bits of the <i>options</i> argument for <b>pcre2_dfa_match()</b> must
|
||||||
be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
|
be zero. The only bits that may be set are PCRE2_ANCHORED,
|
||||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
|
||||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST,
|
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
|
||||||
and PCRE2_DFA_RESTART. All but the last four of these are exactly the same as
|
PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last
|
||||||
for <b>pcre2_match()</b>, so their description is not repeated here.
|
four of these are exactly the same as for <b>pcre2_match()</b>, so their
|
||||||
|
description is not repeated here.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_PARTIAL_HARD
|
PCRE2_PARTIAL_HARD
|
||||||
PCRE2_PARTIAL_SOFT
|
PCRE2_PARTIAL_SOFT
|
||||||
|
@ -3732,7 +3756,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 21 September 2018
|
Last updated: 16 October 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -147,9 +147,10 @@ pattern.
|
||||||
<br><a name="SEC4" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br>
|
<br><a name="SEC4" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br>
|
||||||
<P>
|
<P>
|
||||||
The <b>pcre2_match()</b> options that are supported for JIT matching are
|
The <b>pcre2_match()</b> options that are supported for JIT matching are
|
||||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
|
||||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
|
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
|
||||||
PCRE2_ANCHORED option is not supported at match time.
|
PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not
|
||||||
|
supported at match time.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If the PCRE2_NO_JIT option is passed to <b>pcre2_match()</b> it disables the
|
If the PCRE2_NO_JIT option is passed to <b>pcre2_match()</b> it disables the
|
||||||
|
@ -402,10 +403,13 @@ processed by <b>pcre2_jit_compile()</b>).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The fast path function is called <b>pcre2_jit_match()</b>, and it takes exactly
|
The fast path function is called <b>pcre2_jit_match()</b>, and it takes exactly
|
||||||
the same arguments as <b>pcre2_match()</b>. The return values are also the same,
|
the same arguments as <b>pcre2_match()</b>. However, the subject string must be
|
||||||
plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is
|
specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
|
||||||
requested that was not compiled. Unsupported option bits (for example,
|
option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
|
||||||
PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option.
|
PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
|
||||||
|
return values are also the same as for <b>pcre2_match()</b>, plus
|
||||||
|
PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
|
||||||
|
that was not compiled.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
|
When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
|
||||||
|
@ -434,7 +438,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC13" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC13" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 28 June 2018
|
Last updated: 16 October 2018
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2018 University of Cambridge.
|
Copyright © 1997-2018 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
699
doc/pcre2.txt
699
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_DFA_MATCH 3 "26 April 2018" "PCRE2 10.32"
|
.TH PCRE2_DFA_MATCH 3 "16 October 2018" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -39,6 +39,8 @@ depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not
|
||||||
characters. The options are:
|
characters. The options are:
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ANCHORED Match only at the first position
|
PCRE2_ANCHORED Match only at the first position
|
||||||
|
PCRE2_COPY_MATCHED_SUBJECT
|
||||||
|
On success, make a private subject copy
|
||||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||||
PCRE2_NOTBOL Subject is not the beginning of a line
|
PCRE2_NOTBOL Subject is not the beginning of a line
|
||||||
PCRE2_NOTEOL Subject is not the end of a line
|
PCRE2_NOTEOL Subject is not the end of a line
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_MATCH 3 "14 November 2017" "PCRE2 10.31"
|
.TH PCRE2_MATCH 3 "16 October 2018" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -43,11 +43,13 @@ A match context is needed only if you want to:
|
||||||
Change the backtracking depth limit
|
Change the backtracking depth limit
|
||||||
Set custom memory management specifically for the match
|
Set custom memory management specifically for the match
|
||||||
.sp
|
.sp
|
||||||
The \fIlength\fP and \fIstartoffset\fP values are code
|
The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
|
||||||
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
The length may be given as PCRE2_ZERO_TERMINATED for a subject that is
|
||||||
subject that is terminated by a binary zero code unit. The options are:
|
terminated by a binary zero code unit. The options are:
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ANCHORED Match only at the first position
|
PCRE2_ANCHORED Match only at the first position
|
||||||
|
PCRE2_COPY_MATCHED_SUBJECT
|
||||||
|
On success, make a private subject copy
|
||||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||||
PCRE2_NOTBOL Subject string is not the beginning of a line
|
PCRE2_NOTBOL Subject string is not the beginning of a line
|
||||||
PCRE2_NOTEOL Subject string is not the end of a line
|
PCRE2_NOTEOL Subject string is not the end of a line
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2_MATCH_DATA_FREE 3 "28 June 2018" "PCRE2 10.32"
|
.TH PCRE2_MATCH_DATA_FREE 3 "16 October 2018" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -18,6 +18,10 @@ If \fImatch_data\fP is NULL, this function does nothing. Otherwise,
|
||||||
using the memory freeing function from the general context or compiled pattern
|
using the memory freeing function from the general context or compiled pattern
|
||||||
with which it was created, or \fBfree()\fP if that was not set.
|
with which it was created, or \fBfree()\fP if that was not set.
|
||||||
.P
|
.P
|
||||||
|
If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this
|
||||||
|
match data block, the copy of the subject that was remembered with the block is
|
||||||
|
also freed.
|
||||||
|
.P
|
||||||
There is a complete description of the PCRE2 native API in the
|
There is a complete description of the PCRE2 native API in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2api\fP
|
\fBpcre2api\fP
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "21 September 2018" "PCRE2 10.33"
|
.TH PCRE2API 3 "16 October 2018" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -1237,13 +1237,19 @@ NULL.
|
||||||
NOTE: When one of the matching functions is called, pointers to the compiled
|
NOTE: When one of the matching functions is called, pointers to the compiled
|
||||||
pattern and the subject string are set in the match data block so that they can
|
pattern and the subject string are set in the match data block so that they can
|
||||||
be referenced by the substring extraction functions. After running a match, you
|
be referenced by the substring extraction functions. After running a match, you
|
||||||
must not free a compiled pattern (or a subject string) until after all
|
must not free a compiled pattern or a subject string until after all
|
||||||
operations on the
|
operations on the
|
||||||
.\" HTML <a href="#matchdatablock">
|
.\" HTML <a href="#matchdatablock">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
match data block
|
match data block
|
||||||
.\"
|
.\"
|
||||||
have taken place.
|
have taken place, unless, in the case of the subject string, you have used the
|
||||||
|
PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
|
||||||
|
"Option bits for \fBpcre2_match()\fP"
|
||||||
|
.\" HTML <a href="#matchoptions>">
|
||||||
|
.\" </a>
|
||||||
|
below.
|
||||||
|
.\"
|
||||||
.P
|
.P
|
||||||
The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit
|
The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit
|
||||||
settings that affect the compilation. It should be zero if no options are
|
settings that affect the compilation. It should be zero if no options are
|
||||||
|
@ -2390,7 +2396,13 @@ When one of the matching functions is called, pointers to the compiled pattern
|
||||||
and the subject string are set in the match data block so that they can be
|
and the subject string are set in the match data block so that they can be
|
||||||
referenced by the extraction functions. After running a match, you must not
|
referenced by the extraction functions. After running a match, you must not
|
||||||
free a compiled pattern or a subject string until after all operations on the
|
free a compiled pattern or a subject string until after all operations on the
|
||||||
match data block (for that match) have taken place.
|
match data block (for that match) have taken place, unless, in the case of the
|
||||||
|
subject string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is
|
||||||
|
described in the section entitled "Option bits for \fBpcre2_match()\fP"
|
||||||
|
.\" HTML <a href="#matchoptions>">
|
||||||
|
.\" </a>
|
||||||
|
below.
|
||||||
|
.\"
|
||||||
.P
|
.P
|
||||||
When a match data block itself is no longer needed, it should be freed by
|
When a match data block itself is no longer needed, it should be freed by
|
||||||
calling \fBpcre2_match_data_free()\fP. If this function is called with a NULL
|
calling \fBpcre2_match_data_free()\fP. If this function is called with a NULL
|
||||||
|
@ -2507,10 +2519,10 @@ the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The unused bits of the \fIoptions\fP argument for \fBpcre2_match()\fP must be
|
The unused bits of the \fIoptions\fP argument for \fBpcre2_match()\fP must be
|
||||||
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
|
zero. The only bits that may be set are PCRE2_ANCHORED,
|
||||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
|
||||||
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT.
|
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
|
||||||
Their action is described below.
|
PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
|
||||||
.P
|
.P
|
||||||
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
|
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
|
||||||
the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the
|
the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the
|
||||||
|
@ -2524,6 +2536,22 @@ matching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out
|
||||||
to be anchored by virtue of its contents, it cannot be made unachored at
|
to be anchored by virtue of its contents, it cannot be made unachored at
|
||||||
matching time. Note that setting the option at match time disables JIT
|
matching time. Note that setting the option at match time disables JIT
|
||||||
matching.
|
matching.
|
||||||
|
.sp
|
||||||
|
PCRE2_COPY_MATCHED_SUBJECT
|
||||||
|
.sp
|
||||||
|
By default, a pointer to the subject is remembered in the match data block so
|
||||||
|
that, after a successful match, it can be referenced by the substring
|
||||||
|
extraction functions. This means that the subject's memory must not be freed
|
||||||
|
until all such operations are complete. For some applications where the
|
||||||
|
lifetime of the subject string is not guaranteed, it may be necessary to make a
|
||||||
|
copy of the subject string, but it is wasteful to do this unless the match is
|
||||||
|
successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
|
||||||
|
subject is copied and the new pointer is remembered in the match data block
|
||||||
|
instead of the original subject pointer. The memory allocator that was used for
|
||||||
|
the match block itself is used. The copy is automatically freed when
|
||||||
|
\fBpcre2_match_data_free()\fP is called to free the match data block. It is also
|
||||||
|
automatically freed if the match data block is re-used for another match
|
||||||
|
operation.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ENDANCHORED
|
PCRE2_ENDANCHORED
|
||||||
.sp
|
.sp
|
||||||
|
@ -2961,7 +2989,8 @@ The backtracking match limit was reached.
|
||||||
If a pattern contains many nested backtracking points, heap memory is used to
|
If a pattern contains many nested backtracking points, heap memory is used to
|
||||||
remember them. This error is given when the memory allocation function (default
|
remember them. This error is given when the memory allocation function (default
|
||||||
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
|
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
|
||||||
if the amount of memory needed exceeds the heap limit.
|
if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is
|
||||||
|
also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_NULL
|
PCRE2_ERROR_NULL
|
||||||
.sp
|
.sp
|
||||||
|
@ -3579,11 +3608,12 @@ Here is an example of a simple call to \fBpcre2_dfa_match()\fP:
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The unused bits of the \fIoptions\fP argument for \fBpcre2_dfa_match()\fP must
|
The unused bits of the \fIoptions\fP argument for \fBpcre2_dfa_match()\fP must
|
||||||
be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
|
be zero. The only bits that may be set are PCRE2_ANCHORED,
|
||||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
|
||||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST,
|
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
|
||||||
and PCRE2_DFA_RESTART. All but the last four of these are exactly the same as
|
PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last
|
||||||
for \fBpcre2_match()\fP, so their description is not repeated here.
|
four of these are exactly the same as for \fBpcre2_match()\fP, so their
|
||||||
|
description is not repeated here.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_PARTIAL_HARD
|
PCRE2_PARTIAL_HARD
|
||||||
PCRE2_PARTIAL_SOFT
|
PCRE2_PARTIAL_SOFT
|
||||||
|
@ -3737,6 +3767,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 21 September 2018
|
Last updated: 16 October 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2JIT 3 "28 June 2018" "PCRE2 10.32"
|
.TH PCRE2JIT 3 "16 October 2018" "PCRE2 10.33"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT"
|
.SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT"
|
||||||
|
@ -124,9 +124,10 @@ pattern.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The \fBpcre2_match()\fP options that are supported for JIT matching are
|
The \fBpcre2_match()\fP options that are supported for JIT matching are
|
||||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
|
||||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
|
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
|
||||||
PCRE2_ANCHORED option is not supported at match time.
|
PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not
|
||||||
|
supported at match time.
|
||||||
.P
|
.P
|
||||||
If the PCRE2_NO_JIT option is passed to \fBpcre2_match()\fP it disables the
|
If the PCRE2_NO_JIT option is passed to \fBpcre2_match()\fP it disables the
|
||||||
use of JIT, forcing matching by the interpreter code.
|
use of JIT, forcing matching by the interpreter code.
|
||||||
|
@ -376,10 +377,13 @@ available, and which need the best possible performance, can instead use a
|
||||||
processed by \fBpcre2_jit_compile()\fP).
|
processed by \fBpcre2_jit_compile()\fP).
|
||||||
.P
|
.P
|
||||||
The fast path function is called \fBpcre2_jit_match()\fP, and it takes exactly
|
The fast path function is called \fBpcre2_jit_match()\fP, and it takes exactly
|
||||||
the same arguments as \fBpcre2_match()\fP. The return values are also the same,
|
the same arguments as \fBpcre2_match()\fP. However, the subject string must be
|
||||||
plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is
|
specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
|
||||||
requested that was not compiled. Unsupported option bits (for example,
|
option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
|
||||||
PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option.
|
PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
|
||||||
|
return values are also the same as for \fBpcre2_match()\fP, plus
|
||||||
|
PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
|
||||||
|
that was not compiled.
|
||||||
.P
|
.P
|
||||||
When you call \fBpcre2_match()\fP, as well as testing for invalid options, a
|
When you call \fBpcre2_match()\fP, as well as testing for invalid options, a
|
||||||
number of other sanity checks are performed on the arguments. For example, if
|
number of other sanity checks are performed on the arguments. For example, if
|
||||||
|
@ -412,6 +416,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 28 June 2018
|
Last updated: 16 October 2018
|
||||||
Copyright (c) 1997-2018 University of Cambridge.
|
Copyright (c) 1997-2018 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -167,36 +167,27 @@ D is inspected during pcre2_dfa_match() execution
|
||||||
#define PCRE2_JIT_PARTIAL_HARD 0x00000004u
|
#define PCRE2_JIT_PARTIAL_HARD 0x00000004u
|
||||||
#define PCRE2_JIT_INVALID_UTF 0x00000100u
|
#define PCRE2_JIT_INVALID_UTF 0x00000100u
|
||||||
|
|
||||||
/* These are for pcre2_match(), pcre2_dfa_match(), and pcre2_jit_match(). Note
|
/* These are for pcre2_match(), pcre2_dfa_match(), pcre2_jit_match(), and
|
||||||
that PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK can also be passed to these
|
pcre2_substitute(). Some are allowed only for one of the functions, and in
|
||||||
functions (though pcre2_jit_match() ignores the latter since it bypasses all
|
these cases it is noted below. Note that PCRE2_ANCHORED, PCRE2_ENDANCHORED and
|
||||||
sanity checks). */
|
PCRE2_NO_UTF_CHECK can also be passed to these functions (though
|
||||||
|
pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */
|
||||||
|
|
||||||
#define PCRE2_NOTBOL 0x00000001u
|
#define PCRE2_NOTBOL 0x00000001u
|
||||||
#define PCRE2_NOTEOL 0x00000002u
|
#define PCRE2_NOTEOL 0x00000002u
|
||||||
#define PCRE2_NOTEMPTY 0x00000004u /* ) These two must be kept */
|
#define PCRE2_NOTEMPTY 0x00000004u /* ) These two must be kept */
|
||||||
#define PCRE2_NOTEMPTY_ATSTART 0x00000008u /* ) adjacent to each other. */
|
#define PCRE2_NOTEMPTY_ATSTART 0x00000008u /* ) adjacent to each other. */
|
||||||
#define PCRE2_PARTIAL_SOFT 0x00000010u
|
#define PCRE2_PARTIAL_SOFT 0x00000010u
|
||||||
#define PCRE2_PARTIAL_HARD 0x00000020u
|
#define PCRE2_PARTIAL_HARD 0x00000020u
|
||||||
|
#define PCRE2_DFA_RESTART 0x00000040u /* pcre2_dfa_match() only */
|
||||||
/* These are additional options for pcre2_dfa_match(). */
|
#define PCRE2_DFA_SHORTEST 0x00000080u /* pcre2_dfa_match() only */
|
||||||
|
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u /* pcre2_substitute() only */
|
||||||
#define PCRE2_DFA_RESTART 0x00000040u
|
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u /* pcre2_substitute() only */
|
||||||
#define PCRE2_DFA_SHORTEST 0x00000080u
|
#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u /* pcre2_substitute() only */
|
||||||
|
#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u /* pcre2_substitute() only */
|
||||||
/* These are additional options for pcre2_substitute(), which passes any others
|
#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u /* pcre2_substitute() only */
|
||||||
through to pcre2_match(). */
|
#define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */
|
||||||
|
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
|
||||||
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
|
|
||||||
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
|
|
||||||
#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u
|
|
||||||
#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u
|
|
||||||
#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u
|
|
||||||
|
|
||||||
/* A further option for pcre2_match(), not allowed for pcre2_dfa_match(),
|
|
||||||
ignored for pcre2_jit_match(). */
|
|
||||||
|
|
||||||
#define PCRE2_NO_JIT 0x00002000u
|
|
||||||
|
|
||||||
/* Options for pcre2_pattern_convert(). */
|
/* Options for pcre2_pattern_convert(). */
|
||||||
|
|
||||||
|
|
|
@ -85,7 +85,8 @@ in others, so I abandoned this code. */
|
||||||
#define PUBLIC_DFA_MATCH_OPTIONS \
|
#define PUBLIC_DFA_MATCH_OPTIONS \
|
||||||
(PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
|
(PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
|
||||||
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
|
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
|
||||||
PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART)
|
PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART| \
|
||||||
|
PCRE2_COPY_MATCHED_SUBJECT)
|
||||||
|
|
||||||
|
|
||||||
/*************************************************
|
/*************************************************
|
||||||
|
@ -3228,6 +3229,8 @@ pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
|
||||||
pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount)
|
pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount)
|
||||||
{
|
{
|
||||||
int rc;
|
int rc;
|
||||||
|
int was_zero_terminated = 0;
|
||||||
|
|
||||||
const pcre2_real_code *re = (const pcre2_real_code *)code;
|
const pcre2_real_code *re = (const pcre2_real_code *)code;
|
||||||
|
|
||||||
PCRE2_SPTR start_match;
|
PCRE2_SPTR start_match;
|
||||||
|
@ -3267,7 +3270,11 @@ rws->free = RWS_BASE_SIZE - RWS_ANCHOR_SIZE;
|
||||||
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
|
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
|
||||||
subject string. */
|
subject string. */
|
||||||
|
|
||||||
if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
|
if (length == PCRE2_ZERO_TERMINATED)
|
||||||
|
{
|
||||||
|
length = PRIV(strlen)(subject);
|
||||||
|
was_zero_terminated = 1;
|
||||||
|
}
|
||||||
|
|
||||||
/* Plausibility checks */
|
/* Plausibility checks */
|
||||||
|
|
||||||
|
@ -3520,10 +3527,21 @@ if ((re->flags & PCRE2_LASTSET) != 0)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT,
|
||||||
|
free the memory that was obtained. */
|
||||||
|
|
||||||
|
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
|
||||||
|
{
|
||||||
|
match_data->memctl.free((void *)match_data->subject,
|
||||||
|
match_data->memctl.memory_data);
|
||||||
|
match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT;
|
||||||
|
}
|
||||||
|
|
||||||
/* Fill in fields that are always returned in the match data. */
|
/* Fill in fields that are always returned in the match data. */
|
||||||
|
|
||||||
match_data->code = re;
|
match_data->code = re;
|
||||||
match_data->subject = subject;
|
match_data->subject = subject;
|
||||||
|
match_data->flags = 0;
|
||||||
match_data->mark = NULL;
|
match_data->mark = NULL;
|
||||||
match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER;
|
match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER;
|
||||||
|
|
||||||
|
@ -3818,6 +3836,17 @@ for (;;)
|
||||||
match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
|
match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
|
||||||
match_data->startchar = (PCRE2_SIZE)(start_match - subject);
|
match_data->startchar = (PCRE2_SIZE)(start_match - subject);
|
||||||
match_data->rc = rc;
|
match_data->rc = rc;
|
||||||
|
|
||||||
|
if (rc >= 0 &&(options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
|
||||||
|
{
|
||||||
|
length = CU2BYTES(length + was_zero_terminated);
|
||||||
|
match_data->subject = match_data->memctl.malloc(length,
|
||||||
|
match_data->memctl.memory_data);
|
||||||
|
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
|
memcpy((void *)match_data->subject, subject, length);
|
||||||
|
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
|
||||||
|
}
|
||||||
|
|
||||||
goto EXIT;
|
goto EXIT;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -535,6 +535,10 @@ enum { PCRE2_MATCHEDBY_INTERPRETER, /* pcre2_match() */
|
||||||
PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */
|
PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */
|
||||||
PCRE2_MATCHEDBY_JIT }; /* pcre2_jit_match() */
|
PCRE2_MATCHEDBY_JIT }; /* pcre2_jit_match() */
|
||||||
|
|
||||||
|
/* Values for the flags field in a match data block. */
|
||||||
|
|
||||||
|
#define PCRE2_MD_COPIED_SUBJECT 0x01u
|
||||||
|
|
||||||
/* Magic number to provide a small check against being handed junk. */
|
/* Magic number to provide a small check against being handed junk. */
|
||||||
|
|
||||||
#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */
|
#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */
|
||||||
|
|
|
@ -658,7 +658,8 @@ typedef struct pcre2_real_match_data {
|
||||||
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
|
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
|
||||||
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
|
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
|
||||||
PCRE2_SIZE startchar; /* Offset to starting code unit */
|
PCRE2_SIZE startchar; /* Offset to starting code unit */
|
||||||
uint16_t matchedby; /* Type of match (normal, JIT, DFA) */
|
uint8_t matchedby; /* Type of match (normal, JIT, DFA) */
|
||||||
|
uint8_t flags; /* Various flags */
|
||||||
uint16_t oveccount; /* Number of pairs */
|
uint16_t oveccount; /* Number of pairs */
|
||||||
int rc; /* The return code from the match */
|
int rc; /* The return code from the match */
|
||||||
PCRE2_SIZE ovector[131072]; /* Must be last in the structure */
|
PCRE2_SIZE ovector[131072]; /* Must be last in the structure */
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016 University of Cambridge
|
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -174,6 +174,7 @@ if (rc > (int)oveccount)
|
||||||
rc = 0;
|
rc = 0;
|
||||||
match_data->code = re;
|
match_data->code = re;
|
||||||
match_data->subject = subject;
|
match_data->subject = subject;
|
||||||
|
match_data->flags = 0;
|
||||||
match_data->rc = rc;
|
match_data->rc = rc;
|
||||||
match_data->startchar = arguments.startchar_ptr - subject;
|
match_data->startchar = arguments.startchar_ptr - subject;
|
||||||
match_data->leftchar = 0;
|
match_data->leftchar = 0;
|
||||||
|
|
|
@ -69,11 +69,12 @@ information, and fields within it. */
|
||||||
#define PUBLIC_MATCH_OPTIONS \
|
#define PUBLIC_MATCH_OPTIONS \
|
||||||
(PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
|
(PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
|
||||||
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
|
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
|
||||||
PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT)
|
PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT|PCRE2_COPY_MATCHED_SUBJECT)
|
||||||
|
|
||||||
#define PUBLIC_JIT_MATCH_OPTIONS \
|
#define PUBLIC_JIT_MATCH_OPTIONS \
|
||||||
(PCRE2_NO_UTF_CHECK|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY|\
|
(PCRE2_NO_UTF_CHECK|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY|\
|
||||||
PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_SOFT|PCRE2_PARTIAL_HARD)
|
PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_SOFT|PCRE2_PARTIAL_HARD|\
|
||||||
|
PCRE2_COPY_MATCHED_SUBJECT)
|
||||||
|
|
||||||
/* Non-error returns from and within the match() function. Error returns are
|
/* Non-error returns from and within the match() function. Error returns are
|
||||||
externally defined PCRE2_ERROR_xxx codes, which are all negative. */
|
externally defined PCRE2_ERROR_xxx codes, which are all negative. */
|
||||||
|
@ -6009,10 +6010,11 @@ pcre2_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
|
||||||
pcre2_match_context *mcontext)
|
pcre2_match_context *mcontext)
|
||||||
{
|
{
|
||||||
int rc;
|
int rc;
|
||||||
|
int was_zero_terminated = 0;
|
||||||
const uint8_t *start_bits = NULL;
|
const uint8_t *start_bits = NULL;
|
||||||
|
|
||||||
const pcre2_real_code *re = (const pcre2_real_code *)code;
|
const pcre2_real_code *re = (const pcre2_real_code *)code;
|
||||||
|
|
||||||
|
|
||||||
BOOL anchored;
|
BOOL anchored;
|
||||||
BOOL firstline;
|
BOOL firstline;
|
||||||
BOOL has_first_cu = FALSE;
|
BOOL has_first_cu = FALSE;
|
||||||
|
@ -6052,7 +6054,11 @@ mb->stack_frames = (heapframe *)stack_frames_vector;
|
||||||
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
|
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
|
||||||
subject string. */
|
subject string. */
|
||||||
|
|
||||||
if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
|
if (length == PCRE2_ZERO_TERMINATED)
|
||||||
|
{
|
||||||
|
length = PRIV(strlen)(subject);
|
||||||
|
was_zero_terminated = 1;
|
||||||
|
}
|
||||||
end_subject = subject + length;
|
end_subject = subject + length;
|
||||||
|
|
||||||
/* Plausibility checks */
|
/* Plausibility checks */
|
||||||
|
@ -6167,6 +6173,16 @@ if (mcontext != NULL && mcontext->offset_limit != PCRE2_UNSET &&
|
||||||
(re->overall_options & PCRE2_USE_OFFSET_LIMIT) == 0)
|
(re->overall_options & PCRE2_USE_OFFSET_LIMIT) == 0)
|
||||||
return PCRE2_ERROR_BADOFFSETLIMIT;
|
return PCRE2_ERROR_BADOFFSETLIMIT;
|
||||||
|
|
||||||
|
/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT,
|
||||||
|
free the memory that was obtained. */
|
||||||
|
|
||||||
|
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
|
||||||
|
{
|
||||||
|
match_data->memctl.free((void *)match_data->subject,
|
||||||
|
match_data->memctl.memory_data);
|
||||||
|
match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT;
|
||||||
|
}
|
||||||
|
|
||||||
/* If the pattern was successfully studied with JIT support, run the JIT
|
/* If the pattern was successfully studied with JIT support, run the JIT
|
||||||
executable instead of the rest of this function. Most options must be set at
|
executable instead of the rest of this function. Most options must be set at
|
||||||
compile time for the JIT code to be usable. Fallback to the normal code path if
|
compile time for the JIT code to be usable. Fallback to the normal code path if
|
||||||
|
@ -6178,7 +6194,19 @@ if (re->executable_jit != NULL && (options & ~PUBLIC_JIT_MATCH_OPTIONS) == 0)
|
||||||
{
|
{
|
||||||
rc = pcre2_jit_match(code, subject, length, start_offset, options,
|
rc = pcre2_jit_match(code, subject, length, start_offset, options,
|
||||||
match_data, mcontext);
|
match_data, mcontext);
|
||||||
if (rc != PCRE2_ERROR_JIT_BADOPTION) return rc;
|
if (rc != PCRE2_ERROR_JIT_BADOPTION)
|
||||||
|
{
|
||||||
|
if (rc >= 0 && (options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
|
||||||
|
{
|
||||||
|
length = CU2BYTES(length + was_zero_terminated);
|
||||||
|
match_data->subject = match_data->memctl.malloc(length,
|
||||||
|
match_data->memctl.memory_data);
|
||||||
|
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
|
memcpy((void *)match_data->subject, subject, length);
|
||||||
|
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
|
||||||
|
}
|
||||||
|
return rc;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
@ -6819,12 +6847,14 @@ if (mb->match_frames != mb->stack_frames)
|
||||||
|
|
||||||
match_data->code = re;
|
match_data->code = re;
|
||||||
match_data->subject = subject;
|
match_data->subject = subject;
|
||||||
|
match_data->flags = 0;
|
||||||
match_data->mark = mb->mark;
|
match_data->mark = mb->mark;
|
||||||
match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER;
|
match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER;
|
||||||
|
|
||||||
/* Handle a fully successful match. Set the return code to the number of
|
/* Handle a fully successful match. Set the return code to the number of
|
||||||
captured strings, or 0 if there were too many to fit into the ovector, and then
|
captured strings, or 0 if there were too many to fit into the ovector, and then
|
||||||
set the remaining returned values before returning. */
|
set the remaining returned values before returning. Make a copy of the subject
|
||||||
|
string if requested. */
|
||||||
|
|
||||||
if (rc == MATCH_MATCH)
|
if (rc == MATCH_MATCH)
|
||||||
{
|
{
|
||||||
|
@ -6834,6 +6864,17 @@ if (rc == MATCH_MATCH)
|
||||||
match_data->leftchar = mb->start_used_ptr - subject;
|
match_data->leftchar = mb->start_used_ptr - subject;
|
||||||
match_data->rightchar = ((mb->last_used_ptr > mb->end_match_ptr)?
|
match_data->rightchar = ((mb->last_used_ptr > mb->end_match_ptr)?
|
||||||
mb->last_used_ptr : mb->end_match_ptr) - subject;
|
mb->last_used_ptr : mb->end_match_ptr) - subject;
|
||||||
|
|
||||||
|
if ((options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
|
||||||
|
{
|
||||||
|
length = CU2BYTES(length + was_zero_terminated);
|
||||||
|
match_data->subject = match_data->memctl.malloc(length,
|
||||||
|
match_data->memctl.memory_data);
|
||||||
|
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
|
memcpy((void *)match_data->subject, subject, length);
|
||||||
|
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
|
||||||
|
}
|
||||||
|
|
||||||
return match_data->rc;
|
return match_data->rc;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2017 University of Cambridge
|
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -63,6 +63,7 @@ yield = PRIV(memctl_malloc)(
|
||||||
(pcre2_memctl *)gcontext);
|
(pcre2_memctl *)gcontext);
|
||||||
if (yield == NULL) return NULL;
|
if (yield == NULL) return NULL;
|
||||||
yield->oveccount = oveccount;
|
yield->oveccount = oveccount;
|
||||||
|
yield->flags = 0;
|
||||||
return yield;
|
return yield;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -93,7 +94,12 @@ PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION
|
||||||
pcre2_match_data_free(pcre2_match_data *match_data)
|
pcre2_match_data_free(pcre2_match_data *match_data)
|
||||||
{
|
{
|
||||||
if (match_data != NULL)
|
if (match_data != NULL)
|
||||||
|
{
|
||||||
|
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
|
||||||
|
match_data->memctl.free((void *)match_data->subject,
|
||||||
|
match_data->memctl.memory_data);
|
||||||
match_data->memctl.free(match_data, match_data->memctl.memory_data);
|
match_data->memctl.free(match_data, match_data->memctl.memory_data);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -620,6 +620,7 @@ static modstruct modlist[] = {
|
||||||
{ "convert_glob_separator", MOD_PAT, MOD_CHR, 0, PO(convert_glob_separator) },
|
{ "convert_glob_separator", MOD_PAT, MOD_CHR, 0, PO(convert_glob_separator) },
|
||||||
{ "convert_length", MOD_PAT, MOD_INT, 0, PO(convert_length) },
|
{ "convert_length", MOD_PAT, MOD_INT, 0, PO(convert_length) },
|
||||||
{ "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) },
|
{ "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) },
|
||||||
|
{ "copy_matched_subject", MOD_DAT, MOD_OPT, PCRE2_COPY_MATCHED_SUBJECT, DO(options) },
|
||||||
{ "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) },
|
{ "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) },
|
||||||
{ "depth_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) },
|
{ "depth_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) },
|
||||||
{ "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) },
|
{ "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) },
|
||||||
|
@ -4196,11 +4197,13 @@ else fprintf(outfile, "%s%s%s%s%s%s%s",
|
||||||
static void
|
static void
|
||||||
show_match_options(uint32_t options)
|
show_match_options(uint32_t options)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s",
|
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||||
((options & PCRE2_ANCHORED) != 0)? " anchored" : "",
|
((options & PCRE2_ANCHORED) != 0)? " anchored" : "",
|
||||||
|
((options & PCRE2_COPY_MATCHED_SUBJECT) != 0)? " copy_matched_subject" : "",
|
||||||
((options & PCRE2_DFA_RESTART) != 0)? " dfa_restart" : "",
|
((options & PCRE2_DFA_RESTART) != 0)? " dfa_restart" : "",
|
||||||
((options & PCRE2_DFA_SHORTEST) != 0)? " dfa_shortest" : "",
|
((options & PCRE2_DFA_SHORTEST) != 0)? " dfa_shortest" : "",
|
||||||
((options & PCRE2_ENDANCHORED) != 0)? " endanchored" : "",
|
((options & PCRE2_ENDANCHORED) != 0)? " endanchored" : "",
|
||||||
|
((options & PCRE2_NO_JIT) != 0)? " no_jit" : "",
|
||||||
((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "",
|
((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "",
|
||||||
((options & PCRE2_NOTBOL) != 0)? " notbol" : "",
|
((options & PCRE2_NOTBOL) != 0)? " notbol" : "",
|
||||||
((options & PCRE2_NOTEMPTY) != 0)? " notempty" : "",
|
((options & PCRE2_NOTEMPTY) != 0)? " notempty" : "",
|
||||||
|
@ -7442,6 +7445,25 @@ for (gmatched = 0;; gmatched++)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* If PCRE2_COPY_MATCHED_SUBJECT was set, check that things are as they
|
||||||
|
should be, but not for fast JIT, where it isn't supported. */
|
||||||
|
|
||||||
|
if ((dat_datctl.options & PCRE2_COPY_MATCHED_SUBJECT) != 0 &&
|
||||||
|
(pat_patctl.control & CTL_JITFAST) == 0)
|
||||||
|
{
|
||||||
|
if ((FLD(match_data, flags) & PCRE2_MD_COPIED_SUBJECT) == 0)
|
||||||
|
fprintf(outfile,
|
||||||
|
"** PCRE2 error: flag not set after copy_matched_subject\n");
|
||||||
|
|
||||||
|
if (CASTFLD(void *, match_data, subject) == pp)
|
||||||
|
fprintf(outfile,
|
||||||
|
"** PCRE2 error: copy_matched_subject has not copied\n");
|
||||||
|
|
||||||
|
if (memcmp(CASTFLD(void *, match_data, subject), pp, ulen) != 0)
|
||||||
|
fprintf(outfile,
|
||||||
|
"** PCRE2 error: copy_matched_subject mismatch\n");
|
||||||
|
}
|
||||||
|
|
||||||
/* If this is not the first time round a global loop, check that the
|
/* If this is not the first time round a global loop, check that the
|
||||||
returned string has changed. If it has not, check for an empty string match
|
returned string has changed. If it has not, check for an empty string match
|
||||||
at different starting offset from the previous match. This is a failed test
|
at different starting offset from the previous match. This is a failed test
|
||||||
|
|
|
@ -299,9 +299,9 @@
|
||||||
# ----
|
# ----
|
||||||
|
|
||||||
/[aC]/mg,firstline,newline=lf
|
/[aC]/mg,firstline,newline=lf
|
||||||
match\nmatch
|
match\nmatch
|
||||||
|
|
||||||
/[aCz]/mg,firstline,newline=lf
|
/[aCz]/mg,firstline,newline=lf
|
||||||
match\nmatch
|
match\nmatch
|
||||||
|
|
||||||
# End of testinput17
|
# End of testinput17
|
||||||
|
|
|
@ -5531,4 +5531,11 @@ a)"xI
|
||||||
|
|
||||||
/(?(*script_run:xxx)zzz)/
|
/(?(*script_run:xxx)zzz)/
|
||||||
|
|
||||||
|
/foobar/
|
||||||
|
the foobar thing\=copy_matched_subject
|
||||||
|
the foobar thing\=copy_matched_subject,zero_terminate
|
||||||
|
|
||||||
|
/foobar/g
|
||||||
|
the foobar thing foobar again\=copy_matched_subject
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -4955,4 +4955,11 @@
|
||||||
\= Expect no match
|
\= Expect no match
|
||||||
\na
|
\na
|
||||||
|
|
||||||
|
/foobar/
|
||||||
|
the foobar thing\=copy_matched_subject
|
||||||
|
the foobar thing\=copy_matched_subject,zero_terminate
|
||||||
|
|
||||||
|
/foobar/g
|
||||||
|
the foobar thing foobar again\=copy_matched_subject
|
||||||
|
|
||||||
# End of testinput6
|
# End of testinput6
|
||||||
|
|
|
@ -543,11 +543,11 @@ Failed: error -47: match limit exceeded
|
||||||
# ----
|
# ----
|
||||||
|
|
||||||
/[aC]/mg,firstline,newline=lf
|
/[aC]/mg,firstline,newline=lf
|
||||||
match\nmatch
|
match\nmatch
|
||||||
0: a (JIT)
|
0: a (JIT)
|
||||||
|
|
||||||
/[aCz]/mg,firstline,newline=lf
|
/[aCz]/mg,firstline,newline=lf
|
||||||
match\nmatch
|
match\nmatch
|
||||||
0: a (JIT)
|
0: a (JIT)
|
||||||
|
|
||||||
# End of testinput17
|
# End of testinput17
|
||||||
|
|
|
@ -16821,6 +16821,17 @@ Failed: error 128 at offset 10: assertion expected after (?( or (?(?C)
|
||||||
/(?(*script_run:xxx)zzz)/
|
/(?(*script_run:xxx)zzz)/
|
||||||
Failed: error 128 at offset 14: assertion expected after (?( or (?(?C)
|
Failed: error 128 at offset 14: assertion expected after (?( or (?(?C)
|
||||||
|
|
||||||
|
/foobar/
|
||||||
|
the foobar thing\=copy_matched_subject
|
||||||
|
0: foobar
|
||||||
|
the foobar thing\=copy_matched_subject,zero_terminate
|
||||||
|
0: foobar
|
||||||
|
|
||||||
|
/foobar/g
|
||||||
|
the foobar thing foobar again\=copy_matched_subject
|
||||||
|
0: foobar
|
||||||
|
0: foobar
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||||
Error -62: bad serialized data
|
Error -62: bad serialized data
|
||||||
|
|
|
@ -7783,4 +7783,15 @@ No match
|
||||||
\na
|
\na
|
||||||
No match
|
No match
|
||||||
|
|
||||||
|
/foobar/
|
||||||
|
the foobar thing\=copy_matched_subject
|
||||||
|
0: foobar
|
||||||
|
the foobar thing\=copy_matched_subject,zero_terminate
|
||||||
|
0: foobar
|
||||||
|
|
||||||
|
/foobar/g
|
||||||
|
the foobar thing foobar again\=copy_matched_subject
|
||||||
|
0: foobar
|
||||||
|
0: foobar
|
||||||
|
|
||||||
# End of testinput6
|
# End of testinput6
|
||||||
|
|
Loading…
Reference in New Issue