Implement PCRE2_COPY_MATCHED_SUBJECT.
This commit is contained in:
parent
971f885277
commit
f90ce1a333
|
@ -37,6 +37,10 @@ src/pcre2_chartables.c.dist are updated.
|
|||
ranges such as a-z in EBCDIC environments. The original code probably never
|
||||
worked, though there were no bug reports.
|
||||
|
||||
10. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via
|
||||
pcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast
|
||||
path.
|
||||
|
||||
|
||||
Version 10.32 10-September-2018
|
||||
-------------------------------
|
||||
|
|
|
@ -51,6 +51,8 @@ depth limits. The <i>length</i> and <i>startoffset</i> values are code units, no
|
|||
characters. The options are:
|
||||
<pre>
|
||||
PCRE2_ANCHORED Match only at the first position
|
||||
PCRE2_COPY_MATCHED_SUBJECT
|
||||
On success, make a private subject copy
|
||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||
PCRE2_NOTBOL Subject is not the beginning of a line
|
||||
PCRE2_NOTEOL Subject is not the end of a line
|
||||
|
|
|
@ -55,11 +55,13 @@ A match context is needed only if you want to:
|
|||
Change the backtracking depth limit
|
||||
Set custom memory management specifically for the match
|
||||
</pre>
|
||||
The <i>length</i> and <i>startoffset</i> values are code
|
||||
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
||||
subject that is terminated by a binary zero code unit. The options are:
|
||||
The <i>length</i> and <i>startoffset</i> values are code units, not characters.
|
||||
The length may be given as PCRE2_ZERO_TERMINATED for a subject that is
|
||||
terminated by a binary zero code unit. The options are:
|
||||
<pre>
|
||||
PCRE2_ANCHORED Match only at the first position
|
||||
PCRE2_COPY_MATCHED_SUBJECT
|
||||
On success, make a private subject copy
|
||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||
PCRE2_NOTBOL Subject string is not the beginning of a line
|
||||
PCRE2_NOTEOL Subject string is not the end of a line
|
||||
|
|
|
@ -31,6 +31,11 @@ using the memory freeing function from the general context or compiled pattern
|
|||
with which it was created, or <b>free()</b> if that was not set.
|
||||
</P>
|
||||
<P>
|
||||
If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this
|
||||
match data block, the copy of the subject that was remembered with the block is
|
||||
also freed.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
|
|
|
@ -1305,10 +1305,13 @@ NULL.
|
|||
NOTE: When one of the matching functions is called, pointers to the compiled
|
||||
pattern and the subject string are set in the match data block so that they can
|
||||
be referenced by the substring extraction functions. After running a match, you
|
||||
must not free a compiled pattern (or a subject string) until after all
|
||||
must not free a compiled pattern or a subject string until after all
|
||||
operations on the
|
||||
<a href="#matchdatablock">match data block</a>
|
||||
have taken place.
|
||||
have taken place, unless, in the case of the subject string, you have used the
|
||||
PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
|
||||
"Option bits for <b>pcre2_match()</b>"
|
||||
<a href="#matchoptions>">below.</a>
|
||||
</P>
|
||||
<P>
|
||||
The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
|
||||
|
@ -2419,7 +2422,10 @@ When one of the matching functions is called, pointers to the compiled pattern
|
|||
and the subject string are set in the match data block so that they can be
|
||||
referenced by the extraction functions. After running a match, you must not
|
||||
free a compiled pattern or a subject string until after all operations on the
|
||||
match data block (for that match) have taken place.
|
||||
match data block (for that match) have taken place, unless, in the case of the
|
||||
subject string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is
|
||||
described in the section entitled "Option bits for <b>pcre2_match()</b>"
|
||||
<a href="#matchoptions>">below.</a>
|
||||
</P>
|
||||
<P>
|
||||
When a match data block itself is no longer needed, it should be freed by
|
||||
|
@ -2531,10 +2537,10 @@ Option bits for <b>pcre2_match()</b>
|
|||
</b><br>
|
||||
<P>
|
||||
The unused bits of the <i>options</i> argument for <b>pcre2_match()</b> must be
|
||||
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
|
||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
||||
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT.
|
||||
Their action is described below.
|
||||
zero. The only bits that may be set are PCRE2_ANCHORED,
|
||||
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
|
||||
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
|
||||
PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
|
||||
</P>
|
||||
<P>
|
||||
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
|
||||
|
@ -2549,6 +2555,22 @@ matching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out
|
|||
to be anchored by virtue of its contents, it cannot be made unachored at
|
||||
matching time. Note that setting the option at match time disables JIT
|
||||
matching.
|
||||
<pre>
|
||||
PCRE2_COPY_MATCHED_SUBJECT
|
||||
</pre>
|
||||
By default, a pointer to the subject is remembered in the match data block so
|
||||
that, after a successful match, it can be referenced by the substring
|
||||
extraction functions. This means that the subject's memory must not be freed
|
||||
until all such operations are complete. For some applications where the
|
||||
lifetime of the subject string is not guaranteed, it may be necessary to make a
|
||||
copy of the subject string, but it is wasteful to do this unless the match is
|
||||
successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
|
||||
subject is copied and the new pointer is remembered in the match data block
|
||||
instead of the original subject pointer. The memory allocator that was used for
|
||||
the match block itself is used. The copy is automatically freed when
|
||||
<b>pcre2_match_data_free()</b> is called to free the match data block. It is also
|
||||
automatically freed if the match data block is re-used for another match
|
||||
operation.
|
||||
<pre>
|
||||
PCRE2_ENDANCHORED
|
||||
</pre>
|
||||
|
@ -2954,7 +2976,8 @@ The backtracking match limit was reached.
|
|||
If a pattern contains many nested backtracking points, heap memory is used to
|
||||
remember them. This error is given when the memory allocation function (default
|
||||
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
|
||||
if the amount of memory needed exceeds the heap limit.
|
||||
if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is
|
||||
also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
|
||||
<pre>
|
||||
PCRE2_ERROR_NULL
|
||||
</pre>
|
||||
|
@ -3584,11 +3607,12 @@ Option bits for <b>pcre_dfa_match()</b>
|
|||
</b><br>
|
||||
<P>
|
||||
The unused bits of the <i>options</i> argument for <b>pcre2_dfa_match()</b> must
|
||||
be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
|
||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST,
|
||||
and PCRE2_DFA_RESTART. All but the last four of these are exactly the same as
|
||||
for <b>pcre2_match()</b>, so their description is not repeated here.
|
||||
be zero. The only bits that may be set are PCRE2_ANCHORED,
|
||||
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
|
||||
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
|
||||
PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last
|
||||
four of these are exactly the same as for <b>pcre2_match()</b>, so their
|
||||
description is not repeated here.
|
||||
<pre>
|
||||
PCRE2_PARTIAL_HARD
|
||||
PCRE2_PARTIAL_SOFT
|
||||
|
@ -3732,7 +3756,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 21 September 2018
|
||||
Last updated: 16 October 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -147,9 +147,10 @@ pattern.
|
|||
<br><a name="SEC4" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br>
|
||||
<P>
|
||||
The <b>pcre2_match()</b> options that are supported for JIT matching are
|
||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
|
||||
PCRE2_ANCHORED option is not supported at match time.
|
||||
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
|
||||
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
|
||||
PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not
|
||||
supported at match time.
|
||||
</P>
|
||||
<P>
|
||||
If the PCRE2_NO_JIT option is passed to <b>pcre2_match()</b> it disables the
|
||||
|
@ -402,10 +403,13 @@ processed by <b>pcre2_jit_compile()</b>).
|
|||
</P>
|
||||
<P>
|
||||
The fast path function is called <b>pcre2_jit_match()</b>, and it takes exactly
|
||||
the same arguments as <b>pcre2_match()</b>. The return values are also the same,
|
||||
plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is
|
||||
requested that was not compiled. Unsupported option bits (for example,
|
||||
PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option.
|
||||
the same arguments as <b>pcre2_match()</b>. However, the subject string must be
|
||||
specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
|
||||
option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
|
||||
PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
|
||||
return values are also the same as for <b>pcre2_match()</b>, plus
|
||||
PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
|
||||
that was not compiled.
|
||||
</P>
|
||||
<P>
|
||||
When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
|
||||
|
@ -434,7 +438,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC13" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 28 June 2018
|
||||
Last updated: 16 October 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -1302,9 +1302,11 @@ COMPILING A PATTERN
|
|||
NOTE: When one of the matching functions is called, pointers to the
|
||||
compiled pattern and the subject string are set in the match data block
|
||||
so that they can be referenced by the substring extraction functions.
|
||||
After running a match, you must not free a compiled pattern (or a sub-
|
||||
ject string) until after all operations on the match data block have
|
||||
taken place.
|
||||
After running a match, you must not free a compiled pattern or a sub-
|
||||
ject string until after all operations on the match data block have
|
||||
taken place, unless, in the case of the subject string, you have used
|
||||
the PCRE2_COPY_MATCHED_SUBJECT option, which is described in the sec-
|
||||
tion entitled "Option bits for pcre2_match()" below.
|
||||
|
||||
The options argument for pcre2_compile() contains various bit settings
|
||||
that affect the compilation. It should be zero if no options are
|
||||
|
@ -2388,7 +2390,9 @@ THE MATCH DATA BLOCK
|
|||
they can be referenced by the extraction functions. After running a
|
||||
match, you must not free a compiled pattern or a subject string until
|
||||
after all operations on the match data block (for that match) have
|
||||
taken place.
|
||||
taken place, unless, in the case of the subject string, you have used
|
||||
the PCRE2_COPY_MATCHED_SUBJECT option, which is described in the sec-
|
||||
tion entitled "Option bits for pcre2_match()" below.
|
||||
|
||||
When a match data block itself is no longer needed, it should be freed
|
||||
by calling pcre2_match_data_free(). If this function is called with a
|
||||
|
@ -2488,10 +2492,11 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
|
|||
Option bits for pcre2_match()
|
||||
|
||||
The unused bits of the options argument for pcre2_match() must be zero.
|
||||
The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
|
||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
||||
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PAR-
|
||||
TIAL_SOFT. Their action is described below.
|
||||
The only bits that may be set are PCRE2_ANCHORED,
|
||||
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL,
|
||||
PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT,
|
||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their
|
||||
action is described below.
|
||||
|
||||
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not sup-
|
||||
ported by the just-in-time (JIT) compiler. If it is set, JIT matching
|
||||
|
@ -2507,6 +2512,23 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
|
|||
unachored at matching time. Note that setting the option at match time
|
||||
disables JIT matching.
|
||||
|
||||
PCRE2_COPY_MATCHED_SUBJECT
|
||||
|
||||
By default, a pointer to the subject is remembered in the match data
|
||||
block so that, after a successful match, it can be referenced by the
|
||||
substring extraction functions. This means that the subject's memory
|
||||
must not be freed until all such operations are complete. For some
|
||||
applications where the lifetime of the subject string is not guaran-
|
||||
teed, it may be necessary to make a copy of the subject string, but it
|
||||
is wasteful to do this unless the match is successful. After a success-
|
||||
ful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the subject is copied
|
||||
and the new pointer is remembered in the match data block instead of
|
||||
the original subject pointer. The memory allocator that was used for
|
||||
the match block itself is used. The copy is automatically freed when
|
||||
pcre2_match_data_free() is called to free the match data block. It is
|
||||
also automatically freed if the match data block is re-used for another
|
||||
match operation.
|
||||
|
||||
PCRE2_ENDANCHORED
|
||||
|
||||
If the PCRE2_ENDANCHORED option is set, any string that pcre2_match()
|
||||
|
@ -2881,7 +2903,8 @@ ERROR RETURNS FROM pcre2_match()
|
|||
used to remember them. This error is given when the memory allocation
|
||||
function (default or custom) fails. Note that a different error,
|
||||
PCRE2_ERROR_HEAPLIMIT, is given if the amount of memory needed exceeds
|
||||
the heap limit.
|
||||
the heap limit. PCRE2_ERROR_NOMEMORY is also returned if
|
||||
PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
|
||||
|
||||
PCRE2_ERROR_NULL
|
||||
|
||||
|
@ -3467,12 +3490,13 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
|||
Option bits for pcre_dfa_match()
|
||||
|
||||
The unused bits of the options argument for pcre2_dfa_match() must be
|
||||
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDAN-
|
||||
CHORED, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
|
||||
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
|
||||
PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but
|
||||
the last four of these are exactly the same as for pcre2_match(), so
|
||||
their description is not repeated here.
|
||||
zero. The only bits that may be set are PCRE2_ANCHORED,
|
||||
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL,
|
||||
PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT,
|
||||
PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last four of
|
||||
these are exactly the same as for pcre2_match(), so their description
|
||||
is not repeated here.
|
||||
|
||||
PCRE2_PARTIAL_HARD
|
||||
PCRE2_PARTIAL_SOFT
|
||||
|
@ -3607,7 +3631,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 21 September 2018
|
||||
Last updated: 16 October 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -4924,9 +4948,10 @@ SIMPLE USE OF JIT
|
|||
UNSUPPORTED OPTIONS AND PATTERN ITEMS
|
||||
|
||||
The pcre2_match() options that are supported for JIT matching are
|
||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
|
||||
PCRE2_ANCHORED option is not supported at match time.
|
||||
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
|
||||
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
|
||||
PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options
|
||||
are not supported at match time.
|
||||
|
||||
If the PCRE2_NO_JIT option is passed to pcre2_match() it disables the
|
||||
use of JIT, forcing matching by the interpreter code.
|
||||
|
@ -5164,11 +5189,13 @@ JIT FAST PATH API
|
|||
patterns that have been successfully processed by pcre2_jit_compile()).
|
||||
|
||||
The fast path function is called pcre2_jit_match(), and it takes
|
||||
exactly the same arguments as pcre2_match(). The return values are also
|
||||
the same, plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or
|
||||
complete) is requested that was not compiled. Unsupported option bits
|
||||
(for example, PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT
|
||||
option.
|
||||
exactly the same arguments as pcre2_match(). However, the subject
|
||||
string must be specified with a length; PCRE2_ZERO_TERMINATED is not
|
||||
supported. Unsupported option bits (for example, PCRE2_ANCHORED,
|
||||
PCRE2_ENDANCHORED and PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is
|
||||
the PCRE2_NO_JIT option. The return values are also the same as for
|
||||
pcre2_match(), plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (par-
|
||||
tial or complete) is requested that was not compiled.
|
||||
|
||||
When you call pcre2_match(), as well as testing for invalid options, a
|
||||
number of other sanity checks are performed on the arguments. For exam-
|
||||
|
@ -5195,7 +5222,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 28 June 2018
|
||||
Last updated: 16 October 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_DFA_MATCH 3 "26 April 2018" "PCRE2 10.32"
|
||||
.TH PCRE2_DFA_MATCH 3 "16 October 2018" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -39,6 +39,8 @@ depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not
|
|||
characters. The options are:
|
||||
.sp
|
||||
PCRE2_ANCHORED Match only at the first position
|
||||
PCRE2_COPY_MATCHED_SUBJECT
|
||||
On success, make a private subject copy
|
||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||
PCRE2_NOTBOL Subject is not the beginning of a line
|
||||
PCRE2_NOTEOL Subject is not the end of a line
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_MATCH 3 "14 November 2017" "PCRE2 10.31"
|
||||
.TH PCRE2_MATCH 3 "16 October 2018" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -43,11 +43,13 @@ A match context is needed only if you want to:
|
|||
Change the backtracking depth limit
|
||||
Set custom memory management specifically for the match
|
||||
.sp
|
||||
The \fIlength\fP and \fIstartoffset\fP values are code
|
||||
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
||||
subject that is terminated by a binary zero code unit. The options are:
|
||||
The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
|
||||
The length may be given as PCRE2_ZERO_TERMINATED for a subject that is
|
||||
terminated by a binary zero code unit. The options are:
|
||||
.sp
|
||||
PCRE2_ANCHORED Match only at the first position
|
||||
PCRE2_COPY_MATCHED_SUBJECT
|
||||
On success, make a private subject copy
|
||||
PCRE2_ENDANCHORED Pattern can match only at end of subject
|
||||
PCRE2_NOTBOL Subject string is not the beginning of a line
|
||||
PCRE2_NOTEOL Subject string is not the end of a line
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2_MATCH_DATA_FREE 3 "28 June 2018" "PCRE2 10.32"
|
||||
.TH PCRE2_MATCH_DATA_FREE 3 "16 October 2018" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH SYNOPSIS
|
||||
|
@ -18,6 +18,10 @@ If \fImatch_data\fP is NULL, this function does nothing. Otherwise,
|
|||
using the memory freeing function from the general context or compiled pattern
|
||||
with which it was created, or \fBfree()\fP if that was not set.
|
||||
.P
|
||||
If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this
|
||||
match data block, the copy of the subject that was remembered with the block is
|
||||
also freed.
|
||||
.P
|
||||
There is a complete description of the PCRE2 native API in the
|
||||
.\" HREF
|
||||
\fBpcre2api\fP
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "21 September 2018" "PCRE2 10.33"
|
||||
.TH PCRE2API 3 "16 October 2018" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -1237,13 +1237,19 @@ NULL.
|
|||
NOTE: When one of the matching functions is called, pointers to the compiled
|
||||
pattern and the subject string are set in the match data block so that they can
|
||||
be referenced by the substring extraction functions. After running a match, you
|
||||
must not free a compiled pattern (or a subject string) until after all
|
||||
must not free a compiled pattern or a subject string until after all
|
||||
operations on the
|
||||
.\" HTML <a href="#matchdatablock">
|
||||
.\" </a>
|
||||
match data block
|
||||
.\"
|
||||
have taken place.
|
||||
have taken place, unless, in the case of the subject string, you have used the
|
||||
PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
|
||||
"Option bits for \fBpcre2_match()\fP"
|
||||
.\" HTML <a href="#matchoptions>">
|
||||
.\" </a>
|
||||
below.
|
||||
.\"
|
||||
.P
|
||||
The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit
|
||||
settings that affect the compilation. It should be zero if no options are
|
||||
|
@ -2390,7 +2396,13 @@ When one of the matching functions is called, pointers to the compiled pattern
|
|||
and the subject string are set in the match data block so that they can be
|
||||
referenced by the extraction functions. After running a match, you must not
|
||||
free a compiled pattern or a subject string until after all operations on the
|
||||
match data block (for that match) have taken place.
|
||||
match data block (for that match) have taken place, unless, in the case of the
|
||||
subject string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is
|
||||
described in the section entitled "Option bits for \fBpcre2_match()\fP"
|
||||
.\" HTML <a href="#matchoptions>">
|
||||
.\" </a>
|
||||
below.
|
||||
.\"
|
||||
.P
|
||||
When a match data block itself is no longer needed, it should be freed by
|
||||
calling \fBpcre2_match_data_free()\fP. If this function is called with a NULL
|
||||
|
@ -2507,10 +2519,10 @@ the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
|
|||
.rs
|
||||
.sp
|
||||
The unused bits of the \fIoptions\fP argument for \fBpcre2_match()\fP must be
|
||||
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
|
||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
||||
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT.
|
||||
Their action is described below.
|
||||
zero. The only bits that may be set are PCRE2_ANCHORED,
|
||||
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
|
||||
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
|
||||
PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
|
||||
.P
|
||||
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
|
||||
the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the
|
||||
|
@ -2524,6 +2536,22 @@ matching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out
|
|||
to be anchored by virtue of its contents, it cannot be made unachored at
|
||||
matching time. Note that setting the option at match time disables JIT
|
||||
matching.
|
||||
.sp
|
||||
PCRE2_COPY_MATCHED_SUBJECT
|
||||
.sp
|
||||
By default, a pointer to the subject is remembered in the match data block so
|
||||
that, after a successful match, it can be referenced by the substring
|
||||
extraction functions. This means that the subject's memory must not be freed
|
||||
until all such operations are complete. For some applications where the
|
||||
lifetime of the subject string is not guaranteed, it may be necessary to make a
|
||||
copy of the subject string, but it is wasteful to do this unless the match is
|
||||
successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
|
||||
subject is copied and the new pointer is remembered in the match data block
|
||||
instead of the original subject pointer. The memory allocator that was used for
|
||||
the match block itself is used. The copy is automatically freed when
|
||||
\fBpcre2_match_data_free()\fP is called to free the match data block. It is also
|
||||
automatically freed if the match data block is re-used for another match
|
||||
operation.
|
||||
.sp
|
||||
PCRE2_ENDANCHORED
|
||||
.sp
|
||||
|
@ -2961,7 +2989,8 @@ The backtracking match limit was reached.
|
|||
If a pattern contains many nested backtracking points, heap memory is used to
|
||||
remember them. This error is given when the memory allocation function (default
|
||||
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
|
||||
if the amount of memory needed exceeds the heap limit.
|
||||
if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is
|
||||
also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
|
||||
.sp
|
||||
PCRE2_ERROR_NULL
|
||||
.sp
|
||||
|
@ -3579,11 +3608,12 @@ Here is an example of a simple call to \fBpcre2_dfa_match()\fP:
|
|||
.rs
|
||||
.sp
|
||||
The unused bits of the \fIoptions\fP argument for \fBpcre2_dfa_match()\fP must
|
||||
be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
|
||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST,
|
||||
and PCRE2_DFA_RESTART. All but the last four of these are exactly the same as
|
||||
for \fBpcre2_match()\fP, so their description is not repeated here.
|
||||
be zero. The only bits that may be set are PCRE2_ANCHORED,
|
||||
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
|
||||
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
|
||||
PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last
|
||||
four of these are exactly the same as for \fBpcre2_match()\fP, so their
|
||||
description is not repeated here.
|
||||
.sp
|
||||
PCRE2_PARTIAL_HARD
|
||||
PCRE2_PARTIAL_SOFT
|
||||
|
@ -3737,6 +3767,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 21 September 2018
|
||||
Last updated: 16 October 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2JIT 3 "28 June 2018" "PCRE2 10.32"
|
||||
.TH PCRE2JIT 3 "16 October 2018" "PCRE2 10.33"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT"
|
||||
|
@ -124,9 +124,10 @@ pattern.
|
|||
.rs
|
||||
.sp
|
||||
The \fBpcre2_match()\fP options that are supported for JIT matching are
|
||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
|
||||
PCRE2_ANCHORED option is not supported at match time.
|
||||
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
|
||||
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
|
||||
PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not
|
||||
supported at match time.
|
||||
.P
|
||||
If the PCRE2_NO_JIT option is passed to \fBpcre2_match()\fP it disables the
|
||||
use of JIT, forcing matching by the interpreter code.
|
||||
|
@ -376,10 +377,13 @@ available, and which need the best possible performance, can instead use a
|
|||
processed by \fBpcre2_jit_compile()\fP).
|
||||
.P
|
||||
The fast path function is called \fBpcre2_jit_match()\fP, and it takes exactly
|
||||
the same arguments as \fBpcre2_match()\fP. The return values are also the same,
|
||||
plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is
|
||||
requested that was not compiled. Unsupported option bits (for example,
|
||||
PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option.
|
||||
the same arguments as \fBpcre2_match()\fP. However, the subject string must be
|
||||
specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
|
||||
option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
|
||||
PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
|
||||
return values are also the same as for \fBpcre2_match()\fP, plus
|
||||
PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
|
||||
that was not compiled.
|
||||
.P
|
||||
When you call \fBpcre2_match()\fP, as well as testing for invalid options, a
|
||||
number of other sanity checks are performed on the arguments. For example, if
|
||||
|
@ -412,6 +416,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 28 June 2018
|
||||
Last updated: 16 October 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -167,10 +167,11 @@ D is inspected during pcre2_dfa_match() execution
|
|||
#define PCRE2_JIT_PARTIAL_HARD 0x00000004u
|
||||
#define PCRE2_JIT_INVALID_UTF 0x00000100u
|
||||
|
||||
/* These are for pcre2_match(), pcre2_dfa_match(), and pcre2_jit_match(). Note
|
||||
that PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK can also be passed to these
|
||||
functions (though pcre2_jit_match() ignores the latter since it bypasses all
|
||||
sanity checks). */
|
||||
/* These are for pcre2_match(), pcre2_dfa_match(), pcre2_jit_match(), and
|
||||
pcre2_substitute(). Some are allowed only for one of the functions, and in
|
||||
these cases it is noted below. Note that PCRE2_ANCHORED, PCRE2_ENDANCHORED and
|
||||
PCRE2_NO_UTF_CHECK can also be passed to these functions (though
|
||||
pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */
|
||||
|
||||
#define PCRE2_NOTBOL 0x00000001u
|
||||
#define PCRE2_NOTEOL 0x00000002u
|
||||
|
@ -178,25 +179,15 @@ sanity checks). */
|
|||
#define PCRE2_NOTEMPTY_ATSTART 0x00000008u /* ) adjacent to each other. */
|
||||
#define PCRE2_PARTIAL_SOFT 0x00000010u
|
||||
#define PCRE2_PARTIAL_HARD 0x00000020u
|
||||
|
||||
/* These are additional options for pcre2_dfa_match(). */
|
||||
|
||||
#define PCRE2_DFA_RESTART 0x00000040u
|
||||
#define PCRE2_DFA_SHORTEST 0x00000080u
|
||||
|
||||
/* These are additional options for pcre2_substitute(), which passes any others
|
||||
through to pcre2_match(). */
|
||||
|
||||
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
|
||||
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
|
||||
#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u
|
||||
#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u
|
||||
#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u
|
||||
|
||||
/* A further option for pcre2_match(), not allowed for pcre2_dfa_match(),
|
||||
ignored for pcre2_jit_match(). */
|
||||
|
||||
#define PCRE2_NO_JIT 0x00002000u
|
||||
#define PCRE2_DFA_RESTART 0x00000040u /* pcre2_dfa_match() only */
|
||||
#define PCRE2_DFA_SHORTEST 0x00000080u /* pcre2_dfa_match() only */
|
||||
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u /* pcre2_substitute() only */
|
||||
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u /* pcre2_substitute() only */
|
||||
#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u /* pcre2_substitute() only */
|
||||
#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u /* pcre2_substitute() only */
|
||||
#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u /* pcre2_substitute() only */
|
||||
#define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */
|
||||
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
|
||||
|
||||
/* Options for pcre2_pattern_convert(). */
|
||||
|
||||
|
|
|
@ -85,7 +85,8 @@ in others, so I abandoned this code. */
|
|||
#define PUBLIC_DFA_MATCH_OPTIONS \
|
||||
(PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
|
||||
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
|
||||
PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART)
|
||||
PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART| \
|
||||
PCRE2_COPY_MATCHED_SUBJECT)
|
||||
|
||||
|
||||
/*************************************************
|
||||
|
@ -3228,6 +3229,8 @@ pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
|
|||
pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount)
|
||||
{
|
||||
int rc;
|
||||
int was_zero_terminated = 0;
|
||||
|
||||
const pcre2_real_code *re = (const pcre2_real_code *)code;
|
||||
|
||||
PCRE2_SPTR start_match;
|
||||
|
@ -3267,7 +3270,11 @@ rws->free = RWS_BASE_SIZE - RWS_ANCHOR_SIZE;
|
|||
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
|
||||
subject string. */
|
||||
|
||||
if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
|
||||
if (length == PCRE2_ZERO_TERMINATED)
|
||||
{
|
||||
length = PRIV(strlen)(subject);
|
||||
was_zero_terminated = 1;
|
||||
}
|
||||
|
||||
/* Plausibility checks */
|
||||
|
||||
|
@ -3520,10 +3527,21 @@ if ((re->flags & PCRE2_LASTSET) != 0)
|
|||
}
|
||||
}
|
||||
|
||||
/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT,
|
||||
free the memory that was obtained. */
|
||||
|
||||
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
|
||||
{
|
||||
match_data->memctl.free((void *)match_data->subject,
|
||||
match_data->memctl.memory_data);
|
||||
match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT;
|
||||
}
|
||||
|
||||
/* Fill in fields that are always returned in the match data. */
|
||||
|
||||
match_data->code = re;
|
||||
match_data->subject = subject;
|
||||
match_data->flags = 0;
|
||||
match_data->mark = NULL;
|
||||
match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER;
|
||||
|
||||
|
@ -3818,6 +3836,17 @@ for (;;)
|
|||
match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
|
||||
match_data->startchar = (PCRE2_SIZE)(start_match - subject);
|
||||
match_data->rc = rc;
|
||||
|
||||
if (rc >= 0 &&(options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
|
||||
{
|
||||
length = CU2BYTES(length + was_zero_terminated);
|
||||
match_data->subject = match_data->memctl.malloc(length,
|
||||
match_data->memctl.memory_data);
|
||||
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||
memcpy((void *)match_data->subject, subject, length);
|
||||
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
|
||||
}
|
||||
|
||||
goto EXIT;
|
||||
}
|
||||
|
||||
|
|
|
@ -535,6 +535,10 @@ enum { PCRE2_MATCHEDBY_INTERPRETER, /* pcre2_match() */
|
|||
PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */
|
||||
PCRE2_MATCHEDBY_JIT }; /* pcre2_jit_match() */
|
||||
|
||||
/* Values for the flags field in a match data block. */
|
||||
|
||||
#define PCRE2_MD_COPIED_SUBJECT 0x01u
|
||||
|
||||
/* Magic number to provide a small check against being handed junk. */
|
||||
|
||||
#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */
|
||||
|
|
|
@ -658,7 +658,8 @@ typedef struct pcre2_real_match_data {
|
|||
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
|
||||
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
|
||||
PCRE2_SIZE startchar; /* Offset to starting code unit */
|
||||
uint16_t matchedby; /* Type of match (normal, JIT, DFA) */
|
||||
uint8_t matchedby; /* Type of match (normal, JIT, DFA) */
|
||||
uint8_t flags; /* Various flags */
|
||||
uint16_t oveccount; /* Number of pairs */
|
||||
int rc; /* The return code from the match */
|
||||
PCRE2_SIZE ovector[131072]; /* Must be last in the structure */
|
||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
|||
|
||||
Written by Philip Hazel
|
||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||
New API code Copyright (c) 2016 University of Cambridge
|
||||
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
|
@ -174,6 +174,7 @@ if (rc > (int)oveccount)
|
|||
rc = 0;
|
||||
match_data->code = re;
|
||||
match_data->subject = subject;
|
||||
match_data->flags = 0;
|
||||
match_data->rc = rc;
|
||||
match_data->startchar = arguments.startchar_ptr - subject;
|
||||
match_data->leftchar = 0;
|
||||
|
|
|
@ -69,11 +69,12 @@ information, and fields within it. */
|
|||
#define PUBLIC_MATCH_OPTIONS \
|
||||
(PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
|
||||
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
|
||||
PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT)
|
||||
PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT|PCRE2_COPY_MATCHED_SUBJECT)
|
||||
|
||||
#define PUBLIC_JIT_MATCH_OPTIONS \
|
||||
(PCRE2_NO_UTF_CHECK|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY|\
|
||||
PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_SOFT|PCRE2_PARTIAL_HARD)
|
||||
PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_SOFT|PCRE2_PARTIAL_HARD|\
|
||||
PCRE2_COPY_MATCHED_SUBJECT)
|
||||
|
||||
/* Non-error returns from and within the match() function. Error returns are
|
||||
externally defined PCRE2_ERROR_xxx codes, which are all negative. */
|
||||
|
@ -6009,10 +6010,11 @@ pcre2_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
|
|||
pcre2_match_context *mcontext)
|
||||
{
|
||||
int rc;
|
||||
int was_zero_terminated = 0;
|
||||
const uint8_t *start_bits = NULL;
|
||||
|
||||
const pcre2_real_code *re = (const pcre2_real_code *)code;
|
||||
|
||||
|
||||
BOOL anchored;
|
||||
BOOL firstline;
|
||||
BOOL has_first_cu = FALSE;
|
||||
|
@ -6052,7 +6054,11 @@ mb->stack_frames = (heapframe *)stack_frames_vector;
|
|||
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
|
||||
subject string. */
|
||||
|
||||
if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
|
||||
if (length == PCRE2_ZERO_TERMINATED)
|
||||
{
|
||||
length = PRIV(strlen)(subject);
|
||||
was_zero_terminated = 1;
|
||||
}
|
||||
end_subject = subject + length;
|
||||
|
||||
/* Plausibility checks */
|
||||
|
@ -6167,6 +6173,16 @@ if (mcontext != NULL && mcontext->offset_limit != PCRE2_UNSET &&
|
|||
(re->overall_options & PCRE2_USE_OFFSET_LIMIT) == 0)
|
||||
return PCRE2_ERROR_BADOFFSETLIMIT;
|
||||
|
||||
/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT,
|
||||
free the memory that was obtained. */
|
||||
|
||||
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
|
||||
{
|
||||
match_data->memctl.free((void *)match_data->subject,
|
||||
match_data->memctl.memory_data);
|
||||
match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT;
|
||||
}
|
||||
|
||||
/* If the pattern was successfully studied with JIT support, run the JIT
|
||||
executable instead of the rest of this function. Most options must be set at
|
||||
compile time for the JIT code to be usable. Fallback to the normal code path if
|
||||
|
@ -6178,7 +6194,19 @@ if (re->executable_jit != NULL && (options & ~PUBLIC_JIT_MATCH_OPTIONS) == 0)
|
|||
{
|
||||
rc = pcre2_jit_match(code, subject, length, start_offset, options,
|
||||
match_data, mcontext);
|
||||
if (rc != PCRE2_ERROR_JIT_BADOPTION) return rc;
|
||||
if (rc != PCRE2_ERROR_JIT_BADOPTION)
|
||||
{
|
||||
if (rc >= 0 && (options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
|
||||
{
|
||||
length = CU2BYTES(length + was_zero_terminated);
|
||||
match_data->subject = match_data->memctl.malloc(length,
|
||||
match_data->memctl.memory_data);
|
||||
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||
memcpy((void *)match_data->subject, subject, length);
|
||||
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
|
||||
}
|
||||
return rc;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
|
@ -6819,12 +6847,14 @@ if (mb->match_frames != mb->stack_frames)
|
|||
|
||||
match_data->code = re;
|
||||
match_data->subject = subject;
|
||||
match_data->flags = 0;
|
||||
match_data->mark = mb->mark;
|
||||
match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER;
|
||||
|
||||
/* Handle a fully successful match. Set the return code to the number of
|
||||
captured strings, or 0 if there were too many to fit into the ovector, and then
|
||||
set the remaining returned values before returning. */
|
||||
set the remaining returned values before returning. Make a copy of the subject
|
||||
string if requested. */
|
||||
|
||||
if (rc == MATCH_MATCH)
|
||||
{
|
||||
|
@ -6834,6 +6864,17 @@ if (rc == MATCH_MATCH)
|
|||
match_data->leftchar = mb->start_used_ptr - subject;
|
||||
match_data->rightchar = ((mb->last_used_ptr > mb->end_match_ptr)?
|
||||
mb->last_used_ptr : mb->end_match_ptr) - subject;
|
||||
|
||||
if ((options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
|
||||
{
|
||||
length = CU2BYTES(length + was_zero_terminated);
|
||||
match_data->subject = match_data->memctl.malloc(length,
|
||||
match_data->memctl.memory_data);
|
||||
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||
memcpy((void *)match_data->subject, subject, length);
|
||||
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
|
||||
}
|
||||
|
||||
return match_data->rc;
|
||||
}
|
||||
|
||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
|||
|
||||
Written by Philip Hazel
|
||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||
New API code Copyright (c) 2016-2017 University of Cambridge
|
||||
New API code Copyright (c) 2016-2018 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
|
@ -63,6 +63,7 @@ yield = PRIV(memctl_malloc)(
|
|||
(pcre2_memctl *)gcontext);
|
||||
if (yield == NULL) return NULL;
|
||||
yield->oveccount = oveccount;
|
||||
yield->flags = 0;
|
||||
return yield;
|
||||
}
|
||||
|
||||
|
@ -93,7 +94,12 @@ PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION
|
|||
pcre2_match_data_free(pcre2_match_data *match_data)
|
||||
{
|
||||
if (match_data != NULL)
|
||||
{
|
||||
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
|
||||
match_data->memctl.free((void *)match_data->subject,
|
||||
match_data->memctl.memory_data);
|
||||
match_data->memctl.free(match_data, match_data->memctl.memory_data);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
|
|
@ -620,6 +620,7 @@ static modstruct modlist[] = {
|
|||
{ "convert_glob_separator", MOD_PAT, MOD_CHR, 0, PO(convert_glob_separator) },
|
||||
{ "convert_length", MOD_PAT, MOD_INT, 0, PO(convert_length) },
|
||||
{ "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) },
|
||||
{ "copy_matched_subject", MOD_DAT, MOD_OPT, PCRE2_COPY_MATCHED_SUBJECT, DO(options) },
|
||||
{ "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) },
|
||||
{ "depth_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) },
|
||||
{ "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) },
|
||||
|
@ -4196,11 +4197,13 @@ else fprintf(outfile, "%s%s%s%s%s%s%s",
|
|||
static void
|
||||
show_match_options(uint32_t options)
|
||||
{
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s",
|
||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||
((options & PCRE2_ANCHORED) != 0)? " anchored" : "",
|
||||
((options & PCRE2_COPY_MATCHED_SUBJECT) != 0)? " copy_matched_subject" : "",
|
||||
((options & PCRE2_DFA_RESTART) != 0)? " dfa_restart" : "",
|
||||
((options & PCRE2_DFA_SHORTEST) != 0)? " dfa_shortest" : "",
|
||||
((options & PCRE2_ENDANCHORED) != 0)? " endanchored" : "",
|
||||
((options & PCRE2_NO_JIT) != 0)? " no_jit" : "",
|
||||
((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "",
|
||||
((options & PCRE2_NOTBOL) != 0)? " notbol" : "",
|
||||
((options & PCRE2_NOTEMPTY) != 0)? " notempty" : "",
|
||||
|
@ -7442,6 +7445,25 @@ for (gmatched = 0;; gmatched++)
|
|||
}
|
||||
}
|
||||
|
||||
/* If PCRE2_COPY_MATCHED_SUBJECT was set, check that things are as they
|
||||
should be, but not for fast JIT, where it isn't supported. */
|
||||
|
||||
if ((dat_datctl.options & PCRE2_COPY_MATCHED_SUBJECT) != 0 &&
|
||||
(pat_patctl.control & CTL_JITFAST) == 0)
|
||||
{
|
||||
if ((FLD(match_data, flags) & PCRE2_MD_COPIED_SUBJECT) == 0)
|
||||
fprintf(outfile,
|
||||
"** PCRE2 error: flag not set after copy_matched_subject\n");
|
||||
|
||||
if (CASTFLD(void *, match_data, subject) == pp)
|
||||
fprintf(outfile,
|
||||
"** PCRE2 error: copy_matched_subject has not copied\n");
|
||||
|
||||
if (memcmp(CASTFLD(void *, match_data, subject), pp, ulen) != 0)
|
||||
fprintf(outfile,
|
||||
"** PCRE2 error: copy_matched_subject mismatch\n");
|
||||
}
|
||||
|
||||
/* If this is not the first time round a global loop, check that the
|
||||
returned string has changed. If it has not, check for an empty string match
|
||||
at different starting offset from the previous match. This is a failed test
|
||||
|
|
|
@ -299,9 +299,9 @@
|
|||
# ----
|
||||
|
||||
/[aC]/mg,firstline,newline=lf
|
||||
match\nmatch
|
||||
match\nmatch
|
||||
|
||||
/[aCz]/mg,firstline,newline=lf
|
||||
match\nmatch
|
||||
match\nmatch
|
||||
|
||||
# End of testinput17
|
||||
|
|
|
@ -5531,4 +5531,11 @@ a)"xI
|
|||
|
||||
/(?(*script_run:xxx)zzz)/
|
||||
|
||||
/foobar/
|
||||
the foobar thing\=copy_matched_subject
|
||||
the foobar thing\=copy_matched_subject,zero_terminate
|
||||
|
||||
/foobar/g
|
||||
the foobar thing foobar again\=copy_matched_subject
|
||||
|
||||
# End of testinput2
|
||||
|
|
|
@ -4955,4 +4955,11 @@
|
|||
\= Expect no match
|
||||
\na
|
||||
|
||||
/foobar/
|
||||
the foobar thing\=copy_matched_subject
|
||||
the foobar thing\=copy_matched_subject,zero_terminate
|
||||
|
||||
/foobar/g
|
||||
the foobar thing foobar again\=copy_matched_subject
|
||||
|
||||
# End of testinput6
|
||||
|
|
|
@ -543,11 +543,11 @@ Failed: error -47: match limit exceeded
|
|||
# ----
|
||||
|
||||
/[aC]/mg,firstline,newline=lf
|
||||
match\nmatch
|
||||
match\nmatch
|
||||
0: a (JIT)
|
||||
|
||||
/[aCz]/mg,firstline,newline=lf
|
||||
match\nmatch
|
||||
match\nmatch
|
||||
0: a (JIT)
|
||||
|
||||
# End of testinput17
|
||||
|
|
|
@ -16821,6 +16821,17 @@ Failed: error 128 at offset 10: assertion expected after (?( or (?(?C)
|
|||
/(?(*script_run:xxx)zzz)/
|
||||
Failed: error 128 at offset 14: assertion expected after (?( or (?(?C)
|
||||
|
||||
/foobar/
|
||||
the foobar thing\=copy_matched_subject
|
||||
0: foobar
|
||||
the foobar thing\=copy_matched_subject,zero_terminate
|
||||
0: foobar
|
||||
|
||||
/foobar/g
|
||||
the foobar thing foobar again\=copy_matched_subject
|
||||
0: foobar
|
||||
0: foobar
|
||||
|
||||
# End of testinput2
|
||||
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
|
||||
Error -62: bad serialized data
|
||||
|
|
|
@ -7783,4 +7783,15 @@ No match
|
|||
\na
|
||||
No match
|
||||
|
||||
/foobar/
|
||||
the foobar thing\=copy_matched_subject
|
||||
0: foobar
|
||||
the foobar thing\=copy_matched_subject,zero_terminate
|
||||
0: foobar
|
||||
|
||||
/foobar/g
|
||||
the foobar thing foobar again\=copy_matched_subject
|
||||
0: foobar
|
||||
0: foobar
|
||||
|
||||
# End of testinput6
|
||||
|
|
Loading…
Reference in New Issue