Implement PCRE2_COPY_MATCHED_SUBJECT.

This commit is contained in:
Philip.Hazel 2018-10-17 08:33:38 +00:00
parent 971f885277
commit f90ce1a333
26 changed files with 684 additions and 443 deletions

View File

@ -37,6 +37,10 @@ src/pcre2_chartables.c.dist are updated.
ranges such as a-z in EBCDIC environments. The original code probably never
worked, though there were no bug reports.
10. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via
pcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast
path.
Version 10.32 10-September-2018
-------------------------------

View File

@ -51,6 +51,8 @@ depth limits. The <i>length</i> and <i>startoffset</i> values are code units, no
characters. The options are:
<pre>
PCRE2_ANCHORED Match only at the first position
PCRE2_COPY_MATCHED_SUBJECT
On success, make a private subject copy
PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_NOTBOL Subject is not the beginning of a line
PCRE2_NOTEOL Subject is not the end of a line

View File

@ -55,11 +55,13 @@ A match context is needed only if you want to:
Change the backtracking depth limit
Set custom memory management specifically for the match
</pre>
The <i>length</i> and <i>startoffset</i> values are code
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
subject that is terminated by a binary zero code unit. The options are:
The <i>length</i> and <i>startoffset</i> values are code units, not characters.
The length may be given as PCRE2_ZERO_TERMINATED for a subject that is
terminated by a binary zero code unit. The options are:
<pre>
PCRE2_ANCHORED Match only at the first position
PCRE2_COPY_MATCHED_SUBJECT
On success, make a private subject copy
PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_NOTBOL Subject string is not the beginning of a line
PCRE2_NOTEOL Subject string is not the end of a line

View File

@ -31,6 +31,11 @@ using the memory freeing function from the general context or compiled pattern
with which it was created, or <b>free()</b> if that was not set.
</P>
<P>
If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this
match data block, the copy of the subject that was remembered with the block is
also freed.
</P>
<P>
There is a complete description of the PCRE2 native API in the
<a href="pcre2api.html"><b>pcre2api</b></a>
page and a description of the POSIX API in the

View File

@ -1305,10 +1305,13 @@ NULL.
NOTE: When one of the matching functions is called, pointers to the compiled
pattern and the subject string are set in the match data block so that they can
be referenced by the substring extraction functions. After running a match, you
must not free a compiled pattern (or a subject string) until after all
must not free a compiled pattern or a subject string until after all
operations on the
<a href="#matchdatablock">match data block</a>
have taken place.
have taken place, unless, in the case of the subject string, you have used the
PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
"Option bits for <b>pcre2_match()</b>"
<a href="#matchoptions>">below.</a>
</P>
<P>
The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
@ -2419,7 +2422,10 @@ When one of the matching functions is called, pointers to the compiled pattern
and the subject string are set in the match data block so that they can be
referenced by the extraction functions. After running a match, you must not
free a compiled pattern or a subject string until after all operations on the
match data block (for that match) have taken place.
match data block (for that match) have taken place, unless, in the case of the
subject string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is
described in the section entitled "Option bits for <b>pcre2_match()</b>"
<a href="#matchoptions>">below.</a>
</P>
<P>
When a match data block itself is no longer needed, it should be freed by
@ -2531,10 +2537,10 @@ Option bits for <b>pcre2_match()</b>
</b><br>
<P>
The unused bits of the <i>options</i> argument for <b>pcre2_match()</b> must be
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT.
Their action is described below.
zero. The only bits that may be set are PCRE2_ANCHORED,
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
</P>
<P>
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
@ -2549,6 +2555,22 @@ matching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out
to be anchored by virtue of its contents, it cannot be made unachored at
matching time. Note that setting the option at match time disables JIT
matching.
<pre>
PCRE2_COPY_MATCHED_SUBJECT
</pre>
By default, a pointer to the subject is remembered in the match data block so
that, after a successful match, it can be referenced by the substring
extraction functions. This means that the subject's memory must not be freed
until all such operations are complete. For some applications where the
lifetime of the subject string is not guaranteed, it may be necessary to make a
copy of the subject string, but it is wasteful to do this unless the match is
successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
subject is copied and the new pointer is remembered in the match data block
instead of the original subject pointer. The memory allocator that was used for
the match block itself is used. The copy is automatically freed when
<b>pcre2_match_data_free()</b> is called to free the match data block. It is also
automatically freed if the match data block is re-used for another match
operation.
<pre>
PCRE2_ENDANCHORED
</pre>
@ -2954,7 +2976,8 @@ The backtracking match limit was reached.
If a pattern contains many nested backtracking points, heap memory is used to
remember them. This error is given when the memory allocation function (default
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
if the amount of memory needed exceeds the heap limit.
if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is
also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
<pre>
PCRE2_ERROR_NULL
</pre>
@ -3584,11 +3607,12 @@ Option bits for <b>pcre_dfa_match()</b>
</b><br>
<P>
The unused bits of the <i>options</i> argument for <b>pcre2_dfa_match()</b> must
be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST,
and PCRE2_DFA_RESTART. All but the last four of these are exactly the same as
for <b>pcre2_match()</b>, so their description is not repeated here.
be zero. The only bits that may be set are PCRE2_ANCHORED,
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last
four of these are exactly the same as for <b>pcre2_match()</b>, so their
description is not repeated here.
<pre>
PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT
@ -3732,7 +3756,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 21 September 2018
Last updated: 16 October 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

View File

@ -147,9 +147,10 @@ pattern.
<br><a name="SEC4" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br>
<P>
The <b>pcre2_match()</b> options that are supported for JIT matching are
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
PCRE2_ANCHORED option is not supported at match time.
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not
supported at match time.
</P>
<P>
If the PCRE2_NO_JIT option is passed to <b>pcre2_match()</b> it disables the
@ -402,10 +403,13 @@ processed by <b>pcre2_jit_compile()</b>).
</P>
<P>
The fast path function is called <b>pcre2_jit_match()</b>, and it takes exactly
the same arguments as <b>pcre2_match()</b>. The return values are also the same,
plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is
requested that was not compiled. Unsupported option bits (for example,
PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option.
the same arguments as <b>pcre2_match()</b>. However, the subject string must be
specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
return values are also the same as for <b>pcre2_match()</b>, plus
PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
that was not compiled.
</P>
<P>
When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
@ -434,7 +438,7 @@ Cambridge, England.
</P>
<br><a name="SEC13" href="#TOC1">REVISION</a><br>
<P>
Last updated: 28 June 2018
Last updated: 16 October 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
.TH PCRE2_DFA_MATCH 3 "26 April 2018" "PCRE2 10.32"
.TH PCRE2_DFA_MATCH 3 "16 October 2018" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -39,6 +39,8 @@ depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not
characters. The options are:
.sp
PCRE2_ANCHORED Match only at the first position
PCRE2_COPY_MATCHED_SUBJECT
On success, make a private subject copy
PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_NOTBOL Subject is not the beginning of a line
PCRE2_NOTEOL Subject is not the end of a line

View File

@ -1,4 +1,4 @@
.TH PCRE2_MATCH 3 "14 November 2017" "PCRE2 10.31"
.TH PCRE2_MATCH 3 "16 October 2018" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -43,11 +43,13 @@ A match context is needed only if you want to:
Change the backtracking depth limit
Set custom memory management specifically for the match
.sp
The \fIlength\fP and \fIstartoffset\fP values are code
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
subject that is terminated by a binary zero code unit. The options are:
The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
The length may be given as PCRE2_ZERO_TERMINATED for a subject that is
terminated by a binary zero code unit. The options are:
.sp
PCRE2_ANCHORED Match only at the first position
PCRE2_COPY_MATCHED_SUBJECT
On success, make a private subject copy
PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_NOTBOL Subject string is not the beginning of a line
PCRE2_NOTEOL Subject string is not the end of a line

View File

@ -1,4 +1,4 @@
.TH PCRE2_MATCH_DATA_FREE 3 "28 June 2018" "PCRE2 10.32"
.TH PCRE2_MATCH_DATA_FREE 3 "16 October 2018" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -18,6 +18,10 @@ If \fImatch_data\fP is NULL, this function does nothing. Otherwise,
using the memory freeing function from the general context or compiled pattern
with which it was created, or \fBfree()\fP if that was not set.
.P
If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this
match data block, the copy of the subject that was remembered with the block is
also freed.
.P
There is a complete description of the PCRE2 native API in the
.\" HREF
\fBpcre2api\fP

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "21 September 2018" "PCRE2 10.33"
.TH PCRE2API 3 "16 October 2018" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -1237,13 +1237,19 @@ NULL.
NOTE: When one of the matching functions is called, pointers to the compiled
pattern and the subject string are set in the match data block so that they can
be referenced by the substring extraction functions. After running a match, you
must not free a compiled pattern (or a subject string) until after all
must not free a compiled pattern or a subject string until after all
operations on the
.\" HTML <a href="#matchdatablock">
.\" </a>
match data block
.\"
have taken place.
have taken place, unless, in the case of the subject string, you have used the
PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
"Option bits for \fBpcre2_match()\fP"
.\" HTML <a href="#matchoptions>">
.\" </a>
below.
.\"
.P
The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit
settings that affect the compilation. It should be zero if no options are
@ -2390,7 +2396,13 @@ When one of the matching functions is called, pointers to the compiled pattern
and the subject string are set in the match data block so that they can be
referenced by the extraction functions. After running a match, you must not
free a compiled pattern or a subject string until after all operations on the
match data block (for that match) have taken place.
match data block (for that match) have taken place, unless, in the case of the
subject string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is
described in the section entitled "Option bits for \fBpcre2_match()\fP"
.\" HTML <a href="#matchoptions>">
.\" </a>
below.
.\"
.P
When a match data block itself is no longer needed, it should be freed by
calling \fBpcre2_match_data_free()\fP. If this function is called with a NULL
@ -2507,10 +2519,10 @@ the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
.rs
.sp
The unused bits of the \fIoptions\fP argument for \fBpcre2_match()\fP must be
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT.
Their action is described below.
zero. The only bits that may be set are PCRE2_ANCHORED,
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
.P
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the
@ -2524,6 +2536,22 @@ matching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out
to be anchored by virtue of its contents, it cannot be made unachored at
matching time. Note that setting the option at match time disables JIT
matching.
.sp
PCRE2_COPY_MATCHED_SUBJECT
.sp
By default, a pointer to the subject is remembered in the match data block so
that, after a successful match, it can be referenced by the substring
extraction functions. This means that the subject's memory must not be freed
until all such operations are complete. For some applications where the
lifetime of the subject string is not guaranteed, it may be necessary to make a
copy of the subject string, but it is wasteful to do this unless the match is
successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
subject is copied and the new pointer is remembered in the match data block
instead of the original subject pointer. The memory allocator that was used for
the match block itself is used. The copy is automatically freed when
\fBpcre2_match_data_free()\fP is called to free the match data block. It is also
automatically freed if the match data block is re-used for another match
operation.
.sp
PCRE2_ENDANCHORED
.sp
@ -2961,7 +2989,8 @@ The backtracking match limit was reached.
If a pattern contains many nested backtracking points, heap memory is used to
remember them. This error is given when the memory allocation function (default
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
if the amount of memory needed exceeds the heap limit.
if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is
also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
.sp
PCRE2_ERROR_NULL
.sp
@ -3579,11 +3608,12 @@ Here is an example of a simple call to \fBpcre2_dfa_match()\fP:
.rs
.sp
The unused bits of the \fIoptions\fP argument for \fBpcre2_dfa_match()\fP must
be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED,
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST,
and PCRE2_DFA_RESTART. All but the last four of these are exactly the same as
for \fBpcre2_match()\fP, so their description is not repeated here.
be zero. The only bits that may be set are PCRE2_ANCHORED,
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last
four of these are exactly the same as for \fBpcre2_match()\fP, so their
description is not repeated here.
.sp
PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT
@ -3737,6 +3767,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 21 September 2018
Last updated: 16 October 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2JIT 3 "28 June 2018" "PCRE2 10.32"
.TH PCRE2JIT 3 "16 October 2018" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT"
@ -124,9 +124,10 @@ pattern.
.rs
.sp
The \fBpcre2_match()\fP options that are supported for JIT matching are
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
PCRE2_ANCHORED option is not supported at match time.
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not
supported at match time.
.P
If the PCRE2_NO_JIT option is passed to \fBpcre2_match()\fP it disables the
use of JIT, forcing matching by the interpreter code.
@ -376,10 +377,13 @@ available, and which need the best possible performance, can instead use a
processed by \fBpcre2_jit_compile()\fP).
.P
The fast path function is called \fBpcre2_jit_match()\fP, and it takes exactly
the same arguments as \fBpcre2_match()\fP. The return values are also the same,
plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is
requested that was not compiled. Unsupported option bits (for example,
PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option.
the same arguments as \fBpcre2_match()\fP. However, the subject string must be
specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
return values are also the same as for \fBpcre2_match()\fP, plus
PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
that was not compiled.
.P
When you call \fBpcre2_match()\fP, as well as testing for invalid options, a
number of other sanity checks are performed on the arguments. For example, if
@ -412,6 +416,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 28 June 2018
Last updated: 16 October 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -167,36 +167,27 @@ D is inspected during pcre2_dfa_match() execution
#define PCRE2_JIT_PARTIAL_HARD 0x00000004u
#define PCRE2_JIT_INVALID_UTF 0x00000100u
/* These are for pcre2_match(), pcre2_dfa_match(), and pcre2_jit_match(). Note
that PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK can also be passed to these
functions (though pcre2_jit_match() ignores the latter since it bypasses all
sanity checks). */
/* These are for pcre2_match(), pcre2_dfa_match(), pcre2_jit_match(), and
pcre2_substitute(). Some are allowed only for one of the functions, and in
these cases it is noted below. Note that PCRE2_ANCHORED, PCRE2_ENDANCHORED and
PCRE2_NO_UTF_CHECK can also be passed to these functions (though
pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */
#define PCRE2_NOTBOL 0x00000001u
#define PCRE2_NOTEOL 0x00000002u
#define PCRE2_NOTEMPTY 0x00000004u /* ) These two must be kept */
#define PCRE2_NOTEMPTY_ATSTART 0x00000008u /* ) adjacent to each other. */
#define PCRE2_PARTIAL_SOFT 0x00000010u
#define PCRE2_PARTIAL_HARD 0x00000020u
/* These are additional options for pcre2_dfa_match(). */
#define PCRE2_DFA_RESTART 0x00000040u
#define PCRE2_DFA_SHORTEST 0x00000080u
/* These are additional options for pcre2_substitute(), which passes any others
through to pcre2_match(). */
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u
#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u
#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u
/* A further option for pcre2_match(), not allowed for pcre2_dfa_match(),
ignored for pcre2_jit_match(). */
#define PCRE2_NO_JIT 0x00002000u
#define PCRE2_NOTBOL 0x00000001u
#define PCRE2_NOTEOL 0x00000002u
#define PCRE2_NOTEMPTY 0x00000004u /* ) These two must be kept */
#define PCRE2_NOTEMPTY_ATSTART 0x00000008u /* ) adjacent to each other. */
#define PCRE2_PARTIAL_SOFT 0x00000010u
#define PCRE2_PARTIAL_HARD 0x00000020u
#define PCRE2_DFA_RESTART 0x00000040u /* pcre2_dfa_match() only */
#define PCRE2_DFA_SHORTEST 0x00000080u /* pcre2_dfa_match() only */
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u /* pcre2_substitute() only */
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u /* pcre2_substitute() only */
#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u /* pcre2_substitute() only */
#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u /* pcre2_substitute() only */
#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u /* pcre2_substitute() only */
#define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
/* Options for pcre2_pattern_convert(). */

View File

@ -85,7 +85,8 @@ in others, so I abandoned this code. */
#define PUBLIC_DFA_MATCH_OPTIONS \
(PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART)
PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART| \
PCRE2_COPY_MATCHED_SUBJECT)
/*************************************************
@ -3228,6 +3229,8 @@ pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount)
{
int rc;
int was_zero_terminated = 0;
const pcre2_real_code *re = (const pcre2_real_code *)code;
PCRE2_SPTR start_match;
@ -3267,7 +3270,11 @@ rws->free = RWS_BASE_SIZE - RWS_ANCHOR_SIZE;
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
subject string. */
if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
if (length == PCRE2_ZERO_TERMINATED)
{
length = PRIV(strlen)(subject);
was_zero_terminated = 1;
}
/* Plausibility checks */
@ -3520,10 +3527,21 @@ if ((re->flags & PCRE2_LASTSET) != 0)
}
}
/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT,
free the memory that was obtained. */
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
{
match_data->memctl.free((void *)match_data->subject,
match_data->memctl.memory_data);
match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT;
}
/* Fill in fields that are always returned in the match data. */
match_data->code = re;
match_data->subject = subject;
match_data->flags = 0;
match_data->mark = NULL;
match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER;
@ -3818,6 +3836,17 @@ for (;;)
match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
match_data->startchar = (PCRE2_SIZE)(start_match - subject);
match_data->rc = rc;
if (rc >= 0 &&(options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
{
length = CU2BYTES(length + was_zero_terminated);
match_data->subject = match_data->memctl.malloc(length,
match_data->memctl.memory_data);
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
memcpy((void *)match_data->subject, subject, length);
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
}
goto EXIT;
}

View File

@ -534,6 +534,10 @@ bytes in a code unit in that mode. */
enum { PCRE2_MATCHEDBY_INTERPRETER, /* pcre2_match() */
PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */
PCRE2_MATCHEDBY_JIT }; /* pcre2_jit_match() */
/* Values for the flags field in a match data block. */
#define PCRE2_MD_COPIED_SUBJECT 0x01u
/* Magic number to provide a small check against being handed junk. */

View File

@ -658,7 +658,8 @@ typedef struct pcre2_real_match_data {
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
PCRE2_SIZE startchar; /* Offset to starting code unit */
uint16_t matchedby; /* Type of match (normal, JIT, DFA) */
uint8_t matchedby; /* Type of match (normal, JIT, DFA) */
uint8_t flags; /* Various flags */
uint16_t oveccount; /* Number of pairs */
int rc; /* The return code from the match */
PCRE2_SIZE ovector[131072]; /* Must be last in the structure */

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016 University of Cambridge
New API code Copyright (c) 2016-2018 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -174,6 +174,7 @@ if (rc > (int)oveccount)
rc = 0;
match_data->code = re;
match_data->subject = subject;
match_data->flags = 0;
match_data->rc = rc;
match_data->startchar = arguments.startchar_ptr - subject;
match_data->leftchar = 0;

View File

@ -69,11 +69,12 @@ information, and fields within it. */
#define PUBLIC_MATCH_OPTIONS \
(PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT)
PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT|PCRE2_COPY_MATCHED_SUBJECT)
#define PUBLIC_JIT_MATCH_OPTIONS \
(PCRE2_NO_UTF_CHECK|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY|\
PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_SOFT|PCRE2_PARTIAL_HARD)
PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_SOFT|PCRE2_PARTIAL_HARD|\
PCRE2_COPY_MATCHED_SUBJECT)
/* Non-error returns from and within the match() function. Error returns are
externally defined PCRE2_ERROR_xxx codes, which are all negative. */
@ -5014,7 +5015,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
must record a backtracking point and also set up a chained frame. */
case OP_ONCE:
case OP_SCRIPT_RUN:
case OP_SCRIPT_RUN:
case OP_SBRA:
Lframe_type = GF_NOCAPTURE | Fop;
@ -5526,14 +5527,14 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
case OP_ASSERT_NOT:
case OP_ASSERTBACK_NOT:
RRETURN(MATCH_MATCH);
/* At the end of a script run, apply the script-checking rules. This code
will never by exercised if Unicode support it not compiled, because in
/* At the end of a script run, apply the script-checking rules. This code
will never by exercised if Unicode support it not compiled, because in
that environment script runs cause an error at compile time. */
case OP_SCRIPT_RUN:
if (!PRIV(script_run)(P->eptr, Feptr, utf)) RRETURN(MATCH_NOMATCH);
break;
break;
/* Whole-pattern recursion is coded as a recurse into group 0, so it
won't be picked up here. Instead, we catch it when the OP_END is reached.
@ -6009,10 +6010,11 @@ pcre2_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
pcre2_match_context *mcontext)
{
int rc;
int was_zero_terminated = 0;
const uint8_t *start_bits = NULL;
const pcre2_real_code *re = (const pcre2_real_code *)code;
BOOL anchored;
BOOL firstline;
BOOL has_first_cu = FALSE;
@ -6052,7 +6054,11 @@ mb->stack_frames = (heapframe *)stack_frames_vector;
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
subject string. */
if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
if (length == PCRE2_ZERO_TERMINATED)
{
length = PRIV(strlen)(subject);
was_zero_terminated = 1;
}
end_subject = subject + length;
/* Plausibility checks */
@ -6166,6 +6172,16 @@ time. */
if (mcontext != NULL && mcontext->offset_limit != PCRE2_UNSET &&
(re->overall_options & PCRE2_USE_OFFSET_LIMIT) == 0)
return PCRE2_ERROR_BADOFFSETLIMIT;
/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT,
free the memory that was obtained. */
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
{
match_data->memctl.free((void *)match_data->subject,
match_data->memctl.memory_data);
match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT;
}
/* If the pattern was successfully studied with JIT support, run the JIT
executable instead of the rest of this function. Most options must be set at
@ -6178,7 +6194,19 @@ if (re->executable_jit != NULL && (options & ~PUBLIC_JIT_MATCH_OPTIONS) == 0)
{
rc = pcre2_jit_match(code, subject, length, start_offset, options,
match_data, mcontext);
if (rc != PCRE2_ERROR_JIT_BADOPTION) return rc;
if (rc != PCRE2_ERROR_JIT_BADOPTION)
{
if (rc >= 0 && (options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
{
length = CU2BYTES(length + was_zero_terminated);
match_data->subject = match_data->memctl.malloc(length,
match_data->memctl.memory_data);
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
memcpy((void *)match_data->subject, subject, length);
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
}
return rc;
}
}
#endif
@ -6819,12 +6847,14 @@ if (mb->match_frames != mb->stack_frames)
match_data->code = re;
match_data->subject = subject;
match_data->flags = 0;
match_data->mark = mb->mark;
match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER;
/* Handle a fully successful match. Set the return code to the number of
captured strings, or 0 if there were too many to fit into the ovector, and then
set the remaining returned values before returning. */
set the remaining returned values before returning. Make a copy of the subject
string if requested. */
if (rc == MATCH_MATCH)
{
@ -6834,6 +6864,17 @@ if (rc == MATCH_MATCH)
match_data->leftchar = mb->start_used_ptr - subject;
match_data->rightchar = ((mb->last_used_ptr > mb->end_match_ptr)?
mb->last_used_ptr : mb->end_match_ptr) - subject;
if ((options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
{
length = CU2BYTES(length + was_zero_terminated);
match_data->subject = match_data->memctl.malloc(length,
match_data->memctl.memory_data);
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
memcpy((void *)match_data->subject, subject, length);
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
}
return match_data->rc;
}

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2017 University of Cambridge
New API code Copyright (c) 2016-2018 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -63,6 +63,7 @@ yield = PRIV(memctl_malloc)(
(pcre2_memctl *)gcontext);
if (yield == NULL) return NULL;
yield->oveccount = oveccount;
yield->flags = 0;
return yield;
}
@ -93,7 +94,12 @@ PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION
pcre2_match_data_free(pcre2_match_data *match_data)
{
if (match_data != NULL)
{
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
match_data->memctl.free((void *)match_data->subject,
match_data->memctl.memory_data);
match_data->memctl.free(match_data, match_data->memctl.memory_data);
}
}

View File

@ -620,6 +620,7 @@ static modstruct modlist[] = {
{ "convert_glob_separator", MOD_PAT, MOD_CHR, 0, PO(convert_glob_separator) },
{ "convert_length", MOD_PAT, MOD_INT, 0, PO(convert_length) },
{ "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) },
{ "copy_matched_subject", MOD_DAT, MOD_OPT, PCRE2_COPY_MATCHED_SUBJECT, DO(options) },
{ "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) },
{ "depth_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) },
{ "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) },
@ -4180,7 +4181,7 @@ else fprintf(outfile, "%s%s%s%s%s%s%s",
((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "",
((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "",
((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "",
((options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)? " escaped_cr_is_lf" : "",
((options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)? " escaped_cr_is_lf" : "",
after);
}
@ -4196,11 +4197,13 @@ else fprintf(outfile, "%s%s%s%s%s%s%s",
static void
show_match_options(uint32_t options)
{
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s",
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s",
((options & PCRE2_ANCHORED) != 0)? " anchored" : "",
((options & PCRE2_COPY_MATCHED_SUBJECT) != 0)? " copy_matched_subject" : "",
((options & PCRE2_DFA_RESTART) != 0)? " dfa_restart" : "",
((options & PCRE2_DFA_SHORTEST) != 0)? " dfa_shortest" : "",
((options & PCRE2_ENDANCHORED) != 0)? " endanchored" : "",
((options & PCRE2_NO_JIT) != 0)? " no_jit" : "",
((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "",
((options & PCRE2_NOTBOL) != 0)? " notbol" : "",
((options & PCRE2_NOTEMPTY) != 0)? " notempty" : "",
@ -7442,6 +7445,25 @@ for (gmatched = 0;; gmatched++)
}
}
/* If PCRE2_COPY_MATCHED_SUBJECT was set, check that things are as they
should be, but not for fast JIT, where it isn't supported. */
if ((dat_datctl.options & PCRE2_COPY_MATCHED_SUBJECT) != 0 &&
(pat_patctl.control & CTL_JITFAST) == 0)
{
if ((FLD(match_data, flags) & PCRE2_MD_COPIED_SUBJECT) == 0)
fprintf(outfile,
"** PCRE2 error: flag not set after copy_matched_subject\n");
if (CASTFLD(void *, match_data, subject) == pp)
fprintf(outfile,
"** PCRE2 error: copy_matched_subject has not copied\n");
if (memcmp(CASTFLD(void *, match_data, subject), pp, ulen) != 0)
fprintf(outfile,
"** PCRE2 error: copy_matched_subject mismatch\n");
}
/* If this is not the first time round a global loop, check that the
returned string has changed. If it has not, check for an empty string match
at different starting offset from the previous match. This is a failed test

View File

@ -299,9 +299,9 @@
# ----
/[aC]/mg,firstline,newline=lf
match\nmatch
match\nmatch
/[aCz]/mg,firstline,newline=lf
match\nmatch
match\nmatch
# End of testinput17

7
testdata/testinput2 vendored
View File

@ -5531,4 +5531,11 @@ a)"xI
/(?(*script_run:xxx)zzz)/
/foobar/
the foobar thing\=copy_matched_subject
the foobar thing\=copy_matched_subject,zero_terminate
/foobar/g
the foobar thing foobar again\=copy_matched_subject
# End of testinput2

7
testdata/testinput6 vendored
View File

@ -4955,4 +4955,11 @@
\= Expect no match
\na
/foobar/
the foobar thing\=copy_matched_subject
the foobar thing\=copy_matched_subject,zero_terminate
/foobar/g
the foobar thing foobar again\=copy_matched_subject
# End of testinput6

View File

@ -543,11 +543,11 @@ Failed: error -47: match limit exceeded
# ----
/[aC]/mg,firstline,newline=lf
match\nmatch
match\nmatch
0: a (JIT)
/[aCz]/mg,firstline,newline=lf
match\nmatch
match\nmatch
0: a (JIT)
# End of testinput17

11
testdata/testoutput2 vendored
View File

@ -16821,6 +16821,17 @@ Failed: error 128 at offset 10: assertion expected after (?( or (?(?C)
/(?(*script_run:xxx)zzz)/
Failed: error 128 at offset 14: assertion expected after (?( or (?(?C)
/foobar/
the foobar thing\=copy_matched_subject
0: foobar
the foobar thing\=copy_matched_subject,zero_terminate
0: foobar
/foobar/g
the foobar thing foobar again\=copy_matched_subject
0: foobar
0: foobar
# End of testinput2
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data

11
testdata/testoutput6 vendored
View File

@ -7783,4 +7783,15 @@ No match
\na
No match
/foobar/
the foobar thing\=copy_matched_subject
0: foobar
the foobar thing\=copy_matched_subject,zero_terminate
0: foobar
/foobar/g
the foobar thing foobar again\=copy_matched_subject
0: foobar
0: foobar
# End of testinput6