Implement PCRE2_COPY_MATCHED_SUBJECT.

This commit is contained in:
Philip.Hazel 2018-10-17 08:33:38 +00:00
parent 971f885277
commit f90ce1a333
26 changed files with 684 additions and 443 deletions

View File

@ -37,6 +37,10 @@ src/pcre2_chartables.c.dist are updated.
ranges such as a-z in EBCDIC environments. The original code probably never ranges such as a-z in EBCDIC environments. The original code probably never
worked, though there were no bug reports. worked, though there were no bug reports.
10. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via
pcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast
path.
Version 10.32 10-September-2018 Version 10.32 10-September-2018
------------------------------- -------------------------------

View File

@ -51,6 +51,8 @@ depth limits. The <i>length</i> and <i>startoffset</i> values are code units, no
characters. The options are: characters. The options are:
<pre> <pre>
PCRE2_ANCHORED Match only at the first position PCRE2_ANCHORED Match only at the first position
PCRE2_COPY_MATCHED_SUBJECT
On success, make a private subject copy
PCRE2_ENDANCHORED Pattern can match only at end of subject PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_NOTBOL Subject is not the beginning of a line PCRE2_NOTBOL Subject is not the beginning of a line
PCRE2_NOTEOL Subject is not the end of a line PCRE2_NOTEOL Subject is not the end of a line

View File

@ -55,11 +55,13 @@ A match context is needed only if you want to:
Change the backtracking depth limit Change the backtracking depth limit
Set custom memory management specifically for the match Set custom memory management specifically for the match
</pre> </pre>
The <i>length</i> and <i>startoffset</i> values are code The <i>length</i> and <i>startoffset</i> values are code units, not characters.
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a The length may be given as PCRE2_ZERO_TERMINATED for a subject that is
subject that is terminated by a binary zero code unit. The options are: terminated by a binary zero code unit. The options are:
<pre> <pre>
PCRE2_ANCHORED Match only at the first position PCRE2_ANCHORED Match only at the first position
PCRE2_COPY_MATCHED_SUBJECT
On success, make a private subject copy
PCRE2_ENDANCHORED Pattern can match only at end of subject PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_NOTBOL Subject string is not the beginning of a line PCRE2_NOTBOL Subject string is not the beginning of a line
PCRE2_NOTEOL Subject string is not the end of a line PCRE2_NOTEOL Subject string is not the end of a line

View File

@ -31,6 +31,11 @@ using the memory freeing function from the general context or compiled pattern
with which it was created, or <b>free()</b> if that was not set. with which it was created, or <b>free()</b> if that was not set.
</P> </P>
<P> <P>
If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this
match data block, the copy of the subject that was remembered with the block is
also freed.
</P>
<P>
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API in the
<a href="pcre2api.html"><b>pcre2api</b></a> <a href="pcre2api.html"><b>pcre2api</b></a>
page and a description of the POSIX API in the page and a description of the POSIX API in the

View File

@ -1305,10 +1305,13 @@ NULL.
NOTE: When one of the matching functions is called, pointers to the compiled NOTE: When one of the matching functions is called, pointers to the compiled
pattern and the subject string are set in the match data block so that they can pattern and the subject string are set in the match data block so that they can
be referenced by the substring extraction functions. After running a match, you be referenced by the substring extraction functions. After running a match, you
must not free a compiled pattern (or a subject string) until after all must not free a compiled pattern or a subject string until after all
operations on the operations on the
<a href="#matchdatablock">match data block</a> <a href="#matchdatablock">match data block</a>
have taken place. have taken place, unless, in the case of the subject string, you have used the
PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
"Option bits for <b>pcre2_match()</b>"
<a href="#matchoptions>">below.</a>
</P> </P>
<P> <P>
The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
@ -2419,7 +2422,10 @@ When one of the matching functions is called, pointers to the compiled pattern
and the subject string are set in the match data block so that they can be and the subject string are set in the match data block so that they can be
referenced by the extraction functions. After running a match, you must not referenced by the extraction functions. After running a match, you must not
free a compiled pattern or a subject string until after all operations on the free a compiled pattern or a subject string until after all operations on the
match data block (for that match) have taken place. match data block (for that match) have taken place, unless, in the case of the
subject string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is
described in the section entitled "Option bits for <b>pcre2_match()</b>"
<a href="#matchoptions>">below.</a>
</P> </P>
<P> <P>
When a match data block itself is no longer needed, it should be freed by When a match data block itself is no longer needed, it should be freed by
@ -2531,10 +2537,10 @@ Option bits for <b>pcre2_match()</b>
</b><br> </b><br>
<P> <P>
The unused bits of the <i>options</i> argument for <b>pcre2_match()</b> must be The unused bits of the <i>options</i> argument for <b>pcre2_match()</b> must be
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED, zero. The only bits that may be set are PCRE2_ANCHORED,
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
Their action is described below. PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
</P> </P>
<P> <P>
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
@ -2549,6 +2555,22 @@ matching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out
to be anchored by virtue of its contents, it cannot be made unachored at to be anchored by virtue of its contents, it cannot be made unachored at
matching time. Note that setting the option at match time disables JIT matching time. Note that setting the option at match time disables JIT
matching. matching.
<pre>
PCRE2_COPY_MATCHED_SUBJECT
</pre>
By default, a pointer to the subject is remembered in the match data block so
that, after a successful match, it can be referenced by the substring
extraction functions. This means that the subject's memory must not be freed
until all such operations are complete. For some applications where the
lifetime of the subject string is not guaranteed, it may be necessary to make a
copy of the subject string, but it is wasteful to do this unless the match is
successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
subject is copied and the new pointer is remembered in the match data block
instead of the original subject pointer. The memory allocator that was used for
the match block itself is used. The copy is automatically freed when
<b>pcre2_match_data_free()</b> is called to free the match data block. It is also
automatically freed if the match data block is re-used for another match
operation.
<pre> <pre>
PCRE2_ENDANCHORED PCRE2_ENDANCHORED
</pre> </pre>
@ -2954,7 +2976,8 @@ The backtracking match limit was reached.
If a pattern contains many nested backtracking points, heap memory is used to If a pattern contains many nested backtracking points, heap memory is used to
remember them. This error is given when the memory allocation function (default remember them. This error is given when the memory allocation function (default
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
if the amount of memory needed exceeds the heap limit. if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is
also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
<pre> <pre>
PCRE2_ERROR_NULL PCRE2_ERROR_NULL
</pre> </pre>
@ -3584,11 +3607,12 @@ Option bits for <b>pcre_dfa_match()</b>
</b><br> </b><br>
<P> <P>
The unused bits of the <i>options</i> argument for <b>pcre2_dfa_match()</b> must The unused bits of the <i>options</i> argument for <b>pcre2_dfa_match()</b> must
be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED, be zero. The only bits that may be set are PCRE2_ANCHORED,
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
and PCRE2_DFA_RESTART. All but the last four of these are exactly the same as PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last
for <b>pcre2_match()</b>, so their description is not repeated here. four of these are exactly the same as for <b>pcre2_match()</b>, so their
description is not repeated here.
<pre> <pre>
PCRE2_PARTIAL_HARD PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT PCRE2_PARTIAL_SOFT
@ -3732,7 +3756,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 21 September 2018 Last updated: 16 October 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -147,9 +147,10 @@ pattern.
<br><a name="SEC4" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br> <br><a name="SEC4" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br>
<P> <P>
The <b>pcre2_match()</b> options that are supported for JIT matching are The <b>pcre2_match()</b> options that are supported for JIT matching are
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
PCRE2_ANCHORED option is not supported at match time. PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not
supported at match time.
</P> </P>
<P> <P>
If the PCRE2_NO_JIT option is passed to <b>pcre2_match()</b> it disables the If the PCRE2_NO_JIT option is passed to <b>pcre2_match()</b> it disables the
@ -402,10 +403,13 @@ processed by <b>pcre2_jit_compile()</b>).
</P> </P>
<P> <P>
The fast path function is called <b>pcre2_jit_match()</b>, and it takes exactly The fast path function is called <b>pcre2_jit_match()</b>, and it takes exactly
the same arguments as <b>pcre2_match()</b>. The return values are also the same, the same arguments as <b>pcre2_match()</b>. However, the subject string must be
plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
requested that was not compiled. Unsupported option bits (for example, option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option. PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
return values are also the same as for <b>pcre2_match()</b>, plus
PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
that was not compiled.
</P> </P>
<P> <P>
When you call <b>pcre2_match()</b>, as well as testing for invalid options, a When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
@ -434,7 +438,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC13" href="#TOC1">REVISION</a><br> <br><a name="SEC13" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 28 June 2018 Last updated: 16 October 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -1302,9 +1302,11 @@ COMPILING A PATTERN
NOTE: When one of the matching functions is called, pointers to the NOTE: When one of the matching functions is called, pointers to the
compiled pattern and the subject string are set in the match data block compiled pattern and the subject string are set in the match data block
so that they can be referenced by the substring extraction functions. so that they can be referenced by the substring extraction functions.
After running a match, you must not free a compiled pattern (or a sub- After running a match, you must not free a compiled pattern or a sub-
ject string) until after all operations on the match data block have ject string until after all operations on the match data block have
taken place. taken place, unless, in the case of the subject string, you have used
the PCRE2_COPY_MATCHED_SUBJECT option, which is described in the sec-
tion entitled "Option bits for pcre2_match()" below.
The options argument for pcre2_compile() contains various bit settings The options argument for pcre2_compile() contains various bit settings
that affect the compilation. It should be zero if no options are that affect the compilation. It should be zero if no options are
@ -2388,7 +2390,9 @@ THE MATCH DATA BLOCK
they can be referenced by the extraction functions. After running a they can be referenced by the extraction functions. After running a
match, you must not free a compiled pattern or a subject string until match, you must not free a compiled pattern or a subject string until
after all operations on the match data block (for that match) have after all operations on the match data block (for that match) have
taken place. taken place, unless, in the case of the subject string, you have used
the PCRE2_COPY_MATCHED_SUBJECT option, which is described in the sec-
tion entitled "Option bits for pcre2_match()" below.
When a match data block itself is no longer needed, it should be freed When a match data block itself is no longer needed, it should be freed
by calling pcre2_match_data_free(). If this function is called with a by calling pcre2_match_data_free(). If this function is called with a
@ -2488,10 +2492,11 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
Option bits for pcre2_match() Option bits for pcre2_match()
The unused bits of the options argument for pcre2_match() must be zero. The unused bits of the options argument for pcre2_match() must be zero.
The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED, The only bits that may be set are PCRE2_ANCHORED,
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL,
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PAR- PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT,
TIAL_SOFT. Their action is described below. PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their
action is described below.
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not sup- Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not sup-
ported by the just-in-time (JIT) compiler. If it is set, JIT matching ported by the just-in-time (JIT) compiler. If it is set, JIT matching
@ -2507,6 +2512,23 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
unachored at matching time. Note that setting the option at match time unachored at matching time. Note that setting the option at match time
disables JIT matching. disables JIT matching.
PCRE2_COPY_MATCHED_SUBJECT
By default, a pointer to the subject is remembered in the match data
block so that, after a successful match, it can be referenced by the
substring extraction functions. This means that the subject's memory
must not be freed until all such operations are complete. For some
applications where the lifetime of the subject string is not guaran-
teed, it may be necessary to make a copy of the subject string, but it
is wasteful to do this unless the match is successful. After a success-
ful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the subject is copied
and the new pointer is remembered in the match data block instead of
the original subject pointer. The memory allocator that was used for
the match block itself is used. The copy is automatically freed when
pcre2_match_data_free() is called to free the match data block. It is
also automatically freed if the match data block is re-used for another
match operation.
PCRE2_ENDANCHORED PCRE2_ENDANCHORED
If the PCRE2_ENDANCHORED option is set, any string that pcre2_match() If the PCRE2_ENDANCHORED option is set, any string that pcre2_match()
@ -2881,7 +2903,8 @@ ERROR RETURNS FROM pcre2_match()
used to remember them. This error is given when the memory allocation used to remember them. This error is given when the memory allocation
function (default or custom) fails. Note that a different error, function (default or custom) fails. Note that a different error,
PCRE2_ERROR_HEAPLIMIT, is given if the amount of memory needed exceeds PCRE2_ERROR_HEAPLIMIT, is given if the amount of memory needed exceeds
the heap limit. the heap limit. PCRE2_ERROR_NOMEMORY is also returned if
PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
PCRE2_ERROR_NULL PCRE2_ERROR_NULL
@ -3467,12 +3490,13 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
Option bits for pcre_dfa_match() Option bits for pcre_dfa_match()
The unused bits of the options argument for pcre2_dfa_match() must be The unused bits of the options argument for pcre2_dfa_match() must be
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDAN- zero. The only bits that may be set are PCRE2_ANCHORED,
CHORED, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL,
PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT,
the last four of these are exactly the same as for pcre2_match(), so PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last four of
their description is not repeated here. these are exactly the same as for pcre2_match(), so their description
is not repeated here.
PCRE2_PARTIAL_HARD PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT PCRE2_PARTIAL_SOFT
@ -3607,7 +3631,7 @@ AUTHOR
REVISION REVISION
Last updated: 21 September 2018 Last updated: 16 October 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -4924,9 +4948,10 @@ SIMPLE USE OF JIT
UNSUPPORTED OPTIONS AND PATTERN ITEMS UNSUPPORTED OPTIONS AND PATTERN ITEMS
The pcre2_match() options that are supported for JIT matching are The pcre2_match() options that are supported for JIT matching are
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
PCRE2_ANCHORED option is not supported at match time. PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options
are not supported at match time.
If the PCRE2_NO_JIT option is passed to pcre2_match() it disables the If the PCRE2_NO_JIT option is passed to pcre2_match() it disables the
use of JIT, forcing matching by the interpreter code. use of JIT, forcing matching by the interpreter code.
@ -5164,11 +5189,13 @@ JIT FAST PATH API
patterns that have been successfully processed by pcre2_jit_compile()). patterns that have been successfully processed by pcre2_jit_compile()).
The fast path function is called pcre2_jit_match(), and it takes The fast path function is called pcre2_jit_match(), and it takes
exactly the same arguments as pcre2_match(). The return values are also exactly the same arguments as pcre2_match(). However, the subject
the same, plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or string must be specified with a length; PCRE2_ZERO_TERMINATED is not
complete) is requested that was not compiled. Unsupported option bits supported. Unsupported option bits (for example, PCRE2_ANCHORED,
(for example, PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT PCRE2_ENDANCHORED and PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is
option. the PCRE2_NO_JIT option. The return values are also the same as for
pcre2_match(), plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (par-
tial or complete) is requested that was not compiled.
When you call pcre2_match(), as well as testing for invalid options, a When you call pcre2_match(), as well as testing for invalid options, a
number of other sanity checks are performed on the arguments. For exam- number of other sanity checks are performed on the arguments. For exam-
@ -5195,7 +5222,7 @@ AUTHOR
REVISION REVISION
Last updated: 28 June 2018 Last updated: 16 October 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_DFA_MATCH 3 "26 April 2018" "PCRE2 10.32" .TH PCRE2_DFA_MATCH 3 "16 October 2018" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -39,6 +39,8 @@ depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not
characters. The options are: characters. The options are:
.sp .sp
PCRE2_ANCHORED Match only at the first position PCRE2_ANCHORED Match only at the first position
PCRE2_COPY_MATCHED_SUBJECT
On success, make a private subject copy
PCRE2_ENDANCHORED Pattern can match only at end of subject PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_NOTBOL Subject is not the beginning of a line PCRE2_NOTBOL Subject is not the beginning of a line
PCRE2_NOTEOL Subject is not the end of a line PCRE2_NOTEOL Subject is not the end of a line

View File

@ -1,4 +1,4 @@
.TH PCRE2_MATCH 3 "14 November 2017" "PCRE2 10.31" .TH PCRE2_MATCH 3 "16 October 2018" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -43,11 +43,13 @@ A match context is needed only if you want to:
Change the backtracking depth limit Change the backtracking depth limit
Set custom memory management specifically for the match Set custom memory management specifically for the match
.sp .sp
The \fIlength\fP and \fIstartoffset\fP values are code The \fIlength\fP and \fIstartoffset\fP values are code units, not characters.
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a The length may be given as PCRE2_ZERO_TERMINATED for a subject that is
subject that is terminated by a binary zero code unit. The options are: terminated by a binary zero code unit. The options are:
.sp .sp
PCRE2_ANCHORED Match only at the first position PCRE2_ANCHORED Match only at the first position
PCRE2_COPY_MATCHED_SUBJECT
On success, make a private subject copy
PCRE2_ENDANCHORED Pattern can match only at end of subject PCRE2_ENDANCHORED Pattern can match only at end of subject
PCRE2_NOTBOL Subject string is not the beginning of a line PCRE2_NOTBOL Subject string is not the beginning of a line
PCRE2_NOTEOL Subject string is not the end of a line PCRE2_NOTEOL Subject string is not the end of a line

View File

@ -1,4 +1,4 @@
.TH PCRE2_MATCH_DATA_FREE 3 "28 June 2018" "PCRE2 10.32" .TH PCRE2_MATCH_DATA_FREE 3 "16 October 2018" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -18,6 +18,10 @@ If \fImatch_data\fP is NULL, this function does nothing. Otherwise,
using the memory freeing function from the general context or compiled pattern using the memory freeing function from the general context or compiled pattern
with which it was created, or \fBfree()\fP if that was not set. with which it was created, or \fBfree()\fP if that was not set.
.P .P
If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this
match data block, the copy of the subject that was remembered with the block is
also freed.
.P
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API in the
.\" HREF .\" HREF
\fBpcre2api\fP \fBpcre2api\fP

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "21 September 2018" "PCRE2 10.33" .TH PCRE2API 3 "16 October 2018" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -1237,13 +1237,19 @@ NULL.
NOTE: When one of the matching functions is called, pointers to the compiled NOTE: When one of the matching functions is called, pointers to the compiled
pattern and the subject string are set in the match data block so that they can pattern and the subject string are set in the match data block so that they can
be referenced by the substring extraction functions. After running a match, you be referenced by the substring extraction functions. After running a match, you
must not free a compiled pattern (or a subject string) until after all must not free a compiled pattern or a subject string until after all
operations on the operations on the
.\" HTML <a href="#matchdatablock"> .\" HTML <a href="#matchdatablock">
.\" </a> .\" </a>
match data block match data block
.\" .\"
have taken place. have taken place, unless, in the case of the subject string, you have used the
PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
"Option bits for \fBpcre2_match()\fP"
.\" HTML <a href="#matchoptions>">
.\" </a>
below.
.\"
.P .P
The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit
settings that affect the compilation. It should be zero if no options are settings that affect the compilation. It should be zero if no options are
@ -2390,7 +2396,13 @@ When one of the matching functions is called, pointers to the compiled pattern
and the subject string are set in the match data block so that they can be and the subject string are set in the match data block so that they can be
referenced by the extraction functions. After running a match, you must not referenced by the extraction functions. After running a match, you must not
free a compiled pattern or a subject string until after all operations on the free a compiled pattern or a subject string until after all operations on the
match data block (for that match) have taken place. match data block (for that match) have taken place, unless, in the case of the
subject string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is
described in the section entitled "Option bits for \fBpcre2_match()\fP"
.\" HTML <a href="#matchoptions>">
.\" </a>
below.
.\"
.P .P
When a match data block itself is no longer needed, it should be freed by When a match data block itself is no longer needed, it should be freed by
calling \fBpcre2_match_data_free()\fP. If this function is called with a NULL calling \fBpcre2_match_data_free()\fP. If this function is called with a NULL
@ -2507,10 +2519,10 @@ the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
.rs .rs
.sp .sp
The unused bits of the \fIoptions\fP argument for \fBpcre2_match()\fP must be The unused bits of the \fIoptions\fP argument for \fBpcre2_match()\fP must be
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED, zero. The only bits that may be set are PCRE2_ANCHORED,
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK,
Their action is described below. PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
.P .P
Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the
@ -2524,6 +2536,22 @@ matching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out
to be anchored by virtue of its contents, it cannot be made unachored at to be anchored by virtue of its contents, it cannot be made unachored at
matching time. Note that setting the option at match time disables JIT matching time. Note that setting the option at match time disables JIT
matching. matching.
.sp
PCRE2_COPY_MATCHED_SUBJECT
.sp
By default, a pointer to the subject is remembered in the match data block so
that, after a successful match, it can be referenced by the substring
extraction functions. This means that the subject's memory must not be freed
until all such operations are complete. For some applications where the
lifetime of the subject string is not guaranteed, it may be necessary to make a
copy of the subject string, but it is wasteful to do this unless the match is
successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
subject is copied and the new pointer is remembered in the match data block
instead of the original subject pointer. The memory allocator that was used for
the match block itself is used. The copy is automatically freed when
\fBpcre2_match_data_free()\fP is called to free the match data block. It is also
automatically freed if the match data block is re-used for another match
operation.
.sp .sp
PCRE2_ENDANCHORED PCRE2_ENDANCHORED
.sp .sp
@ -2961,7 +2989,8 @@ The backtracking match limit was reached.
If a pattern contains many nested backtracking points, heap memory is used to If a pattern contains many nested backtracking points, heap memory is used to
remember them. This error is given when the memory allocation function (default remember them. This error is given when the memory allocation function (default
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
if the amount of memory needed exceeds the heap limit. if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is
also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
.sp .sp
PCRE2_ERROR_NULL PCRE2_ERROR_NULL
.sp .sp
@ -3579,11 +3608,12 @@ Here is an example of a simple call to \fBpcre2_dfa_match()\fP:
.rs .rs
.sp .sp
The unused bits of the \fIoptions\fP argument for \fBpcre2_dfa_match()\fP must The unused bits of the \fIoptions\fP argument for \fBpcre2_dfa_match()\fP must
be zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_ENDANCHORED, be zero. The only bits that may be set are PCRE2_ANCHORED,
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL,
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD,
and PCRE2_DFA_RESTART. All but the last four of these are exactly the same as PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last
for \fBpcre2_match()\fP, so their description is not repeated here. four of these are exactly the same as for \fBpcre2_match()\fP, so their
description is not repeated here.
.sp .sp
PCRE2_PARTIAL_HARD PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT PCRE2_PARTIAL_SOFT
@ -3737,6 +3767,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 21 September 2018 Last updated: 16 October 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2JIT 3 "28 June 2018" "PCRE2 10.32" .TH PCRE2JIT 3 "16 October 2018" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT" .SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT"
@ -124,9 +124,10 @@ pattern.
.rs .rs
.sp .sp
The \fBpcre2_match()\fP options that are supported for JIT matching are The \fBpcre2_match()\fP options that are supported for JIT matching are
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY,
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and
PCRE2_ANCHORED option is not supported at match time. PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not
supported at match time.
.P .P
If the PCRE2_NO_JIT option is passed to \fBpcre2_match()\fP it disables the If the PCRE2_NO_JIT option is passed to \fBpcre2_match()\fP it disables the
use of JIT, forcing matching by the interpreter code. use of JIT, forcing matching by the interpreter code.
@ -376,10 +377,13 @@ available, and which need the best possible performance, can instead use a
processed by \fBpcre2_jit_compile()\fP). processed by \fBpcre2_jit_compile()\fP).
.P .P
The fast path function is called \fBpcre2_jit_match()\fP, and it takes exactly The fast path function is called \fBpcre2_jit_match()\fP, and it takes exactly
the same arguments as \fBpcre2_match()\fP. The return values are also the same, the same arguments as \fBpcre2_match()\fP. However, the subject string must be
plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported
requested that was not compiled. Unsupported option bits (for example, option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and
PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option. PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The
return values are also the same as for \fBpcre2_match()\fP, plus
PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested
that was not compiled.
.P .P
When you call \fBpcre2_match()\fP, as well as testing for invalid options, a When you call \fBpcre2_match()\fP, as well as testing for invalid options, a
number of other sanity checks are performed on the arguments. For example, if number of other sanity checks are performed on the arguments. For example, if
@ -412,6 +416,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 28 June 2018 Last updated: 16 October 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
.fi .fi

View File

@ -167,10 +167,11 @@ D is inspected during pcre2_dfa_match() execution
#define PCRE2_JIT_PARTIAL_HARD 0x00000004u #define PCRE2_JIT_PARTIAL_HARD 0x00000004u
#define PCRE2_JIT_INVALID_UTF 0x00000100u #define PCRE2_JIT_INVALID_UTF 0x00000100u
/* These are for pcre2_match(), pcre2_dfa_match(), and pcre2_jit_match(). Note /* These are for pcre2_match(), pcre2_dfa_match(), pcre2_jit_match(), and
that PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK can also be passed to these pcre2_substitute(). Some are allowed only for one of the functions, and in
functions (though pcre2_jit_match() ignores the latter since it bypasses all these cases it is noted below. Note that PCRE2_ANCHORED, PCRE2_ENDANCHORED and
sanity checks). */ PCRE2_NO_UTF_CHECK can also be passed to these functions (though
pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */
#define PCRE2_NOTBOL 0x00000001u #define PCRE2_NOTBOL 0x00000001u
#define PCRE2_NOTEOL 0x00000002u #define PCRE2_NOTEOL 0x00000002u
@ -178,25 +179,15 @@ sanity checks). */
#define PCRE2_NOTEMPTY_ATSTART 0x00000008u /* ) adjacent to each other. */ #define PCRE2_NOTEMPTY_ATSTART 0x00000008u /* ) adjacent to each other. */
#define PCRE2_PARTIAL_SOFT 0x00000010u #define PCRE2_PARTIAL_SOFT 0x00000010u
#define PCRE2_PARTIAL_HARD 0x00000020u #define PCRE2_PARTIAL_HARD 0x00000020u
#define PCRE2_DFA_RESTART 0x00000040u /* pcre2_dfa_match() only */
/* These are additional options for pcre2_dfa_match(). */ #define PCRE2_DFA_SHORTEST 0x00000080u /* pcre2_dfa_match() only */
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u /* pcre2_substitute() only */
#define PCRE2_DFA_RESTART 0x00000040u #define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u /* pcre2_substitute() only */
#define PCRE2_DFA_SHORTEST 0x00000080u #define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u /* pcre2_substitute() only */
#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u /* pcre2_substitute() only */
/* These are additional options for pcre2_substitute(), which passes any others #define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u /* pcre2_substitute() only */
through to pcre2_match(). */ #define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u
#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u
#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u
/* A further option for pcre2_match(), not allowed for pcre2_dfa_match(),
ignored for pcre2_jit_match(). */
#define PCRE2_NO_JIT 0x00002000u
/* Options for pcre2_pattern_convert(). */ /* Options for pcre2_pattern_convert(). */

View File

@ -85,7 +85,8 @@ in others, so I abandoned this code. */
#define PUBLIC_DFA_MATCH_OPTIONS \ #define PUBLIC_DFA_MATCH_OPTIONS \
(PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \ (PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \ PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART) PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART| \
PCRE2_COPY_MATCHED_SUBJECT)
/************************************************* /*************************************************
@ -3228,6 +3229,8 @@ pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount) pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount)
{ {
int rc; int rc;
int was_zero_terminated = 0;
const pcre2_real_code *re = (const pcre2_real_code *)code; const pcre2_real_code *re = (const pcre2_real_code *)code;
PCRE2_SPTR start_match; PCRE2_SPTR start_match;
@ -3267,7 +3270,11 @@ rws->free = RWS_BASE_SIZE - RWS_ANCHOR_SIZE;
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated /* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
subject string. */ subject string. */
if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject); if (length == PCRE2_ZERO_TERMINATED)
{
length = PRIV(strlen)(subject);
was_zero_terminated = 1;
}
/* Plausibility checks */ /* Plausibility checks */
@ -3520,10 +3527,21 @@ if ((re->flags & PCRE2_LASTSET) != 0)
} }
} }
/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT,
free the memory that was obtained. */
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
{
match_data->memctl.free((void *)match_data->subject,
match_data->memctl.memory_data);
match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT;
}
/* Fill in fields that are always returned in the match data. */ /* Fill in fields that are always returned in the match data. */
match_data->code = re; match_data->code = re;
match_data->subject = subject; match_data->subject = subject;
match_data->flags = 0;
match_data->mark = NULL; match_data->mark = NULL;
match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER; match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER;
@ -3818,6 +3836,17 @@ for (;;)
match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject); match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject);
match_data->startchar = (PCRE2_SIZE)(start_match - subject); match_data->startchar = (PCRE2_SIZE)(start_match - subject);
match_data->rc = rc; match_data->rc = rc;
if (rc >= 0 &&(options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
{
length = CU2BYTES(length + was_zero_terminated);
match_data->subject = match_data->memctl.malloc(length,
match_data->memctl.memory_data);
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
memcpy((void *)match_data->subject, subject, length);
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
}
goto EXIT; goto EXIT;
} }

View File

@ -535,6 +535,10 @@ enum { PCRE2_MATCHEDBY_INTERPRETER, /* pcre2_match() */
PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */ PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */
PCRE2_MATCHEDBY_JIT }; /* pcre2_jit_match() */ PCRE2_MATCHEDBY_JIT }; /* pcre2_jit_match() */
/* Values for the flags field in a match data block. */
#define PCRE2_MD_COPIED_SUBJECT 0x01u
/* Magic number to provide a small check against being handed junk. */ /* Magic number to provide a small check against being handed junk. */
#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */ #define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */

View File

@ -658,7 +658,8 @@ typedef struct pcre2_real_match_data {
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */ PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */ PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
PCRE2_SIZE startchar; /* Offset to starting code unit */ PCRE2_SIZE startchar; /* Offset to starting code unit */
uint16_t matchedby; /* Type of match (normal, JIT, DFA) */ uint8_t matchedby; /* Type of match (normal, JIT, DFA) */
uint8_t flags; /* Various flags */
uint16_t oveccount; /* Number of pairs */ uint16_t oveccount; /* Number of pairs */
int rc; /* The return code from the match */ int rc; /* The return code from the match */
PCRE2_SIZE ovector[131072]; /* Must be last in the structure */ PCRE2_SIZE ovector[131072]; /* Must be last in the structure */

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016 University of Cambridge New API code Copyright (c) 2016-2018 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -174,6 +174,7 @@ if (rc > (int)oveccount)
rc = 0; rc = 0;
match_data->code = re; match_data->code = re;
match_data->subject = subject; match_data->subject = subject;
match_data->flags = 0;
match_data->rc = rc; match_data->rc = rc;
match_data->startchar = arguments.startchar_ptr - subject; match_data->startchar = arguments.startchar_ptr - subject;
match_data->leftchar = 0; match_data->leftchar = 0;

View File

@ -69,11 +69,12 @@ information, and fields within it. */
#define PUBLIC_MATCH_OPTIONS \ #define PUBLIC_MATCH_OPTIONS \
(PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \ (PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \
PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \ PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \
PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT) PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT|PCRE2_COPY_MATCHED_SUBJECT)
#define PUBLIC_JIT_MATCH_OPTIONS \ #define PUBLIC_JIT_MATCH_OPTIONS \
(PCRE2_NO_UTF_CHECK|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY|\ (PCRE2_NO_UTF_CHECK|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY|\
PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_SOFT|PCRE2_PARTIAL_HARD) PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_SOFT|PCRE2_PARTIAL_HARD|\
PCRE2_COPY_MATCHED_SUBJECT)
/* Non-error returns from and within the match() function. Error returns are /* Non-error returns from and within the match() function. Error returns are
externally defined PCRE2_ERROR_xxx codes, which are all negative. */ externally defined PCRE2_ERROR_xxx codes, which are all negative. */
@ -6009,10 +6010,11 @@ pcre2_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length,
pcre2_match_context *mcontext) pcre2_match_context *mcontext)
{ {
int rc; int rc;
int was_zero_terminated = 0;
const uint8_t *start_bits = NULL; const uint8_t *start_bits = NULL;
const pcre2_real_code *re = (const pcre2_real_code *)code; const pcre2_real_code *re = (const pcre2_real_code *)code;
BOOL anchored; BOOL anchored;
BOOL firstline; BOOL firstline;
BOOL has_first_cu = FALSE; BOOL has_first_cu = FALSE;
@ -6052,7 +6054,11 @@ mb->stack_frames = (heapframe *)stack_frames_vector;
/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated /* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated
subject string. */ subject string. */
if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject); if (length == PCRE2_ZERO_TERMINATED)
{
length = PRIV(strlen)(subject);
was_zero_terminated = 1;
}
end_subject = subject + length; end_subject = subject + length;
/* Plausibility checks */ /* Plausibility checks */
@ -6167,6 +6173,16 @@ if (mcontext != NULL && mcontext->offset_limit != PCRE2_UNSET &&
(re->overall_options & PCRE2_USE_OFFSET_LIMIT) == 0) (re->overall_options & PCRE2_USE_OFFSET_LIMIT) == 0)
return PCRE2_ERROR_BADOFFSETLIMIT; return PCRE2_ERROR_BADOFFSETLIMIT;
/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT,
free the memory that was obtained. */
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
{
match_data->memctl.free((void *)match_data->subject,
match_data->memctl.memory_data);
match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT;
}
/* If the pattern was successfully studied with JIT support, run the JIT /* If the pattern was successfully studied with JIT support, run the JIT
executable instead of the rest of this function. Most options must be set at executable instead of the rest of this function. Most options must be set at
compile time for the JIT code to be usable. Fallback to the normal code path if compile time for the JIT code to be usable. Fallback to the normal code path if
@ -6178,7 +6194,19 @@ if (re->executable_jit != NULL && (options & ~PUBLIC_JIT_MATCH_OPTIONS) == 0)
{ {
rc = pcre2_jit_match(code, subject, length, start_offset, options, rc = pcre2_jit_match(code, subject, length, start_offset, options,
match_data, mcontext); match_data, mcontext);
if (rc != PCRE2_ERROR_JIT_BADOPTION) return rc; if (rc != PCRE2_ERROR_JIT_BADOPTION)
{
if (rc >= 0 && (options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
{
length = CU2BYTES(length + was_zero_terminated);
match_data->subject = match_data->memctl.malloc(length,
match_data->memctl.memory_data);
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
memcpy((void *)match_data->subject, subject, length);
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
}
return rc;
}
} }
#endif #endif
@ -6819,12 +6847,14 @@ if (mb->match_frames != mb->stack_frames)
match_data->code = re; match_data->code = re;
match_data->subject = subject; match_data->subject = subject;
match_data->flags = 0;
match_data->mark = mb->mark; match_data->mark = mb->mark;
match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER; match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER;
/* Handle a fully successful match. Set the return code to the number of /* Handle a fully successful match. Set the return code to the number of
captured strings, or 0 if there were too many to fit into the ovector, and then captured strings, or 0 if there were too many to fit into the ovector, and then
set the remaining returned values before returning. */ set the remaining returned values before returning. Make a copy of the subject
string if requested. */
if (rc == MATCH_MATCH) if (rc == MATCH_MATCH)
{ {
@ -6834,6 +6864,17 @@ if (rc == MATCH_MATCH)
match_data->leftchar = mb->start_used_ptr - subject; match_data->leftchar = mb->start_used_ptr - subject;
match_data->rightchar = ((mb->last_used_ptr > mb->end_match_ptr)? match_data->rightchar = ((mb->last_used_ptr > mb->end_match_ptr)?
mb->last_used_ptr : mb->end_match_ptr) - subject; mb->last_used_ptr : mb->end_match_ptr) - subject;
if ((options & PCRE2_COPY_MATCHED_SUBJECT) != 0)
{
length = CU2BYTES(length + was_zero_terminated);
match_data->subject = match_data->memctl.malloc(length,
match_data->memctl.memory_data);
if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY;
memcpy((void *)match_data->subject, subject, length);
match_data->flags |= PCRE2_MD_COPIED_SUBJECT;
}
return match_data->rc; return match_data->rc;
} }

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2017 University of Cambridge New API code Copyright (c) 2016-2018 University of Cambridge
----------------------------------------------------------------------------- -----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without
@ -63,6 +63,7 @@ yield = PRIV(memctl_malloc)(
(pcre2_memctl *)gcontext); (pcre2_memctl *)gcontext);
if (yield == NULL) return NULL; if (yield == NULL) return NULL;
yield->oveccount = oveccount; yield->oveccount = oveccount;
yield->flags = 0;
return yield; return yield;
} }
@ -93,8 +94,13 @@ PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION
pcre2_match_data_free(pcre2_match_data *match_data) pcre2_match_data_free(pcre2_match_data *match_data)
{ {
if (match_data != NULL) if (match_data != NULL)
{
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
match_data->memctl.free((void *)match_data->subject,
match_data->memctl.memory_data);
match_data->memctl.free(match_data, match_data->memctl.memory_data); match_data->memctl.free(match_data, match_data->memctl.memory_data);
} }
}

View File

@ -620,6 +620,7 @@ static modstruct modlist[] = {
{ "convert_glob_separator", MOD_PAT, MOD_CHR, 0, PO(convert_glob_separator) }, { "convert_glob_separator", MOD_PAT, MOD_CHR, 0, PO(convert_glob_separator) },
{ "convert_length", MOD_PAT, MOD_INT, 0, PO(convert_length) }, { "convert_length", MOD_PAT, MOD_INT, 0, PO(convert_length) },
{ "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) }, { "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) },
{ "copy_matched_subject", MOD_DAT, MOD_OPT, PCRE2_COPY_MATCHED_SUBJECT, DO(options) },
{ "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) }, { "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) },
{ "depth_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) }, { "depth_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) },
{ "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) }, { "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) },
@ -4196,11 +4197,13 @@ else fprintf(outfile, "%s%s%s%s%s%s%s",
static void static void
show_match_options(uint32_t options) show_match_options(uint32_t options)
{ {
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s", fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s",
((options & PCRE2_ANCHORED) != 0)? " anchored" : "", ((options & PCRE2_ANCHORED) != 0)? " anchored" : "",
((options & PCRE2_COPY_MATCHED_SUBJECT) != 0)? " copy_matched_subject" : "",
((options & PCRE2_DFA_RESTART) != 0)? " dfa_restart" : "", ((options & PCRE2_DFA_RESTART) != 0)? " dfa_restart" : "",
((options & PCRE2_DFA_SHORTEST) != 0)? " dfa_shortest" : "", ((options & PCRE2_DFA_SHORTEST) != 0)? " dfa_shortest" : "",
((options & PCRE2_ENDANCHORED) != 0)? " endanchored" : "", ((options & PCRE2_ENDANCHORED) != 0)? " endanchored" : "",
((options & PCRE2_NO_JIT) != 0)? " no_jit" : "",
((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "", ((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "",
((options & PCRE2_NOTBOL) != 0)? " notbol" : "", ((options & PCRE2_NOTBOL) != 0)? " notbol" : "",
((options & PCRE2_NOTEMPTY) != 0)? " notempty" : "", ((options & PCRE2_NOTEMPTY) != 0)? " notempty" : "",
@ -7442,6 +7445,25 @@ for (gmatched = 0;; gmatched++)
} }
} }
/* If PCRE2_COPY_MATCHED_SUBJECT was set, check that things are as they
should be, but not for fast JIT, where it isn't supported. */
if ((dat_datctl.options & PCRE2_COPY_MATCHED_SUBJECT) != 0 &&
(pat_patctl.control & CTL_JITFAST) == 0)
{
if ((FLD(match_data, flags) & PCRE2_MD_COPIED_SUBJECT) == 0)
fprintf(outfile,
"** PCRE2 error: flag not set after copy_matched_subject\n");
if (CASTFLD(void *, match_data, subject) == pp)
fprintf(outfile,
"** PCRE2 error: copy_matched_subject has not copied\n");
if (memcmp(CASTFLD(void *, match_data, subject), pp, ulen) != 0)
fprintf(outfile,
"** PCRE2 error: copy_matched_subject mismatch\n");
}
/* If this is not the first time round a global loop, check that the /* If this is not the first time round a global loop, check that the
returned string has changed. If it has not, check for an empty string match returned string has changed. If it has not, check for an empty string match
at different starting offset from the previous match. This is a failed test at different starting offset from the previous match. This is a failed test

7
testdata/testinput2 vendored
View File

@ -5531,4 +5531,11 @@ a)"xI
/(?(*script_run:xxx)zzz)/ /(?(*script_run:xxx)zzz)/
/foobar/
the foobar thing\=copy_matched_subject
the foobar thing\=copy_matched_subject,zero_terminate
/foobar/g
the foobar thing foobar again\=copy_matched_subject
# End of testinput2 # End of testinput2

7
testdata/testinput6 vendored
View File

@ -4955,4 +4955,11 @@
\= Expect no match \= Expect no match
\na \na
/foobar/
the foobar thing\=copy_matched_subject
the foobar thing\=copy_matched_subject,zero_terminate
/foobar/g
the foobar thing foobar again\=copy_matched_subject
# End of testinput6 # End of testinput6

11
testdata/testoutput2 vendored
View File

@ -16821,6 +16821,17 @@ Failed: error 128 at offset 10: assertion expected after (?( or (?(?C)
/(?(*script_run:xxx)zzz)/ /(?(*script_run:xxx)zzz)/
Failed: error 128 at offset 14: assertion expected after (?( or (?(?C) Failed: error 128 at offset 14: assertion expected after (?( or (?(?C)
/foobar/
the foobar thing\=copy_matched_subject
0: foobar
the foobar thing\=copy_matched_subject,zero_terminate
0: foobar
/foobar/g
the foobar thing foobar again\=copy_matched_subject
0: foobar
0: foobar
# End of testinput2 # End of testinput2
Error -70: PCRE2_ERROR_BADDATA (unknown error number) Error -70: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data Error -62: bad serialized data

11
testdata/testoutput6 vendored
View File

@ -7783,4 +7783,15 @@ No match
\na \na
No match No match
/foobar/
the foobar thing\=copy_matched_subject
0: foobar
the foobar thing\=copy_matched_subject,zero_terminate
0: foobar
/foobar/g
the foobar thing foobar again\=copy_matched_subject
0: foobar
0: foobar
# End of testinput6 # End of testinput6