Implement PCRE2_SUBSTITUTE_MATCHED.

This commit is contained in:
Philip.Hazel 2019-12-27 13:35:17 +00:00
parent 777582d4de
commit d170829b26
11 changed files with 343 additions and 225 deletions

View File

@ -26,6 +26,8 @@ now correctly backtracked, so this unnecessary restriction has been removed.
6. Avoid some VS compiler warnings.
7. Added PCRE2_SUBSTITUTE_MATCHED.
Version 10.34 21-November-2019
------------------------------

View File

@ -48,8 +48,8 @@ Its arguments are:
<i>outlengthptr</i> Points to the length of the output buffer
</pre>
A match data block is needed only if you want to inspect the data from the
match that is returned in that block. A match context is needed only if you
want to:
match that is returned in that block or if PCRE2_SUBSTITUTE_MATCHED is set. A
match context is needed only if you want to:
<pre>
Set up a callout function
Set a matching offset limit
@ -75,16 +75,17 @@ zero-terminated strings. The options are:
PCRE2_SUBSTITUTE_EXTENDED Do extended replacement processing
PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
</pre>
PCRE2_SUBSTITUTE_LITERAL overrides PCRE2_SUBSTITUTE_EXTENDED,
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY.
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED,
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored.
</P>
<P>
The function returns the number of substitutions, which may be zero if there
were no matches. The result can be greater than one only when
are no matches. The result may be greater than one only when
PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
is returned.
</P>

View File

@ -3302,12 +3302,19 @@ same number causes an error at compile time.
<b> PCRE2_SIZE *<i>outlengthptr</i>);</b>
</P>
<P>
This function calls <b>pcre2_match()</b> and then makes a copy of the subject
string in <i>outputbuffer</i>, replacing one or more parts that were matched
with the <i>replacement</i> string, whose length is supplied in <b>rlength</b>.
This can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
The default is to perform just one replacement, but there is an option that
requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below for details).
This function optionally calls <b>pcre2_match()</b> and then makes a copy of the
subject string in <i>outputbuffer</i>, replacing parts that were matched with
the <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
is to perform just one replacement if the pattern matches, but there is an
option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
for details).
</P>
<P>
If successful, <b>pcre2_substitute()</b> returns the number of substitutions
that were carried out. This may be zero if no match was found, and is never
greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
returned if an error is detected (see below for details).
</P>
<P>
Matches in which a \K item in a lookahead in the pattern causes the match to
@ -3327,14 +3334,31 @@ allocate memory for the compiled code.
<P>
If an external <i>match_data</i> block is provided, its contents afterwards
are those set by the final call to <b>pcre2_match()</b>. For global changes,
this will have ended in a matching error. The contents of the ovector within
this will have ended in a no-match error. The contents of the ovector within
the match data block may or may not have been changed.
</P>
<P>
The <i>outlengthptr</i> argument must point to a variable that contains the
length, in code units, of the output buffer. If the function is successful, the
value is updated to contain the length of the new string, excluding the
trailing zero that is automatically added.
As well as the usual options for <b>pcre2_match()</b>, a number of additional
options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
<i>match_data</i> block must be provided, and it must have been used for an
external call to <b>pcre2_match()</b>. The data in the <i>match_data</i> block
(return code, offset vector) is used for the first substitution instead of
calling <b>pcre2_match()</b> from within <b>pcre2_substitute()</b>. This allows
an application to check for a match before choosing to substitute, without
having to repeat the match.
</P>
<P>
The <i>code</i> argument is not used for the first substitution, but if
PCRE2_SUBSTITUTE_GLOBAL is set, <b>pcre2_match()</b> will be called after the
first substitution to check for further matches, and the contents of the
<i>match_data</i> block will be changed.
</P>
<P>
The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
variable that contains the length, in code units, of the output buffer. If the
function is successful, the value is updated to contain the length of the new
string, excluding the trailing zero that is automatically added.
</P>
<P>
If the function is not successful, the value set via <i>outlengthptr</i> depends
@ -3353,7 +3377,7 @@ The replacement string, which is interpreted as a UTF string in UTF mode,
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
default, however, a dollar character is an escape character that can specify
the insertion of characters from capture groups or names from (*MARK) or other
the insertion of characters from capture groups and names from (*MARK) or other
control verbs in the pattern. The following forms are always recognized:
<pre>
$$ insert a dollar character
@ -3378,16 +3402,6 @@ facility can be used to perform simple simultaneous substitutions, as this
apple lemon
2: pear orange
</pre>
As well as the usual options for <b>pcre2_match()</b>, a number of additional
options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
</P>
<P>
As mentioned above, PCRE2_SUBSTITUTE_LITERAL causes the replacement string to
be treated as a literal, with no interpretation. If this option is set,
PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and
PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
</P>
<P>
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
replacing every matching substring. If this option is not set, only the first
matching substring is replaced. The search for matches takes place in the
@ -3501,14 +3515,17 @@ substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
groups in the extended syntax forms to be treated as unset.
</P>
<P>
If successful, <b>pcre2_substitute()</b> returns the number of successful
matches. This may be zero if no matches were found, and is never greater than 1
unless PCRE2_SUBSTITUTE_GLOBAL is set.
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and
are ignored.
</P>
<br><b>
Substitution errors
</b><br>
<P>
In the event of an error, a negative error code is returned. Except for
PCRE2_ERROR_NOMATCH (which is never returned), errors from <b>pcre2_match()</b>
are passed straight back.
In the event of an error, <b>pcre2_substitute()</b> returns a negative error
code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from
<b>pcre2_match()</b> are passed straight back.
</P>
<P>
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
@ -3526,6 +3543,10 @@ needed is returned via <i>outlengthptr</i>. Note that this does not happen by
default.
</P>
<P>
PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
<i>match_data</i> argument is NULL.
</P>
<P>
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
(invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket
@ -3876,7 +3897,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 26 December 2019
Last updated: 27 December 2019
<br>
Copyright &copy; 1997-2019 University of Cambridge.
<br>

View File

@ -3193,13 +3193,18 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer,
PCRE2_SIZE *outlengthptr);
This function calls pcre2_match() and then makes a copy of the subject
string in outputbuffer, replacing one or more parts that were matched
This function optionally calls pcre2_match() and then makes a copy of
the subject string in outputbuffer, replacing parts that were matched
with the replacement string, whose length is supplied in rlength. This
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
The default is to perform just one replacement, but there is an option
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
for details).
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The
default is to perform just one replacement if the pattern matches, but
there is an option that requests multiple replacements (see PCRE2_SUB-
STITUTE_GLOBAL below for details).
If successful, pcre2_substitute() returns the number of substitutions
that were carried out. This may be zero if no match was found, and is
never greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A nega-
tive value is returned if an error is detected (see below for details).
Matches in which a \K item in a lookahead in the pattern causes the
match to end before it starts are not supported, and give rise to an
@ -3216,13 +3221,29 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
If an external match_data block is provided, its contents afterwards
are those set by the final call to pcre2_match(). For global changes,
this will have ended in a matching error. The contents of the ovector
this will have ended in a no-match error. The contents of the ovector
within the match data block may or may not have been changed.
The outlengthptr argument must point to a variable that contains the
length, in code units, of the output buffer. If the function is suc-
cessful, the value is updated to contain the length of the new string,
excluding the trailing zero that is automatically added.
As well as the usual options for pcre2_match(), a number of additional
options can be set in the options argument of pcre2_substitute(). One
such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
match_data block must be provided, and it must have been used for an
external call to pcre2_match(). The data in the match_data block (re-
turn code, offset vector) is used for the first substitution instead of
calling pcre2_match() from within pcre2_substitute(). This allows an
application to check for a match before choosing to substitute, without
having to repeat the match.
The code argument is not used for the first substitution, but if
PCRE2_SUBSTITUTE_GLOBAL is set, pcre2_match() will be called after the
first substitution to check for further matches, and the contents of
the match_data block will be changed.
The outlengthptr argument of pcre2_substitute() must point to a vari-
able that contains the length, in code units, of the output buffer. If
the function is successful, the value is updated to contain the length
of the new string, excluding the trailing zero that is automatically
added.
If the function is not successful, the value set via outlengthptr de-
pends on the type of error. For syntax errors in the replacement
@ -3241,8 +3262,8 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
is set. If the PCRE2_SUBSTITUTE_LITERAL option is set, it is not inter-
preted in any way. By default, however, a dollar character is an escape
character that can specify the insertion of characters from capture
groups or names from (*MARK) or other control verbs in the pattern. The
following forms are always recognized:
groups and names from (*MARK) or other control verbs in the pattern.
The following forms are always recognized:
$$ insert a dollar character
$<n> or ${<n>} insert the contents of group <n>
@ -3266,14 +3287,6 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
apple lemon
2: pear orange
As well as the usual options for pcre2_match(), a number of additional
options can be set in the options argument of pcre2_substitute().
As mentioned above, PCRE2_SUBSTITUTE_LITERAL causes the replacement
string to be treated as a literal, with no interpretation. If this op-
tion is set, PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
and PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
string, replacing every matching substring. If this option is not set,
only the first matching substring is replaced. The search for matches
@ -3384,13 +3397,15 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause un-
known groups in the extended syntax forms to be treated as unset.
If successful, pcre2_substitute() returns the number of successful
matches. This may be zero if no matches were found, and is never
greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrele-
vant and are ignored.
In the event of an error, a negative error code is returned. Except for
PCRE2_ERROR_NOMATCH (which is never returned), errors from
pcre2_match() are passed straight back.
Substitution errors
In the event of an error, pcre2_substitute() returns a negative error
code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors
from pcre2_match() are passed straight back.
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
@ -3405,6 +3420,9 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
of buffer that is needed is returned via outlengthptr. Note that this
does not happen by default.
PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
match_data argument is NULL.
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in
the replacement string, with more particular errors being PCRE2_ER-
ROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE
@ -3727,7 +3745,7 @@ AUTHOR
REVISION
Last updated: 26 December 2019
Last updated: 27 December 2019
Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_SUBSTITUTE 3 "26 December 2019" "PCRE2 10.35"
.TH PCRE2_SUBSTITUTE 3 "27 December 2019" "PCRE2 10.35"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -36,8 +36,8 @@ Its arguments are:
\fIoutlengthptr\fP Points to the length of the output buffer
.sp
A match data block is needed only if you want to inspect the data from the
match that is returned in that block. A match context is needed only if you
want to:
match that is returned in that block or if PCRE2_SUBSTITUTE_MATCHED is set. A
match context is needed only if you want to:
.sp
Set up a callout function
Set a matching offset limit
@ -67,15 +67,16 @@ zero-terminated strings. The options are:
PCRE2_SUBSTITUTE_EXTENDED Do extended replacement processing
PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
.sp
PCRE2_SUBSTITUTE_LITERAL overrides PCRE2_SUBSTITUTE_EXTENDED,
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY.
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED,
PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored.
.P
The function returns the number of substitutions, which may be zero if there
were no matches. The result can be greater than one only when
are no matches. The result may be greater than one only when
PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
is returned.
.P

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "26 December 2019" "PCRE2 10.35"
.TH PCRE2API 3 "27 December 2019" "PCRE2 10.35"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -3321,12 +3321,18 @@ same number causes an error at compile time.
.B " PCRE2_SIZE *\fIoutlengthptr\fP);"
.fi
.P
This function calls \fBpcre2_match()\fP and then makes a copy of the subject
string in \fIoutputbuffer\fP, replacing one or more parts that were matched
with the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP.
This can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
The default is to perform just one replacement, but there is an option that
requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below for details).
This function optionally calls \fBpcre2_match()\fP and then makes a copy of the
subject string in \fIoutputbuffer\fP, replacing parts that were matched with
the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
is to perform just one replacement if the pattern matches, but there is an
option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
for details).
.P
If successful, \fBpcre2_substitute()\fP returns the number of substitutions
that were carried out. This may be zero if no match was found, and is never
greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
returned if an error is detected (see below for details).
.P
Matches in which a \eK item in a lookahead in the pattern causes the match to
end before it starts are not supported, and give rise to an error return. For
@ -3343,13 +3349,28 @@ allocate memory for the compiled code.
.P
If an external \fImatch_data\fP block is provided, its contents afterwards
are those set by the final call to \fBpcre2_match()\fP. For global changes,
this will have ended in a matching error. The contents of the ovector within
this will have ended in a no-match error. The contents of the ovector within
the match data block may or may not have been changed.
.P
The \fIoutlengthptr\fP argument must point to a variable that contains the
length, in code units, of the output buffer. If the function is successful, the
value is updated to contain the length of the new string, excluding the
trailing zero that is automatically added.
As well as the usual options for \fBpcre2_match()\fP, a number of additional
options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP.
One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
\fImatch_data\fP block must be provided, and it must have been used for an
external call to \fBpcre2_match()\fP. The data in the \fImatch_data\fP block
(return code, offset vector) is used for the first substitution instead of
calling \fBpcre2_match()\fP from within \fBpcre2_substitute()\fP. This allows
an application to check for a match before choosing to substitute, without
having to repeat the match.
.P
The \fIcode\fP argument is not used for the first substitution, but if
PCRE2_SUBSTITUTE_GLOBAL is set, \fBpcre2_match()\fP will be called after the
first substitution to check for further matches, and the contents of the
\fImatch_data\fP block will be changed.
.P
The \fIoutlengthptr\fP argument of \fBpcre2_substitute()\fP must point to a
variable that contains the length, in code units, of the output buffer. If the
function is successful, the value is updated to contain the length of the new
string, excluding the trailing zero that is automatically added.
.P
If the function is not successful, the value set via \fIoutlengthptr\fP depends
on the type of error. For syntax errors in the replacement string, the value is
@ -3366,7 +3387,7 @@ The replacement string, which is interpreted as a UTF string in UTF mode,
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
default, however, a dollar character is an escape character that can specify
the insertion of characters from capture groups or names from (*MARK) or other
the insertion of characters from capture groups and names from (*MARK) or other
control verbs in the pattern. The following forms are always recognized:
.sp
$$ insert a dollar character
@ -3390,14 +3411,6 @@ facility can be used to perform simple simultaneous substitutions, as this
apple lemon
2: pear orange
.sp
As well as the usual options for \fBpcre2_match()\fP, a number of additional
options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP.
.P
As mentioned above, PCRE2_SUBSTITUTE_LITERAL causes the replacement string to
be treated as a literal, with no interpretation. If this option is set,
PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and
PCRE2_SUBSTITUTE_UNSET_EMPTY are irrelevant and are ignored.
.P
PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
replacing every matching substring. If this option is not set, only the first
matching substring is replaced. The search for matches takes place in the
@ -3500,13 +3513,17 @@ The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
groups in the extended syntax forms to be treated as unset.
.P
If successful, \fBpcre2_substitute()\fP returns the number of successful
matches. This may be zero if no matches were found, and is never greater than 1
unless PCRE2_SUBSTITUTE_GLOBAL is set.
.P
In the event of an error, a negative error code is returned. Except for
PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
are passed straight back.
If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and
are ignored.
.
.
.SS "Substitution errors"
.rs
.sp
In the event of an error, \fBpcre2_substitute()\fP returns a negative error
code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from
\fBpcre2_match()\fP are passed straight back.
.P
PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
@ -3520,6 +3537,9 @@ PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size of buffer that is
needed is returned via \fIoutlengthptr\fP. Note that this does not happen by
default.
.P
PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
\fImatch_data\fP argument is NULL.
.P
PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
(invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket
@ -3884,6 +3904,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 26 December 2019
Last updated: 27 December 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi

View File

@ -182,6 +182,7 @@ pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */
#define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
#define PCRE2_SUBSTITUTE_LITERAL 0x00008000u /* pcre2_substitute() only */
#define PCRE2_SUBSTITUTE_MATCHED 0x00010000u /* pcre2_substitute() only */
/* Options for pcre2_pattern_convert(). */

View File

@ -49,8 +49,9 @@ POSSIBILITY OF SUCH DAMAGE.
#define SUBSTITUTE_OPTIONS \
(PCRE2_SUBSTITUTE_EXTENDED|PCRE2_SUBSTITUTE_GLOBAL| \
PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_OVERFLOW_LENGTH| \
PCRE2_SUBSTITUTE_UNKNOWN_UNSET|PCRE2_SUBSTITUTE_UNSET_EMPTY)
PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_MATCHED| \
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_UNKNOWN_UNSET| \
PCRE2_SUBSTITUTE_UNSET_EMPTY)
@ -229,6 +230,7 @@ uint32_t suboptions;
BOOL match_data_created = FALSE;
BOOL escaped_literal = FALSE;
BOOL overflowed = FALSE;
BOOL use_existing_match;
#ifdef SUPPORT_UNICODE
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
#endif
@ -254,9 +256,19 @@ PCRE2_UNSET, so as not to imply an offset in the replacement. */
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
return PCRE2_ERROR_BADOPTION;
/* If no match data block is provided, create one. */
/* Check for using a match that has already happened. Note that the subject
pointer in the match data may be NULL after a no-match. */
if (match_data == NULL)
use_existing_match = ((options & PCRE2_SUBSTITUTE_MATCHED) != 0);
if (use_existing_match)
{
if (match_data == NULL) return PCRE2_ERROR_NULL;
}
/* Otherwise, if no match data block is provided, create one. */
else if (match_data == NULL)
{
pcre2_general_context *gcontext = (mcontext == NULL)?
(pcre2_general_context *)code :
@ -310,7 +322,8 @@ if (start_offset > length)
}
CHECKMEMCPY(subject, start_offset);
/* Loop for global substituting. */
/* Loop for global substituting. If PCRE2_SUBSTITUTE_MATCHED is set, the first
match is taken from the match_data that was passed in. */
subs = 0;
do
@ -318,7 +331,12 @@ do
PCRE2_SPTR ptrstack[PTR_STACK_SIZE];
uint32_t ptrstackptr = 0;
rc = pcre2_match(code, subject, length, start_offset, options|goptions,
if (use_existing_match)
{
rc = match_data->rc;
use_existing_match = FALSE;
}
else rc = pcre2_match(code, subject, length, start_offset, options|goptions,
match_data, mcontext);
#ifdef SUPPORT_UNICODE

View File

@ -503,13 +503,14 @@ so many of them that they are split into two fields. */
#define CTL2_SUBSTITUTE_CALLOUT 0x00000001u
#define CTL2_SUBSTITUTE_EXTENDED 0x00000002u
#define CTL2_SUBSTITUTE_LITERAL 0x00000004u
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000008u
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000010u
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000020u
#define CTL2_SUBJECT_LITERAL 0x00000040u
#define CTL2_CALLOUT_NO_WHERE 0x00000080u
#define CTL2_CALLOUT_EXTRA 0x00000100u
#define CTL2_ALLVECTOR 0x00000200u
#define CTL2_SUBSTITUTE_MATCHED 0x00000008u
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000010u
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000020u
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000040u
#define CTL2_SUBJECT_LITERAL 0x00000080u
#define CTL2_CALLOUT_NO_WHERE 0x00000100u
#define CTL2_CALLOUT_EXTRA 0x00000200u
#define CTL2_ALLVECTOR 0x00000400u
#define CTL2_NL_SET 0x40000000u /* Informational */
#define CTL2_BSR_SET 0x80000000u /* Informational */
@ -532,6 +533,7 @@ different things in the two cases. */
#define CTL2_ALLPD (CTL2_SUBSTITUTE_CALLOUT|\
CTL2_SUBSTITUTE_EXTENDED|\
CTL2_SUBSTITUTE_LITERAL|\
CTL2_SUBSTITUTE_MATCHED|\
CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\
CTL2_SUBSTITUTE_UNKNOWN_UNSET|\
CTL2_SUBSTITUTE_UNSET_EMPTY|\
@ -721,6 +723,7 @@ static modstruct modlist[] = {
{ "substitute_callout", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_CALLOUT, PO(control2) },
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
{ "substitute_literal", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_LITERAL, PO(control2) },
{ "substitute_matched", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_MATCHED, PO(control2) },
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
{ "substitute_skip", MOD_PND, MOD_INT, 0, PO(substitute_skip) },
{ "substitute_stop", MOD_PND, MOD_INT, 0, PO(substitute_stop) },
@ -4088,7 +4091,7 @@ Returns: nothing
static void
show_controls(uint32_t controls, uint32_t controls2, const char *before)
{
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@ -4127,6 +4130,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
((controls2 & CTL2_SUBSTITUTE_CALLOUT) != 0)? " substitute_callout" : "",
((controls2 & CTL2_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
((controls2 & CTL2_SUBSTITUTE_LITERAL) != 0)? " substitute_literal" : "",
((controls2 & CTL2_SUBSTITUTE_MATCHED) != 0)? " substitute_matched" : "",
((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "",
((controls2 & CTL2_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
@ -7232,6 +7236,7 @@ if (dat_datctl.replacement[0] != 0)
uint8_t rbuffer[REPLACE_BUFFSIZE];
uint8_t nbuffer[REPLACE_BUFFSIZE];
uint32_t xoptions;
uint32_t emoption; /* External match option */
PCRE2_SIZE j, rlen, nsize, erroroffset;
BOOL badutf = FALSE;
@ -7256,7 +7261,21 @@ if (dat_datctl.replacement[0] != 0)
if ((dat_datctl.control & CTL_ALTGLOBAL) != 0)
fprintf(outfile, "** Altglobal is not supported with replace: ignored\n");
xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
/* Check for a test that does substitution after an initial external match.
If this is set, we run the external match, but leave the interpretation of
its output to pcre2_substitute(). */
emoption = ((dat_datctl.control2 & CTL2_SUBSTITUTE_MATCHED) == 0)? 0 :
PCRE2_SUBSTITUTE_MATCHED;
if (emoption != 0)
{
PCRE2_MATCH(rc, compiled_code, pp, arg_ulen, dat_datctl.offset,
dat_datctl.options, match_data, use_dat_context);
}
xoptions = emoption |
(((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
PCRE2_SUBSTITUTE_GLOBAL) |
(((dat_datctl.control2 & CTL2_SUBSTITUTE_EXTENDED) == 0)? 0 :
PCRE2_SUBSTITUTE_EXTENDED) |

7
testdata/testinput2 vendored
View File

@ -4641,6 +4641,13 @@ B)x/alt_verbnames,mark
/(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1
aaBB
/abcd/replace=wxyz,substitute_matched
abcd
pqrs
/abcd/g
>abcd1234abcd5678<\=replace=wxyz,substitute_matched
/^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I
/((p(?'K/

10
testdata/testoutput2 vendored
View File

@ -14860,6 +14860,16 @@ Failed: error -55 at offset 3 in replacement: requested value is not set
aaBB
1: AAbbaa..AAbBaa
/abcd/replace=wxyz,substitute_matched
abcd
1: wxyz
pqrs
0: pqrs
/abcd/g
>abcd1234abcd5678<\=replace=wxyz,substitute_matched
2: >wxyz1234wxyz5678<
/^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I
Capture group count = 2
Max back reference = 1