Implement PCRE2_SUBSTITUTE_REPLACEMENT_ONLY.

This commit is contained in:
Philip.Hazel 2020-01-22 17:50:12 +00:00
parent 7171d86587
commit e8d70e2459
14 changed files with 591 additions and 460 deletions

View File

@ -41,6 +41,8 @@ the minimum.
10. Fix *THEN verbs in lookahead assertions in JIT.
11. Added PCRE2_SUBSTITUTE_REPLACEMENT_ONLY.
Version 10.34 21-November-2019
------------------------------

View File

@ -82,6 +82,7 @@ zero-terminated strings. The options are:
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY Return only replacement string(s)
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
</pre>

View File

@ -3305,10 +3305,11 @@ same number causes an error at compile time.
This function optionally calls <b>pcre2_match()</b> and then makes a copy of the
subject string in <i>outputbuffer</i>, replacing parts that were matched with
the <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
is to perform just one replacement if the pattern matches, but there is an
option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
for details).
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
replacement string(s). The default action is to perform just one replacement if
the pattern matches, but there is an option that requests multiple replacements
(see PCRE2_SUBSTITUTE_GLOBAL below for details).
</P>
<P>
If successful, <b>pcre2_substitute()</b> returns the number of substitutions
@ -3349,10 +3350,19 @@ an application to check for a match before choosing to substitute, without
having to repeat the match.
</P>
<P>
The <i>code</i> argument is not used for the first substitution, but if
PCRE2_SUBSTITUTE_GLOBAL is set, <b>pcre2_match()</b> will be called after the
first substitution to check for further matches, and the contents of the
<i>match_data</i> block will be changed.
The <i>code</i> argument is not used for the first substitution when
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also set,
<b>pcre2_match()</b> will be called after the first substitution to check for
further matches, and the contents of the <i>match_data</i> block will be
changed.
</P>
<P>
The default is to return a copy of the subject string with matched substrings
replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the
replacement substrings are returned. In the global case, multiple replacements
are concatenated in the output buffer. Substitution callouts (see
<a href="#subcallouts">below)</a>
can be used to separate them if necessary.
</P>
<P>
The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
@ -3560,7 +3570,7 @@ As for all PCRE2 errors, a text message that describes the error can be
obtained by calling the <b>pcre2_get_error_message()</b> function (see
"Obtaining a textual error message"
<a href="#geterrormessage">above).</a>
</P>
<a name="subcallouts"></a></P>
<br><b>
Substitution callouts
</b><br>
@ -3897,9 +3907,9 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 27 December 2019
Last updated: 22 January 2020
<br>
Copyright &copy; 1997-2019 University of Cambridge.
Copyright &copy; 1997-2020 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -1064,9 +1064,11 @@ process.
substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_skip=&#60;n&#62; skip substitution number n
substitute_stop=&#60;n&#62; skip substitution number n and greater
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_skip=&#60;n&#62; skip substitution &#60;n&#62;
substitute_stop=&#60;n&#62; skip substitution &#60;n&#62; and following
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
</pre>
@ -1235,7 +1237,9 @@ pattern.
substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_skip=&#60;n&#62; skip substitution number n
substitute_stop=&#60;n&#62; skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1397,9 +1401,10 @@ Testing the substitution function
</b><br>
<P>
If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
called instead of one of the matching functions. Note that replacement strings
cannot contain commas, because a comma signifies the end of a modifier. This is
not thought to be an issue in a test program.
called instead of one of the matching functions (or after one call of
<b>pcre2_match()</b> in the case of PCRE2_SUBSTITUTE_MATCHED). Note that
replacement strings cannot contain commas, because a comma signifies the end of
a modifier. This is not thought to be an issue in a test program.
</P>
<P>
Unlike subject strings, <b>pcre2test</b> does not process replacement strings
@ -1416,11 +1421,15 @@ for <b>pcre2_substitute()</b>:
global PCRE2_SUBSTITUTE_GLOBAL
substitute_extended PCRE2_SUBSTITUTE_EXTENDED
substitute_literal PCRE2_SUBSTITUTE_LITERAL
substitute_matched PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
</PRE>
</pre>
See the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation for details of these options.
</P>
<P>
After a successful substitution, the modified string is output, preceded by the
@ -2096,9 +2105,9 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 26 December 2019
Last updated: 22 January 2020
<br>
Copyright &copy; 1997-2019 University of Cambridge.
Copyright &copy; 1997-2020 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -3196,10 +3196,12 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
This function optionally calls pcre2_match() and then makes a copy of
the subject string in outputbuffer, replacing parts that were matched
with the replacement string, whose length is supplied in rlength. This
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The
default is to perform just one replacement if the pattern matches, but
there is an option that requests multiple replacements (see PCRE2_SUB-
STITUTE_GLOBAL below for details).
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to re-
turn just the replacement string(s). The default action is to perform
just one replacement if the pattern matches, but there is an option
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
for details).
If successful, pcre2_substitute() returns the number of substitutions
that were carried out. This may be zero if no match was found, and is
@ -3234,10 +3236,17 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
application to check for a match before choosing to substitute, without
having to repeat the match.
The code argument is not used for the first substitution, but if
PCRE2_SUBSTITUTE_GLOBAL is set, pcre2_match() will be called after the
first substitution to check for further matches, and the contents of
the match_data block will be changed.
The code argument is not used for the first substitution when
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also
set, pcre2_match() will be called after the first substitution to check
for further matches, and the contents of the match_data block will be
changed.
The default is to return a copy of the subject string with matched sub-
strings replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set,
only the replacement substrings are returned. In the global case, mul-
tiple replacements are concatenated in the output buffer. Substitution
callouts (see below) can be used to separate them if necessary.
The outlengthptr argument of pcre2_substitute() must point to a vari-
able that contains the length, in code units, of the output buffer. If
@ -3745,8 +3754,8 @@ AUTHOR
REVISION
Last updated: 27 December 2019
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 22 January 2020
Copyright (c) 1997-2020 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_SUBSTITUTE 3 "05 January 2020" "PCRE2 10.35"
.TH PCRE2_SUBSTITUTE 3 "22 January 2020" "PCRE2 10.35"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -73,6 +73,7 @@ zero-terminated strings. The options are:
PCRE2_SUBSTITUTE_LITERAL The replacement string is literal
PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY Return only replacement string(s)
PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset
PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string
.sp

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "27 December 2019" "PCRE2 10.35"
.TH PCRE2API 3 "22 January 2020" "PCRE2 10.35"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -3324,10 +3324,11 @@ same number causes an error at compile time.
This function optionally calls \fBpcre2_match()\fP and then makes a copy of the
subject string in \fIoutputbuffer\fP, replacing parts that were matched with
the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. The default
is to perform just one replacement if the pattern matches, but there is an
option that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
for details).
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
replacement string(s). The default action is to perform just one replacement if
the pattern matches, but there is an option that requests multiple replacements
(see PCRE2_SUBSTITUTE_GLOBAL below for details).
.P
If successful, \fBpcre2_substitute()\fP returns the number of substitutions
that were carried out. This may be zero if no match was found, and is never
@ -3362,10 +3363,21 @@ calling \fBpcre2_match()\fP from within \fBpcre2_substitute()\fP. This allows
an application to check for a match before choosing to substitute, without
having to repeat the match.
.P
The \fIcode\fP argument is not used for the first substitution, but if
PCRE2_SUBSTITUTE_GLOBAL is set, \fBpcre2_match()\fP will be called after the
first substitution to check for further matches, and the contents of the
\fImatch_data\fP block will be changed.
The \fIcode\fP argument is not used for the first substitution when
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also set,
\fBpcre2_match()\fP will be called after the first substitution to check for
further matches, and the contents of the \fImatch_data\fP block will be
changed.
.P
The default is to return a copy of the subject string with matched substrings
replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the
replacement substrings are returned. In the global case, multiple replacements
are concatenated in the output buffer. Substitution callouts (see
.\" HTML <a href="#subcallouts">
.\" </a>
below)
.\"
can be used to separate them if necessary.
.P
The \fIoutlengthptr\fP argument of \fBpcre2_substitute()\fP must point to a
variable that contains the length, in code units, of the output buffer. If the
@ -3557,6 +3569,7 @@ above).
.\"
.
.
.\" HTML <a name="subcallouts"></a>
.SS "Substitution callouts"
.rs
.sp
@ -3904,6 +3917,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 27 December 2019
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 22 January 2020
Copyright (c) 1997-2020 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "26 December 2019" "PCRE 10.35"
.TH PCRE2TEST 1 "22 January 2020" "PCRE 10.35"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -1025,9 +1025,11 @@ process.
substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_skip=<n> skip substitution number n
substitute_stop=<n> skip substitution number n and greater
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_skip=<n> skip substitution <n>
substitute_stop=<n> skip substitution <n> and following
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
.sp
@ -1203,7 +1205,9 @@ pattern.
substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_skip=<n> skip substitution number n
substitute_stop=<n> skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1367,9 +1371,10 @@ by name.
.rs
.sp
If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
called instead of one of the matching functions. Note that replacement strings
cannot contain commas, because a comma signifies the end of a modifier. This is
not thought to be an issue in a test program.
called instead of one of the matching functions (or after one call of
\fBpcre2_match()\fP in the case of PCRE2_SUBSTITUTE_MATCHED). Note that
replacement strings cannot contain commas, because a comma signifies the end of
a modifier. This is not thought to be an issue in a test program.
.P
Unlike subject strings, \fBpcre2test\fP does not process replacement strings
for escape sequences. In UTF mode, a replacement string is checked to see if it
@ -1384,10 +1389,17 @@ for \fBpcre2_substitute()\fP:
global PCRE2_SUBSTITUTE_GLOBAL
substitute_extended PCRE2_SUBSTITUTE_EXTENDED
substitute_literal PCRE2_SUBSTITUTE_LITERAL
substitute_matched PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
.sp
See the
.\" HREF
\fBpcre2api\fP
.\"
documentation for details of these options.
.P
After a successful substitution, the modified string is output, preceded by the
number of replacements. This may be zero if there were no matches. Here is a
@ -2076,6 +2088,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 26 December 2019
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 22 January 2020
Copyright (c) 1997-2020 University of Cambridge.
.fi

View File

@ -950,9 +950,11 @@ PATTERN MODIFIERS
substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_skip=<n> skip substitution number n
substitute_stop=<n> skip substitution number n and greater
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_skip=<n> skip substitution <n>
substitute_stop=<n> skip substitution <n> and following
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
@ -1105,7 +1107,9 @@ SUBJECT MODIFIERS
substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_literal use PCRE2_SUBSTITUTE_LITERAL
substitute_matched use PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_skip=<n> skip substitution number n
substitute_stop=<n> skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1251,9 +1255,11 @@ SUBJECT MODIFIERS
Testing the substitution function
If the replace modifier is set, the pcre2_substitute() function is
called instead of one of the matching functions. Note that replacement
strings cannot contain commas, because a comma signifies the end of a
modifier. This is not thought to be an issue in a test program.
called instead of one of the matching functions (or after one call of
pcre2_match() in the case of PCRE2_SUBSTITUTE_MATCHED). Note that re-
placement strings cannot contain commas, because a comma signifies the
end of a modifier. This is not thought to be an issue in a test pro-
gram.
Unlike subject strings, pcre2test does not process replacement strings
for escape sequences. In UTF mode, a replacement string is checked to
@ -1268,10 +1274,13 @@ SUBJECT MODIFIERS
global PCRE2_SUBSTITUTE_GLOBAL
substitute_extended PCRE2_SUBSTITUTE_EXTENDED
substitute_literal PCRE2_SUBSTITUTE_LITERAL
substitute_matched PCRE2_SUBSTITUTE_MATCHED
substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY
See the pcre2api documentation for details of these options.
After a successful substitution, the modified string is output, pre-
ceded by the number of replacements. This may be zero if there were no
@ -1905,5 +1914,5 @@ AUTHOR
REVISION
Last updated: 26 December 2019
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 22 January 2020
Copyright (c) 1997-2020 University of Cambridge.

View File

@ -5,7 +5,7 @@
/* This is the public header file for the PCRE library, second API, to be
#included by applications that call PCRE2 functions.
Copyright (c) 2016-2019 University of Cambridge
Copyright (c) 2016-2020 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -183,6 +183,7 @@ pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */
#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u
#define PCRE2_SUBSTITUTE_LITERAL 0x00008000u /* pcre2_substitute() only */
#define PCRE2_SUBSTITUTE_MATCHED 0x00010000u /* pcre2_substitute() only */
#define PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 0x00020000u /* pcre2_substitute() only */
/* Options for pcre2_pattern_convert(). */

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2019 University of Cambridge
New API code Copyright (c) 2016-2020 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -50,8 +50,8 @@ POSSIBILITY OF SUCH DAMAGE.
#define SUBSTITUTE_OPTIONS \
(PCRE2_SUBSTITUTE_EXTENDED|PCRE2_SUBSTITUTE_GLOBAL| \
PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_MATCHED| \
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_UNKNOWN_UNSET| \
PCRE2_SUBSTITUTE_UNSET_EMPTY)
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_REPLACEMENT_ONLY| \
PCRE2_SUBSTITUTE_UNKNOWN_UNSET|PCRE2_SUBSTITUTE_UNSET_EMPTY)
@ -195,6 +195,7 @@ overflow, either give an error immediately, or keep on, accumulating the
length. */
#define CHECKMEMCPY(from,length) \
{ \
if (!overflowed && lengthleft < length) \
{ \
if ((suboptions & PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) == 0) goto NOROOM; \
@ -210,6 +211,7 @@ length. */
memcpy(buffer + buff_offset, from, CU2BYTES(length)); \
buff_offset += length; \
lengthleft -= length; \
} \
}
/* Here's the function */
@ -231,6 +233,7 @@ BOOL match_data_created = FALSE;
BOOL escaped_literal = FALSE;
BOOL overflowed = FALSE;
BOOL use_existing_match;
BOOL replacement_only;
#ifdef SUPPORT_UNICODE
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
#endif
@ -260,6 +263,7 @@ if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
pointer in the match data may be NULL after a no-match. */
use_existing_match = ((options & PCRE2_SUBSTITUTE_MATCHED) != 0);
replacement_only = ((options & PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) != 0);
if (use_existing_match)
{
@ -312,7 +316,7 @@ if (utf && (options & PCRE2_NO_UTF_CHECK) == 0)
suboptions = options & SUBSTITUTE_OPTIONS;
options &= ~SUBSTITUTE_OPTIONS;
/* Copy up to the start offset */
/* Error if the start match offset it greater than the length of the subject. */
if (start_offset > length)
{
@ -320,7 +324,10 @@ if (start_offset > length)
rc = PCRE2_ERROR_BADOFFSET;
goto EXIT;
}
CHECKMEMCPY(subject, start_offset);
/* Copy up to the start offset, unless only the replacement is required. */
if (!replacement_only) CHECKMEMCPY(subject, start_offset);
/* Loop for global substituting. If PCRE2_SUBSTITUTE_MATCHED is set, the first
match is taken from the match_data that was passed in. */
@ -382,11 +389,11 @@ do
#endif
}
/* Copy what we have advanced past, reset the special global options, and
continue to the next match. */
/* Copy what we have advanced past (unless not required), reset the special
global options, and continue to the next match. */
fraglength = start_offset - save_start;
CHECKMEMCPY(subject + save_start, fraglength);
if (!replacement_only) CHECKMEMCPY(subject + save_start, fraglength);
goptions = 0;
continue;
}
@ -430,12 +437,12 @@ do
}
subs++;
/* Copy the text leading up to the match, and remember where the insert
begins and how many ovector pairs are set. */
/* Copy the text leading up to the match (unless not required), and remember
where the insert begins and how many ovector pairs are set. */
if (rc == 0) rc = ovector_count;
fraglength = ovector[0] - start_offset;
CHECKMEMCPY(subject + start_offset, fraglength);
if (!replacement_only) CHECKMEMCPY(subject + start_offset, fraglength);
scb.output_offsets[0] = buff_offset;
scb.oveccount = rc;
@ -882,7 +889,7 @@ do
buff_offset -= newlength;
lengthleft += newlength;
CHECKMEMCPY(subject + ovector[0], oldlength);
if (!replacement_only) CHECKMEMCPY(subject + ovector[0], oldlength);
/* A negative return means do not do any more. */
@ -903,12 +910,17 @@ do
start_offset = ovector[1];
} while ((suboptions & PCRE2_SUBSTITUTE_GLOBAL) != 0); /* Repeat "do" loop */
/* Copy the rest of the subject. */
/* Copy the rest of the subject unless not required, and terminate the output
with a binary zero. */
if (!replacement_only)
{
fraglength = length - start_offset;
CHECKMEMCPY(subject + start_offset, fraglength);
}
fraglength = length - start_offset;
CHECKMEMCPY(subject + start_offset, fraglength);
temp[0] = 0;
CHECKMEMCPY(temp , 1);
CHECKMEMCPY(temp, 1);
/* If overflowed is set it means the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set,
and matching has carried on after a full buffer, in order to compute the length

View File

@ -11,7 +11,7 @@ hacked-up (non-) design had also run out of steam.
Written by Philip Hazel
Original code Copyright (c) 1997-2012 University of Cambridge
Rewritten code Copyright (c) 2016-2019 University of Cambridge
Rewritten code Copyright (c) 2016-2020 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -505,12 +505,13 @@ so many of them that they are split into two fields. */
#define CTL2_SUBSTITUTE_LITERAL 0x00000004u
#define CTL2_SUBSTITUTE_MATCHED 0x00000008u
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000010u
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000020u
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000040u
#define CTL2_SUBJECT_LITERAL 0x00000080u
#define CTL2_CALLOUT_NO_WHERE 0x00000100u
#define CTL2_CALLOUT_EXTRA 0x00000200u
#define CTL2_ALLVECTOR 0x00000400u
#define CTL2_SUBSTITUTE_REPLACEMENT_ONLY 0x00000020u
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000040u
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000080u
#define CTL2_SUBJECT_LITERAL 0x00000100u
#define CTL2_CALLOUT_NO_WHERE 0x00000200u
#define CTL2_CALLOUT_EXTRA 0x00000400u
#define CTL2_ALLVECTOR 0x00000800u
#define CTL2_NL_SET 0x40000000u /* Informational */
#define CTL2_BSR_SET 0x80000000u /* Informational */
@ -535,6 +536,7 @@ different things in the two cases. */
CTL2_SUBSTITUTE_LITERAL|\
CTL2_SUBSTITUTE_MATCHED|\
CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\
CTL2_SUBSTITUTE_REPLACEMENT_ONLY|\
CTL2_SUBSTITUTE_UNKNOWN_UNSET|\
CTL2_SUBSTITUTE_UNSET_EMPTY|\
CTL2_ALLVECTOR)
@ -725,6 +727,7 @@ static modstruct modlist[] = {
{ "substitute_literal", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_LITERAL, PO(control2) },
{ "substitute_matched", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_MATCHED, PO(control2) },
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
{ "substitute_replacement_only", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_REPLACEMENT_ONLY, PO(control2) },
{ "substitute_skip", MOD_PND, MOD_INT, 0, PO(substitute_skip) },
{ "substitute_stop", MOD_PND, MOD_INT, 0, PO(substitute_stop) },
{ "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
@ -4091,7 +4094,7 @@ Returns: nothing
static void
show_controls(uint32_t controls, uint32_t controls2, const char *before)
{
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@ -4132,6 +4135,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
((controls2 & CTL2_SUBSTITUTE_LITERAL) != 0)? " substitute_literal" : "",
((controls2 & CTL2_SUBSTITUTE_MATCHED) != 0)? " substitute_matched" : "",
((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
((controls2 & CTL2_SUBSTITUTE_REPLACEMENT_ONLY) != 0)? " substitute_replacement_only" : "",
((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "",
((controls2 & CTL2_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
((controls & CTL_USE_LENGTH) != 0)? " use_length" : "",
@ -7283,6 +7287,8 @@ if (dat_datctl.replacement[0] != 0)
PCRE2_SUBSTITUTE_LITERAL) |
(((dat_datctl.control2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) == 0)? 0 :
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) |
(((dat_datctl.control2 & CTL2_SUBSTITUTE_REPLACEMENT_ONLY) == 0)? 0 :
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) |
(((dat_datctl.control2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) == 0)? 0 :
PCRE2_SUBSTITUTE_UNKNOWN_UNSET) |
(((dat_datctl.control2 & CTL2_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :

13
testdata/testinput2 vendored
View File

@ -5793,4 +5793,17 @@ a)"xI
/^((\1+)(?C)|\d)+133X$/
111133X\=callout_capture
/abc/replace=xyz,substitute_replacement_only
123abc456
/a(?<ONE>b)c(?<TWO>d)e/g,replace=X$ONE+${TWO}Z,substitute_replacement_only
"abcde-abcde-"
/a(b)c|xyz/g,replace=<$0>,substitute_callout,substitute_replacement_only
abcdefabcpqr
abxyzpqrabcxyz
12abc34xyz99abc55\=substitute_stop=2
12abc34xyz99abc55\=substitute_skip=1
12abc34xyz99abc55\=substitute_skip=2
# End of testinput2

33
testdata/testoutput2 vendored
View File

@ -17503,6 +17503,39 @@ Callout 0: last capture = 2
1: 11
2: 11
/abc/replace=xyz,substitute_replacement_only
123abc456
1: xyz
/a(?<ONE>b)c(?<TWO>d)e/g,replace=X$ONE+${TWO}Z,substitute_replacement_only
"abcde-abcde-"
2: Xb+dZXb+dZ
/a(b)c|xyz/g,replace=<$0>,substitute_callout,substitute_replacement_only
abcdefabcpqr
1(2) Old 0 3 "abc" New 0 5 "<abc>"
2(2) Old 6 9 "abc" New 5 10 "<abc>"
2: <abc><abc>
abxyzpqrabcxyz
1(1) Old 2 5 "xyz" New 0 5 "<xyz>"
2(2) Old 8 11 "abc" New 5 10 "<abc>"
3(1) Old 11 14 "xyz" New 10 15 "<xyz>"
3: <xyz><abc><xyz>
12abc34xyz99abc55\=substitute_stop=2
1(2) Old 2 5 "abc" New 0 5 "<abc>"
2(1) Old 7 10 "xyz" New 5 10 "<xyz> STOPPED"
2: <abc>
12abc34xyz99abc55\=substitute_skip=1
1(2) Old 2 5 "abc" New 0 5 "<abc> SKIPPED"
2(1) Old 7 10 "xyz" New 0 5 "<xyz>"
3(2) Old 12 15 "abc" New 5 10 "<abc>"
3: <xyz><abc>
12abc34xyz99abc55\=substitute_skip=2
1(2) Old 2 5 "abc" New 0 5 "<abc>"
2(1) Old 7 10 "xyz" New 5 10 "<xyz> SKIPPED"
3(2) Old 12 15 "abc" New 5 10 "<abc>"
3: <abc><abc>
# End of testinput2
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data