Implement callouts from pcre2_substitute().

This commit is contained in:
Philip.Hazel 2018-09-18 16:31:30 +00:00
parent 80adf9d165
commit a69267246f
26 changed files with 956 additions and 433 deletions

View File

@ -12,6 +12,8 @@ partial matches.
2. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has
a greater than 1 fixed quantifier. This issue was found by Yunho Kim.
3. Added support for callouts from pcre2_substitute().
Version 10.32 10-September-2018
-------------------------------

View File

@ -85,6 +85,7 @@ dist_html_DATA = \
doc/html/pcre2_set_parens_nest_limit.html \
doc/html/pcre2_set_recursion_limit.html \
doc/html/pcre2_set_recursion_memory_management.html \
doc/html/pcre2_set_substitute_callout.html \
doc/html/pcre2_substitute.html \
doc/html/pcre2_substring_copy_byname.html \
doc/html/pcre2_substring_copy_bynumber.html \
@ -178,6 +179,7 @@ dist_man_MANS = \
doc/pcre2_set_parens_nest_limit.3 \
doc/pcre2_set_recursion_limit.3 \
doc/pcre2_set_recursion_memory_management.3 \
doc/pcre2_set_substitute_callout.3 \
doc/pcre2_substitute.3 \
doc/pcre2_substring_copy_byname.3 \
doc/pcre2_substring_copy_bynumber.3 \

View File

@ -162,7 +162,7 @@ listing), and the short pages for individual functions, are concatenated in
pcre2-config show PCRE2 installation configuration information
pcre2api details of PCRE2's native C API
pcre2build building PCRE2
pcre2callout details of the callout feature
pcre2callout details of the pattern callout feature
pcre2compat discussion of Perl compatibility
pcre2convert details of pattern conversion functions
pcre2demo a demonstration C program that uses PCRE2
@ -198,7 +198,7 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
</P>
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
<P>
Last updated: 11 July 2018
Last updated: 17 September 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

View File

@ -0,0 +1,43 @@
<html>
<head>
<title>pcre2_set_substitute_callout specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre2_set_substitute_callout man page</h1>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
</p>
<p>
This page is part of the PCRE2 HTML documentation. It was generated
automatically from the original man page. If there is any nonsense in it,
please consult the man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre2.h&#62;</b>
</P>
<P>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *),</b>
<b> void *<i>callout_data</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function sets the substitute callout fields in a match context (the first
argument). The second argument specifies a callout function, and the third
argument is an opaque data item that is passed to it. The result of this
function is always zero.
</P>
<P>
There is a complete description of the PCRE2 native API in the
<a href="pcre2api.html"><b>pcre2api</b></a>
page and a description of the POSIX API in the
<a href="pcre2posix.html"><b>pcre2posix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
</p>

View File

@ -182,6 +182,11 @@ document for an overview of all the PCRE2 documentation.
<b> void *<i>callout_data</i>);</b>
<br>
<br>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> void *<i>callout_data</i>);</b>
<br>
<br>
<b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> PCRE2_SIZE <i>value</i>);</b>
<br>
@ -912,12 +917,23 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
<b> void *<i>callout_data</i>);</b>
<br>
<br>
This sets up a "callout" function for PCRE2 to call at specified points
This sets up a callout function for PCRE2 to call at specified points
during a matching operation. Details are given in the
<a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation.
<br>
<br>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> void *<i>callout_data</i>);</b>
<br>
<br>
This sets up a callout function for PCRE2 to call after each substitution
made by <b>pcre2_substitute()</b>. Details are given in the section entitled
"Creating a new string with substitutions"
<a href="#substitutions">below.</a>
<br>
<br>
<b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> PCRE2_SIZE <i>value</i>);</b>
<br>
@ -3163,26 +3179,30 @@ page, you cannot use names to distinguish the different subpatterns, because
names are not included in the compiled code. The matching process uses only
numbers. For this reason, the use of different names for subpatterns of the
same number causes an error at compile time.
</P>
<a name="substitutions"></a></P>
<br><a name="SEC36" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
<P>
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
<b> uint32_t <i>options</i>, pcre2_match_data *<i>match_data</i>,</b>
<b> pcre2_match_context *<i>mcontext</i>, PCRE2_SPTR <i>replacement</i>,</b>
<b> PCRE2_SIZE <i>rlength</i>, PCRE2_UCHAR *\fIoutputbuffer\zfP,</b>
<b> PCRE2_SIZE <i>rlength</i>, PCRE2_UCHAR *<i>outputbuffer</i>,</b>
<b> PCRE2_SIZE *<i>outlengthptr</i>);</b>
</P>
<P>
This function calls <b>pcre2_match()</b> and then makes a copy of the subject
string in <i>outputbuffer</i>, replacing the part that was matched with the
<i>replacement</i> string, whose length is supplied in <b>rlength</b>. This can
be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
which a \K item in a lookahead in the pattern causes the match to end before
it starts are not supported, and give rise to an error return. For global
replacements, matches in which \K in a lookbehind causes the match to start
earlier than the point that was reached in the previous iteration are also not
supported.
string in <i>outputbuffer</i>, replacing one or more parts that were matched
with the <i>replacement</i> string, whose length is supplied in <b>rlength</b>.
This can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
The default is to perform just one replacement, but there is an option that
requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below for details).
</P>
<P>
Matches in which a \K item in a lookahead in the pattern causes the match to
end before it starts are not supported, and give rise to an error return. For
global replacements, matches in which \K in a lookbehind causes the match to
start earlier than the point that was reached in the previous iteration are
also not supported.
</P>
<P>
The first seven arguments of <b>pcre2_substitute()</b> are the same as for
@ -3194,9 +3214,9 @@ allocate memory for the compiled code.
</P>
<P>
If an external <i>match_data</i> block is provided, its contents afterwards
are those set by the final call to <b>pcre2_match()</b>, which will have
ended in a matching error. The contents of the ovector within the match data
block may or may not have been changed.
are those set by the final call to <b>pcre2_match()</b>. For global changes,
this will have ended in a matching error. The contents of the ovector within
the match data block may or may not have been changed.
</P>
<P>
The <i>outlengthptr</i> argument must point to a variable that contains the
@ -3220,12 +3240,12 @@ length is in code units, not bytes.
In the replacement string, which is interpreted as a UTF string in UTF mode,
and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
dollar character is an escape character that can specify the insertion of
characters from capturing groups or (*MARK), (*PRUNE), or (*THEN) items in the
pattern. The following forms are always recognized:
characters from capturing groups or names from (*MARK) or other control verbs
in the pattern. The following forms are always recognized:
<pre>
$$ insert a dollar character
$&#60;n&#62; or ${&#60;n&#62;} insert the contents of group &#60;n&#62;
$*MARK or ${*MARK} insert a (*MARK), (*PRUNE), or (*THEN) name
$*MARK or ${*MARK} insert a control verb name
</pre>
Either a group number or a group name can be given for &#60;n&#62;. Curly brackets are
required only if the following character would be interpreted as part of the
@ -3234,12 +3254,13 @@ For example, if the pattern a(b)c is matched with "=abc=" and the replacement
string "+$1$0$1+", the result is "=+babcb+=".
</P>
<P>
$*MARK inserts the name from the last encountered (*MARK), (*PRUNE), or (*THEN)
on the matching path that has a name. (*MARK) must always include a name, but
(*PRUNE) and (*THEN) need not. For example, in the case of (*MARK:A)(*PRUNE)
the name inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B".
This facility can be used to perform simple simultaneous substitutions, as this
<b>pcre2test</b> example shows:
$*MARK inserts the name from the last encountered (*ACCEPT), (*COMMIT),
(*MARK), (*PRUNE), or (*THEN) on the matching path that has a name. (*MARK)
must always include a name, but the other verbs need not. For example, in
the case of (*MARK:A)(*PRUNE) the name inserted is "A", but for
(*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be used to
perform simple simultaneous substitutions, as this <b>pcre2test</b> example
shows:
<pre>
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
apple lemon
@ -3399,6 +3420,44 @@ obtained by calling the <b>pcre2_get_error_message()</b> function (see
"Obtaining a textual error message"
<a href="#geterrormessage">above).</a>
</P>
<br><b>
Substitution callouts
</b><br>
<P>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> void *<i>callout_data</i>);</b>
<br>
<br>
The <b>pcre2_set_substitution_callout()</b> function can be used to specify a
callout function for <b>pcre2_substitute()</b>. This information is passed in
a match context. The callout function is called after each substitution. It is
not called for simulated substitutions that happen as a result of the
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout function should not return
any value.
</P>
<P>
The first argument of the callout function is a pointer to a substitute callout
block structure, which contains the following fields, not necessarily in this
order:
<pre>
uint32_t <i>version</i>;
PCRE2_SIZE <i>input_offsets[2]</i>;
PCRE2_SIZE <i>output_offsets[2]</i>;
</pre>
The <i>version</i> field contains the version number of the block format. The
current version is 0. The version number will increase in future if more fields
are added, but the intention is never to remove any of the existing fields.
</P>
<P>
The <i>input_offsets</i> vector contains the code unit offsets in the input
string of the matched substring, and the <i>output_offsets</i> vector contains
the offsets of the replacement in the output string.
</P>
<P>
The second argument of the callout function is the value passed as
<i>callout_data</i> when the function was registered.
</P>
<br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
<P>
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
@ -3665,7 +3724,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 07 September 2018
Last updated: 18 September 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

View File

@ -44,6 +44,14 @@ a match context (see <b>pcre2_set_callout()</b> in the
documentation).
</P>
<P>
When using the <b>pcre2_substitute()</b> function, an additional callout feature
is available. This does a callout after each change to the subject string and
is described in the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation; the rest of this document is concerned with callouts during
pattern matching.
</P>
<P>
Within a regular expression, (?C&#60;arg&#62;) indicates a point at which the external
function is to be called. Different callout points can be identified by putting
a number less than 256 after the letter C. The default value is zero.
@ -463,7 +471,7 @@ Cambridge, England.
</P>
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
<P>
Last updated: 26 April 2018
Last updated: 17 September 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

View File

@ -1041,6 +1041,7 @@ process.
aftertext show text after match
allaftertext show text after captures
allcaptures show all captures
allvector show the entire ovector
allusedtext show all consulted text
altglobal alternative global matching
/g global global matching
@ -1048,6 +1049,7 @@ process.
mark show mark values
replace=&#60;string&#62; specify a replacement string
startchar show starting character when relevant
substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1185,6 +1187,7 @@ pattern.
aftertext show text after match
allaftertext show text after captures
allcaptures show all captures
allvector show the entire ovector
allusedtext show all consulted text (non-JIT only)
altglobal alternative global matching
callout_capture show captures at callout time
@ -1214,6 +1217,7 @@ pattern.
replace=&#60;string&#62; specify a replacement string
startchar show startchar when relevant
startoffset=&#60;n&#62; same as offset=&#60;n&#62;
substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1281,10 +1285,28 @@ captured parentheses be output after a match. By default, only those up to the
highest one actually used in the match are output (corresponding to the return
code from <b>pcre2_match()</b>). Groups that did not take part in the match
are output as "&#60;unset&#62;". This modifier is not relevant for DFA matching (which
does no capturing); it is ignored, with a warning message, if present.
does no capturing) and does not apply when <b>replace</b> is specified; it is
ignored, with a warning message, if present.
</P>
<br><b>
Testing callouts
Showing the entire ovector, for all outcomes
</b><br>
<P>
The <b>allvector</b> modifier requests that the entire ovector be shown,
whatever the outcome of the match. Compare <b>allcaptures</b>, which shows only
up to the maximum number of capture groups for the pattern, and then only for a
successful complete non-DFA match. This modifier, which acts after any match
result, and also for DFA matching, provides a means of checking that there are
no unexpected modifications to ovector fields. Before each match attempt, the
ovector is filled with a special value, and if this is found in both elements
of a capturing pair, "&#60;unchanged&#62;" is output. After a successful match, this
applies to all groups after the maximum capture group for the pattern. In other
cases it applies to the entire ovector. After a partial match, the first two
elements are the only ones that should be set. After a DFA match, the amount of
ovector that is used depends on the number of matches that were found.
</P>
<br><b>
Testing pattern callouts
</b><br>
<P>
A callout function is supplied when <b>pcre2test</b> calls the library matching
@ -1292,6 +1314,9 @@ functions, unless <b>callout_none</b> is specified. Its behaviour can be
controlled by various modifiers listed above whose names begin with
<b>callout_</b>. Details are given in the section entitled "Callouts"
<a href="#callouts">below.</a>
Testing callouts from <b>pcre2_substitute()</b> is decribed separately in
"Testing the substitution function"
<a href="#substitution">below.</a>
</P>
<br><b>
Finding all matches in a string
@ -1343,7 +1368,7 @@ instead of a colon. This is in addition to the normal full list. The string
length (that is, the return from the extraction function) is given in
parentheses after each substring, followed by the name when the extraction was
by name.
</P>
<a name="substitution"></a></P>
<br><b>
Testing the substitution function
</b><br>
@ -1384,6 +1409,16 @@ simple example of a substitution test:
=abc=abc=\=global
2: =xxx=xxx=
</pre>
If the <b>substitute_callout</b> modifier is set, a substitution callout
function is set up. When it is called (after each substitution), the offsets in
the input and output strings are output. For example:
<pre>
/abc/g,replace=&#60;$0&#62;,substitute_callout
abcdefabcpqr
Old 0 3 New 0 5
Old 6 9 New 8 13
2: &#60;abc&#62;def&#60;abc&#62;pqr
</pre>
Subject and replacement strings should be kept relatively short (fewer than 256
characters) for substitution tests, as fixed-size buffers are used. To make it
easy to test for buffer overflow, if the replacement string starts with a
@ -1401,10 +1436,10 @@ The default action of <b>pcre2_substitute()</b> is to return
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
<b>substitute_overflow_length</b> modifier), <b>pcre2_substitute()</b> continues
to go through the motions of matching and substituting, in order to compute the
size of buffer that is required. When this happens, <b>pcre2test</b> shows the
required buffer length (which includes space for the trailing zero) as part of
the error message. For example:
to go through the motions of matching and substituting (but not doing any
callouts), in order to compute the size of buffer that is required. When this
happens, <b>pcre2test</b> shows the required buffer length (which includes space
for the trailing zero) as part of the error message. For example:
<pre>
/abc/substitute_overflow_length
123abc123\=replace=[9]XYZ
@ -2004,7 +2039,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 21 July 2018
Last updated: 17 September 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

View File

@ -1,4 +1,4 @@
.TH PCRE2 3 "11 July 2018" "PCRE2 10.32"
.TH PCRE2 3 "17 September 2018" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH INTRODUCTION
@ -156,7 +156,7 @@ listing), and the short pages for individual functions, are concatenated in
pcre2-config show PCRE2 installation configuration information
pcre2api details of PCRE2's native C API
pcre2build building PCRE2
pcre2callout details of the callout feature
pcre2callout details of the pattern callout feature
pcre2compat discussion of Perl compatibility
pcre2convert details of pattern conversion functions
pcre2demo a demonstration C program that uses PCRE2
@ -197,6 +197,6 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
.rs
.sp
.nf
Last updated: 11 July 2018
Last updated: 17 September 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -141,7 +141,7 @@ USER DOCUMENTATION
pcre2-config show PCRE2 installation configuration information
pcre2api details of PCRE2's native C API
pcre2build building PCRE2
pcre2callout details of the callout feature
pcre2callout details of the pattern callout feature
pcre2compat discussion of Perl compatibility
pcre2convert details of pattern conversion functions
pcre2demo a demonstration C program that uses PCRE2
@ -177,7 +177,7 @@ AUTHOR
REVISION
Last updated: 11 July 2018
Last updated: 17 September 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------
@ -293,6 +293,10 @@ PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS
int (*callout_function)(pcre2_callout_block *, void *),
void *callout_data);
int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*callout_function)(pcre2_substitute_callout_block *, void *),
void *callout_data);
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
PCRE2_SIZE value);
@ -933,10 +937,18 @@ PCRE2 CONTEXTS
int (*callout_function)(pcre2_callout_block *, void *),
void *callout_data);
This sets up a "callout" function for PCRE2 to call at specified points
This sets up a callout function for PCRE2 to call at specified points
during a matching operation. Details are given in the pcre2callout doc-
umentation.
int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*callout_function)(pcre2_substitute_callout_block *, void *),
void *callout_data);
This sets up a callout function for PCRE2 to call after each substitu-
tion made by pcre2_substitute(). Details are given in the section enti-
tled "Creating a new string with substitutions" below.
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
PCRE2_SIZE value);
@ -3083,18 +3095,22 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
PCRE2_SIZE length, PCRE2_SIZE startoffset,
uint32_t options, pcre2_match_data *match_data,
pcre2_match_context *mcontext, PCRE2_SPTR replacement,
PCRE2_SIZE rlength, PCRE2_UCHAR *outputbufferP,
PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer,
PCRE2_SIZE *outlengthptr);
This function calls pcre2_match() and then makes a copy of the subject
string in outputbuffer, replacing the part that was matched with the
replacement string, whose length is supplied in rlength. This can be
given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
which a \K item in a lookahead in the pattern causes the match to end
before it starts are not supported, and give rise to an error return.
For global replacements, matches in which \K in a lookbehind causes the
match to start earlier than the point that was reached in the previous
iteration are also not supported.
string in outputbuffer, replacing one or more parts that were matched
with the replacement string, whose length is supplied in rlength. This
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
The default is to perform just one replacement, but there is an option
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
for details).
Matches in which a \K item in a lookahead in the pattern causes the
match to end before it starts are not supported, and give rise to an
error return. For global replacements, matches in which \K in a lookbe-
hind causes the match to start earlier than the point that was reached
in the previous iteration are also not supported.
The first seven arguments of pcre2_substitute() are the same as for
pcre2_match(), except that the partial matching options are not permit-
@ -3104,9 +3120,9 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
were used to allocate memory for the compiled code.
If an external match_data block is provided, its contents afterwards
are those set by the final call to pcre2_match(), which will have ended
in a matching error. The contents of the ovector within the match data
block may or may not have been changed.
are those set by the final call to pcre2_match(). For global changes,
this will have ended in a matching error. The contents of the ovector
within the match data block may or may not have been changed.
The outlengthptr argument must point to a variable that contains the
length, in code units, of the output buffer. If the function is suc-
@ -3128,13 +3144,13 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
In the replacement string, which is interpreted as a UTF string in UTF
mode, and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK
option is set, a dollar character is an escape character that can spec-
ify the insertion of characters from capturing groups or (*MARK),
(*PRUNE), or (*THEN) items in the pattern. The following forms are
ify the insertion of characters from capturing groups or names from
(*MARK) or other control verbs in the pattern. The following forms are
always recognized:
$$ insert a dollar character
$<n> or ${<n>} insert the contents of group <n>
$*MARK or ${*MARK} insert a (*MARK), (*PRUNE), or (*THEN) name
$*MARK or ${*MARK} insert a control verb name
Either a group number or a group name can be given for <n>. Curly
brackets are required only if the following character would be inter-
@ -3143,11 +3159,11 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
matched with "=abc=" and the replacement string "+$1$0$1+", the result
is "=+babcb+=".
$*MARK inserts the name from the last encountered (*MARK), (*PRUNE), or
(*THEN) on the matching path that has a name. (*MARK) must always
include a name, but (*PRUNE) and (*THEN) need not. For example, in the
case of (*MARK:A)(*PRUNE) the name inserted is "A", but for
(*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be
$*MARK inserts the name from the last encountered (*ACCEPT), (*COMMIT),
(*MARK), (*PRUNE), or (*THEN) on the matching path that has a name.
(*MARK) must always include a name, but the other verbs need not. For
example, in the case of (*MARK:A)(*PRUNE) the name inserted is "A", but
for (*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be
used to perform simple simultaneous substitutions, as this pcre2test
example shows:
@ -3302,6 +3318,39 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
obtained by calling the pcre2_get_error_message() function (see
"Obtaining a textual error message" above).
Substitution callouts
int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*callout_function)(pcre2_substitute_callout_block *, void *),
void *callout_data);
The pcre2_set_substitution_callout() function can be used to specify a
callout function for pcre2_substitute(). This information is passed in
a match context. The callout function is called after each substitu-
tion. It is not called for simulated substitutions that happen as a
result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout func-
tion should not return any value.
The first argument of the callout function is a pointer to a substitute
callout block structure, which contains the following fields, not nec-
essarily in this order:
uint32_t version;
PCRE2_SIZE input_offsets[2];
PCRE2_SIZE output_offsets[2];
The version field contains the version number of the block format. The
current version is 0. The version number will increase in future if
more fields are added, but the intention is never to remove any of the
existing fields.
The input_offsets vector contains the code unit offsets in the input
string of the matched substring, and the output_offsets vector contains
the offsets of the replacement in the output string.
The second argument of the callout function is the value passed as
callout_data when the function was registered.
DUPLICATE SUBPATTERN NAMES
@ -3549,7 +3598,7 @@ AUTHOR
REVISION
Last updated: 07 September 2018
Last updated: 18 September 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------
@ -4135,6 +4184,11 @@ DESCRIPTION
its entry point in a match context (see pcre2_set_callout() in the
pcre2api documentation).
When using the pcre2_substitute() function, an additional callout fea-
ture is available. This does a callout after each change to the subject
string and is described in the pcre2api documentation; the rest of this
document is concerned with callouts during pattern matching.
Within a regular expression, (?C<arg>) indicates a point at which the
external function is to be called. Different callout points can be
identified by putting a number less than 256 after the letter C. The
@ -4530,7 +4584,7 @@ AUTHOR
REVISION
Last updated: 26 April 2018
Last updated: 17 September 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -0,0 +1,31 @@
.TH PCRE2_SET_SUBSTITUTE_CALLOUT 3 "17 September 2018" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
.rs
.sp
.B #include <pcre2.h>
.PP
.nf
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *),"
.B " void *\fIcallout_data\fP);"
.fi
.
.SH DESCRIPTION
.rs
.sp
This function sets the substitute callout fields in a match context (the first
argument). The second argument specifies a callout function, and the third
argument is an opaque data item that is passed to it. The result of this
function is always zero.
.P
There is a complete description of the PCRE2 native API in the
.\" HREF
\fBpcre2api\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcre2posix\fP
.\"
page.

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "07 September 2018" "PCRE2 10.32"
.TH PCRE2API 3 "18 September 2018" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -123,6 +123,10 @@ document for an overview of all the PCRE2 documentation.
.B " int (*\fIcallout_function\fP)(pcre2_callout_block *, void *),"
.B " void *\fIcallout_data\fP);"
.sp
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " void *\fIcallout_data\fP);"
.sp
.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
.B " PCRE2_SIZE \fIvalue\fP);"
.sp
@ -847,7 +851,7 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
.B " void *\fIcallout_data\fP);"
.fi
.sp
This sets up a "callout" function for PCRE2 to call at specified points
This sets up a callout function for PCRE2 to call at specified points
during a matching operation. Details are given in the
.\" HREF
\fBpcre2callout\fP
@ -855,6 +859,20 @@ during a matching operation. Details are given in the
documentation.
.sp
.nf
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " void *\fIcallout_data\fP);"
.fi
.sp
This sets up a callout function for PCRE2 to call after each substitution
made by \fBpcre2_substitute()\fP. Details are given in the section entitled
"Creating a new string with substitutions"
.\" HTML <a href="#substitutions">
.\" </a>
below.
.\"
.sp
.nf
.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
.B " PCRE2_SIZE \fIvalue\fP);"
.fi
@ -3171,6 +3189,7 @@ numbers. For this reason, the use of different names for subpatterns of the
same number causes an error at compile time.
.
.
.\" HTML <a name="substitutions"></a>
.SH "CREATING A NEW STRING WITH SUBSTITUTIONS"
.rs
.sp
@ -3179,19 +3198,22 @@ same number causes an error at compile time.
.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP,"
.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP,"
.B " pcre2_match_context *\fImcontext\fP, PCRE2_SPTR \fIreplacement\fP,"
.B " PCRE2_SIZE \fIrlength\fP, PCRE2_UCHAR *\fIoutputbuffer\zfP,"
.B " PCRE2_SIZE \fIrlength\fP, PCRE2_UCHAR *\fIoutputbuffer\fP,"
.B " PCRE2_SIZE *\fIoutlengthptr\fP);"
.fi
.P
This function calls \fBpcre2_match()\fP and then makes a copy of the subject
string in \fIoutputbuffer\fP, replacing the part that was matched with the
\fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This can
be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
which a \eK item in a lookahead in the pattern causes the match to end before
it starts are not supported, and give rise to an error return. For global
replacements, matches in which \eK in a lookbehind causes the match to start
earlier than the point that was reached in the previous iteration are also not
supported.
string in \fIoutputbuffer\fP, replacing one or more parts that were matched
with the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP.
This can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
The default is to perform just one replacement, but there is an option that
requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below for details).
.P
Matches in which a \eK item in a lookahead in the pattern causes the match to
end before it starts are not supported, and give rise to an error return. For
global replacements, matches in which \eK in a lookbehind causes the match to
start earlier than the point that was reached in the previous iteration are
also not supported.
.P
The first seven arguments of \fBpcre2_substitute()\fP are the same as for
\fBpcre2_match()\fP, except that the partial matching options are not
@ -3201,9 +3223,9 @@ functions from the match context, if provided, or else those that were used to
allocate memory for the compiled code.
.P
If an external \fImatch_data\fP block is provided, its contents afterwards
are those set by the final call to \fBpcre2_match()\fP, which will have
ended in a matching error. The contents of the ovector within the match data
block may or may not have been changed.
are those set by the final call to \fBpcre2_match()\fP. For global changes,
this will have ended in a matching error. The contents of the ovector within
the match data block may or may not have been changed.
.P
The \fIoutlengthptr\fP argument must point to a variable that contains the
length, in code units, of the output buffer. If the function is successful, the
@ -3224,12 +3246,12 @@ length is in code units, not bytes.
In the replacement string, which is interpreted as a UTF string in UTF mode,
and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
dollar character is an escape character that can specify the insertion of
characters from capturing groups or (*MARK), (*PRUNE), or (*THEN) items in the
pattern. The following forms are always recognized:
characters from capturing groups or names from (*MARK) or other control verbs
in the pattern. The following forms are always recognized:
.sp
$$ insert a dollar character
$<n> or ${<n>} insert the contents of group <n>
$*MARK or ${*MARK} insert a (*MARK), (*PRUNE), or (*THEN) name
$*MARK or ${*MARK} insert a control verb name
.sp
Either a group number or a group name can be given for <n>. Curly brackets are
required only if the following character would be interpreted as part of the
@ -3237,12 +3259,13 @@ number or name. The number may be zero to include the entire matched string.
For example, if the pattern a(b)c is matched with "=abc=" and the replacement
string "+$1$0$1+", the result is "=+babcb+=".
.P
$*MARK inserts the name from the last encountered (*MARK), (*PRUNE), or (*THEN)
on the matching path that has a name. (*MARK) must always include a name, but
(*PRUNE) and (*THEN) need not. For example, in the case of (*MARK:A)(*PRUNE)
the name inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B".
This facility can be used to perform simple simultaneous substitutions, as this
\fBpcre2test\fP example shows:
$*MARK inserts the name from the last encountered (*ACCEPT), (*COMMIT),
(*MARK), (*PRUNE), or (*THEN) on the matching path that has a name. (*MARK)
must always include a name, but the other verbs need not. For example, in
the case of (*MARK:A)(*PRUNE) the name inserted is "A", but for
(*MARK:A)(*PRUNE:B) the relevant name is "B". This facility can be used to
perform simple simultaneous substitutions, as this \fBpcre2test\fP example
shows:
.sp
/(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
apple lemon
@ -3388,6 +3411,42 @@ above).
.\"
.
.
.SS "Substitution callouts"
.rs
.sp
.nf
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " void *\fIcallout_data\fP);"
.fi
.sp
The \fBpcre2_set_substitution_callout()\fP function can be used to specify a
callout function for \fBpcre2_substitute()\fP. This information is passed in
a match context. The callout function is called after each substitution. It is
not called for simulated substitutions that happen as a result of the
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout function should not return
any value.
.P
The first argument of the callout function is a pointer to a substitute callout
block structure, which contains the following fields, not necessarily in this
order:
.sp
uint32_t \fIversion\fP;
PCRE2_SIZE \fIinput_offsets[2]\fP;
PCRE2_SIZE \fIoutput_offsets[2]\fP;
.sp
The \fIversion\fP field contains the version number of the block format. The
current version is 0. The version number will increase in future if more fields
are added, but the intention is never to remove any of the existing fields.
.P
The \fIinput_offsets\fP vector contains the code unit offsets in the input
string of the matched substring, and the \fIoutput_offsets\fP vector contains
the offsets of the replacement in the output string.
.P
The second argument of the callout function is the value passed as
\fIcallout_data\fP when the function was registered.
.
.
.SH "DUPLICATE SUBPATTERN NAMES"
.rs
.sp
@ -3670,6 +3729,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 07 September 2018
Last updated: 18 September 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2CALLOUT 3 "26 April 2018" "PCRE2 10.32"
.TH PCRE2CALLOUT 3 "17 September 2018" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -27,6 +27,15 @@ a match context (see \fBpcre2_set_callout()\fP in the
.\"
documentation).
.P
When using the \fBpcre2_substitute()\fP function, an additional callout feature
is available. This does a callout after each change to the subject string and
is described in the
.\" HREF
\fBpcre2api\fP
.\"
documentation; the rest of this document is concerned with callouts during
pattern matching.
.P
Within a regular expression, (?C<arg>) indicates a point at which the external
function is to be called. Different callout points can be identified by putting
a number less than 256 after the letter C. The default value is zero.
@ -443,6 +452,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 26 April 2018
Last updated: 17 September 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "15 September 2018" "PCRE 10.33"
.TH PCRE2TEST 1 "17 September 2018" "PCRE 10.33"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -1011,6 +1011,7 @@ process.
mark show mark values
replace=<string> specify a replacement string
startchar show starting character when relevant
substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1185,6 +1186,7 @@ pattern.
replace=<string> specify a replacement string
startchar show startchar when relevant
startoffset=<n> same as offset=<n>
substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1271,7 +1273,7 @@ elements are the only ones that should be set. After a DFA match, the amount of
ovector that is used depends on the number of matches that were found.
.
.
.SS "Testing callouts"
.SS "Testing pattern callouts"
.rs
.sp
A callout function is supplied when \fBpcre2test\fP calls the library matching
@ -1282,6 +1284,12 @@ controlled by various modifiers listed above whose names begin with
.\" </a>
below.
.\"
Testing callouts from \fBpcre2_substitute()\fP is decribed separately in
"Testing the substitution function"
.\" HTML <a href="#substitution">
.\" </a>
below.
.\"
.
.
.SS "Finding all matches in a string"
@ -1332,6 +1340,7 @@ parentheses after each substring, followed by the name when the extraction was
by name.
.
.
.\" HTML <a name="substitution"></a>
.SS "Testing the substitution function"
.rs
.sp
@ -1367,6 +1376,16 @@ simple example of a substitution test:
=abc=abc=\e=global
2: =xxx=xxx=
.sp
If the \fBsubstitute_callout\fP modifier is set, a substitution callout
function is set up. When it is called (after each substitution), the offsets in
the input and output strings are output. For example:
.sp
/abc/g,replace=<$0>,substitute_callout
abcdefabcpqr
Old 0 3 New 0 5
Old 6 9 New 8 13
2: <abc>def<abc>pqr
.sp
Subject and replacement strings should be kept relatively short (fewer than 256
characters) for substitution tests, as fixed-size buffers are used. To make it
easy to test for buffer overflow, if the replacement string starts with a
@ -1384,10 +1403,10 @@ The default action of \fBpcre2_substitute()\fP is to return
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
\fBsubstitute_overflow_length\fP modifier), \fBpcre2_substitute()\fP continues
to go through the motions of matching and substituting, in order to compute the
size of buffer that is required. When this happens, \fBpcre2test\fP shows the
required buffer length (which includes space for the trailing zero) as part of
the error message. For example:
to go through the motions of matching and substituting (but not doing any
callouts), in order to compute the size of buffer that is required. When this
happens, \fBpcre2test\fP shows the required buffer length (which includes space
for the trailing zero) as part of the error message. For example:
.sp
/abc/substitute_overflow_length
123abc123\e=replace=[9]XYZ
@ -2002,6 +2021,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 15 September 2018
Last updated: 17 September 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -929,6 +929,7 @@ PATTERN MODIFIERS
aftertext show text after match
allaftertext show text after captures
allcaptures show all captures
allvector show the entire ovector
allusedtext show all consulted text
altglobal alternative global matching
/g global global matching
@ -936,6 +937,7 @@ PATTERN MODIFIERS
mark show mark values
replace=<string> specify a replacement string
startchar show starting character when relevant
substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1057,6 +1059,7 @@ SUBJECT MODIFIERS
aftertext show text after match
allaftertext show text after captures
allcaptures show all captures
allvector show the entire ovector
allusedtext show all consulted text (non-JIT only)
altglobal alternative global matching
callout_capture show captures at callout time
@ -1086,6 +1089,7 @@ SUBJECT MODIFIERS
replace=<string> specify a replacement string
startchar show startchar when relevant
startoffset=<n> same as offset=<n>
substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
@ -1150,15 +1154,34 @@ SUBJECT MODIFIERS
the highest one actually used in the match are output (corresponding to
the return code from pcre2_match()). Groups that did not take part in
the match are output as "<unset>". This modifier is not relevant for
DFA matching (which does no capturing); it is ignored, with a warning
message, if present.
DFA matching (which does no capturing) and does not apply when replace
is specified; it is ignored, with a warning message, if present.
Testing callouts
Showing the entire ovector, for all outcomes
The allvector modifier requests that the entire ovector be shown, what-
ever the outcome of the match. Compare allcaptures, which shows only up
to the maximum number of capture groups for the pattern, and then only
for a successful complete non-DFA match. This modifier, which acts
after any match result, and also for DFA matching, provides a means of
checking that there are no unexpected modifications to ovector fields.
Before each match attempt, the ovector is filled with a special value,
and if this is found in both elements of a capturing pair,
"<unchanged>" is output. After a successful match, this applies to all
groups after the maximum capture group for the pattern. In other cases
it applies to the entire ovector. After a partial match, the first two
elements are the only ones that should be set. After a DFA match, the
amount of ovector that is used depends on the number of matches that
were found.
Testing pattern callouts
A callout function is supplied when pcre2test calls the library match-
ing functions, unless callout_none is specified. Its behaviour can be
controlled by various modifiers listed above whose names begin with
callout_. Details are given in the section entitled "Callouts" below.
Testing callouts from pcre2_substitute() is decribed separately in
"Testing the substitution function" below.
Finding all matches in a string
@ -1239,6 +1262,16 @@ SUBJECT MODIFIERS
=abc=abc=\=global
2: =xxx=xxx=
If the substitute_callout modifier is set, a substitution callout func-
tion is set up. When it is called (after each substitution), the off-
sets in the input and output strings are output. For example:
/abc/g,replace=<$0>,substitute_callout
abcdefabcpqr
Old 0 3 New 0 5
Old 6 9 New 8 13
2: <abc>def<abc>pqr
Subject and replacement strings should be kept relatively short (fewer
than 256 characters) for substitution tests, as fixed-size buffers are
used. To make it easy to test for buffer overflow, if the replacement
@ -1257,10 +1290,11 @@ SUBJECT MODIFIERS
PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the sub-
stitute_overflow_length modifier), pcre2_substitute() continues to go
through the motions of matching and substituting, in order to compute
the size of buffer that is required. When this happens, pcre2test shows
the required buffer length (which includes space for the trailing zero)
as part of the error message. For example:
through the motions of matching and substituting (but not doing any
callouts), in order to compute the size of buffer that is required.
When this happens, pcre2test shows the required buffer length (which
includes space for the trailing zero) as part of the error message. For
example:
/abc/substitute_overflow_length
123abc123\=replace=[9]XYZ
@ -1818,5 +1852,5 @@ AUTHOR
REVISION
Last updated: 21 July 2018
Last updated: 17 September 2018
Copyright (c) 1997-2018 University of Cambridge.

View File

@ -505,10 +505,10 @@ typedef struct pcre2_real_jit_stack pcre2_jit_stack; \
typedef pcre2_jit_stack *(*pcre2_jit_callback)(void *);
/* The structure for passing out data via the pcre_callout_function. We use a
structure so that new fields can be added on the end in future versions,
without changing the API of the function, thereby allowing old clients to work
without modification. Define the generic version in a macro; the width-specific
/* The structures for passing out data via callout functions. We use structures
so that new fields can be added on the end in future versions, without changing
the API of the function, thereby allowing old clients to work without
modification. Define the generic versions in a macro; the width-specific
versions are generated from this macro below. */
/* Flags for the callout_flags field. These are cleared after a callout. */
@ -550,7 +550,15 @@ typedef struct pcre2_callout_enumerate_block { \
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
/* ------------------------------------------------------------------ */ \
} pcre2_callout_enumerate_block;
} pcre2_callout_enumerate_block; \
\
typedef struct pcre2_substitute_callout_block { \
uint32_t version; /* Identifies version of block */ \
/* ------------------------ Version 0 ------------------------------- */ \
PCRE2_SIZE input_offsets[2]; /* Matched portion of the input */ \
PCRE2_SIZE output_offsets[2]; /* Changed portion of the output */ \
/* ------------------------------------------------------------------ */ \
} pcre2_substitute_callout_block;
/* List the generic forms of all other functions in macros, which will be
@ -605,6 +613,9 @@ PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_callout(pcre2_match_context *, \
int (*)(pcre2_callout_block *, void *), void *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_substitute_callout(pcre2_match_context *, \
void (*)(pcre2_substitute_callout_block *, void *), void *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_depth_limit(pcre2_match_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
@ -808,6 +819,7 @@ pcre2_compile are called by application code. */
#define pcre2_callout_block PCRE2_SUFFIX(pcre2_callout_block_)
#define pcre2_callout_enumerate_block PCRE2_SUFFIX(pcre2_callout_enumerate_block_)
#define pcre2_substitute_callout_block PCRE2_SUFFIX(pcre2_substitute_callout_block_)
#define pcre2_general_context PCRE2_SUFFIX(pcre2_general_context_)
#define pcre2_compile_context PCRE2_SUFFIX(pcre2_compile_context_)
#define pcre2_convert_context PCRE2_SUFFIX(pcre2_convert_context_)
@ -873,6 +885,7 @@ pcre2_compile are called by application code. */
#define pcre2_set_newline PCRE2_SUFFIX(pcre2_set_newline_)
#define pcre2_set_parens_nest_limit PCRE2_SUFFIX(pcre2_set_parens_nest_limit_)
#define pcre2_set_offset_limit PCRE2_SUFFIX(pcre2_set_offset_limit_)
#define pcre2_set_substitute_callout PCRE2_SUFFIX(pcre2_set_substitute_callout_)
#define pcre2_substitute PCRE2_SUFFIX(pcre2_substitute_)
#define pcre2_substring_copy_byname PCRE2_SUFFIX(pcre2_substring_copy_byname_)
#define pcre2_substring_copy_bynumber PCRE2_SUFFIX(pcre2_substring_copy_bynumber_)

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2017 University of Cambridge
New API code Copyright (c) 2016-2018 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -163,11 +163,13 @@ when no context is supplied to a match function. */
const pcre2_match_context PRIV(default_match_context) = {
{ default_malloc, default_free, NULL },
#ifdef SUPPORT_JIT
NULL,
NULL,
NULL, /* JIT callback */
NULL, /* JIT callback data */
#endif
NULL,
NULL,
NULL, /* Callout function */
NULL, /* Callout data */
NULL, /* Substitute callout function */
NULL, /* Substitute callout data */
PCRE2_UNSET, /* Offset limit */
HEAP_LIMIT,
MATCH_LIMIT,
@ -403,6 +405,16 @@ mcontext->callout_data = callout_data;
return 0;
}
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*substitute_callout)(pcre2_substitute_callout_block *, void *),
void *substitute_callout_data)
{
mcontext->substitute_callout = substitute_callout;
mcontext->substitute_callout_data = substitute_callout_data;
return 0;
}
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
pcre2_set_heap_limit(pcre2_match_context *mcontext, uint32_t limit)
{

View File

@ -585,6 +585,8 @@ typedef struct pcre2_real_match_context {
#endif
int (*callout)(pcre2_callout_block *, void *);
void *callout_data;
void (*substitute_callout)(pcre2_substitute_callout_block *, void *);
void *substitute_callout_data;
PCRE2_SIZE offset_limit;
uint32_t heap_limit;
uint32_t match_limit;

View File

@ -239,7 +239,9 @@ PCRE2_SIZE extra_needed = 0;
PCRE2_SIZE buff_offset, buff_length, lengthleft, fraglength;
PCRE2_SIZE *ovector;
PCRE2_SIZE ovecsave[3];
pcre2_substitute_callout_block scb;
scb.version = 0;
buff_offset = 0;
lengthleft = buff_length = *blength;
*blength = PCRE2_UNSET;
@ -391,6 +393,11 @@ do
goto EXIT;
}
/* Save the match point for a possible callout */
scb.input_offsets[0] = ovector[0];
scb.input_offsets[1] = ovector[1];
/* Count substitutions with a paranoid check for integer overflow; surely no
real call to this function would ever hit this! */
@ -401,11 +408,13 @@ do
}
subs++;
/* Copy the text leading up to the match. */
/* Copy the text leading up to the match, and remember where the insert
begins. */
if (rc == 0) rc = ovector_count;
fraglength = ovector[0] - start_offset;
CHECKMEMCPY(subject + start_offset, fraglength);
scb.output_offsets[0] = buff_offset;
/* Process the replacement string. Literal mode is set by \Q, but only in
extended mode when backslashes are being interpreted. In extended mode we
@ -821,10 +830,19 @@ do
} /* End handling a literal code unit */
} /* End of loop for scanning the replacement. */
/* The replacement has been copied to the output. Save the details of this
match. See above for how this data is used. If we matched an empty string, do
the magic for global matches. Finally, update the start offset to point to
the rest of the subject string. */
/* The replacement has been copied to the output, or its size has been
remembered. Do the callout if there is one and we have done an actual
replacement. */
if (!overflowed && mcontext->substitute_callout != NULL)
{
scb.output_offsets[1] = buff_offset;
mcontext->substitute_callout(&scb, mcontext->substitute_callout_data);
}
/* Save the details of this match. See above for how this data is used. If we
matched an empty string, do the magic for global matches. Finally, update the
start offset to point to the rest of the subject string. */
ovecsave[0] = ovector[0];
ovecsave[1] = ovector[1];

View File

@ -484,14 +484,15 @@ so many of them that they are split into two fields. */
/* Second control word */
#define CTL2_SUBSTITUTE_EXTENDED 0x00000001u
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000002u
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000004u
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000008u
#define CTL2_SUBJECT_LITERAL 0x00000010u
#define CTL2_CALLOUT_NO_WHERE 0x00000020u
#define CTL2_CALLOUT_EXTRA 0x00000040u
#define CTL2_ALLVECTOR 0x00000080u
#define CTL2_SUBSTITUTE_CALLOUT 0x00000001u
#define CTL2_SUBSTITUTE_EXTENDED 0x00000002u
#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000004u
#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000008u
#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000010u
#define CTL2_SUBJECT_LITERAL 0x00000020u
#define CTL2_CALLOUT_NO_WHERE 0x00000040u
#define CTL2_CALLOUT_EXTRA 0x00000080u
#define CTL2_ALLVECTOR 0x00000100u
#define CTL2_NL_SET 0x40000000u /* Informational */
#define CTL2_BSR_SET 0x80000000u /* Informational */
@ -511,7 +512,8 @@ different things in the two cases. */
CTL_STARTCHAR|\
CTL_UTF8_INPUT)
#define CTL2_ALLPD (CTL2_SUBSTITUTE_EXTENDED|\
#define CTL2_ALLPD (CTL2_SUBSTITUTE_CALLOUT|\
CTL2_SUBSTITUTE_EXTENDED|\
CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\
CTL2_SUBSTITUTE_UNKNOWN_UNSET|\
CTL2_SUBSTITUTE_UNSET_EMPTY|\
@ -690,6 +692,7 @@ static modstruct modlist[] = {
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
{ "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) },
{ "subject_literal", MOD_PATP, MOD_CTL, CTL2_SUBJECT_LITERAL, PO(control2) },
{ "substitute_callout", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_CALLOUT, PO(control2) },
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
{ "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
@ -1355,6 +1358,17 @@ are supported. */
else \
pcre2_set_parens_nest_limit_32(G(a,32),b)
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
if (test_mode == PCRE8_MODE) \
pcre2_set_substitute_callout_8(G(a,8), \
(void (*)(pcre2_substitute_callout_block_8 *, void *))b,c); \
else if (test_mode == PCRE16_MODE) \
pcre2_set_substitute_callout_16(G(a,16), \
(void (*)(pcre2_substitute_callout_block_16 *, void *))b,c); \
else \
pcre2_set_substitute_callout_32(G(a,32), \
(void (*)(pcre2_substitute_callout_block_32 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
if (test_mode == PCRE8_MODE) \
a = pcre2_substitute_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),G(h,8), \
@ -1824,6 +1838,14 @@ the three different cases. */
else \
G(pcre2_set_parens_nest_limit_,BITTWO)(G(a,BITTWO),b)
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
G(pcre2_set_substitute_callout_,BITONE)(G(a,BITONE), \
(void (*)(G(pcre2_substitute_callout_block_,BITONE) *, void *))b,c); \
else \
G(pcre2_set_substitute_callout_,BITTWO)(G(a,BITTWO), \
(void (*)(G(pcre2_substitute_callout_block_,BITTWO) *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
a = G(pcre2_substitute_,BITONE)(G(b,BITONE),(G(PCRE2_SPTR,BITONE))c,d,e,f, \
@ -2025,6 +2047,9 @@ the three different cases. */
#define PCRE2_SET_MAX_PATTERN_LENGTH(a,b) pcre2_set_max_pattern_length_8(G(a,8),b)
#define PCRE2_SET_OFFSET_LIMIT(a,b) pcre2_set_offset_limit_8(G(a,8),b)
#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_8(G(a,8),b)
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
pcre2_set_substitute_callout_8(G(a,8), \
(void (*)(pcre2_substitute_callout_block_8 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
a = pcre2_substitute_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),G(h,8), \
(PCRE2_SPTR8)i,j,(PCRE2_UCHAR8 *)k,l)
@ -2129,6 +2154,9 @@ the three different cases. */
#define PCRE2_SET_MAX_PATTERN_LENGTH(a,b) pcre2_set_max_pattern_length_16(G(a,16),b)
#define PCRE2_SET_OFFSET_LIMIT(a,b) pcre2_set_offset_limit_16(G(a,16),b)
#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_16(G(a,16),b)
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
pcre2_set_substitute_callout_16(G(a,16), \
(void (*)(pcre2_substitute_callout_block_16 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
a = pcre2_substitute_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),G(h,16), \
(PCRE2_SPTR16)i,j,(PCRE2_UCHAR16 *)k,l)
@ -2221,7 +2249,7 @@ the three different cases. */
#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \
r = pcre2_serialize_get_number_of_codes_32(a)
#define PCRE2_SET_CALLOUT(a,b,c) \
pcre2_set_callout_32(G(a,32),(int (*)(pcre2_callout_block_32 *, void *))b,c);
pcre2_set_callout_32(G(a,32),(int (*)(pcre2_callout_block_32 *, void *))b,c)
#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_32(G(a,32),b)
#define PCRE2_SET_COMPILE_RECURSION_GUARD(a,b,c) \
pcre2_set_compile_recursion_guard_32(G(a,32),b,c)
@ -2233,6 +2261,9 @@ the three different cases. */
#define PCRE2_SET_MAX_PATTERN_LENGTH(a,b) pcre2_set_max_pattern_length_32(G(a,32),b)
#define PCRE2_SET_OFFSET_LIMIT(a,b) pcre2_set_offset_limit_32(G(a,32),b)
#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_32(G(a,32),b)
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
pcre2_set_substitute_callout_32(G(a,32), \
(void (*)(pcre2_substitute_callout_block_32 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
a = pcre2_substitute_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),G(h,32), \
(PCRE2_SPTR32)i,j,(PCRE2_UCHAR32 *)k,l)
@ -4022,7 +4053,7 @@ Returns: nothing
static void
show_controls(uint32_t controls, uint32_t controls2, const char *before)
{
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@ -4058,6 +4089,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
((controls & CTL_PUSHCOPY) != 0)? " pushcopy" : "",
((controls & CTL_PUSHTABLESCOPY) != 0)? " pushtablescopy" : "",
((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
((controls2 & CTL2_SUBSTITUTE_CALLOUT) != 0)? " substitute_callout" : "",
((controls2 & CTL2_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "",
((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "",
@ -5896,6 +5928,35 @@ return capcount;
/*************************************************
* Substitute callout function *
*************************************************/
/* Called from pcre2_substitute() when the substitute_callout modifier is set.
Print out the data that is passed back. The substitute callout block is
identical for all code unit widths, so we just pick one.
Arguments:
scb pointer to substitute callout block
data_ptr callout data
Returns: nothing
*/
static void
substitute_callout_function(pcre2_substitute_callout_block_8 *scb,
void *data_ptr)
{
(void)data_ptr; /* Not used */
fprintf(outfile, "Old %" SIZ_FORM " %" SIZ_FORM " New %" SIZ_FORM
" %" SIZ_FORM "\n",
SIZ_CAST scb->input_offsets[0],
SIZ_CAST scb->input_offsets[1],
SIZ_CAST scb->output_offsets[0],
SIZ_CAST scb->output_offsets[1]);
}
/*************************************************
* Callout function *
*************************************************/
@ -5907,8 +5968,11 @@ callout block for different code unit widths are that the pointers to the
subject, the most recent MARK, and a callout argument string point to strings
of the appropriate width. Casts can be used to deal with this.
Argument: a pointer to a callout block
Return:
Arguments:
cb a pointer to a callout block
callout_data_ptr the provided callout data
Returns: 0 or 1 or an error, as determined by settings
*/
static int
@ -7158,6 +7222,16 @@ if (dat_datctl.replacement[0] != 0)
rlen = PCRE2_ZERO_TERMINATED;
else
rlen = (CASTVAR(uint8_t *, r) - rbuffer)/code_unit_size;
if ((dat_datctl.control2 & CTL2_SUBSTITUTE_CALLOUT) != 0)
{
PCRE2_SET_SUBSTITUTE_CALLOUT(dat_context, substitute_callout_function, NULL);
}
else
{
PCRE2_SET_SUBSTITUTE_CALLOUT(dat_context, NULL, NULL); /* No callout */
}
PCRE2_SUBSTITUTE(rc, compiled_code, pp, arg_ulen, dat_datctl.offset,
dat_datctl.options|xoptions, match_data, dat_context,
rbuffer, rlen, nbuffer, &nsize);

View File

@ -476,4 +476,9 @@
\= Expect no match
aaa
# Offsets are different in 8-bit mode.
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr
# End of testinput10

View File

@ -382,4 +382,9 @@
\= Expect no match
aaa
# Offsets are different in 8-bit mode.
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr
# End of testinput12

3
testdata/testinput2 vendored
View File

@ -5514,4 +5514,7 @@ a)"xI
abcdef\=ovector=4
abxyz\=ovector=4
/a(b)c|xyz/g,replace=<$0>,substitute_callout
abcdefabcpqr
# End of testinput2

10
testdata/testoutput10 vendored
View File

@ -1626,4 +1626,14 @@ Subject length lower bound = 1
aaa
No match
# Offsets are different in 8-bit mode.
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr
Old 6 6 New 6 8
Old 13 13 New 15 17
Old 13 16 New 17 22
Old 22 22 New 28 30
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
# End of testinput10

View File

@ -1471,4 +1471,14 @@ Subject length lower bound = 1
aaa
No match
# Offsets are different in 8-bit mode.
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr
Old 6 6 New 6 8
Old 12 12 New 14 16
Old 12 15 New 16 21
Old 21 21 New 27 29
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
# End of testinput12

View File

@ -1468,4 +1468,14 @@ Subject length lower bound = 1
aaa
No match
# Offsets are different in 8-bit mode.
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr
Old 6 6 New 6 8
Old 12 12 New 14 16
Old 12 15 New 16 21
Old 21 21 New 27 29
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
# End of testinput12

View File

@ -16795,6 +16795,12 @@ Subject length lower bound = 1
2: <unchanged>
3: <unchanged>
/a(b)c|xyz/g,replace=<$0>,substitute_callout
abcdefabcpqr
Old 0 3 New 0 5
Old 6 9 New 8 13
2: <abc>def<abc>pqr
# End of testinput2
Error -70: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data