Upgrade the as yet unreleased substitute callout facility.

This commit is contained in:
Philip.Hazel 2018-11-12 16:02:01 +00:00
parent 900f457222
commit 9bc81d5229
18 changed files with 599 additions and 303 deletions

View File

@ -20,7 +20,7 @@ SYNOPSIS
</P>
<P>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *),</b>
<b> int (*<i>callout_function</i>)(pcre2_substitute_callout_block *),</b>
<b> void *<i>callout_data</i>);</b>
</P>
<br><b>

View File

@ -183,7 +183,7 @@ document for an overview of all the PCRE2 documentation.
<br>
<br>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> void *<i>callout_data</i>);</b>
<br>
<br>
@ -924,7 +924,7 @@ documentation.
<br>
<br>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> void *<i>callout_data</i>);</b>
<br>
<br>
@ -3413,9 +3413,9 @@ substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
groups in the extended syntax forms to be treated as unset.
</P>
<P>
If successful, <b>pcre2_substitute()</b> returns the number of replacements that
were made. This may be zero if no matches were found, and is never greater than
1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
If successful, <b>pcre2_substitute()</b> returns the number of successful
matches. This may be zero if no matches were found, and is never greater than 1
unless PCRE2_SUBSTITUTE_GLOBAL is set.
</P>
<P>
In the event of an error, a negative error code is returned. Except for
@ -3457,16 +3457,16 @@ Substitution callouts
</b><br>
<P>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> void *<i>callout_data</i>);</b>
<br>
<br>
The <b>pcre2_set_substitution_callout()</b> function can be used to specify a
callout function for <b>pcre2_substitute()</b>. This information is passed in
a match context. The callout function is called after each substitution. It is
not called for simulated substitutions that happen as a result of the
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout function should not return
any value.
a match context. The callout function is called after each substitution has
been processed, but it can cause the replacement not to happen. The callout
function is not called for simulated substitutions that happen as a result of
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.
</P>
<P>
The first argument of the callout function is a pointer to a substitute callout
@ -3474,7 +3474,11 @@ block structure, which contains the following fields, not necessarily in this
order:
<pre>
uint32_t <i>version</i>;
PCRE2_SIZE <i>input_offsets[2]</i>;
uint32_t <i>subscount</i>;
PCRE2_SPTR <i>input</i>;
PCRE2_SPTR <i>output</i>;
PCRE2_SIZE <i>*ovector</i>;
uint32_t <i>oveccount</i>;
PCRE2_SIZE <i>output_offsets[2]</i>;
</pre>
The <i>version</i> field contains the version number of the block format. The
@ -3482,13 +3486,34 @@ current version is 0. The version number will increase in future if more fields
are added, but the intention is never to remove any of the existing fields.
</P>
<P>
The <i>input_offsets</i> vector contains the code unit offsets in the input
string of the matched substring, and the <i>output_offsets</i> vector contains
the offsets of the replacement in the output string.
The <i>subscount</i> field is the number of the current match. It is 1 for the
first callout, 2 for the second, and so on. The <i>input</i> and <i>output</i>
pointers are copies of the values passed to <b>pcre2_substitute()</b>.
</P>
<P>
The <i>ovector</i> field points to the ovector, which contains the result of the
most recent match. The <i>oveccount</i> field contains the number of pairs that
are set in the ovector, and is always greater than zero.
</P>
<P>
The <i>output_offsets</i> vector contains the offsets of the replacement in the
output string. This has already been processed for dollar and (if requested)
backslash substitutions as described above.
</P>
<P>
The second argument of the callout function is the value passed as
<i>callout_data</i> when the function was registered.
<i>callout_data</i> when the function was registered. The value returned by the
callout function is interpreted as follows:
</P>
<P>
If the value is zero, the replacement is accepted, and, if
PCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next
match. If the value is not zero, the current replacement is not accepted. If
the value is greater than zero, processing continues when
PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or
PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the
output and the call to <b>pcre2_substitute()</b> exits, returning the number of
matches so far.
</P>
<br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
<P>
@ -3757,7 +3782,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 19 October 2018
Last updated: 12 November 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

View File

@ -1052,7 +1052,9 @@ process.
startchar show starting character when relevant
substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=&#60;n&#62; skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=&#60;n&#62; skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
</pre>
@ -1220,7 +1222,9 @@ pattern.
startoffset=&#60;n&#62; same as offset=&#60;n&#62;
substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=&#60;n&#62; skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=&#60;n&#62; skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
zero_terminate pass the subject as zero-terminated
@ -1410,16 +1414,6 @@ simple example of a substitution test:
=abc=abc=\=global
2: =xxx=xxx=
</pre>
If the <b>substitute_callout</b> modifier is set, a substitution callout
function is set up. When it is called (after each substitution), the offsets in
the input and output strings are output. For example:
<pre>
/abc/g,replace=&#60;$0&#62;,substitute_callout
abcdefabcpqr
Old 0 3 New 0 5
Old 6 9 New 8 13
2: &#60;abc&#62;def&#60;abc&#62;pqr
</pre>
Subject and replacement strings should be kept relatively short (fewer than 256
characters) for substitution tests, as fixed-size buffers are used. To make it
easy to test for buffer overflow, if the replacement string starts with a
@ -1451,6 +1445,47 @@ matching provokes an error return ("bad option value") from
<b>pcre2_substitute()</b>.
</P>
<br><b>
Testing substitute callouts
</b><br>
<P>
If the <b>substitute_callout</b> modifier is set, a substitution callout
function is set up. When it is called (after each substitution), details of the
the input and output strings are output. For example:
<pre>
/abc/g,replace=&#60;$0&#62;,substitute_callout
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62;"
2(1) Old 6 9 "abc" New 8 13 "&#60;abc&#62;"
2: &#60;abc&#62;def&#60;abc&#62;pqr
</pre>
The first number on each callout line is the count of matches. The
parenthesized number is the number of pairs that are set in the ovector (that
is, one more than the number of capturing groups that were set). Then are
listed the offsets of the old substring, its contents, and the same for the
replacement.
</P>
<P>
By default, the substitution callout function returns zero, which accepts the
replacement and causes matching to continue if /g was used. Two further
modifiers can be used to test other return values. If <b>substitute_skip</b> is
set to a value greater than zero the callout function returns +1 for the match
of that number, and similarly <b>substitute_stop</b> returns -1. These cause the
replacement to be rejected, and -1 causes no further matching to take place. If
either of them are set, <b>substitute_callout</b> is assumed. For example:
<pre>
/abc/g,replace=&#60;$0&#62;,substitute_skip=1
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62; SKIPPED"
2(1) Old 6 9 "abc" New 6 11 "&#60;abc&#62;"
2: abcdef&#60;abc&#62;pqr
abcdefabcpqr\=substitute_stop=1
1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62; STOPPED"
1: abcdefabcpqr
</pre>
If both are set for the same number, stop takes precedence. Only a single skip
or stop is supported, which is sufficient for testing that the feature works.
</P>
<br><b>
Setting the JIT stack size
</b><br>
<P>
@ -2040,7 +2075,7 @@ Cambridge, England.
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 21 September 2018
Last updated: 12 November 2018
<br>
Copyright &copy; 1997-2018 University of Cambridge.
<br>

View File

@ -294,7 +294,7 @@ PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS
void *callout_data);
int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*callout_function)(pcre2_substitute_callout_block *, void *),
int (*callout_function)(pcre2_substitute_callout_block *, void *),
void *callout_data);
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
@ -942,7 +942,7 @@ PCRE2 CONTEXTS
umentation.
int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*callout_function)(pcre2_substitute_callout_block *, void *),
int (*callout_function)(pcre2_substitute_callout_block *, void *),
void *callout_data);
This sets up a callout function for PCRE2 to call after each substitu-
@ -3318,8 +3318,8 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause
unknown groups in the extended syntax forms to be treated as unset.
If successful, pcre2_substitute() returns the number of replacements
that were made. This may be zero if no matches were found, and is never
If successful, pcre2_substitute() returns the number of successful
matches. This may be zero if no matches were found, and is never
greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
In the event of an error, a negative error code is returned. Except for
@ -3355,22 +3355,26 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
Substitution callouts
int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*callout_function)(pcre2_substitute_callout_block *, void *),
int (*callout_function)(pcre2_substitute_callout_block *, void *),
void *callout_data);
The pcre2_set_substitution_callout() function can be used to specify a
callout function for pcre2_substitute(). This information is passed in
a match context. The callout function is called after each substitu-
tion. It is not called for simulated substitutions that happen as a
result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout func-
tion should not return any value.
a match context. The callout function is called after each substitution
has been processed, but it can cause the replacement not to happen. The
callout function is not called for simulated substitutions that happen
as a result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.
The first argument of the callout function is a pointer to a substitute
callout block structure, which contains the following fields, not nec-
essarily in this order:
uint32_t version;
PCRE2_SIZE input_offsets[2];
uint32_t subscount;
PCRE2_SPTR input;
PCRE2_SPTR output;
PCRE2_SIZE *ovector;
uint32_t oveccount;
PCRE2_SIZE output_offsets[2];
The version field contains the version number of the block format. The
@ -3378,12 +3382,30 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
more fields are added, but the intention is never to remove any of the
existing fields.
The input_offsets vector contains the code unit offsets in the input
string of the matched substring, and the output_offsets vector contains
the offsets of the replacement in the output string.
The subscount field is the number of the current match. It is 1 for the
first callout, 2 for the second, and so on. The input and output point-
ers are copies of the values passed to pcre2_substitute().
The ovector field points to the ovector, which contains the result of
the most recent match. The oveccount field contains the number of pairs
that are set in the ovector, and is always greater than zero.
The output_offsets vector contains the offsets of the replacement in
the output string. This has already been processed for dollar and (if
requested) backslash substitutions as described above.
The second argument of the callout function is the value passed as
callout_data when the function was registered.
callout_data when the function was registered. The value returned by
the callout function is interpreted as follows:
If the value is zero, the replacement is accepted, and, if PCRE2_SUB-
STITUTE_GLOBAL is set, processing continues with a search for the next
match. If the value is not zero, the current replacement is not
accepted. If the value is greater than zero, processing continues when
PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero
or PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is
copied to the output and the call to pcre2_substitute() exits, return-
ing the number of matches so far.
DUPLICATE SUBPATTERN NAMES
@ -3633,7 +3655,7 @@ AUTHOR
REVISION
Last updated: 19 October 2018
Last updated: 12 November 2018
Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_SET_SUBSTITUTE_CALLOUT 3 "17 September 2018" "PCRE2 10.33"
.TH PCRE2_SET_SUBSTITUTE_CALLOUT 3 "12 November 2018" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
@ -8,7 +8,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.PP
.nf
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *),"
.B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *),"
.B " void *\fIcallout_data\fP);"
.fi
.

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "19 October 2018" "PCRE2 10.33"
.TH PCRE2API 3 "12 November 2018" "PCRE2 10.33"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -124,7 +124,7 @@ document for an overview of all the PCRE2 documentation.
.B " void *\fIcallout_data\fP);"
.sp
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " void *\fIcallout_data\fP);"
.sp
.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
@ -860,7 +860,7 @@ documentation.
.sp
.nf
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " void *\fIcallout_data\fP);"
.fi
.sp
@ -3412,9 +3412,9 @@ The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
groups in the extended syntax forms to be treated as unset.
.P
If successful, \fBpcre2_substitute()\fP returns the number of replacements that
were made. This may be zero if no matches were found, and is never greater than
1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
If successful, \fBpcre2_substitute()\fP returns the number of successful
matches. This may be zero if no matches were found, and is never greater than 1
unless PCRE2_SUBSTITUTE_GLOBAL is set.
.P
In the event of an error, a negative error code is returned. Except for
PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
@ -3454,35 +3454,57 @@ above).
.sp
.nf
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " void *\fIcallout_data\fP);"
.fi
.sp
The \fBpcre2_set_substitution_callout()\fP function can be used to specify a
callout function for \fBpcre2_substitute()\fP. This information is passed in
a match context. The callout function is called after each substitution. It is
not called for simulated substitutions that happen as a result of the
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout function should not return
any value.
a match context. The callout function is called after each substitution has
been processed, but it can cause the replacement not to happen. The callout
function is not called for simulated substitutions that happen as a result of
the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.
.P
The first argument of the callout function is a pointer to a substitute callout
block structure, which contains the following fields, not necessarily in this
order:
.sp
uint32_t \fIversion\fP;
PCRE2_SIZE \fIinput_offsets[2]\fP;
uint32_t \fIsubscount\fP;
PCRE2_SPTR \fIinput\fP;
PCRE2_SPTR \fIoutput\fP;
PCRE2_SIZE \fI*ovector\fP;
uint32_t \fIoveccount\fP;
PCRE2_SIZE \fIoutput_offsets[2]\fP;
.sp
The \fIversion\fP field contains the version number of the block format. The
current version is 0. The version number will increase in future if more fields
are added, but the intention is never to remove any of the existing fields.
.P
The \fIinput_offsets\fP vector contains the code unit offsets in the input
string of the matched substring, and the \fIoutput_offsets\fP vector contains
the offsets of the replacement in the output string.
The \fIsubscount\fP field is the number of the current match. It is 1 for the
first callout, 2 for the second, and so on. The \fIinput\fP and \fIoutput\fP
pointers are copies of the values passed to \fBpcre2_substitute()\fP.
.P
The \fIovector\fP field points to the ovector, which contains the result of the
most recent match. The \fIoveccount\fP field contains the number of pairs that
are set in the ovector, and is always greater than zero.
.P
The \fIoutput_offsets\fP vector contains the offsets of the replacement in the
output string. This has already been processed for dollar and (if requested)
backslash substitutions as described above.
.P
The second argument of the callout function is the value passed as
\fIcallout_data\fP when the function was registered.
\fIcallout_data\fP when the function was registered. The value returned by the
callout function is interpreted as follows:
.P
If the value is zero, the replacement is accepted, and, if
PCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next
match. If the value is not zero, the current replacement is not accepted. If
the value is greater than zero, processing continues when
PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or
PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the
output and the call to \fBpcre2_substitute()\fP exits, returning the number of
matches so far.
.
.
.SH "DUPLICATE SUBPATTERN NAMES"
@ -3768,6 +3790,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 19 October 2018
Last updated: 12 November 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "21 September 2018" "PCRE 10.33"
.TH PCRE2TEST 1 "12 November 2018" "PCRE 10.33"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -1014,7 +1014,9 @@ process.
startchar show starting character when relevant
substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=<n> skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=<n> skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
.sp
@ -1189,7 +1191,9 @@ pattern.
startoffset=<n> same as offset=<n>
substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=<n> skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=<n> skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
zero_terminate pass the subject as zero-terminated
@ -1377,16 +1381,6 @@ simple example of a substitution test:
=abc=abc=\e=global
2: =xxx=xxx=
.sp
If the \fBsubstitute_callout\fP modifier is set, a substitution callout
function is set up. When it is called (after each substitution), the offsets in
the input and output strings are output. For example:
.sp
/abc/g,replace=<$0>,substitute_callout
abcdefabcpqr
Old 0 3 New 0 5
Old 6 9 New 8 13
2: <abc>def<abc>pqr
.sp
Subject and replacement strings should be kept relatively short (fewer than 256
characters) for substitution tests, as fixed-size buffers are used. To make it
easy to test for buffer overflow, if the replacement string starts with a
@ -1418,6 +1412,46 @@ matching provokes an error return ("bad option value") from
\fBpcre2_substitute()\fP.
.
.
.SS "Testing substitute callouts"
.rs
.sp
If the \fBsubstitute_callout\fP modifier is set, a substitution callout
function is set up. When it is called (after each substitution), details of the
the input and output strings are output. For example:
.sp
/abc/g,replace=<$0>,substitute_callout
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "<abc>"
2(1) Old 6 9 "abc" New 8 13 "<abc>"
2: <abc>def<abc>pqr
.sp
The first number on each callout line is the count of matches. The
parenthesized number is the number of pairs that are set in the ovector (that
is, one more than the number of capturing groups that were set). Then are
listed the offsets of the old substring, its contents, and the same for the
replacement.
.P
By default, the substitution callout function returns zero, which accepts the
replacement and causes matching to continue if /g was used. Two further
modifiers can be used to test other return values. If \fBsubstitute_skip\fP is
set to a value greater than zero the callout function returns +1 for the match
of that number, and similarly \fBsubstitute_stop\fP returns -1. These cause the
replacement to be rejected, and -1 causes no further matching to take place. If
either of them are set, \fBsubstitute_callout\fP is assumed. For example:
.sp
/abc/g,replace=<$0>,substitute_skip=1
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED"
2(1) Old 6 9 "abc" New 6 11 "<abc>"
2: abcdef<abc>pqr
abcdefabcpqr\e=substitute_stop=1
1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED"
1: abcdefabcpqr
.sp
If both are set for the same number, stop takes precedence. Only a single skip
or stop is supported, which is sufficient for testing that the feature works.
.
.
.SS "Setting the JIT stack size"
.rs
.sp
@ -2022,6 +2056,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 21 September 2018
Last updated: 12 November 2018
Copyright (c) 1997-2018 University of Cambridge.
.fi

View File

@ -940,7 +940,9 @@ PATTERN MODIFIERS
startchar show starting character when relevant
substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=<n> skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=<n> skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
@ -1092,7 +1094,9 @@ SUBJECT MODIFIERS
startoffset=<n> same as offset=<n>
substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=<n> skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=<n> skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
zero_terminate pass the subject as zero-terminated
@ -1263,16 +1267,6 @@ SUBJECT MODIFIERS
=abc=abc=\=global
2: =xxx=xxx=
If the substitute_callout modifier is set, a substitution callout func-
tion is set up. When it is called (after each substitution), the off-
sets in the input and output strings are output. For example:
/abc/g,replace=<$0>,substitute_callout
abcdefabcpqr
Old 0 3 New 0 5
Old 6 9 New 8 13
2: <abc>def<abc>pqr
Subject and replacement strings should be kept relatively short (fewer
than 256 characters) for substitution tests, as fixed-size buffers are
used. To make it easy to test for buffer overflow, if the replacement
@ -1305,6 +1299,46 @@ SUBJECT MODIFIERS
partial matching provokes an error return ("bad option value") from
pcre2_substitute().
Testing substitute callouts
If the substitute_callout modifier is set, a substitution callout func-
tion is set up. When it is called (after each substitution), details of
the the input and output strings are output. For example:
/abc/g,replace=<$0>,substitute_callout
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "<abc>"
2(1) Old 6 9 "abc" New 8 13 "<abc>"
2: <abc>def<abc>pqr
The first number on each callout line is the count of matches. The
parenthesized number is the number of pairs that are set in the ovector
(that is, one more than the number of capturing groups that were set).
Then are listed the offsets of the old substring, its contents, and the
same for the replacement.
By default, the substitution callout function returns zero, which
accepts the replacement and causes matching to continue if /g was used.
Two further modifiers can be used to test other return values. If sub-
stitute_skip is set to a value greater than zero the callout function
returns +1 for the match of that number, and similarly substitute_stop
returns -1. These cause the replacement to be rejected, and -1 causes
no further matching to take place. If either of them are set, substi-
tute_callout is assumed. For example:
/abc/g,replace=<$0>,substitute_skip=1
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED"
2(1) Old 6 9 "abc" New 6 11 "<abc>"
2: abcdef<abc>pqr
abcdefabcpqr\=substitute_stop=1
1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED"
1: abcdefabcpqr
If both are set for the same number, stop takes precedence. Only a sin-
gle skip or stop is supported, which is sufficient for testing that the
feature works.
Setting the JIT stack size
The jitstack modifier provides a way of setting the maximum stack size
@ -1853,5 +1887,5 @@ AUTHOR
REVISION
Last updated: 21 September 2018
Last updated: 12 November 2018
Copyright (c) 1997-2018 University of Cambridge.

View File

@ -549,8 +549,12 @@ typedef struct pcre2_callout_enumerate_block { \
typedef struct pcre2_substitute_callout_block { \
uint32_t version; /* Identifies version of block */ \
/* ------------------------ Version 0 ------------------------------- */ \
PCRE2_SIZE input_offsets[2]; /* Matched portion of the input */ \
PCRE2_SPTR input; /* Pointer to input subject string */ \
PCRE2_SPTR output; /* Pointer to output buffer */ \
PCRE2_SIZE output_offsets[2]; /* Changed portion of the output */ \
PCRE2_SIZE *ovector; /* Pointer to current ovector */ \
uint32_t oveccount; /* Count of pairs set in ovector */ \
uint32_t subscount; /* Substitution number */ \
/* ------------------------------------------------------------------ */ \
} pcre2_substitute_callout_block;
@ -609,7 +613,7 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
int (*)(pcre2_callout_block *, void *), void *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_substitute_callout(pcre2_match_context *, \
void (*)(pcre2_substitute_callout_block *, void *), void *); \
int (*)(pcre2_substitute_callout_block *, void *), void *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_depth_limit(pcre2_match_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \

View File

@ -407,7 +407,7 @@ return 0;
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*substitute_callout)(pcre2_substitute_callout_block *, void *),
int (*substitute_callout)(pcre2_substitute_callout_block *, void *),
void *substitute_callout_data)
{
mcontext->substitute_callout = substitute_callout;

View File

@ -585,7 +585,7 @@ typedef struct pcre2_real_match_context {
#endif
int (*callout)(pcre2_callout_block *, void *);
void *callout_data;
void (*substitute_callout)(pcre2_substitute_callout_block *, void *);
int (*substitute_callout)(pcre2_substitute_callout_block *, void *);
void *substitute_callout_data;
PCRE2_SIZE offset_limit;
uint32_t heap_limit;

View File

@ -241,13 +241,15 @@ PCRE2_SIZE *ovector;
PCRE2_SIZE ovecsave[3];
pcre2_substitute_callout_block scb;
scb.version = 0;
/* General initialization */
buff_offset = 0;
lengthleft = buff_length = *blength;
*blength = PCRE2_UNSET;
ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET;
/* Partial matching is not valid. */
/* Partial matching is not valid. This must come after setting *blength to
PCRE2_UNSET, so as not to imply an offset in the replacement. */
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
return PCRE2_ERROR_BADOPTION;
@ -266,6 +268,13 @@ if (match_data == NULL)
ovector = pcre2_get_ovector_pointer(match_data);
ovector_count = pcre2_get_ovector_count(match_data);
/* Fixed things in the callout block */
scb.version = 0;
scb.input = subject;
scb.output = (PCRE2_SPTR)buffer;
scb.ovector = ovector;
/* Find lengths of zero-terminated strings and the end of the replacement. */
if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
@ -393,11 +402,6 @@ do
goto EXIT;
}
/* Save the match point for a possible callout */
scb.input_offsets[0] = ovector[0];
scb.input_offsets[1] = ovector[1];
/* Count substitutions with a paranoid check for integer overflow; surely no
real call to this function would ever hit this! */
@ -409,12 +413,13 @@ do
subs++;
/* Copy the text leading up to the match, and remember where the insert
begins. */
begins and how many ovector pairs are set. */
if (rc == 0) rc = ovector_count;
fraglength = ovector[0] - start_offset;
CHECKMEMCPY(subject + start_offset, fraglength);
scb.output_offsets[0] = buff_offset;
scb.oveccount = rc;
/* Process the replacement string. Literal mode is set by \Q, but only in
extended mode when backslashes are being interpreted. In extended mode we
@ -836,8 +841,26 @@ do
if (!overflowed && mcontext->substitute_callout != NULL)
{
scb.subscount = subs;
scb.output_offsets[1] = buff_offset;
mcontext->substitute_callout(&scb, mcontext->substitute_callout_data);
rc = mcontext->substitute_callout(&scb, mcontext->substitute_callout_data);
/* A non-zero return means cancel this substitution. Instead, copy the
matched string fragment. */
if (rc != 0)
{
PCRE2_SIZE newlength = scb.output_offsets[1] - scb.output_offsets[0];
PCRE2_SIZE oldlength = ovector[1] - ovector[0];
buff_offset -= newlength;
lengthleft += newlength;
CHECKMEMCPY(subject + ovector[0], oldlength);
/* A negative return means do not do any more. */
if (rc < 0) suboptions &= (~PCRE2_SUBSTITUTE_GLOBAL);
}
}
/* Save the details of this match. See above for how this data is used. If we

View File

@ -537,6 +537,8 @@ typedef struct patctl { /* Structure for pattern modifiers. */
uint32_t control2; /* Must be in same position as datctl */
uint32_t jitstack; /* Must be in same position as datctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t substitute_skip; /* Must be in same position as patctl */
uint32_t substitute_stop; /* Must be in same position as patctl */
uint32_t jit;
uint32_t stackguard_test;
uint32_t tables_id;
@ -557,6 +559,8 @@ typedef struct datctl { /* Structure for data line modifiers. */
uint32_t control2; /* Must be in same position as patctl */
uint32_t jitstack; /* Must be in same position as patctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t substitute_skip; /* Must be in same position as patctl */
uint32_t substitute_stop; /* Must be in same position as patctl */
uint32_t startend[2];
uint32_t cerror[2];
uint32_t cfail[2];
@ -704,6 +708,8 @@ static modstruct modlist[] = {
{ "substitute_callout", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_CALLOUT, PO(control2) },
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
{ "substitute_skip", MOD_PND, MOD_INT, 0, PO(substitute_skip) },
{ "substitute_stop", MOD_PND, MOD_INT, 0, PO(substitute_stop) },
{ "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
{ "substitute_unset_empty", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNSET_EMPTY, PO(control2) },
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
@ -1370,13 +1376,13 @@ are supported. */
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
if (test_mode == PCRE8_MODE) \
pcre2_set_substitute_callout_8(G(a,8), \
(void (*)(pcre2_substitute_callout_block_8 *, void *))b,c); \
(int (*)(pcre2_substitute_callout_block_8 *, void *))b,c); \
else if (test_mode == PCRE16_MODE) \
pcre2_set_substitute_callout_16(G(a,16), \
(void (*)(pcre2_substitute_callout_block_16 *, void *))b,c); \
(int (*)(pcre2_substitute_callout_block_16 *, void *))b,c); \
else \
pcre2_set_substitute_callout_32(G(a,32), \
(void (*)(pcre2_substitute_callout_block_32 *, void *))b,c)
(int (*)(pcre2_substitute_callout_block_32 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
if (test_mode == PCRE8_MODE) \
@ -1850,10 +1856,10 @@ the three different cases. */
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
G(pcre2_set_substitute_callout_,BITONE)(G(a,BITONE), \
(void (*)(G(pcre2_substitute_callout_block_,BITONE) *, void *))b,c); \
(int (*)(G(pcre2_substitute_callout_block_,BITONE) *, void *))b,c); \
else \
G(pcre2_set_substitute_callout_,BITTWO)(G(a,BITTWO), \
(void (*)(G(pcre2_substitute_callout_block_,BITTWO) *, void *))b,c)
(int (*)(G(pcre2_substitute_callout_block_,BITTWO) *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
if (test_mode == G(G(PCRE,BITONE),_MODE)) \
@ -2058,7 +2064,7 @@ the three different cases. */
#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_8(G(a,8),b)
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
pcre2_set_substitute_callout_8(G(a,8), \
(void (*)(pcre2_substitute_callout_block_8 *, void *))b,c)
(int (*)(pcre2_substitute_callout_block_8 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
a = pcre2_substitute_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),G(h,8), \
(PCRE2_SPTR8)i,j,(PCRE2_UCHAR8 *)k,l)
@ -2165,7 +2171,7 @@ the three different cases. */
#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_16(G(a,16),b)
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
pcre2_set_substitute_callout_16(G(a,16), \
(void (*)(pcre2_substitute_callout_block_16 *, void *))b,c)
(int (*)(pcre2_substitute_callout_block_16 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
a = pcre2_substitute_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),G(h,16), \
(PCRE2_SPTR16)i,j,(PCRE2_UCHAR16 *)k,l)
@ -2272,7 +2278,7 @@ the three different cases. */
#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_32(G(a,32),b)
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
pcre2_set_substitute_callout_32(G(a,32), \
(void (*)(pcre2_substitute_callout_block_32 *, void *))b,c)
(int (*)(pcre2_substitute_callout_block_32 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
a = pcre2_substitute_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),G(h,32), \
(PCRE2_SPTR32)i,j,(PCRE2_UCHAR32 *)k,l)
@ -5955,17 +5961,40 @@ Arguments:
Returns: nothing
*/
static void
static int
substitute_callout_function(pcre2_substitute_callout_block_8 *scb,
void *data_ptr)
{
int yield = 0;
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
(void)data_ptr; /* Not used */
fprintf(outfile, "Old %" SIZ_FORM " %" SIZ_FORM " New %" SIZ_FORM
" %" SIZ_FORM "\n",
SIZ_CAST scb->input_offsets[0],
SIZ_CAST scb->input_offsets[1],
SIZ_CAST scb->output_offsets[0],
SIZ_CAST scb->output_offsets[1]);
fprintf(outfile, "%2d(%d) Old %" SIZ_FORM " %" SIZ_FORM " \"",
scb->subscount, scb->oveccount,
SIZ_CAST scb->ovector[0], SIZ_CAST scb->ovector[1]);
PCHARSV(scb->input, scb->ovector[0], scb->ovector[1] - scb->ovector[0],
utf, outfile);
fprintf(outfile, "\" New %" SIZ_FORM " %" SIZ_FORM " \"",
SIZ_CAST scb->output_offsets[0], SIZ_CAST scb->output_offsets[1]);
PCHARSV(scb->output, scb->output_offsets[0],
scb->output_offsets[1] - scb->output_offsets[0], utf, outfile);
if (scb->subscount == dat_datctl.substitute_stop)
{
yield = -1;
fprintf(outfile, " STOPPED");
}
else if (scb->subscount == dat_datctl.substitute_skip)
{
yield = +1;
fprintf(outfile, " SKIPPED");
}
fprintf(outfile, "\"\n");
return yield;
}
@ -6494,6 +6523,11 @@ dat_datctl.control2 |= (pat_patctl.control2 & CTL2_ALLPD);
strcpy((char *)dat_datctl.replacement, (char *)pat_patctl.replacement);
if (dat_datctl.jitstack == 0) dat_datctl.jitstack = pat_patctl.jitstack;
if (dat_datctl.substitute_skip == 0)
dat_datctl.substitute_skip = pat_patctl.substitute_skip;
if (dat_datctl.substitute_stop == 0)
dat_datctl.substitute_stop = pat_patctl.substitute_stop;
/* Initialize for scanning the data line. */
#ifdef SUPPORT_PCRE2_8
@ -6833,6 +6867,11 @@ arg_ulen = ulen; /* Value to use in match arg */
if (p[-1] != 0 && !decode_modifiers(p, CTX_DAT, NULL, &dat_datctl))
return PR_OK;
/* Setting substitute_{skip,fail} implies a substitute callout. */
if (dat_datctl.substitute_skip != 0 || dat_datctl.substitute_stop != 0)
dat_datctl.control2 |= CTL2_SUBSTITUTE_CALLOUT;
/* Check for mutually exclusive modifiers. At present, these are all in the
first control word. */

15
testdata/testinput2 vendored
View File

@ -5516,6 +5516,21 @@ a)"xI
/a(b)c|xyz/g,replace=<$0>,substitute_callout
abcdefabcpqr
abxyzpqrabcxyz
12abc34xyz99abc55\=substitute_stop=2
12abc34xyz99abc55\=substitute_skip=1
12abc34xyz99abc55\=substitute_skip=2
/a(b)c|xyz/g,replace=<$0>
abcdefabcpqr
abxyzpqrabcxyz
12abc34xyz\=substitute_stop=2
12abc34xyz\=substitute_skip=1
/a(b)c|xyz/replace=<$0>
abcdefabcpqr
12abc34xyz\=substitute_skip=1
12abc34xyz\=substitute_stop=1
/abc\rdef/
abc\ndef

View File

@ -1630,10 +1630,10 @@ No match
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr
Old 6 6 New 6 8
Old 13 13 New 15 17
Old 13 16 New 17 22
Old 22 22 New 28 30
1(2) Old 6 6 "" New 6 8 "<>"
2(2) Old 13 13 "" New 15 17 "<>"
3(2) Old 13 16 "def" New 17 22 "<def>"
4(2) Old 22 22 "" New 28 30 "<>"
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
# End of testinput10

View File

@ -1475,10 +1475,10 @@ No match
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr
Old 6 6 New 6 8
Old 12 12 New 14 16
Old 12 15 New 16 21
Old 21 21 New 27 29
1(2) Old 6 6 "" New 6 8 "<>"
2(2) Old 12 12 "" New 14 16 "<>"
3(2) Old 12 15 "def" New 16 21 "<def>"
4(2) Old 21 21 "" New 27 29 "<>"
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
# A few script run tests in non-UTF mode (but they need Unicode support)

View File

@ -1472,10 +1472,10 @@ No match
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr
Old 6 6 New 6 8
Old 12 12 New 14 16
Old 12 15 New 16 21
Old 21 21 New 27 29
1(2) Old 6 6 "" New 6 8 "<>"
2(2) Old 12 12 "" New 14 16 "<>"
3(2) Old 12 15 "def" New 16 21 "<def>"
4(2) Old 21 21 "" New 27 29 "<>"
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
# A few script run tests in non-UTF mode (but they need Unicode support)

47
testdata/testoutput2 vendored
View File

@ -16797,9 +16797,52 @@ Subject length lower bound = 1
/a(b)c|xyz/g,replace=<$0>,substitute_callout
abcdefabcpqr
Old 0 3 New 0 5
Old 6 9 New 8 13
1(2) Old 0 3 "abc" New 0 5 "<abc>"
2(2) Old 6 9 "abc" New 8 13 "<abc>"
2: <abc>def<abc>pqr
abxyzpqrabcxyz
1(1) Old 2 5 "xyz" New 2 7 "<xyz>"
2(2) Old 8 11 "abc" New 10 15 "<abc>"
3(1) Old 11 14 "xyz" New 15 20 "<xyz>"
3: ab<xyz>pqr<abc><xyz>
12abc34xyz99abc55\=substitute_stop=2
1(2) Old 2 5 "abc" New 2 7 "<abc>"
2(1) Old 7 10 "xyz" New 9 14 "<xyz> STOPPED"
2: 12<abc>34xyz99abc55
12abc34xyz99abc55\=substitute_skip=1
1(2) Old 2 5 "abc" New 2 7 "<abc> SKIPPED"
2(1) Old 7 10 "xyz" New 7 12 "<xyz>"
3(2) Old 12 15 "abc" New 14 19 "<abc>"
3: 12abc34<xyz>99<abc>55
12abc34xyz99abc55\=substitute_skip=2
1(2) Old 2 5 "abc" New 2 7 "<abc>"
2(1) Old 7 10 "xyz" New 9 14 "<xyz> SKIPPED"
3(2) Old 12 15 "abc" New 14 19 "<abc>"
3: 12<abc>34xyz99<abc>55
/a(b)c|xyz/g,replace=<$0>
abcdefabcpqr
2: <abc>def<abc>pqr
abxyzpqrabcxyz
3: ab<xyz>pqr<abc><xyz>
12abc34xyz\=substitute_stop=2
1(2) Old 2 5 "abc" New 2 7 "<abc>"
2(1) Old 7 10 "xyz" New 9 14 "<xyz> STOPPED"
2: 12<abc>34xyz
12abc34xyz\=substitute_skip=1
1(2) Old 2 5 "abc" New 2 7 "<abc> SKIPPED"
2(1) Old 7 10 "xyz" New 7 12 "<xyz>"
2: 12abc34<xyz>
/a(b)c|xyz/replace=<$0>
abcdefabcpqr
1: <abc>defabcpqr
12abc34xyz\=substitute_skip=1
1(2) Old 2 5 "abc" New 2 7 "<abc> SKIPPED"
1: 12abc34xyz
12abc34xyz\=substitute_stop=1
1(2) Old 2 5 "abc" New 2 7 "<abc> STOPPED"
1: 12abc34xyz
/abc\rdef/
abc\ndef