Upgrade the as yet unreleased substitute callout facility.

This commit is contained in:
Philip.Hazel 2018-11-12 16:02:01 +00:00
parent 900f457222
commit 9bc81d5229
18 changed files with 599 additions and 303 deletions

View File

@ -20,7 +20,7 @@ SYNOPSIS
</P> </P>
<P> <P>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b> <b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *),</b> <b> int (*<i>callout_function</i>)(pcre2_substitute_callout_block *),</b>
<b> void *<i>callout_data</i>);</b> <b> void *<i>callout_data</i>);</b>
</P> </P>
<br><b> <br><b>

View File

@ -183,7 +183,7 @@ document for an overview of all the PCRE2 documentation.
<br> <br>
<br> <br>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b> <b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b> <b> int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> void *<i>callout_data</i>);</b> <b> void *<i>callout_data</i>);</b>
<br> <br>
<br> <br>
@ -924,7 +924,7 @@ documentation.
<br> <br>
<br> <br>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b> <b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b> <b> int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> void *<i>callout_data</i>);</b> <b> void *<i>callout_data</i>);</b>
<br> <br>
<br> <br>
@ -3413,9 +3413,9 @@ substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
groups in the extended syntax forms to be treated as unset. groups in the extended syntax forms to be treated as unset.
</P> </P>
<P> <P>
If successful, <b>pcre2_substitute()</b> returns the number of replacements that If successful, <b>pcre2_substitute()</b> returns the number of successful
were made. This may be zero if no matches were found, and is never greater than matches. This may be zero if no matches were found, and is never greater than 1
1 unless PCRE2_SUBSTITUTE_GLOBAL is set. unless PCRE2_SUBSTITUTE_GLOBAL is set.
</P> </P>
<P> <P>
In the event of an error, a negative error code is returned. Except for In the event of an error, a negative error code is returned. Except for
@ -3457,16 +3457,16 @@ Substitution callouts
</b><br> </b><br>
<P> <P>
<b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b> <b>int pcre2_set_substitute_callout(pcre2_match_context *<i>mcontext</i>,</b>
<b> void (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b> <b> int (*<i>callout_function</i>)(pcre2_substitute_callout_block *, void *),</b>
<b> void *<i>callout_data</i>);</b> <b> void *<i>callout_data</i>);</b>
<br> <br>
<br> <br>
The <b>pcre2_set_substitution_callout()</b> function can be used to specify a The <b>pcre2_set_substitution_callout()</b> function can be used to specify a
callout function for <b>pcre2_substitute()</b>. This information is passed in callout function for <b>pcre2_substitute()</b>. This information is passed in
a match context. The callout function is called after each substitution. It is a match context. The callout function is called after each substitution has
not called for simulated substitutions that happen as a result of the been processed, but it can cause the replacement not to happen. The callout
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout function should not return function is not called for simulated substitutions that happen as a result of
any value. the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.
</P> </P>
<P> <P>
The first argument of the callout function is a pointer to a substitute callout The first argument of the callout function is a pointer to a substitute callout
@ -3474,7 +3474,11 @@ block structure, which contains the following fields, not necessarily in this
order: order:
<pre> <pre>
uint32_t <i>version</i>; uint32_t <i>version</i>;
PCRE2_SIZE <i>input_offsets[2]</i>; uint32_t <i>subscount</i>;
PCRE2_SPTR <i>input</i>;
PCRE2_SPTR <i>output</i>;
PCRE2_SIZE <i>*ovector</i>;
uint32_t <i>oveccount</i>;
PCRE2_SIZE <i>output_offsets[2]</i>; PCRE2_SIZE <i>output_offsets[2]</i>;
</pre> </pre>
The <i>version</i> field contains the version number of the block format. The The <i>version</i> field contains the version number of the block format. The
@ -3482,13 +3486,34 @@ current version is 0. The version number will increase in future if more fields
are added, but the intention is never to remove any of the existing fields. are added, but the intention is never to remove any of the existing fields.
</P> </P>
<P> <P>
The <i>input_offsets</i> vector contains the code unit offsets in the input The <i>subscount</i> field is the number of the current match. It is 1 for the
string of the matched substring, and the <i>output_offsets</i> vector contains first callout, 2 for the second, and so on. The <i>input</i> and <i>output</i>
the offsets of the replacement in the output string. pointers are copies of the values passed to <b>pcre2_substitute()</b>.
</P>
<P>
The <i>ovector</i> field points to the ovector, which contains the result of the
most recent match. The <i>oveccount</i> field contains the number of pairs that
are set in the ovector, and is always greater than zero.
</P>
<P>
The <i>output_offsets</i> vector contains the offsets of the replacement in the
output string. This has already been processed for dollar and (if requested)
backslash substitutions as described above.
</P> </P>
<P> <P>
The second argument of the callout function is the value passed as The second argument of the callout function is the value passed as
<i>callout_data</i> when the function was registered. <i>callout_data</i> when the function was registered. The value returned by the
callout function is interpreted as follows:
</P>
<P>
If the value is zero, the replacement is accepted, and, if
PCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next
match. If the value is not zero, the current replacement is not accepted. If
the value is greater than zero, processing continues when
PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or
PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the
output and the call to <b>pcre2_substitute()</b> exits, returning the number of
matches so far.
</P> </P>
<br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br> <br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
<P> <P>
@ -3757,7 +3782,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 19 October 2018 Last updated: 12 November 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -1052,7 +1052,9 @@ process.
startchar show starting character when relevant startchar show starting character when relevant
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=&#60;n&#62; skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=&#60;n&#62; skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
</pre> </pre>
@ -1220,7 +1222,9 @@ pattern.
startoffset=&#60;n&#62; same as offset=&#60;n&#62; startoffset=&#60;n&#62; same as offset=&#60;n&#62;
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=&#60;n&#62; skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=&#60;n&#62; skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
zero_terminate pass the subject as zero-terminated zero_terminate pass the subject as zero-terminated
@ -1410,16 +1414,6 @@ simple example of a substitution test:
=abc=abc=\=global =abc=abc=\=global
2: =xxx=xxx= 2: =xxx=xxx=
</pre> </pre>
If the <b>substitute_callout</b> modifier is set, a substitution callout
function is set up. When it is called (after each substitution), the offsets in
the input and output strings are output. For example:
<pre>
/abc/g,replace=&#60;$0&#62;,substitute_callout
abcdefabcpqr
Old 0 3 New 0 5
Old 6 9 New 8 13
2: &#60;abc&#62;def&#60;abc&#62;pqr
</pre>
Subject and replacement strings should be kept relatively short (fewer than 256 Subject and replacement strings should be kept relatively short (fewer than 256
characters) for substitution tests, as fixed-size buffers are used. To make it characters) for substitution tests, as fixed-size buffers are used. To make it
easy to test for buffer overflow, if the replacement string starts with a easy to test for buffer overflow, if the replacement string starts with a
@ -1451,6 +1445,47 @@ matching provokes an error return ("bad option value") from
<b>pcre2_substitute()</b>. <b>pcre2_substitute()</b>.
</P> </P>
<br><b> <br><b>
Testing substitute callouts
</b><br>
<P>
If the <b>substitute_callout</b> modifier is set, a substitution callout
function is set up. When it is called (after each substitution), details of the
the input and output strings are output. For example:
<pre>
/abc/g,replace=&#60;$0&#62;,substitute_callout
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62;"
2(1) Old 6 9 "abc" New 8 13 "&#60;abc&#62;"
2: &#60;abc&#62;def&#60;abc&#62;pqr
</pre>
The first number on each callout line is the count of matches. The
parenthesized number is the number of pairs that are set in the ovector (that
is, one more than the number of capturing groups that were set). Then are
listed the offsets of the old substring, its contents, and the same for the
replacement.
</P>
<P>
By default, the substitution callout function returns zero, which accepts the
replacement and causes matching to continue if /g was used. Two further
modifiers can be used to test other return values. If <b>substitute_skip</b> is
set to a value greater than zero the callout function returns +1 for the match
of that number, and similarly <b>substitute_stop</b> returns -1. These cause the
replacement to be rejected, and -1 causes no further matching to take place. If
either of them are set, <b>substitute_callout</b> is assumed. For example:
<pre>
/abc/g,replace=&#60;$0&#62;,substitute_skip=1
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62; SKIPPED"
2(1) Old 6 9 "abc" New 6 11 "&#60;abc&#62;"
2: abcdef&#60;abc&#62;pqr
abcdefabcpqr\=substitute_stop=1
1(1) Old 0 3 "abc" New 0 5 "&#60;abc&#62; STOPPED"
1: abcdefabcpqr
</pre>
If both are set for the same number, stop takes precedence. Only a single skip
or stop is supported, which is sufficient for testing that the feature works.
</P>
<br><b>
Setting the JIT stack size Setting the JIT stack size
</b><br> </b><br>
<P> <P>
@ -2040,7 +2075,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br> <br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 21 September 2018 Last updated: 12 November 2018
<br> <br>
Copyright &copy; 1997-2018 University of Cambridge. Copyright &copy; 1997-2018 University of Cambridge.
<br> <br>

View File

@ -294,7 +294,7 @@ PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS
void *callout_data); void *callout_data);
int pcre2_set_substitute_callout(pcre2_match_context *mcontext, int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*callout_function)(pcre2_substitute_callout_block *, void *), int (*callout_function)(pcre2_substitute_callout_block *, void *),
void *callout_data); void *callout_data);
int pcre2_set_offset_limit(pcre2_match_context *mcontext, int pcre2_set_offset_limit(pcre2_match_context *mcontext,
@ -942,7 +942,7 @@ PCRE2 CONTEXTS
umentation. umentation.
int pcre2_set_substitute_callout(pcre2_match_context *mcontext, int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*callout_function)(pcre2_substitute_callout_block *, void *), int (*callout_function)(pcre2_substitute_callout_block *, void *),
void *callout_data); void *callout_data);
This sets up a callout function for PCRE2 to call after each substitu- This sets up a callout function for PCRE2 to call after each substitu-
@ -3318,8 +3318,8 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause
unknown groups in the extended syntax forms to be treated as unset. unknown groups in the extended syntax forms to be treated as unset.
If successful, pcre2_substitute() returns the number of replacements If successful, pcre2_substitute() returns the number of successful
that were made. This may be zero if no matches were found, and is never matches. This may be zero if no matches were found, and is never
greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set. greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.
In the event of an error, a negative error code is returned. Except for In the event of an error, a negative error code is returned. Except for
@ -3355,22 +3355,26 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
Substitution callouts Substitution callouts
int pcre2_set_substitute_callout(pcre2_match_context *mcontext, int pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*callout_function)(pcre2_substitute_callout_block *, void *), int (*callout_function)(pcre2_substitute_callout_block *, void *),
void *callout_data); void *callout_data);
The pcre2_set_substitution_callout() function can be used to specify a The pcre2_set_substitution_callout() function can be used to specify a
callout function for pcre2_substitute(). This information is passed in callout function for pcre2_substitute(). This information is passed in
a match context. The callout function is called after each substitu- a match context. The callout function is called after each substitution
tion. It is not called for simulated substitutions that happen as a has been processed, but it can cause the replacement not to happen. The
result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout func- callout function is not called for simulated substitutions that happen
tion should not return any value. as a result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.
The first argument of the callout function is a pointer to a substitute The first argument of the callout function is a pointer to a substitute
callout block structure, which contains the following fields, not nec- callout block structure, which contains the following fields, not nec-
essarily in this order: essarily in this order:
uint32_t version; uint32_t version;
PCRE2_SIZE input_offsets[2]; uint32_t subscount;
PCRE2_SPTR input;
PCRE2_SPTR output;
PCRE2_SIZE *ovector;
uint32_t oveccount;
PCRE2_SIZE output_offsets[2]; PCRE2_SIZE output_offsets[2];
The version field contains the version number of the block format. The The version field contains the version number of the block format. The
@ -3378,12 +3382,30 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
more fields are added, but the intention is never to remove any of the more fields are added, but the intention is never to remove any of the
existing fields. existing fields.
The input_offsets vector contains the code unit offsets in the input The subscount field is the number of the current match. It is 1 for the
string of the matched substring, and the output_offsets vector contains first callout, 2 for the second, and so on. The input and output point-
the offsets of the replacement in the output string. ers are copies of the values passed to pcre2_substitute().
The ovector field points to the ovector, which contains the result of
the most recent match. The oveccount field contains the number of pairs
that are set in the ovector, and is always greater than zero.
The output_offsets vector contains the offsets of the replacement in
the output string. This has already been processed for dollar and (if
requested) backslash substitutions as described above.
The second argument of the callout function is the value passed as The second argument of the callout function is the value passed as
callout_data when the function was registered. callout_data when the function was registered. The value returned by
the callout function is interpreted as follows:
If the value is zero, the replacement is accepted, and, if PCRE2_SUB-
STITUTE_GLOBAL is set, processing continues with a search for the next
match. If the value is not zero, the current replacement is not
accepted. If the value is greater than zero, processing continues when
PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero
or PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is
copied to the output and the call to pcre2_substitute() exits, return-
ing the number of matches so far.
DUPLICATE SUBPATTERN NAMES DUPLICATE SUBPATTERN NAMES
@ -3633,7 +3655,7 @@ AUTHOR
REVISION REVISION
Last updated: 19 October 2018 Last updated: 12 November 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2_SET_SUBSTITUTE_CALLOUT 3 "17 September 2018" "PCRE2 10.33" .TH PCRE2_SET_SUBSTITUTE_CALLOUT 3 "12 November 2018" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -8,7 +8,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.PP .PP
.nf .nf
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP, .B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *)," .B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *),"
.B " void *\fIcallout_data\fP);" .B " void *\fIcallout_data\fP);"
.fi .fi
. .

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "19 October 2018" "PCRE2 10.33" .TH PCRE2API 3 "12 November 2018" "PCRE2 10.33"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -124,7 +124,7 @@ document for an overview of all the PCRE2 documentation.
.B " void *\fIcallout_data\fP);" .B " void *\fIcallout_data\fP);"
.sp .sp
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP, .B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *)," .B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " void *\fIcallout_data\fP);" .B " void *\fIcallout_data\fP);"
.sp .sp
.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP, .B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
@ -860,7 +860,7 @@ documentation.
.sp .sp
.nf .nf
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP, .B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *)," .B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " void *\fIcallout_data\fP);" .B " void *\fIcallout_data\fP);"
.fi .fi
.sp .sp
@ -3412,9 +3412,9 @@ The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown
groups in the extended syntax forms to be treated as unset. groups in the extended syntax forms to be treated as unset.
.P .P
If successful, \fBpcre2_substitute()\fP returns the number of replacements that If successful, \fBpcre2_substitute()\fP returns the number of successful
were made. This may be zero if no matches were found, and is never greater than matches. This may be zero if no matches were found, and is never greater than 1
1 unless PCRE2_SUBSTITUTE_GLOBAL is set. unless PCRE2_SUBSTITUTE_GLOBAL is set.
.P .P
In the event of an error, a negative error code is returned. Except for In the event of an error, a negative error code is returned. Except for
PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
@ -3454,35 +3454,57 @@ above).
.sp .sp
.nf .nf
.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP, .B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP,
.B " void (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *)," .B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *),"
.B " void *\fIcallout_data\fP);" .B " void *\fIcallout_data\fP);"
.fi .fi
.sp .sp
The \fBpcre2_set_substitution_callout()\fP function can be used to specify a The \fBpcre2_set_substitution_callout()\fP function can be used to specify a
callout function for \fBpcre2_substitute()\fP. This information is passed in callout function for \fBpcre2_substitute()\fP. This information is passed in
a match context. The callout function is called after each substitution. It is a match context. The callout function is called after each substitution has
not called for simulated substitutions that happen as a result of the been processed, but it can cause the replacement not to happen. The callout
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. A callout function should not return function is not called for simulated substitutions that happen as a result of
any value. the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.
.P .P
The first argument of the callout function is a pointer to a substitute callout The first argument of the callout function is a pointer to a substitute callout
block structure, which contains the following fields, not necessarily in this block structure, which contains the following fields, not necessarily in this
order: order:
.sp .sp
uint32_t \fIversion\fP; uint32_t \fIversion\fP;
PCRE2_SIZE \fIinput_offsets[2]\fP; uint32_t \fIsubscount\fP;
PCRE2_SPTR \fIinput\fP;
PCRE2_SPTR \fIoutput\fP;
PCRE2_SIZE \fI*ovector\fP;
uint32_t \fIoveccount\fP;
PCRE2_SIZE \fIoutput_offsets[2]\fP; PCRE2_SIZE \fIoutput_offsets[2]\fP;
.sp .sp
The \fIversion\fP field contains the version number of the block format. The The \fIversion\fP field contains the version number of the block format. The
current version is 0. The version number will increase in future if more fields current version is 0. The version number will increase in future if more fields
are added, but the intention is never to remove any of the existing fields. are added, but the intention is never to remove any of the existing fields.
.P .P
The \fIinput_offsets\fP vector contains the code unit offsets in the input The \fIsubscount\fP field is the number of the current match. It is 1 for the
string of the matched substring, and the \fIoutput_offsets\fP vector contains first callout, 2 for the second, and so on. The \fIinput\fP and \fIoutput\fP
the offsets of the replacement in the output string. pointers are copies of the values passed to \fBpcre2_substitute()\fP.
.P
The \fIovector\fP field points to the ovector, which contains the result of the
most recent match. The \fIoveccount\fP field contains the number of pairs that
are set in the ovector, and is always greater than zero.
.P
The \fIoutput_offsets\fP vector contains the offsets of the replacement in the
output string. This has already been processed for dollar and (if requested)
backslash substitutions as described above.
.P .P
The second argument of the callout function is the value passed as The second argument of the callout function is the value passed as
\fIcallout_data\fP when the function was registered. \fIcallout_data\fP when the function was registered. The value returned by the
callout function is interpreted as follows:
.P
If the value is zero, the replacement is accepted, and, if
PCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next
match. If the value is not zero, the current replacement is not accepted. If
the value is greater than zero, processing continues when
PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or
PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the
output and the call to \fBpcre2_substitute()\fP exits, returning the number of
matches so far.
. .
. .
.SH "DUPLICATE SUBPATTERN NAMES" .SH "DUPLICATE SUBPATTERN NAMES"
@ -3768,6 +3790,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 19 October 2018 Last updated: 12 November 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "21 September 2018" "PCRE 10.33" .TH PCRE2TEST 1 "12 November 2018" "PCRE 10.33"
.SH NAME .SH NAME
pcre2test - a program for testing Perl-compatible regular expressions. pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
@ -1014,7 +1014,9 @@ process.
startchar show starting character when relevant startchar show starting character when relevant
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=<n> skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=<n> skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
.sp .sp
@ -1189,7 +1191,9 @@ pattern.
startoffset=<n> same as offset=<n> startoffset=<n> same as offset=<n>
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=<n> skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=<n> skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
zero_terminate pass the subject as zero-terminated zero_terminate pass the subject as zero-terminated
@ -1377,16 +1381,6 @@ simple example of a substitution test:
=abc=abc=\e=global =abc=abc=\e=global
2: =xxx=xxx= 2: =xxx=xxx=
.sp .sp
If the \fBsubstitute_callout\fP modifier is set, a substitution callout
function is set up. When it is called (after each substitution), the offsets in
the input and output strings are output. For example:
.sp
/abc/g,replace=<$0>,substitute_callout
abcdefabcpqr
Old 0 3 New 0 5
Old 6 9 New 8 13
2: <abc>def<abc>pqr
.sp
Subject and replacement strings should be kept relatively short (fewer than 256 Subject and replacement strings should be kept relatively short (fewer than 256
characters) for substitution tests, as fixed-size buffers are used. To make it characters) for substitution tests, as fixed-size buffers are used. To make it
easy to test for buffer overflow, if the replacement string starts with a easy to test for buffer overflow, if the replacement string starts with a
@ -1418,6 +1412,46 @@ matching provokes an error return ("bad option value") from
\fBpcre2_substitute()\fP. \fBpcre2_substitute()\fP.
. .
. .
.SS "Testing substitute callouts"
.rs
.sp
If the \fBsubstitute_callout\fP modifier is set, a substitution callout
function is set up. When it is called (after each substitution), details of the
the input and output strings are output. For example:
.sp
/abc/g,replace=<$0>,substitute_callout
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "<abc>"
2(1) Old 6 9 "abc" New 8 13 "<abc>"
2: <abc>def<abc>pqr
.sp
The first number on each callout line is the count of matches. The
parenthesized number is the number of pairs that are set in the ovector (that
is, one more than the number of capturing groups that were set). Then are
listed the offsets of the old substring, its contents, and the same for the
replacement.
.P
By default, the substitution callout function returns zero, which accepts the
replacement and causes matching to continue if /g was used. Two further
modifiers can be used to test other return values. If \fBsubstitute_skip\fP is
set to a value greater than zero the callout function returns +1 for the match
of that number, and similarly \fBsubstitute_stop\fP returns -1. These cause the
replacement to be rejected, and -1 causes no further matching to take place. If
either of them are set, \fBsubstitute_callout\fP is assumed. For example:
.sp
/abc/g,replace=<$0>,substitute_skip=1
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED"
2(1) Old 6 9 "abc" New 6 11 "<abc>"
2: abcdef<abc>pqr
abcdefabcpqr\e=substitute_stop=1
1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED"
1: abcdefabcpqr
.sp
If both are set for the same number, stop takes precedence. Only a single skip
or stop is supported, which is sufficient for testing that the feature works.
.
.
.SS "Setting the JIT stack size" .SS "Setting the JIT stack size"
.rs .rs
.sp .sp
@ -2022,6 +2056,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 21 September 2018 Last updated: 12 November 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.
.fi .fi

View File

@ -940,7 +940,9 @@ PATTERN MODIFIERS
startchar show starting character when relevant startchar show starting character when relevant
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_extended use PCRE2_SUBSTITUTE_EXTENDED substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=<n> skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=<n> skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
@ -1092,7 +1094,9 @@ SUBJECT MODIFIERS
startoffset=<n> same as offset=<n> startoffset=<n> same as offset=<n>
substitute_callout use substitution callouts substitute_callout use substitution callouts
substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
substitute_skip=<n> skip substitution number n
substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
substitute_stop=<n> skip substitution number n and greater
substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
zero_terminate pass the subject as zero-terminated zero_terminate pass the subject as zero-terminated
@ -1263,16 +1267,6 @@ SUBJECT MODIFIERS
=abc=abc=\=global =abc=abc=\=global
2: =xxx=xxx= 2: =xxx=xxx=
If the substitute_callout modifier is set, a substitution callout func-
tion is set up. When it is called (after each substitution), the off-
sets in the input and output strings are output. For example:
/abc/g,replace=<$0>,substitute_callout
abcdefabcpqr
Old 0 3 New 0 5
Old 6 9 New 8 13
2: <abc>def<abc>pqr
Subject and replacement strings should be kept relatively short (fewer Subject and replacement strings should be kept relatively short (fewer
than 256 characters) for substitution tests, as fixed-size buffers are than 256 characters) for substitution tests, as fixed-size buffers are
used. To make it easy to test for buffer overflow, if the replacement used. To make it easy to test for buffer overflow, if the replacement
@ -1305,162 +1299,202 @@ SUBJECT MODIFIERS
partial matching provokes an error return ("bad option value") from partial matching provokes an error return ("bad option value") from
pcre2_substitute(). pcre2_substitute().
Testing substitute callouts
If the substitute_callout modifier is set, a substitution callout func-
tion is set up. When it is called (after each substitution), details of
the the input and output strings are output. For example:
/abc/g,replace=<$0>,substitute_callout
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "<abc>"
2(1) Old 6 9 "abc" New 8 13 "<abc>"
2: <abc>def<abc>pqr
The first number on each callout line is the count of matches. The
parenthesized number is the number of pairs that are set in the ovector
(that is, one more than the number of capturing groups that were set).
Then are listed the offsets of the old substring, its contents, and the
same for the replacement.
By default, the substitution callout function returns zero, which
accepts the replacement and causes matching to continue if /g was used.
Two further modifiers can be used to test other return values. If sub-
stitute_skip is set to a value greater than zero the callout function
returns +1 for the match of that number, and similarly substitute_stop
returns -1. These cause the replacement to be rejected, and -1 causes
no further matching to take place. If either of them are set, substi-
tute_callout is assumed. For example:
/abc/g,replace=<$0>,substitute_skip=1
abcdefabcpqr
1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED"
2(1) Old 6 9 "abc" New 6 11 "<abc>"
2: abcdef<abc>pqr
abcdefabcpqr\=substitute_stop=1
1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED"
1: abcdefabcpqr
If both are set for the same number, stop takes precedence. Only a sin-
gle skip or stop is supported, which is sufficient for testing that the
feature works.
Setting the JIT stack size Setting the JIT stack size
The jitstack modifier provides a way of setting the maximum stack size The jitstack modifier provides a way of setting the maximum stack size
that is used by the just-in-time optimization code. It is ignored if that is used by the just-in-time optimization code. It is ignored if
JIT optimization is not being used. The value is a number of kibibytes JIT optimization is not being used. The value is a number of kibibytes
(units of 1024 bytes). Setting zero reverts to the default of 32KiB. (units of 1024 bytes). Setting zero reverts to the default of 32KiB.
Providing a stack that is larger than the default is necessary only for Providing a stack that is larger than the default is necessary only for
very complicated patterns. If jitstack is set non-zero on a subject very complicated patterns. If jitstack is set non-zero on a subject
line it overrides any value that was set on the pattern. line it overrides any value that was set on the pattern.
Setting heap, match, and depth limits Setting heap, match, and depth limits
The heap_limit, match_limit, and depth_limit modifiers set the appro- The heap_limit, match_limit, and depth_limit modifiers set the appro-
priate limits in the match context. These values are ignored when the priate limits in the match context. These values are ignored when the
find_limits modifier is specified. find_limits modifier is specified.
Finding minimum limits Finding minimum limits
If the find_limits modifier is present on a subject line, pcre2test If the find_limits modifier is present on a subject line, pcre2test
calls the relevant matching function several times, setting different calls the relevant matching function several times, setting different
values in the match context via pcre2_set_heap_limit(), values in the match context via pcre2_set_heap_limit(),
pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the
minimum values for each parameter that allows the match to complete minimum values for each parameter that allows the match to complete
without error. If JIT is being used, only the match limit is relevant. without error. If JIT is being used, only the match limit is relevant.
When using this modifier, the pattern should not contain any limit set- When using this modifier, the pattern should not contain any limit set-
tings such as (*LIMIT_MATCH=...) within it. If such a setting is tings such as (*LIMIT_MATCH=...) within it. If such a setting is
present and is lower than the minimum matching value, the minimum value present and is lower than the minimum matching value, the minimum value
cannot be found because pcre2_set_match_limit() etc. are only able to cannot be found because pcre2_set_match_limit() etc. are only able to
reduce the value of an in-pattern limit; they cannot increase it. reduce the value of an in-pattern limit; they cannot increase it.
For non-DFA matching, the minimum depth_limit number is a measure of For non-DFA matching, the minimum depth_limit number is a measure of
how much nested backtracking happens (that is, how deeply the pattern's how much nested backtracking happens (that is, how deeply the pattern's
tree is searched). In the case of DFA matching, depth_limit controls tree is searched). In the case of DFA matching, depth_limit controls
the depth of recursive calls of the internal function that is used for the depth of recursive calls of the internal function that is used for
handling pattern recursion, lookaround assertions, and atomic groups. handling pattern recursion, lookaround assertions, and atomic groups.
For non-DFA matching, the match_limit number is a measure of the amount For non-DFA matching, the match_limit number is a measure of the amount
of backtracking that takes place, and learning the minimum value can be of backtracking that takes place, and learning the minimum value can be
instructive. For most simple matches, the number is quite small, but instructive. For most simple matches, the number is quite small, but
for patterns with very large numbers of matching possibilities, it can for patterns with very large numbers of matching possibilities, it can
become large very quickly with increasing length of subject string. In become large very quickly with increasing length of subject string. In
the case of DFA matching, match_limit controls the total number of the case of DFA matching, match_limit controls the total number of
calls, both recursive and non-recursive, to the internal matching func- calls, both recursive and non-recursive, to the internal matching func-
tion, thus controlling the overall amount of computing resource that is tion, thus controlling the overall amount of computing resource that is
used. used.
For both kinds of matching, the heap_limit number, which is in For both kinds of matching, the heap_limit number, which is in
kibibytes (units of 1024 bytes), limits the amount of heap memory used kibibytes (units of 1024 bytes), limits the amount of heap memory used
for matching. A value of zero disables the use of any heap memory; many for matching. A value of zero disables the use of any heap memory; many
simple pattern matches can be done without using the heap, so zero is simple pattern matches can be done without using the heap, so zero is
not an unreasonable setting. not an unreasonable setting.
Showing MARK names Showing MARK names
The mark modifier causes the names from backtracking control verbs that The mark modifier causes the names from backtracking control verbs that
are returned from calls to pcre2_match() to be displayed. If a mark is are returned from calls to pcre2_match() to be displayed. If a mark is
returned for a match, non-match, or partial match, pcre2test shows it. returned for a match, non-match, or partial match, pcre2test shows it.
For a match, it is on a line by itself, tagged with "MK:". Otherwise, For a match, it is on a line by itself, tagged with "MK:". Otherwise,
it is added to the non-match message. it is added to the non-match message.
Showing memory usage Showing memory usage
The memory modifier causes pcre2test to log the sizes of all heap mem- The memory modifier causes pcre2test to log the sizes of all heap mem-
ory allocation and freeing calls that occur during a call to ory allocation and freeing calls that occur during a call to
pcre2_match() or pcre2_dfa_match(). These occur only when a match pcre2_match() or pcre2_dfa_match(). These occur only when a match
requires a bigger vector than the default for remembering backtracking requires a bigger vector than the default for remembering backtracking
points (pcre2_match()) or for internal workspace (pcre2_dfa_match()). points (pcre2_match()) or for internal workspace (pcre2_dfa_match()).
In many cases there will be no heap memory used and therefore no addi- In many cases there will be no heap memory used and therefore no addi-
tional output. No heap memory is allocated during matching with JIT, so tional output. No heap memory is allocated during matching with JIT, so
in that case the memory modifier never has any effect. For this modi- in that case the memory modifier never has any effect. For this modi-
fier to work, the null_context modifier must not be set on both the fier to work, the null_context modifier must not be set on both the
pattern and the subject, though it can be set on one or the other. pattern and the subject, though it can be set on one or the other.
Setting a starting offset Setting a starting offset
The offset modifier sets an offset in the subject string at which The offset modifier sets an offset in the subject string at which
matching starts. Its value is a number of code units, not characters. matching starts. Its value is a number of code units, not characters.
Setting an offset limit Setting an offset limit
The offset_limit modifier sets a limit for unanchored matches. If a The offset_limit modifier sets a limit for unanchored matches. If a
match cannot be found starting at or before this offset in the subject, match cannot be found starting at or before this offset in the subject,
a "no match" return is given. The data value is a number of code units, a "no match" return is given. The data value is a number of code units,
not characters. When this modifier is used, the use_offset_limit modi- not characters. When this modifier is used, the use_offset_limit modi-
fier must have been set for the pattern; if not, an error is generated. fier must have been set for the pattern; if not, an error is generated.
Setting the size of the output vector Setting the size of the output vector
The ovector modifier applies only to the subject line in which it The ovector modifier applies only to the subject line in which it
appears, though of course it can also be used to set a default in a appears, though of course it can also be used to set a default in a
#subject command. It specifies the number of pairs of offsets that are #subject command. It specifies the number of pairs of offsets that are
available for storing matching information. The default is 15. available for storing matching information. The default is 15.
A value of zero is useful when testing the POSIX API because it causes A value of zero is useful when testing the POSIX API because it causes
regexec() to be called with a NULL capture vector. When not testing the regexec() to be called with a NULL capture vector. When not testing the
POSIX API, a value of zero is used to cause pcre2_match_data_cre- POSIX API, a value of zero is used to cause pcre2_match_data_cre-
ate_from_pattern() to be called, in order to create a match block of ate_from_pattern() to be called, in order to create a match block of
exactly the right size for the pattern. (It is not possible to create a exactly the right size for the pattern. (It is not possible to create a
match block with a zero-length ovector; there is always at least one match block with a zero-length ovector; there is always at least one
pair of offsets.) pair of offsets.)
Passing the subject as zero-terminated Passing the subject as zero-terminated
By default, the subject string is passed to a native API matching func- By default, the subject string is passed to a native API matching func-
tion with its correct length. In order to test the facility for passing tion with its correct length. In order to test the facility for passing
a zero-terminated string, the zero_terminate modifier is provided. It a zero-terminated string, the zero_terminate modifier is provided. It
causes the length to be passed as PCRE2_ZERO_TERMINATED. When matching causes the length to be passed as PCRE2_ZERO_TERMINATED. When matching
via the POSIX interface, this modifier is ignored, with a warning. via the POSIX interface, this modifier is ignored, with a warning.
When testing pcre2_substitute(), this modifier also has the effect of When testing pcre2_substitute(), this modifier also has the effect of
passing the replacement string as zero-terminated. passing the replacement string as zero-terminated.
Passing a NULL context Passing a NULL context
Normally, pcre2test passes a context block to pcre2_match(), Normally, pcre2test passes a context block to pcre2_match(),
pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
set, however, NULL is passed. This is for testing that the matching set, however, NULL is passed. This is for testing that the matching
functions behave correctly in this case (they use default values). This functions behave correctly in this case (they use default values). This
modifier cannot be used with the find_limits modifier or when testing modifier cannot be used with the find_limits modifier or when testing
the substitution function. the substitution function.
THE ALTERNATIVE MATCHING FUNCTION THE ALTERNATIVE MATCHING FUNCTION
By default, pcre2test uses the standard PCRE2 matching function, By default, pcre2test uses the standard PCRE2 matching function,
pcre2_match() to match each subject line. PCRE2 also supports an alter- pcre2_match() to match each subject line. PCRE2 also supports an alter-
native matching function, pcre2_dfa_match(), which operates in a dif- native matching function, pcre2_dfa_match(), which operates in a dif-
ferent way, and has some restrictions. The differences between the two ferent way, and has some restrictions. The differences between the two
functions are described in the pcre2matching documentation. functions are described in the pcre2matching documentation.
If the dfa modifier is set, the alternative matching function is used. If the dfa modifier is set, the alternative matching function is used.
This function finds all possible matches at a given point in the sub- This function finds all possible matches at a given point in the sub-
ject. If, however, the dfa_shortest modifier is set, processing stops ject. If, however, the dfa_shortest modifier is set, processing stops
after the first match is found. This is always the shortest possible after the first match is found. This is always the shortest possible
match. match.
DEFAULT OUTPUT FROM pcre2test DEFAULT OUTPUT FROM pcre2test
This section describes the output when the normal matching function, This section describes the output when the normal matching function,
pcre2_match(), is being used. pcre2_match(), is being used.
When a match succeeds, pcre2test outputs the list of captured sub- When a match succeeds, pcre2test outputs the list of captured sub-
strings, starting with number 0 for the string that matched the whole strings, starting with number 0 for the string that matched the whole
pattern. Otherwise, it outputs "No match" when the return is pattern. Otherwise, it outputs "No match" when the return is
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
this is the entire substring that was inspected during the partial this is the entire substring that was inspected during the partial
match; it may include characters before the actual match start if a match; it may include characters before the actual match start if a
lookbehind assertion, \K, \b, or \B was involved.) lookbehind assertion, \K, \b, or \B was involved.)
For any other return, pcre2test outputs the PCRE2 negative error number For any other return, pcre2test outputs the PCRE2 negative error number
and a short descriptive phrase. If the error is a failed UTF string and a short descriptive phrase. If the error is a failed UTF string
check, the code unit offset of the start of the failing character is check, the code unit offset of the start of the failing character is
also output. Here is an example of an interactive pcre2test run. also output. Here is an example of an interactive pcre2test run.
$ pcre2test $ pcre2test
@ -1476,8 +1510,8 @@ DEFAULT OUTPUT FROM pcre2test
Unset capturing substrings that are not followed by one that is set are Unset capturing substrings that are not followed by one that is set are
not shown by pcre2test unless the allcaptures modifier is specified. In not shown by pcre2test unless the allcaptures modifier is specified. In
the following example, there are two capturing substrings, but when the the following example, there are two capturing substrings, but when the
first data line is matched, the second, unset substring is not shown. first data line is matched, the second, unset substring is not shown.
An "internal" unset substring is shown as "<unset>", as for the second An "internal" unset substring is shown as "<unset>", as for the second
data line. data line.
re> /(a)|(b)/ re> /(a)|(b)/
@ -1489,11 +1523,11 @@ DEFAULT OUTPUT FROM pcre2test
1: <unset> 1: <unset>
2: b 2: b
If the strings contain any non-printing characters, they are output as If the strings contain any non-printing characters, they are output as
\xhh escapes if the value is less than 256 and UTF mode is not set. \xhh escapes if the value is less than 256 and UTF mode is not set.
Otherwise they are output as \x{hh...} escapes. See below for the defi- Otherwise they are output as \x{hh...} escapes. See below for the defi-
nition of non-printing characters. If the aftertext modifier is set, nition of non-printing characters. If the aftertext modifier is set,
the output for substring 0 is followed by the the rest of the subject the output for substring 0 is followed by the the rest of the subject
string, identified by "0+" like this: string, identified by "0+" like this:
re> /cat/aftertext re> /cat/aftertext
@ -1501,7 +1535,7 @@ DEFAULT OUTPUT FROM pcre2test
0: cat 0: cat
0+ aract 0+ aract
If global matching is requested, the results of successive matching If global matching is requested, the results of successive matching
attempts are output in sequence, like this: attempts are output in sequence, like this:
re> /\Bi(\w\w)/g re> /\Bi(\w\w)/g
@ -1513,8 +1547,8 @@ DEFAULT OUTPUT FROM pcre2test
0: ipp 0: ipp
1: pp 1: pp
"No match" is output only if the first match attempt fails. Here is an "No match" is output only if the first match attempt fails. Here is an
example of a failure message (the offset 4 that is specified by the example of a failure message (the offset 4 that is specified by the
offset modifier is past the end of the subject string): offset modifier is past the end of the subject string):
re> /xyz/ re> /xyz/
@ -1522,7 +1556,7 @@ DEFAULT OUTPUT FROM pcre2test
Error -24 (bad offset value) Error -24 (bad offset value)
Note that whereas patterns can be continued over several lines (a plain Note that whereas patterns can be continued over several lines (a plain
">" prompt is used for continuations), subject lines may not. However ">" prompt is used for continuations), subject lines may not. However
newlines can be included in a subject by means of the \n escape (or \r, newlines can be included in a subject by means of the \n escape (or \r,
\r\n, etc., depending on the newline sequence setting). \r\n, etc., depending on the newline sequence setting).
@ -1530,7 +1564,7 @@ DEFAULT OUTPUT FROM pcre2test
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
When the alternative matching function, pcre2_dfa_match(), is used, the When the alternative matching function, pcre2_dfa_match(), is used, the
output consists of a list of all the matches that start at the first output consists of a list of all the matches that start at the first
point in the subject where there is at least one match. For example: point in the subject where there is at least one match. For example:
re> /(tang|tangerine|tan)/ re> /(tang|tangerine|tan)/
@ -1539,11 +1573,11 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1: tang 1: tang
2: tan 2: tan
Using the normal matching function on this data finds only "tang". The Using the normal matching function on this data finds only "tang". The
longest matching string is always given first (and numbered zero). longest matching string is always given first (and numbered zero).
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
followed by the partially matching substring. Note that this is the followed by the partially matching substring. Note that this is the
entire substring that was inspected during the partial match; it may entire substring that was inspected during the partial match; it may
include characters before the actual match start if a lookbehind asser- include characters before the actual match start if a lookbehind asser-
tion, \b, or \B was involved. (\K is not supported for DFA matching.) tion, \b, or \B was involved. (\K is not supported for DFA matching.)
@ -1559,16 +1593,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1: tan 1: tan
0: tan 0: tan
The alternative matching function does not support substring capture, The alternative matching function does not support substring capture,
so the modifiers that are concerned with captured substrings are not so the modifiers that are concerned with captured substrings are not
relevant. relevant.
RESTARTING AFTER A PARTIAL MATCH RESTARTING AFTER A PARTIAL MATCH
When the alternative matching function has given the PCRE2_ERROR_PAR- When the alternative matching function has given the PCRE2_ERROR_PAR-
TIAL return, indicating that the subject partially matched the pattern, TIAL return, indicating that the subject partially matched the pattern,
you can restart the match with additional subject data by means of the you can restart the match with additional subject data by means of the
dfa_restart modifier. For example: dfa_restart modifier. For example:
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@ -1577,37 +1611,37 @@ RESTARTING AFTER A PARTIAL MATCH
data> n05\=dfa,dfa_restart data> n05\=dfa,dfa_restart
0: n05 0: n05
For further information about partial matching, see the pcre2partial For further information about partial matching, see the pcre2partial
documentation. documentation.
CALLOUTS CALLOUTS
If the pattern contains any callout requests, pcre2test's callout func- If the pattern contains any callout requests, pcre2test's callout func-
tion is called during matching unless callout_none is specified. This tion is called during matching unless callout_none is specified. This
works with both matching functions, and with JIT, though there are some works with both matching functions, and with JIT, though there are some
differences in behaviour. The output for callouts with numerical argu- differences in behaviour. The output for callouts with numerical argu-
ments and those with string arguments is slightly different. ments and those with string arguments is slightly different.
Callouts with numerical arguments Callouts with numerical arguments
By default, the callout function displays the callout number, the start By default, the callout function displays the callout number, the start
and current positions in the subject text at the callout time, and the and current positions in the subject text at the callout time, and the
next pattern item to be tested. For example: next pattern item to be tested. For example:
--->pqrabcdef --->pqrabcdef
0 ^ ^ \d 0 ^ ^ \d
This output indicates that callout number 0 occurred for a match This output indicates that callout number 0 occurred for a match
attempt starting at the fourth character of the subject string, when attempt starting at the fourth character of the subject string, when
the pointer was at the seventh character, and when the next pattern the pointer was at the seventh character, and when the next pattern
item was \d. Just one circumflex is output if the start and current item was \d. Just one circumflex is output if the start and current
positions are the same, or if the current position precedes the start positions are the same, or if the current position precedes the start
position, which can happen if the callout is in a lookbehind assertion. position, which can happen if the callout is in a lookbehind assertion.
Callouts numbered 255 are assumed to be automatic callouts, inserted as Callouts numbered 255 are assumed to be automatic callouts, inserted as
a result of the auto_callout pattern modifier. In this case, instead of a result of the auto_callout pattern modifier. In this case, instead of
showing the callout number, the offset in the pattern, preceded by a showing the callout number, the offset in the pattern, preceded by a
plus, is output. For example: plus, is output. For example:
re> /\d?[A-E]\*/auto_callout re> /\d?[A-E]\*/auto_callout
@ -1620,7 +1654,7 @@ CALLOUTS
0: E* 0: E*
If a pattern contains (*MARK) items, an additional line is output when- If a pattern contains (*MARK) items, an additional line is output when-
ever a change of latest mark is passed to the callout function. For ever a change of latest mark is passed to the callout function. For
example: example:
re> /a(*MARK:X)bc/auto_callout re> /a(*MARK:X)bc/auto_callout
@ -1634,17 +1668,17 @@ CALLOUTS
+12 ^ ^ +12 ^ ^
0: abc 0: abc
The mark changes between matching "a" and "b", but stays the same for The mark changes between matching "a" and "b", but stays the same for
the rest of the match, so nothing more is output. If, as a result of the rest of the match, so nothing more is output. If, as a result of
backtracking, the mark reverts to being unset, the text "<unset>" is backtracking, the mark reverts to being unset, the text "<unset>" is
output. output.
Callouts with string arguments Callouts with string arguments
The output for a callout with a string argument is similar, except that The output for a callout with a string argument is similar, except that
instead of outputting a callout number before the position indicators, instead of outputting a callout number before the position indicators,
the callout string and its offset in the pattern string are output the callout string and its offset in the pattern string are output
before the reflection of the subject string, and the subject string is before the reflection of the subject string, and the subject string is
reflected for each callout. For example: reflected for each callout. For example:
re> /^ab(?C'first')cd(?C"second")ef/ re> /^ab(?C'first')cd(?C"second")ef/
@ -1660,26 +1694,26 @@ CALLOUTS
Callout modifiers Callout modifiers
The callout function in pcre2test returns zero (carry on matching) by The callout function in pcre2test returns zero (carry on matching) by
default, but you can use a callout_fail modifier in a subject line to default, but you can use a callout_fail modifier in a subject line to
change this and other parameters of the callout (see below). change this and other parameters of the callout (see below).
If the callout_capture modifier is set, the current captured groups are If the callout_capture modifier is set, the current captured groups are
output when a callout occurs. This is useful only for non-DFA matching, output when a callout occurs. This is useful only for non-DFA matching,
as pcre2_dfa_match() does not support capturing, so no captures are as pcre2_dfa_match() does not support capturing, so no captures are
ever shown. ever shown.
The normal callout output, showing the callout number or pattern offset The normal callout output, showing the callout number or pattern offset
(as described above) is suppressed if the callout_no_where modifier is (as described above) is suppressed if the callout_no_where modifier is
set. set.
When using the interpretive matching function pcre2_match() without When using the interpretive matching function pcre2_match() without
JIT, setting the callout_extra modifier causes additional output from JIT, setting the callout_extra modifier causes additional output from
pcre2test's callout function to be generated. For the first callout in pcre2test's callout function to be generated. For the first callout in
a match attempt at a new starting position in the subject, "New match a match attempt at a new starting position in the subject, "New match
attempt" is output. If there has been a backtrack since the last call- attempt" is output. If there has been a backtrack since the last call-
out (or start of matching if this is the first callout), "Backtrack" is out (or start of matching if this is the first callout), "Backtrack" is
output, followed by "No other matching paths" if the backtrack ended output, followed by "No other matching paths" if the backtrack ended
the previous match attempt. For example: the previous match attempt. For example:
re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
@ -1716,86 +1750,86 @@ CALLOUTS
+1 ^ a+ +1 ^ a+
No match No match
Notice that various optimizations must be turned off if you want all Notice that various optimizations must be turned off if you want all
possible matching paths to be scanned. If no_start_optimize is not possible matching paths to be scanned. If no_start_optimize is not
used, there is an immediate "no match", without any callouts, because used, there is an immediate "no match", without any callouts, because
the starting optimization fails to find "b" in the subject, which it the starting optimization fails to find "b" in the subject, which it
knows must be present for any match. If no_auto_possess is not used, knows must be present for any match. If no_auto_possess is not used,
the "a+" item is turned into "a++", which reduces the number of back- the "a+" item is turned into "a++", which reduces the number of back-
tracks. tracks.
The callout_extra modifier has no effect if used with the DFA matching The callout_extra modifier has no effect if used with the DFA matching
function, or with JIT. function, or with JIT.
Return values from callouts Return values from callouts
The default return from the callout function is zero, which allows The default return from the callout function is zero, which allows
matching to continue. The callout_fail modifier can be given one or two matching to continue. The callout_fail modifier can be given one or two
numbers. If there is only one number, 1 is returned instead of 0 (caus- numbers. If there is only one number, 1 is returned instead of 0 (caus-
ing matching to backtrack) when a callout of that number is reached. If ing matching to backtrack) when a callout of that number is reached. If
two numbers (<n>:<m>) are given, 1 is returned when callout <n> is two numbers (<n>:<m>) are given, 1 is returned when callout <n> is
reached and there have been at least <m> callouts. The callout_error reached and there have been at least <m> callouts. The callout_error
modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus- modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
ing the entire matching process to be aborted. If both these modifiers ing the entire matching process to be aborted. If both these modifiers
are set for the same callout number, callout_error takes precedence. are set for the same callout number, callout_error takes precedence.
Note that callouts with string arguments are always given the number Note that callouts with string arguments are always given the number
zero. zero.
The callout_data modifier can be given an unsigned or a negative num- The callout_data modifier can be given an unsigned or a negative num-
ber. This is set as the "user data" that is passed to the matching ber. This is set as the "user data" that is passed to the matching
function, and passed back when the callout function is invoked. Any function, and passed back when the callout function is invoked. Any
value other than zero is used as a return from pcre2test's callout value other than zero is used as a return from pcre2test's callout
function. function.
Inserting callouts can be helpful when using pcre2test to check compli- Inserting callouts can be helpful when using pcre2test to check compli-
cated regular expressions. For further information about callouts, see cated regular expressions. For further information about callouts, see
the pcre2callout documentation. the pcre2callout documentation.
NON-PRINTING CHARACTERS NON-PRINTING CHARACTERS
When pcre2test is outputting text in the compiled version of a pattern, When pcre2test is outputting text in the compiled version of a pattern,
bytes other than 32-126 are always treated as non-printing characters bytes other than 32-126 are always treated as non-printing characters
and are therefore shown as hex escapes. and are therefore shown as hex escapes.
When pcre2test is outputting text that is a matched part of a subject When pcre2test is outputting text that is a matched part of a subject
string, it behaves in the same way, unless a different locale has been string, it behaves in the same way, unless a different locale has been
set for the pattern (using the locale modifier). In this case, the set for the pattern (using the locale modifier). In this case, the
isprint() function is used to distinguish printing and non-printing isprint() function is used to distinguish printing and non-printing
characters. characters.
SAVING AND RESTORING COMPILED PATTERNS SAVING AND RESTORING COMPILED PATTERNS
It is possible to save compiled patterns on disc or elsewhere, and It is possible to save compiled patterns on disc or elsewhere, and
reload them later, subject to a number of restrictions. JIT data cannot reload them later, subject to a number of restrictions. JIT data cannot
be saved. The host on which the patterns are reloaded must be running be saved. The host on which the patterns are reloaded must be running
the same version of PCRE2, with the same code unit width, and must also the same version of PCRE2, with the same code unit width, and must also
have the same endianness, pointer width and PCRE2_SIZE type. Before have the same endianness, pointer width and PCRE2_SIZE type. Before
compiled patterns can be saved they must be serialized, that is, con- compiled patterns can be saved they must be serialized, that is, con-
verted to a stream of bytes. A single byte stream may contain any num- verted to a stream of bytes. A single byte stream may contain any num-
ber of compiled patterns, but they must all use the same character ber of compiled patterns, but they must all use the same character
tables. A single copy of the tables is included in the byte stream (its tables. A single copy of the tables is included in the byte stream (its
size is 1088 bytes). size is 1088 bytes).
The functions whose names begin with pcre2_serialize_ are used for The functions whose names begin with pcre2_serialize_ are used for
serializing and de-serializing. They are described in the pcre2serial- serializing and de-serializing. They are described in the pcre2serial-
ize documentation. In this section we describe the features of ize documentation. In this section we describe the features of
pcre2test that can be used to test these functions. pcre2test that can be used to test these functions.
Note that "serialization" in PCRE2 does not convert compiled patterns Note that "serialization" in PCRE2 does not convert compiled patterns
to an abstract format like Java or .NET. It just makes a reloadable to an abstract format like Java or .NET. It just makes a reloadable
byte code stream. Hence the restrictions on reloading mentioned above. byte code stream. Hence the restrictions on reloading mentioned above.
In pcre2test, when a pattern with push modifier is successfully com- In pcre2test, when a pattern with push modifier is successfully com-
piled, it is pushed onto a stack of compiled patterns, and pcre2test piled, it is pushed onto a stack of compiled patterns, and pcre2test
expects the next line to contain a new pattern (or command) instead of expects the next line to contain a new pattern (or command) instead of
a subject line. By contrast, the pushcopy modifier causes a copy of the a subject line. By contrast, the pushcopy modifier causes a copy of the
compiled pattern to be stacked, leaving the original available for compiled pattern to be stacked, leaving the original available for
immediate matching. By using push and/or pushcopy, a number of patterns immediate matching. By using push and/or pushcopy, a number of patterns
can be compiled and retained. These modifiers are incompatible with can be compiled and retained. These modifiers are incompatible with
posix, and control modifiers that act at match time are ignored (with a posix, and control modifiers that act at match time are ignored (with a
message) for the stacked patterns. The jitverify modifier applies only message) for the stacked patterns. The jitverify modifier applies only
at compile time. at compile time.
The command The command
@ -1803,21 +1837,21 @@ SAVING AND RESTORING COMPILED PATTERNS
#save <filename> #save <filename>
causes all the stacked patterns to be serialized and the result written causes all the stacked patterns to be serialized and the result written
to the named file. Afterwards, all the stacked patterns are freed. The to the named file. Afterwards, all the stacked patterns are freed. The
command command
#load <filename> #load <filename>
reads the data in the file, and then arranges for it to be de-serial- reads the data in the file, and then arranges for it to be de-serial-
ized, with the resulting compiled patterns added to the pattern stack. ized, with the resulting compiled patterns added to the pattern stack.
The pattern on the top of the stack can be retrieved by the #pop com- The pattern on the top of the stack can be retrieved by the #pop com-
mand, which must be followed by lines of subjects that are to be mand, which must be followed by lines of subjects that are to be
matched with the pattern, terminated as usual by an empty line or end matched with the pattern, terminated as usual by an empty line or end
of file. This command may be followed by a modifier list containing of file. This command may be followed by a modifier list containing
only control modifiers that act after a pattern has been compiled. In only control modifiers that act after a pattern has been compiled. In
particular, hex, posix, posix_nosub, push, and pushcopy are not particular, hex, posix, posix_nosub, push, and pushcopy are not
allowed, nor are any option-setting modifiers. The JIT modifiers are, allowed, nor are any option-setting modifiers. The JIT modifiers are,
however permitted. Here is an example that saves and reloads two pat- however permitted. Here is an example that saves and reloads two pat-
terns. terns.
/abc/push /abc/push
@ -1830,10 +1864,10 @@ SAVING AND RESTORING COMPILED PATTERNS
#pop jit,bincode #pop jit,bincode
abc abc
If jitverify is used with #pop, it does not automatically imply jit, If jitverify is used with #pop, it does not automatically imply jit,
which is different behaviour from when it is used on a pattern. which is different behaviour from when it is used on a pattern.
The #popcopy command is analagous to the pushcopy modifier in that it The #popcopy command is analagous to the pushcopy modifier in that it
makes current a copy of the topmost stack pattern, leaving the original makes current a copy of the topmost stack pattern, leaving the original
still on the stack. still on the stack.
@ -1853,5 +1887,5 @@ AUTHOR
REVISION REVISION
Last updated: 21 September 2018 Last updated: 12 November 2018
Copyright (c) 1997-2018 University of Cambridge. Copyright (c) 1997-2018 University of Cambridge.

View File

@ -549,8 +549,12 @@ typedef struct pcre2_callout_enumerate_block { \
typedef struct pcre2_substitute_callout_block { \ typedef struct pcre2_substitute_callout_block { \
uint32_t version; /* Identifies version of block */ \ uint32_t version; /* Identifies version of block */ \
/* ------------------------ Version 0 ------------------------------- */ \ /* ------------------------ Version 0 ------------------------------- */ \
PCRE2_SIZE input_offsets[2]; /* Matched portion of the input */ \ PCRE2_SPTR input; /* Pointer to input subject string */ \
PCRE2_SPTR output; /* Pointer to output buffer */ \
PCRE2_SIZE output_offsets[2]; /* Changed portion of the output */ \ PCRE2_SIZE output_offsets[2]; /* Changed portion of the output */ \
PCRE2_SIZE *ovector; /* Pointer to current ovector */ \
uint32_t oveccount; /* Count of pairs set in ovector */ \
uint32_t subscount; /* Substitution number */ \
/* ------------------------------------------------------------------ */ \ /* ------------------------------------------------------------------ */ \
} pcre2_substitute_callout_block; } pcre2_substitute_callout_block;
@ -609,7 +613,7 @@ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
int (*)(pcre2_callout_block *, void *), void *); \ int (*)(pcre2_callout_block *, void *), void *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_substitute_callout(pcre2_match_context *, \ pcre2_set_substitute_callout(pcre2_match_context *, \
void (*)(pcre2_substitute_callout_block *, void *), void *); \ int (*)(pcre2_substitute_callout_block *, void *), void *); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \
pcre2_set_depth_limit(pcre2_match_context *, uint32_t); \ pcre2_set_depth_limit(pcre2_match_context *, uint32_t); \
PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \

View File

@ -407,7 +407,7 @@ return 0;
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
pcre2_set_substitute_callout(pcre2_match_context *mcontext, pcre2_set_substitute_callout(pcre2_match_context *mcontext,
void (*substitute_callout)(pcre2_substitute_callout_block *, void *), int (*substitute_callout)(pcre2_substitute_callout_block *, void *),
void *substitute_callout_data) void *substitute_callout_data)
{ {
mcontext->substitute_callout = substitute_callout; mcontext->substitute_callout = substitute_callout;

View File

@ -585,7 +585,7 @@ typedef struct pcre2_real_match_context {
#endif #endif
int (*callout)(pcre2_callout_block *, void *); int (*callout)(pcre2_callout_block *, void *);
void *callout_data; void *callout_data;
void (*substitute_callout)(pcre2_substitute_callout_block *, void *); int (*substitute_callout)(pcre2_substitute_callout_block *, void *);
void *substitute_callout_data; void *substitute_callout_data;
PCRE2_SIZE offset_limit; PCRE2_SIZE offset_limit;
uint32_t heap_limit; uint32_t heap_limit;

View File

@ -241,13 +241,15 @@ PCRE2_SIZE *ovector;
PCRE2_SIZE ovecsave[3]; PCRE2_SIZE ovecsave[3];
pcre2_substitute_callout_block scb; pcre2_substitute_callout_block scb;
scb.version = 0; /* General initialization */
buff_offset = 0; buff_offset = 0;
lengthleft = buff_length = *blength; lengthleft = buff_length = *blength;
*blength = PCRE2_UNSET; *blength = PCRE2_UNSET;
ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET; ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET;
/* Partial matching is not valid. */ /* Partial matching is not valid. This must come after setting *blength to
PCRE2_UNSET, so as not to imply an offset in the replacement. */
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0) if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
return PCRE2_ERROR_BADOPTION; return PCRE2_ERROR_BADOPTION;
@ -266,6 +268,13 @@ if (match_data == NULL)
ovector = pcre2_get_ovector_pointer(match_data); ovector = pcre2_get_ovector_pointer(match_data);
ovector_count = pcre2_get_ovector_count(match_data); ovector_count = pcre2_get_ovector_count(match_data);
/* Fixed things in the callout block */
scb.version = 0;
scb.input = subject;
scb.output = (PCRE2_SPTR)buffer;
scb.ovector = ovector;
/* Find lengths of zero-terminated strings and the end of the replacement. */ /* Find lengths of zero-terminated strings and the end of the replacement. */
if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject); if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject);
@ -393,11 +402,6 @@ do
goto EXIT; goto EXIT;
} }
/* Save the match point for a possible callout */
scb.input_offsets[0] = ovector[0];
scb.input_offsets[1] = ovector[1];
/* Count substitutions with a paranoid check for integer overflow; surely no /* Count substitutions with a paranoid check for integer overflow; surely no
real call to this function would ever hit this! */ real call to this function would ever hit this! */
@ -409,12 +413,13 @@ do
subs++; subs++;
/* Copy the text leading up to the match, and remember where the insert /* Copy the text leading up to the match, and remember where the insert
begins. */ begins and how many ovector pairs are set. */
if (rc == 0) rc = ovector_count; if (rc == 0) rc = ovector_count;
fraglength = ovector[0] - start_offset; fraglength = ovector[0] - start_offset;
CHECKMEMCPY(subject + start_offset, fraglength); CHECKMEMCPY(subject + start_offset, fraglength);
scb.output_offsets[0] = buff_offset; scb.output_offsets[0] = buff_offset;
scb.oveccount = rc;
/* Process the replacement string. Literal mode is set by \Q, but only in /* Process the replacement string. Literal mode is set by \Q, but only in
extended mode when backslashes are being interpreted. In extended mode we extended mode when backslashes are being interpreted. In extended mode we
@ -836,8 +841,26 @@ do
if (!overflowed && mcontext->substitute_callout != NULL) if (!overflowed && mcontext->substitute_callout != NULL)
{ {
scb.subscount = subs;
scb.output_offsets[1] = buff_offset; scb.output_offsets[1] = buff_offset;
mcontext->substitute_callout(&scb, mcontext->substitute_callout_data); rc = mcontext->substitute_callout(&scb, mcontext->substitute_callout_data);
/* A non-zero return means cancel this substitution. Instead, copy the
matched string fragment. */
if (rc != 0)
{
PCRE2_SIZE newlength = scb.output_offsets[1] - scb.output_offsets[0];
PCRE2_SIZE oldlength = ovector[1] - ovector[0];
buff_offset -= newlength;
lengthleft += newlength;
CHECKMEMCPY(subject + ovector[0], oldlength);
/* A negative return means do not do any more. */
if (rc < 0) suboptions &= (~PCRE2_SUBSTITUTE_GLOBAL);
}
} }
/* Save the details of this match. See above for how this data is used. If we /* Save the details of this match. See above for how this data is used. If we

View File

@ -531,12 +531,14 @@ different things in the two cases. */
subject must be at the start and in the same order in both cases so that the subject must be at the start and in the same order in both cases so that the
same offset in the big table below works for both. */ same offset in the big table below works for both. */
typedef struct patctl { /* Structure for pattern modifiers. */ typedef struct patctl { /* Structure for pattern modifiers. */
uint32_t options; /* Must be in same position as datctl */ uint32_t options; /* Must be in same position as datctl */
uint32_t control; /* Must be in same position as datctl */ uint32_t control; /* Must be in same position as datctl */
uint32_t control2; /* Must be in same position as datctl */ uint32_t control2; /* Must be in same position as datctl */
uint32_t jitstack; /* Must be in same position as datctl */ uint32_t jitstack; /* Must be in same position as datctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */ uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t substitute_skip; /* Must be in same position as patctl */
uint32_t substitute_stop; /* Must be in same position as patctl */
uint32_t jit; uint32_t jit;
uint32_t stackguard_test; uint32_t stackguard_test;
uint32_t tables_id; uint32_t tables_id;
@ -551,12 +553,14 @@ typedef struct patctl { /* Structure for pattern modifiers. */
#define MAXCPYGET 10 #define MAXCPYGET 10
#define LENCPYGET 64 #define LENCPYGET 64
typedef struct datctl { /* Structure for data line modifiers. */ typedef struct datctl { /* Structure for data line modifiers. */
uint32_t options; /* Must be in same position as patctl */ uint32_t options; /* Must be in same position as patctl */
uint32_t control; /* Must be in same position as patctl */ uint32_t control; /* Must be in same position as patctl */
uint32_t control2; /* Must be in same position as patctl */ uint32_t control2; /* Must be in same position as patctl */
uint32_t jitstack; /* Must be in same position as patctl */ uint32_t jitstack; /* Must be in same position as patctl */
uint8_t replacement[REPLACE_MODSIZE]; /* So must this */ uint8_t replacement[REPLACE_MODSIZE]; /* So must this */
uint32_t substitute_skip; /* Must be in same position as patctl */
uint32_t substitute_stop; /* Must be in same position as patctl */
uint32_t startend[2]; uint32_t startend[2];
uint32_t cerror[2]; uint32_t cerror[2];
uint32_t cfail[2]; uint32_t cfail[2];
@ -704,6 +708,8 @@ static modstruct modlist[] = {
{ "substitute_callout", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_CALLOUT, PO(control2) }, { "substitute_callout", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_CALLOUT, PO(control2) },
{ "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) }, { "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) },
{ "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) }, { "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) },
{ "substitute_skip", MOD_PND, MOD_INT, 0, PO(substitute_skip) },
{ "substitute_stop", MOD_PND, MOD_INT, 0, PO(substitute_stop) },
{ "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) }, { "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) },
{ "substitute_unset_empty", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNSET_EMPTY, PO(control2) }, { "substitute_unset_empty", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNSET_EMPTY, PO(control2) },
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) }, { "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
@ -1370,13 +1376,13 @@ are supported. */
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \ #define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
if (test_mode == PCRE8_MODE) \ if (test_mode == PCRE8_MODE) \
pcre2_set_substitute_callout_8(G(a,8), \ pcre2_set_substitute_callout_8(G(a,8), \
(void (*)(pcre2_substitute_callout_block_8 *, void *))b,c); \ (int (*)(pcre2_substitute_callout_block_8 *, void *))b,c); \
else if (test_mode == PCRE16_MODE) \ else if (test_mode == PCRE16_MODE) \
pcre2_set_substitute_callout_16(G(a,16), \ pcre2_set_substitute_callout_16(G(a,16), \
(void (*)(pcre2_substitute_callout_block_16 *, void *))b,c); \ (int (*)(pcre2_substitute_callout_block_16 *, void *))b,c); \
else \ else \
pcre2_set_substitute_callout_32(G(a,32), \ pcre2_set_substitute_callout_32(G(a,32), \
(void (*)(pcre2_substitute_callout_block_32 *, void *))b,c) (int (*)(pcre2_substitute_callout_block_32 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \ #define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
if (test_mode == PCRE8_MODE) \ if (test_mode == PCRE8_MODE) \
@ -1850,10 +1856,10 @@ the three different cases. */
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \ #define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
if (test_mode == G(G(PCRE,BITONE),_MODE)) \ if (test_mode == G(G(PCRE,BITONE),_MODE)) \
G(pcre2_set_substitute_callout_,BITONE)(G(a,BITONE), \ G(pcre2_set_substitute_callout_,BITONE)(G(a,BITONE), \
(void (*)(G(pcre2_substitute_callout_block_,BITONE) *, void *))b,c); \ (int (*)(G(pcre2_substitute_callout_block_,BITONE) *, void *))b,c); \
else \ else \
G(pcre2_set_substitute_callout_,BITTWO)(G(a,BITTWO), \ G(pcre2_set_substitute_callout_,BITTWO)(G(a,BITTWO), \
(void (*)(G(pcre2_substitute_callout_block_,BITTWO) *, void *))b,c) (int (*)(G(pcre2_substitute_callout_block_,BITTWO) *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \ #define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
if (test_mode == G(G(PCRE,BITONE),_MODE)) \ if (test_mode == G(G(PCRE,BITONE),_MODE)) \
@ -2058,7 +2064,7 @@ the three different cases. */
#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_8(G(a,8),b) #define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_8(G(a,8),b)
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \ #define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
pcre2_set_substitute_callout_8(G(a,8), \ pcre2_set_substitute_callout_8(G(a,8), \
(void (*)(pcre2_substitute_callout_block_8 *, void *))b,c) (int (*)(pcre2_substitute_callout_block_8 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \ #define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
a = pcre2_substitute_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),G(h,8), \ a = pcre2_substitute_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),G(h,8), \
(PCRE2_SPTR8)i,j,(PCRE2_UCHAR8 *)k,l) (PCRE2_SPTR8)i,j,(PCRE2_UCHAR8 *)k,l)
@ -2165,7 +2171,7 @@ the three different cases. */
#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_16(G(a,16),b) #define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_16(G(a,16),b)
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \ #define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
pcre2_set_substitute_callout_16(G(a,16), \ pcre2_set_substitute_callout_16(G(a,16), \
(void (*)(pcre2_substitute_callout_block_16 *, void *))b,c) (int (*)(pcre2_substitute_callout_block_16 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \ #define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
a = pcre2_substitute_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),G(h,16), \ a = pcre2_substitute_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),G(h,16), \
(PCRE2_SPTR16)i,j,(PCRE2_UCHAR16 *)k,l) (PCRE2_SPTR16)i,j,(PCRE2_UCHAR16 *)k,l)
@ -2272,7 +2278,7 @@ the three different cases. */
#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_32(G(a,32),b) #define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_32(G(a,32),b)
#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \ #define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \
pcre2_set_substitute_callout_32(G(a,32), \ pcre2_set_substitute_callout_32(G(a,32), \
(void (*)(pcre2_substitute_callout_block_32 *, void *))b,c) (int (*)(pcre2_substitute_callout_block_32 *, void *))b,c)
#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \ #define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \
a = pcre2_substitute_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),G(h,32), \ a = pcre2_substitute_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),G(h,32), \
(PCRE2_SPTR32)i,j,(PCRE2_UCHAR32 *)k,l) (PCRE2_SPTR32)i,j,(PCRE2_UCHAR32 *)k,l)
@ -5955,17 +5961,40 @@ Arguments:
Returns: nothing Returns: nothing
*/ */
static void static int
substitute_callout_function(pcre2_substitute_callout_block_8 *scb, substitute_callout_function(pcre2_substitute_callout_block_8 *scb,
void *data_ptr) void *data_ptr)
{ {
int yield = 0;
BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0;
(void)data_ptr; /* Not used */ (void)data_ptr; /* Not used */
fprintf(outfile, "Old %" SIZ_FORM " %" SIZ_FORM " New %" SIZ_FORM
" %" SIZ_FORM "\n", fprintf(outfile, "%2d(%d) Old %" SIZ_FORM " %" SIZ_FORM " \"",
SIZ_CAST scb->input_offsets[0], scb->subscount, scb->oveccount,
SIZ_CAST scb->input_offsets[1], SIZ_CAST scb->ovector[0], SIZ_CAST scb->ovector[1]);
SIZ_CAST scb->output_offsets[0],
SIZ_CAST scb->output_offsets[1]); PCHARSV(scb->input, scb->ovector[0], scb->ovector[1] - scb->ovector[0],
utf, outfile);
fprintf(outfile, "\" New %" SIZ_FORM " %" SIZ_FORM " \"",
SIZ_CAST scb->output_offsets[0], SIZ_CAST scb->output_offsets[1]);
PCHARSV(scb->output, scb->output_offsets[0],
scb->output_offsets[1] - scb->output_offsets[0], utf, outfile);
if (scb->subscount == dat_datctl.substitute_stop)
{
yield = -1;
fprintf(outfile, " STOPPED");
}
else if (scb->subscount == dat_datctl.substitute_skip)
{
yield = +1;
fprintf(outfile, " SKIPPED");
}
fprintf(outfile, "\"\n");
return yield;
} }
@ -6494,6 +6523,11 @@ dat_datctl.control2 |= (pat_patctl.control2 & CTL2_ALLPD);
strcpy((char *)dat_datctl.replacement, (char *)pat_patctl.replacement); strcpy((char *)dat_datctl.replacement, (char *)pat_patctl.replacement);
if (dat_datctl.jitstack == 0) dat_datctl.jitstack = pat_patctl.jitstack; if (dat_datctl.jitstack == 0) dat_datctl.jitstack = pat_patctl.jitstack;
if (dat_datctl.substitute_skip == 0)
dat_datctl.substitute_skip = pat_patctl.substitute_skip;
if (dat_datctl.substitute_stop == 0)
dat_datctl.substitute_stop = pat_patctl.substitute_stop;
/* Initialize for scanning the data line. */ /* Initialize for scanning the data line. */
#ifdef SUPPORT_PCRE2_8 #ifdef SUPPORT_PCRE2_8
@ -6833,6 +6867,11 @@ arg_ulen = ulen; /* Value to use in match arg */
if (p[-1] != 0 && !decode_modifiers(p, CTX_DAT, NULL, &dat_datctl)) if (p[-1] != 0 && !decode_modifiers(p, CTX_DAT, NULL, &dat_datctl))
return PR_OK; return PR_OK;
/* Setting substitute_{skip,fail} implies a substitute callout. */
if (dat_datctl.substitute_skip != 0 || dat_datctl.substitute_stop != 0)
dat_datctl.control2 |= CTL2_SUBSTITUTE_CALLOUT;
/* Check for mutually exclusive modifiers. At present, these are all in the /* Check for mutually exclusive modifiers. At present, these are all in the
first control word. */ first control word. */

15
testdata/testinput2 vendored
View File

@ -5516,6 +5516,21 @@ a)"xI
/a(b)c|xyz/g,replace=<$0>,substitute_callout /a(b)c|xyz/g,replace=<$0>,substitute_callout
abcdefabcpqr abcdefabcpqr
abxyzpqrabcxyz
12abc34xyz99abc55\=substitute_stop=2
12abc34xyz99abc55\=substitute_skip=1
12abc34xyz99abc55\=substitute_skip=2
/a(b)c|xyz/g,replace=<$0>
abcdefabcpqr
abxyzpqrabcxyz
12abc34xyz\=substitute_stop=2
12abc34xyz\=substitute_skip=1
/a(b)c|xyz/replace=<$0>
abcdefabcpqr
12abc34xyz\=substitute_skip=1
12abc34xyz\=substitute_stop=1
/abc\rdef/ /abc\rdef/
abc\ndef abc\ndef

View File

@ -1630,10 +1630,10 @@ No match
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout /(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr 123abcáyzabcdef789abcሴqr
Old 6 6 New 6 8 1(2) Old 6 6 "" New 6 8 "<>"
Old 13 13 New 15 17 2(2) Old 13 13 "" New 15 17 "<>"
Old 13 16 New 17 22 3(2) Old 13 16 "def" New 17 22 "<def>"
Old 22 22 New 28 30 4(2) Old 22 22 "" New 28 30 "<>"
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr 4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
# End of testinput10 # End of testinput10

View File

@ -1475,10 +1475,10 @@ No match
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout /(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr 123abcáyzabcdef789abcሴqr
Old 6 6 New 6 8 1(2) Old 6 6 "" New 6 8 "<>"
Old 12 12 New 14 16 2(2) Old 12 12 "" New 14 16 "<>"
Old 12 15 New 16 21 3(2) Old 12 15 "def" New 16 21 "<def>"
Old 21 21 New 27 29 4(2) Old 21 21 "" New 27 29 "<>"
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr 4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
# A few script run tests in non-UTF mode (but they need Unicode support) # A few script run tests in non-UTF mode (but they need Unicode support)

View File

@ -1472,10 +1472,10 @@ No match
/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout /(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
123abcáyzabcdef789abcሴqr 123abcáyzabcdef789abcሴqr
Old 6 6 New 6 8 1(2) Old 6 6 "" New 6 8 "<>"
Old 12 12 New 14 16 2(2) Old 12 12 "" New 14 16 "<>"
Old 12 15 New 16 21 3(2) Old 12 15 "def" New 16 21 "<def>"
Old 21 21 New 27 29 4(2) Old 21 21 "" New 27 29 "<>"
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr 4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
# A few script run tests in non-UTF mode (but they need Unicode support) # A few script run tests in non-UTF mode (but they need Unicode support)

47
testdata/testoutput2 vendored
View File

@ -16797,9 +16797,52 @@ Subject length lower bound = 1
/a(b)c|xyz/g,replace=<$0>,substitute_callout /a(b)c|xyz/g,replace=<$0>,substitute_callout
abcdefabcpqr abcdefabcpqr
Old 0 3 New 0 5 1(2) Old 0 3 "abc" New 0 5 "<abc>"
Old 6 9 New 8 13 2(2) Old 6 9 "abc" New 8 13 "<abc>"
2: <abc>def<abc>pqr 2: <abc>def<abc>pqr
abxyzpqrabcxyz
1(1) Old 2 5 "xyz" New 2 7 "<xyz>"
2(2) Old 8 11 "abc" New 10 15 "<abc>"
3(1) Old 11 14 "xyz" New 15 20 "<xyz>"
3: ab<xyz>pqr<abc><xyz>
12abc34xyz99abc55\=substitute_stop=2
1(2) Old 2 5 "abc" New 2 7 "<abc>"
2(1) Old 7 10 "xyz" New 9 14 "<xyz> STOPPED"
2: 12<abc>34xyz99abc55
12abc34xyz99abc55\=substitute_skip=1
1(2) Old 2 5 "abc" New 2 7 "<abc> SKIPPED"
2(1) Old 7 10 "xyz" New 7 12 "<xyz>"
3(2) Old 12 15 "abc" New 14 19 "<abc>"
3: 12abc34<xyz>99<abc>55
12abc34xyz99abc55\=substitute_skip=2
1(2) Old 2 5 "abc" New 2 7 "<abc>"
2(1) Old 7 10 "xyz" New 9 14 "<xyz> SKIPPED"
3(2) Old 12 15 "abc" New 14 19 "<abc>"
3: 12<abc>34xyz99<abc>55
/a(b)c|xyz/g,replace=<$0>
abcdefabcpqr
2: <abc>def<abc>pqr
abxyzpqrabcxyz
3: ab<xyz>pqr<abc><xyz>
12abc34xyz\=substitute_stop=2
1(2) Old 2 5 "abc" New 2 7 "<abc>"
2(1) Old 7 10 "xyz" New 9 14 "<xyz> STOPPED"
2: 12<abc>34xyz
12abc34xyz\=substitute_skip=1
1(2) Old 2 5 "abc" New 2 7 "<abc> SKIPPED"
2(1) Old 7 10 "xyz" New 7 12 "<xyz>"
2: 12abc34<xyz>
/a(b)c|xyz/replace=<$0>
abcdefabcpqr
1: <abc>defabcpqr
12abc34xyz\=substitute_skip=1
1(2) Old 2 5 "abc" New 2 7 "<abc> SKIPPED"
1: 12abc34xyz
12abc34xyz\=substitute_stop=1
1(2) Old 2 5 "abc" New 2 7 "<abc> STOPPED"
1: 12abc34xyz
/abc\rdef/ /abc\rdef/
abc\ndef abc\ndef