Documentation update.
This commit is contained in:
parent
a57787b7cd
commit
eedd9d8e55
|
@ -3309,13 +3309,13 @@ can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
|
||||||
option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
|
option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
|
||||||
replacement string(s). The default action is to perform just one replacement if
|
replacement string(s). The default action is to perform just one replacement if
|
||||||
the pattern matches, but there is an option that requests multiple replacements
|
the pattern matches, but there is an option that requests multiple replacements
|
||||||
(see PCRE2_SUBSTITUTE_GLOBAL below for details).
|
(see PCRE2_SUBSTITUTE_GLOBAL below).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If successful, <b>pcre2_substitute()</b> returns the number of substitutions
|
If successful, <b>pcre2_substitute()</b> returns the number of substitutions
|
||||||
that were carried out. This may be zero if no match was found, and is never
|
that were carried out. This may be zero if no match was found, and is never
|
||||||
greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
|
greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
|
||||||
returned if an error is detected (see below for details).
|
returned if an error is detected.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Matches in which a \K item in a lookahead in the pattern causes the match to
|
Matches in which a \K item in a lookahead in the pattern causes the match to
|
||||||
|
@ -3333,10 +3333,11 @@ functions from the match context, if provided, or else those that were used to
|
||||||
allocate memory for the compiled code.
|
allocate memory for the compiled code.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If an external <i>match_data</i> block is provided, its contents afterwards
|
If <i>match_data</i> is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
|
||||||
are those set by the final call to <b>pcre2_match()</b>. For global changes,
|
provided block is used for all calls to <b>pcre2_match()</b>, and its contents
|
||||||
this will have ended in a no-match error. The contents of the ovector within
|
afterwards are the result of the final call. For global changes, this will
|
||||||
the match data block may or may not have been changed.
|
always be a no-match error. The contents of the ovector within the match data
|
||||||
|
block may or may not have been changed.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
As well as the usual options for <b>pcre2_match()</b>, a number of additional
|
As well as the usual options for <b>pcre2_match()</b>, a number of additional
|
||||||
|
@ -3350,45 +3351,68 @@ an application to check for a match before choosing to substitute, without
|
||||||
having to repeat the match.
|
having to repeat the match.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <i>code</i> argument is not used for the first substitution when
|
The contents of the externally supplied match data block are not changed when
|
||||||
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also set,
|
PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set,
|
||||||
<b>pcre2_match()</b> will be called after the first substitution to check for
|
<b>pcre2_match()</b> is called after the first substitution to check for further
|
||||||
further matches, and the contents of the <i>match_data</i> block will be
|
matches, but this is done using an internally obtained match data block, thus
|
||||||
changed.
|
always leaving the external block unchanged.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The default is to return a copy of the subject string with matched substrings
|
The <i>code</i> argument is not used for matching before the first substitution
|
||||||
replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the
|
when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, even when
|
||||||
replacement substrings are returned. In the global case, multiple replacements
|
PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains information such as the
|
||||||
are concatenated in the output buffer. Substitution callouts (see
|
UTF setting and the number of capturing parentheses in the pattern.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The default action of <b>pcre2_substitute()</b> is to return a copy of the
|
||||||
|
subject string with matched substrings replaced. However, if
|
||||||
|
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are
|
||||||
|
returned. In the global case, multiple replacements are concatenated in the
|
||||||
|
output buffer. Substitution callouts (see
|
||||||
<a href="#subcallouts">below)</a>
|
<a href="#subcallouts">below)</a>
|
||||||
can be used to separate them if necessary.
|
can be used to separate them if necessary.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
|
The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
|
||||||
variable that contains the length, in code units, of the output buffer. If the
|
variable that contains the length, in code units, of the output buffer. If the
|
||||||
function is successful, the value is updated to contain the length of the new
|
function is successful, the value is updated to contain the length in code
|
||||||
string, excluding the trailing zero that is automatically added.
|
units of the new string, excluding the trailing zero that is automatically
|
||||||
|
added.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If the function is not successful, the value set via <i>outlengthptr</i> depends
|
If the function is not successful, the value set via <i>outlengthptr</i> depends
|
||||||
on the type of error. For syntax errors in the replacement string, the value is
|
on the type of error. For syntax errors in the replacement string, the value is
|
||||||
the offset in the replacement string where the error was detected. For other
|
the offset in the replacement string where the error was detected. For other
|
||||||
errors, the value is PCRE2_UNSET by default. This includes the case of the
|
errors, the value is PCRE2_UNSET by default. This includes the case of the
|
||||||
output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set
|
output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set.
|
||||||
(see below), in which case the value is the minimum length needed, including
|
|
||||||
space for the trailing zero. Note that in order to compute the required length,
|
|
||||||
<b>pcre2_substitute()</b> has to simulate all the matching and copying, instead
|
|
||||||
of giving an error return as soon as the buffer overflows. Note also that the
|
|
||||||
length is in code units, not bytes.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The replacement string, which is interpreted as a UTF string in UTF mode,
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
|
||||||
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
|
too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
|
||||||
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
|
this option is set, however, <b>pcre2_substitute()</b> continues to go through
|
||||||
default, however, a dollar character is an escape character that can specify
|
the motions of matching and substituting (without, of course, writing anything)
|
||||||
the insertion of characters from capture groups and names from (*MARK) or other
|
in order to compute the size of buffer that is needed. This value is passed
|
||||||
control verbs in the pattern. The following forms are always recognized:
|
back via the <i>outlengthptr</i> variable, with the result of the function still
|
||||||
|
being PCRE2_ERROR_NOMEMORY.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Passing a buffer size of zero is a permitted way of finding out how much memory
|
||||||
|
is needed for given substitution. However, this does mean that the entire
|
||||||
|
operation is carried out twice. Depending on the application, it may be more
|
||||||
|
efficient to allocate a large buffer and free the excess afterwards, instead of
|
||||||
|
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The replacement string, which is interpreted as a UTF string in UTF mode, is
|
||||||
|
checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An invalid UTF
|
||||||
|
replacement string causes an immediate return with the relevant UTF error code.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not interpreted
|
||||||
|
in any way. By default, however, a dollar character is an escape character that
|
||||||
|
can specify the insertion of characters from capture groups and names from
|
||||||
|
(*MARK) or other control verbs in the pattern. The following forms are always
|
||||||
|
recognized:
|
||||||
<pre>
|
<pre>
|
||||||
$$ insert a dollar character
|
$$ insert a dollar character
|
||||||
$<n> or ${<n>} insert the contents of group <n>
|
$<n> or ${<n>} insert the contents of group <n>
|
||||||
|
@ -3436,22 +3460,6 @@ CRLF is a valid newline sequence and the next two characters are CR, LF. In
|
||||||
this case, the offset is advanced by two characters.
|
this case, the offset is advanced by two characters.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
|
|
||||||
too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
|
|
||||||
this option is set, however, <b>pcre2_substitute()</b> continues to go through
|
|
||||||
the motions of matching and substituting (without, of course, writing anything)
|
|
||||||
in order to compute the size of buffer that is needed. This value is passed
|
|
||||||
back via the <i>outlengthptr</i> variable, with the result of the function still
|
|
||||||
being PCRE2_ERROR_NOMEMORY.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
Passing a buffer size of zero is a permitted way of finding out how much memory
|
|
||||||
is needed for given substitution. However, this does mean that the entire
|
|
||||||
operation is carried out twice. Depending on the application, it may be more
|
|
||||||
efficient to allocate a large buffer and free the excess afterwards, instead of
|
|
||||||
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do
|
||||||
not appear in the pattern to be treated as unset groups. This option should be
|
not appear in the pattern to be treated as unset groups. This option should be
|
||||||
used with care, because it means that a typo in a group name or number no
|
used with care, because it means that a typo in a group name or number no
|
||||||
|
@ -3907,7 +3915,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 22 January 2020
|
Last updated: 16 February 2020
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2020 University of Cambridge.
|
Copyright © 1997-2020 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
103
doc/pcre2.txt
103
doc/pcre2.txt
|
@ -3200,13 +3200,13 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to re-
|
There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to re-
|
||||||
turn just the replacement string(s). The default action is to perform
|
turn just the replacement string(s). The default action is to perform
|
||||||
just one replacement if the pattern matches, but there is an option
|
just one replacement if the pattern matches, but there is an option
|
||||||
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
|
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL be-
|
||||||
for details).
|
low).
|
||||||
|
|
||||||
If successful, pcre2_substitute() returns the number of substitutions
|
If successful, pcre2_substitute() returns the number of substitutions
|
||||||
that were carried out. This may be zero if no match was found, and is
|
that were carried out. This may be zero if no match was found, and is
|
||||||
never greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A nega-
|
never greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A nega-
|
||||||
tive value is returned if an error is detected (see below for details).
|
tive value is returned if an error is detected.
|
||||||
|
|
||||||
Matches in which a \K item in a lookahead in the pattern causes the
|
Matches in which a \K item in a lookahead in the pattern causes the
|
||||||
match to end before it starts are not supported, and give rise to an
|
match to end before it starts are not supported, and give rise to an
|
||||||
|
@ -3221,10 +3221,11 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
ment functions from the match context, if provided, or else those that
|
ment functions from the match context, if provided, or else those that
|
||||||
were used to allocate memory for the compiled code.
|
were used to allocate memory for the compiled code.
|
||||||
|
|
||||||
If an external match_data block is provided, its contents afterwards
|
If match_data is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
|
||||||
are those set by the final call to pcre2_match(). For global changes,
|
provided block is used for all calls to pcre2_match(), and its contents
|
||||||
this will have ended in a no-match error. The contents of the ovector
|
afterwards are the result of the final call. For global changes, this
|
||||||
within the match data block may or may not have been changed.
|
will always be a no-match error. The contents of the ovector within the
|
||||||
|
match data block may or may not have been changed.
|
||||||
|
|
||||||
As well as the usual options for pcre2_match(), a number of additional
|
As well as the usual options for pcre2_match(), a number of additional
|
||||||
options can be set in the options argument of pcre2_substitute(). One
|
options can be set in the options argument of pcre2_substitute(). One
|
||||||
|
@ -3236,43 +3237,65 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
application to check for a match before choosing to substitute, without
|
application to check for a match before choosing to substitute, without
|
||||||
having to repeat the match.
|
having to repeat the match.
|
||||||
|
|
||||||
The code argument is not used for the first substitution when
|
The contents of the externally supplied match data block are not
|
||||||
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also
|
changed when PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTI-
|
||||||
set, pcre2_match() will be called after the first substitution to check
|
TUTE_GLOBAL is also set, pcre2_match() is called after the first sub-
|
||||||
for further matches, and the contents of the match_data block will be
|
stitution to check for further matches, but this is done using an in-
|
||||||
changed.
|
ternally obtained match data block, thus always leaving the external
|
||||||
|
block unchanged.
|
||||||
|
|
||||||
The default is to return a copy of the subject string with matched sub-
|
The code argument is not used for matching before the first substitu-
|
||||||
strings replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set,
|
tion when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided,
|
||||||
only the replacement substrings are returned. In the global case, mul-
|
even when PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains in-
|
||||||
tiple replacements are concatenated in the output buffer. Substitution
|
formation such as the UTF setting and the number of capturing parenthe-
|
||||||
callouts (see below) can be used to separate them if necessary.
|
ses in the pattern.
|
||||||
|
|
||||||
|
The default action of pcre2_substitute() is to return a copy of the
|
||||||
|
subject string with matched substrings replaced. However, if PCRE2_SUB-
|
||||||
|
STITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are
|
||||||
|
returned. In the global case, multiple replacements are concatenated in
|
||||||
|
the output buffer. Substitution callouts (see below) can be used to
|
||||||
|
separate them if necessary.
|
||||||
|
|
||||||
The outlengthptr argument of pcre2_substitute() must point to a vari-
|
The outlengthptr argument of pcre2_substitute() must point to a vari-
|
||||||
able that contains the length, in code units, of the output buffer. If
|
able that contains the length, in code units, of the output buffer. If
|
||||||
the function is successful, the value is updated to contain the length
|
the function is successful, the value is updated to contain the length
|
||||||
of the new string, excluding the trailing zero that is automatically
|
in code units of the new string, excluding the trailing zero that is
|
||||||
added.
|
automatically added.
|
||||||
|
|
||||||
If the function is not successful, the value set via outlengthptr de-
|
If the function is not successful, the value set via outlengthptr de-
|
||||||
pends on the type of error. For syntax errors in the replacement
|
pends on the type of error. For syntax errors in the replacement
|
||||||
string, the value is the offset in the replacement string where the er-
|
string, the value is the offset in the replacement string where the er-
|
||||||
ror was detected. For other errors, the value is PCRE2_UNSET by de-
|
ror was detected. For other errors, the value is PCRE2_UNSET by de-
|
||||||
fault. This includes the case of the output buffer being too small, un-
|
fault. This includes the case of the output buffer being too small, un-
|
||||||
less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which case
|
less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set.
|
||||||
the value is the minimum length needed, including space for the trail-
|
|
||||||
ing zero. Note that in order to compute the required length, pcre2_sub-
|
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
||||||
stitute() has to simulate all the matching and copying, instead of giv-
|
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
|
||||||
ing an error return as soon as the buffer overflows. Note also that the
|
ORY immediately. If this option is set, however, pcre2_substitute()
|
||||||
length is in code units, not bytes.
|
continues to go through the motions of matching and substituting (with-
|
||||||
|
out, of course, writing anything) in order to compute the size of buf-
|
||||||
|
fer that is needed. This value is passed back via the outlengthptr
|
||||||
|
variable, with the result of the function still being PCRE2_ER-
|
||||||
|
ROR_NOMEMORY.
|
||||||
|
|
||||||
|
Passing a buffer size of zero is a permitted way of finding out how
|
||||||
|
much memory is needed for given substitution. However, this does mean
|
||||||
|
that the entire operation is carried out twice. Depending on the appli-
|
||||||
|
cation, it may be more efficient to allocate a large buffer and free
|
||||||
|
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
||||||
|
FLOW_LENGTH.
|
||||||
|
|
||||||
The replacement string, which is interpreted as a UTF string in UTF
|
The replacement string, which is interpreted as a UTF string in UTF
|
||||||
mode, is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option
|
mode, is checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An
|
||||||
is set. If the PCRE2_SUBSTITUTE_LITERAL option is set, it is not inter-
|
invalid UTF replacement string causes an immediate return with the rel-
|
||||||
preted in any way. By default, however, a dollar character is an escape
|
evant UTF error code.
|
||||||
character that can specify the insertion of characters from capture
|
|
||||||
groups and names from (*MARK) or other control verbs in the pattern.
|
If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not in-
|
||||||
The following forms are always recognized:
|
terpreted in any way. By default, however, a dollar character is an es-
|
||||||
|
cape character that can specify the insertion of characters from cap-
|
||||||
|
ture groups and names from (*MARK) or other control verbs in the pat-
|
||||||
|
tern. The following forms are always recognized:
|
||||||
|
|
||||||
$$ insert a dollar character
|
$$ insert a dollar character
|
||||||
$<n> or ${<n>} insert the contents of group <n>
|
$<n> or ${<n>} insert the contents of group <n>
|
||||||
|
@ -3320,22 +3343,6 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
two characters are CR, LF. In this case, the offset is advanced by two
|
two characters are CR, LF. In this case, the offset is advanced by two
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
|
||||||
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
|
|
||||||
ORY immediately. If this option is set, however, pcre2_substitute()
|
|
||||||
continues to go through the motions of matching and substituting (with-
|
|
||||||
out, of course, writing anything) in order to compute the size of buf-
|
|
||||||
fer that is needed. This value is passed back via the outlengthptr
|
|
||||||
variable, with the result of the function still being PCRE2_ER-
|
|
||||||
ROR_NOMEMORY.
|
|
||||||
|
|
||||||
Passing a buffer size of zero is a permitted way of finding out how
|
|
||||||
much memory is needed for given substitution. However, this does mean
|
|
||||||
that the entire operation is carried out twice. Depending on the appli-
|
|
||||||
cation, it may be more efficient to allocate a large buffer and free
|
|
||||||
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
|
||||||
FLOW_LENGTH.
|
|
||||||
|
|
||||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
|
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
|
||||||
do not appear in the pattern to be treated as unset groups. This option
|
do not appear in the pattern to be treated as unset groups. This option
|
||||||
should be used with care, because it means that a typo in a group name
|
should be used with care, because it means that a typo in a group name
|
||||||
|
@ -3754,7 +3761,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 22 January 2020
|
Last updated: 16 February 2020
|
||||||
Copyright (c) 1997-2020 University of Cambridge.
|
Copyright (c) 1997-2020 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue