Documentation update.
This commit is contained in:
parent
a57787b7cd
commit
eedd9d8e55
|
@ -3309,13 +3309,13 @@ can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
|
|||
option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
|
||||
replacement string(s). The default action is to perform just one replacement if
|
||||
the pattern matches, but there is an option that requests multiple replacements
|
||||
(see PCRE2_SUBSTITUTE_GLOBAL below for details).
|
||||
(see PCRE2_SUBSTITUTE_GLOBAL below).
|
||||
</P>
|
||||
<P>
|
||||
If successful, <b>pcre2_substitute()</b> returns the number of substitutions
|
||||
that were carried out. This may be zero if no match was found, and is never
|
||||
greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
|
||||
returned if an error is detected (see below for details).
|
||||
returned if an error is detected.
|
||||
</P>
|
||||
<P>
|
||||
Matches in which a \K item in a lookahead in the pattern causes the match to
|
||||
|
@ -3333,10 +3333,11 @@ functions from the match context, if provided, or else those that were used to
|
|||
allocate memory for the compiled code.
|
||||
</P>
|
||||
<P>
|
||||
If an external <i>match_data</i> block is provided, its contents afterwards
|
||||
are those set by the final call to <b>pcre2_match()</b>. For global changes,
|
||||
this will have ended in a no-match error. The contents of the ovector within
|
||||
the match data block may or may not have been changed.
|
||||
If <i>match_data</i> is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
|
||||
provided block is used for all calls to <b>pcre2_match()</b>, and its contents
|
||||
afterwards are the result of the final call. For global changes, this will
|
||||
always be a no-match error. The contents of the ovector within the match data
|
||||
block may or may not have been changed.
|
||||
</P>
|
||||
<P>
|
||||
As well as the usual options for <b>pcre2_match()</b>, a number of additional
|
||||
|
@ -3350,45 +3351,68 @@ an application to check for a match before choosing to substitute, without
|
|||
having to repeat the match.
|
||||
</P>
|
||||
<P>
|
||||
The <i>code</i> argument is not used for the first substitution when
|
||||
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also set,
|
||||
<b>pcre2_match()</b> will be called after the first substitution to check for
|
||||
further matches, and the contents of the <i>match_data</i> block will be
|
||||
changed.
|
||||
The contents of the externally supplied match data block are not changed when
|
||||
PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set,
|
||||
<b>pcre2_match()</b> is called after the first substitution to check for further
|
||||
matches, but this is done using an internally obtained match data block, thus
|
||||
always leaving the external block unchanged.
|
||||
</P>
|
||||
<P>
|
||||
The default is to return a copy of the subject string with matched substrings
|
||||
replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the
|
||||
replacement substrings are returned. In the global case, multiple replacements
|
||||
are concatenated in the output buffer. Substitution callouts (see
|
||||
The <i>code</i> argument is not used for matching before the first substitution
|
||||
when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, even when
|
||||
PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains information such as the
|
||||
UTF setting and the number of capturing parentheses in the pattern.
|
||||
</P>
|
||||
<P>
|
||||
The default action of <b>pcre2_substitute()</b> is to return a copy of the
|
||||
subject string with matched substrings replaced. However, if
|
||||
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are
|
||||
returned. In the global case, multiple replacements are concatenated in the
|
||||
output buffer. Substitution callouts (see
|
||||
<a href="#subcallouts">below)</a>
|
||||
can be used to separate them if necessary.
|
||||
</P>
|
||||
<P>
|
||||
The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
|
||||
variable that contains the length, in code units, of the output buffer. If the
|
||||
function is successful, the value is updated to contain the length of the new
|
||||
string, excluding the trailing zero that is automatically added.
|
||||
function is successful, the value is updated to contain the length in code
|
||||
units of the new string, excluding the trailing zero that is automatically
|
||||
added.
|
||||
</P>
|
||||
<P>
|
||||
If the function is not successful, the value set via <i>outlengthptr</i> depends
|
||||
on the type of error. For syntax errors in the replacement string, the value is
|
||||
the offset in the replacement string where the error was detected. For other
|
||||
errors, the value is PCRE2_UNSET by default. This includes the case of the
|
||||
output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set
|
||||
(see below), in which case the value is the minimum length needed, including
|
||||
space for the trailing zero. Note that in order to compute the required length,
|
||||
<b>pcre2_substitute()</b> has to simulate all the matching and copying, instead
|
||||
of giving an error return as soon as the buffer overflows. Note also that the
|
||||
length is in code units, not bytes.
|
||||
output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set.
|
||||
</P>
|
||||
<P>
|
||||
The replacement string, which is interpreted as a UTF string in UTF mode,
|
||||
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
|
||||
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
|
||||
default, however, a dollar character is an escape character that can specify
|
||||
the insertion of characters from capture groups and names from (*MARK) or other
|
||||
control verbs in the pattern. The following forms are always recognized:
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
|
||||
too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
|
||||
this option is set, however, <b>pcre2_substitute()</b> continues to go through
|
||||
the motions of matching and substituting (without, of course, writing anything)
|
||||
in order to compute the size of buffer that is needed. This value is passed
|
||||
back via the <i>outlengthptr</i> variable, with the result of the function still
|
||||
being PCRE2_ERROR_NOMEMORY.
|
||||
</P>
|
||||
<P>
|
||||
Passing a buffer size of zero is a permitted way of finding out how much memory
|
||||
is needed for given substitution. However, this does mean that the entire
|
||||
operation is carried out twice. Depending on the application, it may be more
|
||||
efficient to allocate a large buffer and free the excess afterwards, instead of
|
||||
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
|
||||
</P>
|
||||
<P>
|
||||
The replacement string, which is interpreted as a UTF string in UTF mode, is
|
||||
checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An invalid UTF
|
||||
replacement string causes an immediate return with the relevant UTF error code.
|
||||
</P>
|
||||
<P>
|
||||
If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not interpreted
|
||||
in any way. By default, however, a dollar character is an escape character that
|
||||
can specify the insertion of characters from capture groups and names from
|
||||
(*MARK) or other control verbs in the pattern. The following forms are always
|
||||
recognized:
|
||||
<pre>
|
||||
$$ insert a dollar character
|
||||
$<n> or ${<n>} insert the contents of group <n>
|
||||
|
@ -3436,22 +3460,6 @@ CRLF is a valid newline sequence and the next two characters are CR, LF. In
|
|||
this case, the offset is advanced by two characters.
|
||||
</P>
|
||||
<P>
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
|
||||
too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
|
||||
this option is set, however, <b>pcre2_substitute()</b> continues to go through
|
||||
the motions of matching and substituting (without, of course, writing anything)
|
||||
in order to compute the size of buffer that is needed. This value is passed
|
||||
back via the <i>outlengthptr</i> variable, with the result of the function still
|
||||
being PCRE2_ERROR_NOMEMORY.
|
||||
</P>
|
||||
<P>
|
||||
Passing a buffer size of zero is a permitted way of finding out how much memory
|
||||
is needed for given substitution. However, this does mean that the entire
|
||||
operation is carried out twice. Depending on the application, it may be more
|
||||
efficient to allocate a large buffer and free the excess afterwards, instead of
|
||||
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
|
||||
</P>
|
||||
<P>
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do
|
||||
not appear in the pattern to be treated as unset groups. This option should be
|
||||
used with care, because it means that a typo in a group name or number no
|
||||
|
@ -3907,7 +3915,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 22 January 2020
|
||||
Last updated: 16 February 2020
|
||||
<br>
|
||||
Copyright © 1997-2020 University of Cambridge.
|
||||
<br>
|
||||
|
|
103
doc/pcre2.txt
103
doc/pcre2.txt
|
@ -3200,13 +3200,13 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
|||
There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to re-
|
||||
turn just the replacement string(s). The default action is to perform
|
||||
just one replacement if the pattern matches, but there is an option
|
||||
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
|
||||
for details).
|
||||
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL be-
|
||||
low).
|
||||
|
||||
If successful, pcre2_substitute() returns the number of substitutions
|
||||
that were carried out. This may be zero if no match was found, and is
|
||||
never greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A nega-
|
||||
tive value is returned if an error is detected (see below for details).
|
||||
tive value is returned if an error is detected.
|
||||
|
||||
Matches in which a \K item in a lookahead in the pattern causes the
|
||||
match to end before it starts are not supported, and give rise to an
|
||||
|
@ -3221,10 +3221,11 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
|||
ment functions from the match context, if provided, or else those that
|
||||
were used to allocate memory for the compiled code.
|
||||
|
||||
If an external match_data block is provided, its contents afterwards
|
||||
are those set by the final call to pcre2_match(). For global changes,
|
||||
this will have ended in a no-match error. The contents of the ovector
|
||||
within the match data block may or may not have been changed.
|
||||
If match_data is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
|
||||
provided block is used for all calls to pcre2_match(), and its contents
|
||||
afterwards are the result of the final call. For global changes, this
|
||||
will always be a no-match error. The contents of the ovector within the
|
||||
match data block may or may not have been changed.
|
||||
|
||||
As well as the usual options for pcre2_match(), a number of additional
|
||||
options can be set in the options argument of pcre2_substitute(). One
|
||||
|
@ -3236,43 +3237,65 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
|||
application to check for a match before choosing to substitute, without
|
||||
having to repeat the match.
|
||||
|
||||
The code argument is not used for the first substitution when
|
||||
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also
|
||||
set, pcre2_match() will be called after the first substitution to check
|
||||
for further matches, and the contents of the match_data block will be
|
||||
changed.
|
||||
The contents of the externally supplied match data block are not
|
||||
changed when PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTI-
|
||||
TUTE_GLOBAL is also set, pcre2_match() is called after the first sub-
|
||||
stitution to check for further matches, but this is done using an in-
|
||||
ternally obtained match data block, thus always leaving the external
|
||||
block unchanged.
|
||||
|
||||
The default is to return a copy of the subject string with matched sub-
|
||||
strings replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set,
|
||||
only the replacement substrings are returned. In the global case, mul-
|
||||
tiple replacements are concatenated in the output buffer. Substitution
|
||||
callouts (see below) can be used to separate them if necessary.
|
||||
The code argument is not used for matching before the first substitu-
|
||||
tion when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided,
|
||||
even when PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains in-
|
||||
formation such as the UTF setting and the number of capturing parenthe-
|
||||
ses in the pattern.
|
||||
|
||||
The default action of pcre2_substitute() is to return a copy of the
|
||||
subject string with matched substrings replaced. However, if PCRE2_SUB-
|
||||
STITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are
|
||||
returned. In the global case, multiple replacements are concatenated in
|
||||
the output buffer. Substitution callouts (see below) can be used to
|
||||
separate them if necessary.
|
||||
|
||||
The outlengthptr argument of pcre2_substitute() must point to a vari-
|
||||
able that contains the length, in code units, of the output buffer. If
|
||||
the function is successful, the value is updated to contain the length
|
||||
of the new string, excluding the trailing zero that is automatically
|
||||
added.
|
||||
in code units of the new string, excluding the trailing zero that is
|
||||
automatically added.
|
||||
|
||||
If the function is not successful, the value set via outlengthptr de-
|
||||
pends on the type of error. For syntax errors in the replacement
|
||||
string, the value is the offset in the replacement string where the er-
|
||||
ror was detected. For other errors, the value is PCRE2_UNSET by de-
|
||||
fault. This includes the case of the output buffer being too small, un-
|
||||
less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which case
|
||||
the value is the minimum length needed, including space for the trail-
|
||||
ing zero. Note that in order to compute the required length, pcre2_sub-
|
||||
stitute() has to simulate all the matching and copying, instead of giv-
|
||||
ing an error return as soon as the buffer overflows. Note also that the
|
||||
length is in code units, not bytes.
|
||||
less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set.
|
||||
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
||||
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
|
||||
ORY immediately. If this option is set, however, pcre2_substitute()
|
||||
continues to go through the motions of matching and substituting (with-
|
||||
out, of course, writing anything) in order to compute the size of buf-
|
||||
fer that is needed. This value is passed back via the outlengthptr
|
||||
variable, with the result of the function still being PCRE2_ER-
|
||||
ROR_NOMEMORY.
|
||||
|
||||
Passing a buffer size of zero is a permitted way of finding out how
|
||||
much memory is needed for given substitution. However, this does mean
|
||||
that the entire operation is carried out twice. Depending on the appli-
|
||||
cation, it may be more efficient to allocate a large buffer and free
|
||||
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
||||
FLOW_LENGTH.
|
||||
|
||||
The replacement string, which is interpreted as a UTF string in UTF
|
||||
mode, is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option
|
||||
is set. If the PCRE2_SUBSTITUTE_LITERAL option is set, it is not inter-
|
||||
preted in any way. By default, however, a dollar character is an escape
|
||||
character that can specify the insertion of characters from capture
|
||||
groups and names from (*MARK) or other control verbs in the pattern.
|
||||
The following forms are always recognized:
|
||||
mode, is checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An
|
||||
invalid UTF replacement string causes an immediate return with the rel-
|
||||
evant UTF error code.
|
||||
|
||||
If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not in-
|
||||
terpreted in any way. By default, however, a dollar character is an es-
|
||||
cape character that can specify the insertion of characters from cap-
|
||||
ture groups and names from (*MARK) or other control verbs in the pat-
|
||||
tern. The following forms are always recognized:
|
||||
|
||||
$$ insert a dollar character
|
||||
$<n> or ${<n>} insert the contents of group <n>
|
||||
|
@ -3320,22 +3343,6 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
|||
two characters are CR, LF. In this case, the offset is advanced by two
|
||||
characters.
|
||||
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
|
||||
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
|
||||
ORY immediately. If this option is set, however, pcre2_substitute()
|
||||
continues to go through the motions of matching and substituting (with-
|
||||
out, of course, writing anything) in order to compute the size of buf-
|
||||
fer that is needed. This value is passed back via the outlengthptr
|
||||
variable, with the result of the function still being PCRE2_ER-
|
||||
ROR_NOMEMORY.
|
||||
|
||||
Passing a buffer size of zero is a permitted way of finding out how
|
||||
much memory is needed for given substitution. However, this does mean
|
||||
that the entire operation is carried out twice. Depending on the appli-
|
||||
cation, it may be more efficient to allocate a large buffer and free
|
||||
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
|
||||
FLOW_LENGTH.
|
||||
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
|
||||
do not appear in the pattern to be treated as unset groups. This option
|
||||
should be used with care, because it means that a typo in a group name
|
||||
|
@ -3754,7 +3761,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 22 January 2020
|
||||
Last updated: 16 February 2020
|
||||
Copyright (c) 1997-2020 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
|
Loading…
Reference in New Issue