Documentation update.

This commit is contained in:
Philip.Hazel 2020-02-16 17:47:14 +00:00
parent a57787b7cd
commit eedd9d8e55
2 changed files with 300 additions and 285 deletions

View File

@ -3309,13 +3309,13 @@ can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
replacement string(s). The default action is to perform just one replacement if
the pattern matches, but there is an option that requests multiple replacements
(see PCRE2_SUBSTITUTE_GLOBAL below for details).
(see PCRE2_SUBSTITUTE_GLOBAL below).
</P>
<P>
If successful, <b>pcre2_substitute()</b> returns the number of substitutions
that were carried out. This may be zero if no match was found, and is never
greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
returned if an error is detected (see below for details).
returned if an error is detected.
</P>
<P>
Matches in which a \K item in a lookahead in the pattern causes the match to
@ -3333,10 +3333,11 @@ functions from the match context, if provided, or else those that were used to
allocate memory for the compiled code.
</P>
<P>
If an external <i>match_data</i> block is provided, its contents afterwards
are those set by the final call to <b>pcre2_match()</b>. For global changes,
this will have ended in a no-match error. The contents of the ovector within
the match data block may or may not have been changed.
If <i>match_data</i> is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
provided block is used for all calls to <b>pcre2_match()</b>, and its contents
afterwards are the result of the final call. For global changes, this will
always be a no-match error. The contents of the ovector within the match data
block may or may not have been changed.
</P>
<P>
As well as the usual options for <b>pcre2_match()</b>, a number of additional
@ -3350,45 +3351,68 @@ an application to check for a match before choosing to substitute, without
having to repeat the match.
</P>
<P>
The <i>code</i> argument is not used for the first substitution when
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also set,
<b>pcre2_match()</b> will be called after the first substitution to check for
further matches, and the contents of the <i>match_data</i> block will be
changed.
The contents of the externally supplied match data block are not changed when
PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set,
<b>pcre2_match()</b> is called after the first substitution to check for further
matches, but this is done using an internally obtained match data block, thus
always leaving the external block unchanged.
</P>
<P>
The default is to return a copy of the subject string with matched substrings
replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the
replacement substrings are returned. In the global case, multiple replacements
are concatenated in the output buffer. Substitution callouts (see
The <i>code</i> argument is not used for matching before the first substitution
when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, even when
PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains information such as the
UTF setting and the number of capturing parentheses in the pattern.
</P>
<P>
The default action of <b>pcre2_substitute()</b> is to return a copy of the
subject string with matched substrings replaced. However, if
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are
returned. In the global case, multiple replacements are concatenated in the
output buffer. Substitution callouts (see
<a href="#subcallouts">below)</a>
can be used to separate them if necessary.
</P>
<P>
The <i>outlengthptr</i> argument of <b>pcre2_substitute()</b> must point to a
variable that contains the length, in code units, of the output buffer. If the
function is successful, the value is updated to contain the length of the new
string, excluding the trailing zero that is automatically added.
function is successful, the value is updated to contain the length in code
units of the new string, excluding the trailing zero that is automatically
added.
</P>
<P>
If the function is not successful, the value set via <i>outlengthptr</i> depends
on the type of error. For syntax errors in the replacement string, the value is
the offset in the replacement string where the error was detected. For other
errors, the value is PCRE2_UNSET by default. This includes the case of the
output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set
(see below), in which case the value is the minimum length needed, including
space for the trailing zero. Note that in order to compute the required length,
<b>pcre2_substitute()</b> has to simulate all the matching and copying, instead
of giving an error return as soon as the buffer overflows. Note also that the
length is in code units, not bytes.
output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set.
</P>
<P>
The replacement string, which is interpreted as a UTF string in UTF mode,
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
default, however, a dollar character is an escape character that can specify
the insertion of characters from capture groups and names from (*MARK) or other
control verbs in the pattern. The following forms are always recognized:
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
this option is set, however, <b>pcre2_substitute()</b> continues to go through
the motions of matching and substituting (without, of course, writing anything)
in order to compute the size of buffer that is needed. This value is passed
back via the <i>outlengthptr</i> variable, with the result of the function still
being PCRE2_ERROR_NOMEMORY.
</P>
<P>
Passing a buffer size of zero is a permitted way of finding out how much memory
is needed for given substitution. However, this does mean that the entire
operation is carried out twice. Depending on the application, it may be more
efficient to allocate a large buffer and free the excess afterwards, instead of
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
</P>
<P>
The replacement string, which is interpreted as a UTF string in UTF mode, is
checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An invalid UTF
replacement string causes an immediate return with the relevant UTF error code.
</P>
<P>
If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not interpreted
in any way. By default, however, a dollar character is an escape character that
can specify the insertion of characters from capture groups and names from
(*MARK) or other control verbs in the pattern. The following forms are always
recognized:
<pre>
$$ insert a dollar character
$&#60;n&#62; or ${&#60;n&#62;} insert the contents of group &#60;n&#62;
@ -3436,22 +3460,6 @@ CRLF is a valid newline sequence and the next two characters are CR, LF. In
this case, the offset is advanced by two characters.
</P>
<P>
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
this option is set, however, <b>pcre2_substitute()</b> continues to go through
the motions of matching and substituting (without, of course, writing anything)
in order to compute the size of buffer that is needed. This value is passed
back via the <i>outlengthptr</i> variable, with the result of the function still
being PCRE2_ERROR_NOMEMORY.
</P>
<P>
Passing a buffer size of zero is a permitted way of finding out how much memory
is needed for given substitution. However, this does mean that the entire
operation is carried out twice. Depending on the application, it may be more
efficient to allocate a large buffer and free the excess afterwards, instead of
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
</P>
<P>
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do
not appear in the pattern to be treated as unset groups. This option should be
used with care, because it means that a typo in a group name or number no
@ -3907,7 +3915,7 @@ Cambridge, England.
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 22 January 2020
Last updated: 16 February 2020
<br>
Copyright &copy; 1997-2020 University of Cambridge.
<br>

View File

@ -3200,13 +3200,13 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to re-
turn just the replacement string(s). The default action is to perform
just one replacement if the pattern matches, but there is an option
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below
for details).
that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL be-
low).
If successful, pcre2_substitute() returns the number of substitutions
that were carried out. This may be zero if no match was found, and is
never greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A nega-
tive value is returned if an error is detected (see below for details).
tive value is returned if an error is detected.
Matches in which a \K item in a lookahead in the pattern causes the
match to end before it starts are not supported, and give rise to an
@ -3221,10 +3221,11 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
ment functions from the match context, if provided, or else those that
were used to allocate memory for the compiled code.
If an external match_data block is provided, its contents afterwards
are those set by the final call to pcre2_match(). For global changes,
this will have ended in a no-match error. The contents of the ovector
within the match data block may or may not have been changed.
If match_data is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
provided block is used for all calls to pcre2_match(), and its contents
afterwards are the result of the final call. For global changes, this
will always be a no-match error. The contents of the ovector within the
match data block may or may not have been changed.
As well as the usual options for pcre2_match(), a number of additional
options can be set in the options argument of pcre2_substitute(). One
@ -3236,43 +3237,65 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
application to check for a match before choosing to substitute, without
having to repeat the match.
The code argument is not used for the first substitution when
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also
set, pcre2_match() will be called after the first substitution to check
for further matches, and the contents of the match_data block will be
changed.
The contents of the externally supplied match data block are not
changed when PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTI-
TUTE_GLOBAL is also set, pcre2_match() is called after the first sub-
stitution to check for further matches, but this is done using an in-
ternally obtained match data block, thus always leaving the external
block unchanged.
The default is to return a copy of the subject string with matched sub-
strings replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set,
only the replacement substrings are returned. In the global case, mul-
tiple replacements are concatenated in the output buffer. Substitution
callouts (see below) can be used to separate them if necessary.
The code argument is not used for matching before the first substitu-
tion when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided,
even when PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains in-
formation such as the UTF setting and the number of capturing parenthe-
ses in the pattern.
The default action of pcre2_substitute() is to return a copy of the
subject string with matched substrings replaced. However, if PCRE2_SUB-
STITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are
returned. In the global case, multiple replacements are concatenated in
the output buffer. Substitution callouts (see below) can be used to
separate them if necessary.
The outlengthptr argument of pcre2_substitute() must point to a vari-
able that contains the length, in code units, of the output buffer. If
the function is successful, the value is updated to contain the length
of the new string, excluding the trailing zero that is automatically
added.
in code units of the new string, excluding the trailing zero that is
automatically added.
If the function is not successful, the value set via outlengthptr de-
pends on the type of error. For syntax errors in the replacement
string, the value is the offset in the replacement string where the er-
ror was detected. For other errors, the value is PCRE2_UNSET by de-
fault. This includes the case of the output buffer being too small, un-
less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see below), in which case
the value is the minimum length needed, including space for the trail-
ing zero. Note that in order to compute the required length, pcre2_sub-
stitute() has to simulate all the matching and copying, instead of giv-
ing an error return as soon as the buffer overflows. Note also that the
length is in code units, not bytes.
less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set.
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
ORY immediately. If this option is set, however, pcre2_substitute()
continues to go through the motions of matching and substituting (with-
out, of course, writing anything) in order to compute the size of buf-
fer that is needed. This value is passed back via the outlengthptr
variable, with the result of the function still being PCRE2_ER-
ROR_NOMEMORY.
Passing a buffer size of zero is a permitted way of finding out how
much memory is needed for given substitution. However, this does mean
that the entire operation is carried out twice. Depending on the appli-
cation, it may be more efficient to allocate a large buffer and free
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
FLOW_LENGTH.
The replacement string, which is interpreted as a UTF string in UTF
mode, is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option
is set. If the PCRE2_SUBSTITUTE_LITERAL option is set, it is not inter-
preted in any way. By default, however, a dollar character is an escape
character that can specify the insertion of characters from capture
groups and names from (*MARK) or other control verbs in the pattern.
The following forms are always recognized:
mode, is checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An
invalid UTF replacement string causes an immediate return with the rel-
evant UTF error code.
If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not in-
terpreted in any way. By default, however, a dollar character is an es-
cape character that can specify the insertion of characters from cap-
ture groups and names from (*MARK) or other control verbs in the pat-
tern. The following forms are always recognized:
$$ insert a dollar character
$<n> or ${<n>} insert the contents of group <n>
@ -3320,22 +3343,6 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
two characters are CR, LF. In this case, the offset is advanced by two
characters.
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output
buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
ORY immediately. If this option is set, however, pcre2_substitute()
continues to go through the motions of matching and substituting (with-
out, of course, writing anything) in order to compute the size of buf-
fer that is needed. This value is passed back via the outlengthptr
variable, with the result of the function still being PCRE2_ER-
ROR_NOMEMORY.
Passing a buffer size of zero is a permitted way of finding out how
much memory is needed for given substitution. However, this does mean
that the entire operation is carried out twice. Depending on the appli-
cation, it may be more efficient to allocate a large buffer and free
the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER-
FLOW_LENGTH.
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
do not appear in the pattern to be treated as unset groups. This option
should be used with care, because it means that a typo in a group name
@ -3754,7 +3761,7 @@ AUTHOR
REVISION
Last updated: 22 January 2020
Last updated: 16 February 2020
Copyright (c) 1997-2020 University of Cambridge.
------------------------------------------------------------------------------