Interpret NULL pointer, zero length as an empty string for subjects and replacements.

This commit is contained in:
Philip Hazel 2021-11-30 16:34:39 +00:00
parent 7ab2769728
commit 4ef0c51d2b
16 changed files with 241 additions and 131 deletions

View File

@ -34,6 +34,11 @@ substituting.
12. Add check for NULL replacement to pcre2_substitute(). 12. Add check for NULL replacement to pcre2_substitute().
13. For the subject arguments of pcre2_match(), pcre2_dfa_match(), and
pcre2_substitute(), and the replacement argument of the latter, if the pointer
is NULL and the length is zero, treat as an empty string. Apparently a number
of applications treat NULL/0 in this way.
Version 10.39 29-October-2021 Version 10.39 29-October-2021
----------------------------- -----------------------------

View File

@ -2640,7 +2640,9 @@ The subject string is passed to <b>pcre2_match()</b> as a pointer in
<i>startoffset</i>. The length and offset are in code units, not characters. <i>startoffset</i>. The length and offset are in code units, not characters.
That is, they are in bytes for the 8-bit library, 16-bit code units for the That is, they are in bytes for the 8-bit library, 16-bit code units for the
16-bit library, and 32-bit code units for the 32-bit library, whether or not 16-bit library, and 32-bit code units for the 32-bit library, whether or not
UTF processing is enabled. UTF processing is enabled. As a special case, if <i>subject</i> is NULL and
<i>length</i> is zero, the subject is assumed to be an empty string. If
<i>length</i> is non-zero, an error occurs if <i>subject</i> is NULL.
</P> </P>
<P> <P>
If <i>startoffset</i> is greater than the length of the subject, If <i>startoffset</i> is greater than the length of the subject,
@ -3394,12 +3396,17 @@ same number causes an error at compile time.
<P> <P>
This function optionally calls <b>pcre2_match()</b> and then makes a copy of the This function optionally calls <b>pcre2_match()</b> and then makes a copy of the
subject string in <i>outputbuffer</i>, replacing parts that were matched with subject string in <i>outputbuffer</i>, replacing parts that were matched with
the <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This the <i>replacement</i> string, whose length is supplied in <b>rlength</b>, which
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. As a
option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the special case, if <i>replacement</i> is NULL and <i>rlength</i> is zero, the
replacement string(s). The default action is to perform just one replacement if replacement is assumed to be an empty string. If <i>rlength</i> is non-zero, an
the pattern matches, but there is an option that requests multiple replacements error occurs if <i>replacement</i> is NULL.
(see PCRE2_SUBSTITUTE_GLOBAL below). </P>
<P>
There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just
the replacement string(s). The default action is to perform just one
replacement if the pattern matches, but there is an option that requests
multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below).
</P> </P>
<P> <P>
If successful, <b>pcre2_substitute()</b> returns the number of substitutions If successful, <b>pcre2_substitute()</b> returns the number of substitutions
@ -3812,12 +3819,13 @@ other alternatives. Ultimately, when it runs out of matches,
<P> <P>
The function <b>pcre2_dfa_match()</b> is called to match a subject string The function <b>pcre2_dfa_match()</b> is called to match a subject string
against a compiled pattern, using a matching algorithm that scans the subject against a compiled pattern, using a matching algorithm that scans the subject
string just once (not counting lookaround assertions), and does not backtrack. string just once (not counting lookaround assertions), and does not backtrack
This has different characteristics to the normal algorithm, and is not (except when processing lookaround assertions). This has different
compatible with Perl. Some of the features of PCRE2 patterns are not supported. characteristics to the normal algorithm, and is not compatible with Perl. Some
Nevertheless, there are times when this kind of matching can be useful. For a of the features of PCRE2 patterns are not supported. Nevertheless, there are
discussion of the two matching algorithms, and a list of features that times when this kind of matching can be useful. For a discussion of the two
<b>pcre2_dfa_match()</b> does not support, see the matching algorithms, and a list of features that <b>pcre2_dfa_match()</b> does
not support, see the
<a href="pcre2matching.html"><b>pcre2matching</b></a> <a href="pcre2matching.html"><b>pcre2matching</b></a>
documentation. documentation.
</P> </P>
@ -4010,7 +4018,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 30 August 2021 Last updated: 30 November 2021
<br> <br>
Copyright &copy; 1997-2021 University of Cambridge. Copyright &copy; 1997-2021 University of Cambridge.
<br> <br>

View File

@ -269,11 +269,11 @@ starts another match, that match must use a different JIT stack to the one used
for currently suspended match(es). for currently suspended match(es).
</P> </P>
<P> <P>
In a multithread application, if you do not In a multithread application, if you do not specify a JIT stack, or if you
specify a JIT stack, or if you assign or pass back NULL from a callback, that assign or pass back NULL from a callback, that is thread-safe, because each
is thread-safe, because each thread has its own machine stack. However, if you thread has its own machine stack. However, if you assign or pass back a
assign or pass back a non-NULL JIT stack, this must be a different stack for non-NULL JIT stack, this must be a different stack for each thread so that the
each thread so that the application is thread-safe. application is thread-safe.
</P> </P>
<P> <P>
Strictly speaking, even more is allowed. You can assign the same non-NULL stack Strictly speaking, even more is allowed. You can assign the same non-NULL stack
@ -382,8 +382,8 @@ out this complicated API.
<b>void pcre2_jit_free_unused_memory(pcre2_general_context *<i>gcontext</i>);</b> <b>void pcre2_jit_free_unused_memory(pcre2_general_context *<i>gcontext</i>);</b>
</P> </P>
<P> <P>
The JIT executable allocator does not free all memory when it is possible. The JIT executable allocator does not free all memory when it is possible. It
It expects new allocations, and keeps some free memory around to improve expects new allocations, and keeps some free memory around to improve
allocation speed. However, in low memory conditions, it might be better to free allocation speed. However, in low memory conditions, it might be better to free
all possible memory. You can cause this to happen by calling all possible memory. You can cause this to happen by calling
pcre2_jit_free_unused_memory(). Its argument is a general context, for custom pcre2_jit_free_unused_memory(). Its argument is a general context, for custom
@ -442,10 +442,10 @@ that was not compiled.
<P> <P>
When you call <b>pcre2_match()</b>, as well as testing for invalid options, a When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
number of other sanity checks are performed on the arguments. For example, if number of other sanity checks are performed on the arguments. For example, if
the subject pointer is NULL, an immediate error is given. Also, unless the subject pointer is NULL but the length is non-zero, an immediate error is
PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for validity. In the given. Also, unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested
interests of speed, these checks do not happen on the JIT fast path, and if for validity. In the interests of speed, these checks do not happen on the JIT
invalid data is passed, the result is undefined. fast path, and if invalid data is passed, the result is undefined.
</P> </P>
<P> <P>
Bypassing the sanity checks and the <b>pcre2_match()</b> wrapping can give Bypassing the sanity checks and the <b>pcre2_match()</b> wrapping can give
@ -466,9 +466,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC14" href="#TOC1">REVISION</a><br> <br><a name="SEC14" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 23 May 2019 Last updated: 30 November 2021
<br> <br>
Copyright &copy; 1997-2019 University of Cambridge. Copyright &copy; 1997-2021 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -2579,7 +2579,9 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
and offset are in code units, not characters. That is, they are in and offset are in code units, not characters. That is, they are in
bytes for the 8-bit library, 16-bit code units for the 16-bit library, bytes for the 8-bit library, 16-bit code units for the 16-bit library,
and 32-bit code units for the 32-bit library, whether or not UTF pro- and 32-bit code units for the 32-bit library, whether or not UTF pro-
cessing is enabled. cessing is enabled. As a special case, if subject is NULL and length is
zero, the subject is assumed to be an empty string. If length is non-
zero, an error occurs if subject is NULL.
If startoffset is greater than the length of the subject, pcre2_match() If startoffset is greater than the length of the subject, pcre2_match()
returns PCRE2_ERROR_BADOFFSET. When the starting offset is zero, the returns PCRE2_ERROR_BADOFFSET. When the starting offset is zero, the
@ -3280,8 +3282,12 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
This function optionally calls pcre2_match() and then makes a copy of This function optionally calls pcre2_match() and then makes a copy of
the subject string in outputbuffer, replacing parts that were matched the subject string in outputbuffer, replacing parts that were matched
with the replacement string, whose length is supplied in rlength. This with the replacement string, whose length is supplied in rlength, which
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. As
a special case, if replacement is NULL and rlength is zero, the re-
placement is assumed to be an empty string. If rlength is non-zero, an
error occurs if replacement is NULL.
There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to re- There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to re-
turn just the replacement string(s). The default action is to perform turn just the replacement string(s). The default action is to perform
just one replacement if the pattern matches, but there is an option just one replacement if the pattern matches, but there is an option
@ -3666,23 +3672,24 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
The function pcre2_dfa_match() is called to match a subject string The function pcre2_dfa_match() is called to match a subject string
against a compiled pattern, using a matching algorithm that scans the against a compiled pattern, using a matching algorithm that scans the
subject string just once (not counting lookaround assertions), and does subject string just once (not counting lookaround assertions), and does
not backtrack. This has different characteristics to the normal algo- not backtrack (except when processing lookaround assertions). This has
rithm, and is not compatible with Perl. Some of the features of PCRE2 different characteristics to the normal algorithm, and is not compati-
patterns are not supported. Nevertheless, there are times when this ble with Perl. Some of the features of PCRE2 patterns are not sup-
kind of matching can be useful. For a discussion of the two matching ported. Nevertheless, there are times when this kind of matching can be
algorithms, and a list of features that pcre2_dfa_match() does not sup- useful. For a discussion of the two matching algorithms, and a list of
port, see the pcre2matching documentation. features that pcre2_dfa_match() does not support, see the pcre2matching
documentation.
The arguments for the pcre2_dfa_match() function are the same as for The arguments for the pcre2_dfa_match() function are the same as for
pcre2_match(), plus two extras. The ovector within the match data block pcre2_match(), plus two extras. The ovector within the match data block
is used in a different way, and this is described below. The other com- is used in a different way, and this is described below. The other com-
mon arguments are used in the same way as for pcre2_match(), so their mon arguments are used in the same way as for pcre2_match(), so their
description is not repeated here. description is not repeated here.
The two additional arguments provide workspace for the function. The The two additional arguments provide workspace for the function. The
workspace vector should contain at least 20 elements. It is used for workspace vector should contain at least 20 elements. It is used for
keeping track of multiple paths through the pattern tree. More keeping track of multiple paths through the pattern tree. More
workspace is needed for patterns and subjects where there are a lot of workspace is needed for patterns and subjects where there are a lot of
potential matches. potential matches.
Here is an example of a simple call to pcre2_dfa_match(): Here is an example of a simple call to pcre2_dfa_match():
@ -3702,45 +3709,45 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
Option bits for pcre2_dfa_match() Option bits for pcre2_dfa_match()
The unused bits of the options argument for pcre2_dfa_match() must be The unused bits of the options argument for pcre2_dfa_match() must be
zero. The only bits that may be set are PCRE2_ANCHORED, zero. The only bits that may be set are PCRE2_ANCHORED,
PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NO- PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NO-
TEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, TEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK,
PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and
PCRE2_DFA_RESTART. All but the last four of these are exactly the same PCRE2_DFA_RESTART. All but the last four of these are exactly the same
as for pcre2_match(), so their description is not repeated here. as for pcre2_match(), so their description is not repeated here.
PCRE2_PARTIAL_HARD PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT PCRE2_PARTIAL_SOFT
These have the same general effect as they do for pcre2_match(), but These have the same general effect as they do for pcre2_match(), but
the details are slightly different. When PCRE2_PARTIAL_HARD is set for the details are slightly different. When PCRE2_PARTIAL_HARD is set for
pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if the end of the pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if the end of the
subject is reached and there is still at least one matching possibility subject is reached and there is still at least one matching possibility
that requires additional characters. This happens even if some complete that requires additional characters. This happens even if some complete
matches have already been found. When PCRE2_PARTIAL_SOFT is set, the matches have already been found. When PCRE2_PARTIAL_SOFT is set, the
return code PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL return code PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL
if the end of the subject is reached, there have been no complete if the end of the subject is reached, there have been no complete
matches, but there is still at least one matching possibility. The por- matches, but there is still at least one matching possibility. The por-
tion of the string that was inspected when the longest partial match tion of the string that was inspected when the longest partial match
was found is set as the first matching string in both cases. There is a was found is set as the first matching string in both cases. There is a
more detailed discussion of partial and multi-segment matching, with more detailed discussion of partial and multi-segment matching, with
examples, in the pcre2partial documentation. examples, in the pcre2partial documentation.
PCRE2_DFA_SHORTEST PCRE2_DFA_SHORTEST
Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to
stop as soon as it has found one match. Because of the way the alterna- stop as soon as it has found one match. Because of the way the alterna-
tive algorithm works, this is necessarily the shortest possible match tive algorithm works, this is necessarily the shortest possible match
at the first possible matching point in the subject string. at the first possible matching point in the subject string.
PCRE2_DFA_RESTART PCRE2_DFA_RESTART
When pcre2_dfa_match() returns a partial match, it is possible to call When pcre2_dfa_match() returns a partial match, it is possible to call
it again, with additional subject characters, and have it continue with it again, with additional subject characters, and have it continue with
the same match. The PCRE2_DFA_RESTART option requests this action; when the same match. The PCRE2_DFA_RESTART option requests this action; when
it is set, the workspace and wscount options must reference the same it is set, the workspace and wscount options must reference the same
vector as before because data about the match so far is left in them vector as before because data about the match so far is left in them
after a partial match. There is more discussion of this facility in the after a partial match. There is more discussion of this facility in the
pcre2partial documentation. pcre2partial documentation.
@ -3748,8 +3755,8 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
When pcre2_dfa_match() succeeds, it may have matched more than one sub- When pcre2_dfa_match() succeeds, it may have matched more than one sub-
string in the subject. Note, however, that all the matches from one run string in the subject. Note, however, that all the matches from one run
of the function start at the same point in the subject. The shorter of the function start at the same point in the subject. The shorter
matches are all initial substrings of the longer matches. For example, matches are all initial substrings of the longer matches. For example,
if the pattern if the pattern
<.*> <.*>
@ -3764,80 +3771,80 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
<something> <something else> <something> <something else>
<something> <something>
On success, the yield of the function is a number greater than zero, On success, the yield of the function is a number greater than zero,
which is the number of matched substrings. The offsets of the sub- which is the number of matched substrings. The offsets of the sub-
strings are returned in the ovector, and can be extracted by number in strings are returned in the ovector, and can be extracted by number in
the same way as for pcre2_match(), but the numbers bear no relation to the same way as for pcre2_match(), but the numbers bear no relation to
any capture groups that may exist in the pattern, because DFA matching any capture groups that may exist in the pattern, because DFA matching
does not support capturing. does not support capturing.
Calls to the convenience functions that extract substrings by name re- Calls to the convenience functions that extract substrings by name re-
turn the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used af- turn the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used af-
ter a DFA match. The convenience functions that extract substrings by ter a DFA match. The convenience functions that extract substrings by
number never return PCRE2_ERROR_NOSUBSTRING. number never return PCRE2_ERROR_NOSUBSTRING.
The matched strings are stored in the ovector in reverse order of The matched strings are stored in the ovector in reverse order of
length; that is, the longest matching string is first. If there were length; that is, the longest matching string is first. If there were
too many matches to fit into the ovector, the yield of the function is too many matches to fit into the ovector, the yield of the function is
zero, and the vector is filled with the longest matches. zero, and the vector is filled with the longest matches.
NOTE: PCRE2's "auto-possessification" optimization usually applies to NOTE: PCRE2's "auto-possessification" optimization usually applies to
character repeats at the end of a pattern (as well as internally). For character repeats at the end of a pattern (as well as internally). For
example, the pattern "a\d+" is compiled as if it were "a\d++". For DFA example, the pattern "a\d+" is compiled as if it were "a\d++". For DFA
matching, this means that only one possible match is found. If you re- matching, this means that only one possible match is found. If you re-
ally do want multiple matches in such cases, either use an ungreedy re- ally do want multiple matches in such cases, either use an ungreedy re-
peat such as "a\d+?" or set the PCRE2_NO_AUTO_POSSESS option when com- peat such as "a\d+?" or set the PCRE2_NO_AUTO_POSSESS option when com-
piling. piling.
Error returns from pcre2_dfa_match() Error returns from pcre2_dfa_match()
The pcre2_dfa_match() function returns a negative number when it fails. The pcre2_dfa_match() function returns a negative number when it fails.
Many of the errors are the same as for pcre2_match(), as described Many of the errors are the same as for pcre2_match(), as described
above. There are in addition the following errors that are specific to above. There are in addition the following errors that are specific to
pcre2_dfa_match(): pcre2_dfa_match():
PCRE2_ERROR_DFA_UITEM PCRE2_ERROR_DFA_UITEM
This return is given if pcre2_dfa_match() encounters an item in the This return is given if pcre2_dfa_match() encounters an item in the
pattern that it does not support, for instance, the use of \C in a UTF pattern that it does not support, for instance, the use of \C in a UTF
mode or a backreference. mode or a backreference.
PCRE2_ERROR_DFA_UCOND PCRE2_ERROR_DFA_UCOND
This return is given if pcre2_dfa_match() encounters a condition item This return is given if pcre2_dfa_match() encounters a condition item
that uses a backreference for the condition, or a test for recursion in that uses a backreference for the condition, or a test for recursion in
a specific capture group. These are not supported. a specific capture group. These are not supported.
PCRE2_ERROR_DFA_UINVALID_UTF PCRE2_ERROR_DFA_UINVALID_UTF
This return is given if pcre2_dfa_match() is called for a pattern that This return is given if pcre2_dfa_match() is called for a pattern that
was compiled with PCRE2_MATCH_INVALID_UTF. This is not supported for was compiled with PCRE2_MATCH_INVALID_UTF. This is not supported for
DFA matching. DFA matching.
PCRE2_ERROR_DFA_WSSIZE PCRE2_ERROR_DFA_WSSIZE
This return is given if pcre2_dfa_match() runs out of space in the This return is given if pcre2_dfa_match() runs out of space in the
workspace vector. workspace vector.
PCRE2_ERROR_DFA_RECURSE PCRE2_ERROR_DFA_RECURSE
When a recursion or subroutine call is processed, the matching function When a recursion or subroutine call is processed, the matching function
calls itself recursively, using private memory for the ovector and calls itself recursively, using private memory for the ovector and
workspace. This error is given if the internal ovector is not large workspace. This error is given if the internal ovector is not large
enough. This should be extremely rare, as a vector of size 1000 is enough. This should be extremely rare, as a vector of size 1000 is
used. used.
PCRE2_ERROR_DFA_BADRESTART PCRE2_ERROR_DFA_BADRESTART
When pcre2_dfa_match() is called with the PCRE2_DFA_RESTART option, When pcre2_dfa_match() is called with the PCRE2_DFA_RESTART option,
some plausibility checks are made on the contents of the workspace, some plausibility checks are made on the contents of the workspace,
which should contain data about the previous partial match. If any of which should contain data about the previous partial match. If any of
these checks fail, this error is given. these checks fail, this error is given.
SEE ALSO SEE ALSO
pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3), pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3),
pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3). pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).
@ -3850,7 +3857,7 @@ AUTHOR
REVISION REVISION
Last updated: 30 August 2021 Last updated: 30 November 2021
Copyright (c) 1997-2021 University of Cambridge. Copyright (c) 1997-2021 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -5436,7 +5443,7 @@ FREEING JIT SPECULATIVE MEMORY
void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext); void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext);
The JIT executable allocator does not free all memory when it is possi- The JIT executable allocator does not free all memory when it is possi-
ble. It expects new allocations, and keeps some free memory around to ble. It expects new allocations, and keeps some free memory around to
improve allocation speed. However, in low memory conditions, it might improve allocation speed. However, in low memory conditions, it might
be better to free all possible memory. You can cause this to happen by be better to free all possible memory. You can cause this to happen by
calling pcre2_jit_free_unused_memory(). Its argument is a general con- calling pcre2_jit_free_unused_memory(). Its argument is a general con-
@ -5494,12 +5501,13 @@ JIT FAST PATH API
When you call pcre2_match(), as well as testing for invalid options, a When you call pcre2_match(), as well as testing for invalid options, a
number of other sanity checks are performed on the arguments. For exam- number of other sanity checks are performed on the arguments. For exam-
ple, if the subject pointer is NULL, an immediate error is given. Also, ple, if the subject pointer is NULL but the length is non-zero, an im-
unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for mediate error is given. Also, unless PCRE2_NO_UTF_CHECK is set, a UTF
validity. In the interests of speed, these checks do not happen on the subject string is tested for validity. In the interests of speed, these
JIT fast path, and if invalid data is passed, the result is undefined. checks do not happen on the JIT fast path, and if invalid data is
passed, the result is undefined.
Bypassing the sanity checks and the pcre2_match() wrapping can give Bypassing the sanity checks and the pcre2_match() wrapping can give
speedups of more than 10%. speedups of more than 10%.
@ -5517,8 +5525,8 @@ AUTHOR
REVISION REVISION
Last updated: 23 May 2019 Last updated: 30 November 2021
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2021 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "30 August 2021" "PCRE2 10.38" .TH PCRE2API 3 "30 November 2021" "PCRE2 10.40"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -2624,7 +2624,9 @@ The subject string is passed to \fBpcre2_match()\fP as a pointer in
\fIstartoffset\fP. The length and offset are in code units, not characters. \fIstartoffset\fP. The length and offset are in code units, not characters.
That is, they are in bytes for the 8-bit library, 16-bit code units for the That is, they are in bytes for the 8-bit library, 16-bit code units for the
16-bit library, and 32-bit code units for the 32-bit library, whether or not 16-bit library, and 32-bit code units for the 32-bit library, whether or not
UTF processing is enabled. UTF processing is enabled. As a special case, if \fIsubject\fP is NULL and
\fIlength\fP is zero, the subject is assumed to be an empty string. If
\fIlength\fP is non-zero, an error occurs if \fIsubject\fP is NULL.
.P .P
If \fIstartoffset\fP is greater than the length of the subject, If \fIstartoffset\fP is greater than the length of the subject,
\fBpcre2_match()\fP returns PCRE2_ERROR_BADOFFSET. When the starting offset is \fBpcre2_match()\fP returns PCRE2_ERROR_BADOFFSET. When the starting offset is
@ -3413,12 +3415,16 @@ same number causes an error at compile time.
.P .P
This function optionally calls \fBpcre2_match()\fP and then makes a copy of the This function optionally calls \fBpcre2_match()\fP and then makes a copy of the
subject string in \fIoutputbuffer\fP, replacing parts that were matched with subject string in \fIoutputbuffer\fP, replacing parts that were matched with
the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP, which
can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. As a
option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the special case, if \fIreplacement\fP is NULL and \fIrlength\fP is zero, the
replacement string(s). The default action is to perform just one replacement if replacement is assumed to be an empty string. If \fIrlength\fP is non-zero, an
the pattern matches, but there is an option that requests multiple replacements error occurs if \fIreplacement\fP is NULL.
(see PCRE2_SUBSTITUTE_GLOBAL below). .P
There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just
the replacement string(s). The default action is to perform just one
replacement if the pattern matches, but there is an option that requests
multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below).
.P .P
If successful, \fBpcre2_substitute()\fP returns the number of substitutions If successful, \fBpcre2_substitute()\fP returns the number of substitutions
that were carried out. This may be zero if no match was found, and is never that were carried out. This may be zero if no match was found, and is never
@ -3813,12 +3819,13 @@ other alternatives. Ultimately, when it runs out of matches,
.P .P
The function \fBpcre2_dfa_match()\fP is called to match a subject string The function \fBpcre2_dfa_match()\fP is called to match a subject string
against a compiled pattern, using a matching algorithm that scans the subject against a compiled pattern, using a matching algorithm that scans the subject
string just once (not counting lookaround assertions), and does not backtrack. string just once (not counting lookaround assertions), and does not backtrack
This has different characteristics to the normal algorithm, and is not (except when processing lookaround assertions). This has different
compatible with Perl. Some of the features of PCRE2 patterns are not supported. characteristics to the normal algorithm, and is not compatible with Perl. Some
Nevertheless, there are times when this kind of matching can be useful. For a of the features of PCRE2 patterns are not supported. Nevertheless, there are
discussion of the two matching algorithms, and a list of features that times when this kind of matching can be useful. For a discussion of the two
\fBpcre2_dfa_match()\fP does not support, see the matching algorithms, and a list of features that \fBpcre2_dfa_match()\fP does
not support, see the
.\" HREF .\" HREF
\fBpcre2matching\fP \fBpcre2matching\fP
.\" .\"
@ -4018,6 +4025,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 30 August 2021 Last updated: 30 November 2021
Copyright (c) 1997-2021 University of Cambridge. Copyright (c) 1997-2021 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2JIT 3 "23 May 2019" "PCRE2 10.34" .TH PCRE2JIT 3 "30 November 2021" "PCRE2 10.40"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT" .SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT"
@ -251,11 +251,11 @@ non-sequential matches in one thread is to use callouts: if a callout function
starts another match, that match must use a different JIT stack to the one used starts another match, that match must use a different JIT stack to the one used
for currently suspended match(es). for currently suspended match(es).
.P .P
In a multithread application, if you do not In a multithread application, if you do not specify a JIT stack, or if you
specify a JIT stack, or if you assign or pass back NULL from a callback, that assign or pass back NULL from a callback, that is thread-safe, because each
is thread-safe, because each thread has its own machine stack. However, if you thread has its own machine stack. However, if you assign or pass back a
assign or pass back a non-NULL JIT stack, this must be a different stack for non-NULL JIT stack, this must be a different stack for each thread so that the
each thread so that the application is thread-safe. application is thread-safe.
.P .P
Strictly speaking, even more is allowed. You can assign the same non-NULL stack Strictly speaking, even more is allowed. You can assign the same non-NULL stack
to a match context that is used by any number of patterns, as long as they are to a match context that is used by any number of patterns, as long as they are
@ -355,8 +355,8 @@ out this complicated API.
.B void pcre2_jit_free_unused_memory(pcre2_general_context *\fIgcontext\fP); .B void pcre2_jit_free_unused_memory(pcre2_general_context *\fIgcontext\fP);
.fi .fi
.P .P
The JIT executable allocator does not free all memory when it is possible. The JIT executable allocator does not free all memory when it is possible. It
It expects new allocations, and keeps some free memory around to improve expects new allocations, and keeps some free memory around to improve
allocation speed. However, in low memory conditions, it might be better to free allocation speed. However, in low memory conditions, it might be better to free
all possible memory. You can cause this to happen by calling all possible memory. You can cause this to happen by calling
pcre2_jit_free_unused_memory(). Its argument is a general context, for custom pcre2_jit_free_unused_memory(). Its argument is a general context, for custom
@ -416,10 +416,10 @@ that was not compiled.
.P .P
When you call \fBpcre2_match()\fP, as well as testing for invalid options, a When you call \fBpcre2_match()\fP, as well as testing for invalid options, a
number of other sanity checks are performed on the arguments. For example, if number of other sanity checks are performed on the arguments. For example, if
the subject pointer is NULL, an immediate error is given. Also, unless the subject pointer is NULL but the length is non-zero, an immediate error is
PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for validity. In the given. Also, unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested
interests of speed, these checks do not happen on the JIT fast path, and if for validity. In the interests of speed, these checks do not happen on the JIT
invalid data is passed, the result is undefined. fast path, and if invalid data is passed, the result is undefined.
.P .P
Bypassing the sanity checks and the \fBpcre2_match()\fP wrapping can give Bypassing the sanity checks and the \fBpcre2_match()\fP wrapping can give
speedups of more than 10%. speedups of more than 10%.
@ -445,6 +445,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 23 May 2019 Last updated: 30 November 2021
Copyright (c) 1997-2019 University of Cambridge. Copyright (c) 1997-2021 University of Cambridge.
.fi .fi

View File

@ -3285,6 +3285,10 @@ rws->next = NULL;
rws->size = RWS_BASE_SIZE; rws->size = RWS_BASE_SIZE;
rws->free = RWS_BASE_SIZE - RWS_ANCHOR_SIZE; rws->free = RWS_BASE_SIZE - RWS_ANCHOR_SIZE;
/* Recognize NULL, length 0 as an empty string. */
if (subject == NULL && length == 0) subject = (PCRE2_SPTR)"";
/* Plausibility checks */ /* Plausibility checks */
if ((options & ~PUBLIC_DFA_MATCH_OPTIONS) != 0) return PCRE2_ERROR_BADOPTION; if ((options & ~PUBLIC_DFA_MATCH_OPTIONS) != 0) return PCRE2_ERROR_BADOPTION;

View File

@ -253,7 +253,7 @@ static const unsigned char match_error_texts[] =
"unknown substring\0" "unknown substring\0"
/* 50 */ /* 50 */
"non-unique substring name\0" "non-unique substring name\0"
"NULL argument passed\0" "NULL argument passed with non-zero length\0"
"nested recursion at the same subject position\0" "nested recursion at the same subject position\0"
"matching depth limit exceeded\0" "matching depth limit exceeded\0"
"requested value is not available\0" "requested value is not available\0"

View File

@ -6170,6 +6170,10 @@ PCRE2_SPTR stack_frames_vector[START_FRAMES_SIZE/sizeof(PCRE2_SPTR)]
PCRE2_KEEP_UNINITIALIZED; PCRE2_KEEP_UNINITIALIZED;
mb->stack_frames = (heapframe *)stack_frames_vector; mb->stack_frames = (heapframe *)stack_frames_vector;
/* Recognize NULL, length 0 as an empty string. */
if (subject == NULL && length == 0) subject = (PCRE2_SPTR)"";
/* Plausibility checks */ /* Plausibility checks */
if ((options & ~PUBLIC_MATCH_OPTIONS) != 0) return PCRE2_ERROR_BADOPTION; if ((options & ~PUBLIC_MATCH_OPTIONS) != 0) return PCRE2_ERROR_BADOPTION;

View File

@ -260,9 +260,15 @@ PCRE2_UNSET, so as not to imply an offset in the replacement. */
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0) if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
return PCRE2_ERROR_BADOPTION; return PCRE2_ERROR_BADOPTION;
/* Validate length and find the end of the replacement. */ /* Validate length and find the end of the replacement. A NULL replacement of
zero length is interpreted as an empty string. */
if (replacement == NULL) return PCRE2_ERROR_NULL; if (replacement == NULL)
{
if (rlength != 0) return PCRE2_ERROR_NULL;
replacement = (PCRE2_SPTR)"";
}
if (rlength == PCRE2_ZERO_TERMINATED) rlength = PRIV(strlen)(replacement); if (rlength == PCRE2_ZERO_TERMINATED) rlength = PRIV(strlen)(replacement);
repend = replacement + rlength; repend = replacement + rlength;

View File

@ -304,4 +304,7 @@
/[aCz]/mg,firstline,newline=lf /[aCz]/mg,firstline,newline=lf
match\nmatch match\nmatch
//jitfast
\=null_subject
# End of testinput17 # End of testinput17

View File

@ -135,4 +135,9 @@
123ace 123ace
123ace\=posix_startend=2:6 123ace\=posix_startend=2:6
//posix
\= Expect errors
\=null_subject
abc\=null_subject
# End of testdata/testinput18 # End of testdata/testinput18

21
testdata/testinput2 vendored
View File

@ -5902,4 +5902,25 @@ a)"xI
# --------- # ---------
# Tests for zero-length NULL to be treated as an empty string.
//
\=null_subject
\= Expect error
abc\=null_subject
//replace=[20]
abc\=null_replacement
\=null_subject
\=null_replacement
/X*/g,replace=xy
\= Expect error
>X<\=null_replacement
/X+/replace=[20]
>XX<\=null_replacement
# ---------
# End of testinput2 # End of testinput2

View File

@ -550,4 +550,8 @@ Failed: error -47: match limit exceeded
match\nmatch match\nmatch
0: a (JIT) 0: a (JIT)
//jitfast
\=null_subject
0: (JIT)
# End of testinput17 # End of testinput17

View File

@ -215,4 +215,11 @@ Failed: POSIX code 16: bad argument at offset 0
3: <unset> 3: <unset>
4: c 4: c
//posix
\= Expect errors
\=null_subject
No match: POSIX code 16: bad argument
abc\=null_subject
No match: POSIX code 16: bad argument
# End of testdata/testinput18 # End of testdata/testinput18

28
testdata/testoutput2 vendored
View File

@ -17674,6 +17674,34 @@ Failed: error 199 at offset 14: \K is not allowed in lookarounds (but see PCRE2_
# --------- # ---------
# Tests for zero-length NULL to be treated as an empty string.
//
\=null_subject
0:
\= Expect error
abc\=null_subject
Failed: error -51: NULL argument passed with non-zero length
//replace=[20]
abc\=null_replacement
1: abc
\=null_subject
1:
\=null_replacement
1:
/X*/g,replace=xy
\= Expect error
>X<\=null_replacement
Failed: error -51: NULL argument passed with non-zero length
/X+/replace=[20]
>XX<\=null_replacement
1: ><
# ---------
# End of testinput2 # End of testinput2
Error -70: PCRE2_ERROR_BADDATA (unknown error number) Error -70: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data Error -62: bad serialized data