Refactor match_data() to always use the heap instead of having an initial frames vector on the stack; some consequential adjustmentsneeded.

This commit is contained in:
Philip Hazel 2022-07-27 17:44:55 +01:00
parent e47fc51584
commit d90fb23878
15 changed files with 332 additions and 256 deletions

View File

@ -32,6 +32,19 @@ tidied up an untidy #ifdef arrangement in pcre2test.
8. Fixed an issue in the backtracking optimization of character repeats in
JIT. Furthermore optimize star repetitions, not just plus repetitions.
9. Removed the use of an initial backtracking frames vector on the system stack
in pcre2_match() so that it now always uses the heap. (In a multi-thread
environment with very small stacks there had been an issue.) This also is
tidier for JIT matching, which didn't need that vector. The heap vector is now
remembered in the match data block and re-used if that block itself is re-used.
It is freed with the match data block.
10. Adjusted the find_limits code in pcre2test to work with change 9 above.
11. Added find_limits_noheap to pcre2test, because the heap limits are now
different in different environments and so cannot be included in the standard
tests.
Version 10.40 15-April-2022
---------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "22 April 2022" "PCRE2 10.41"
.TH PCRE2API 3 "27 July 2022" "PCRE2 10.41"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -953,7 +953,7 @@ has its own memory control arrangements (see the
documentation for more details). If the limit is reached, the negative error
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
is built; if it is not, the default is set very large and is essentially
"unlimited".
unlimited.
.P
A value for the heap limit may also be supplied by an item at the start of a
pattern of the form
@ -964,18 +964,18 @@ where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
limit is set, less than the default.
.P
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
stack for recording backtracking points. The more nested backtracking points
there are (that is, the deeper the search tree), the more memory is needed.
Heap memory is used only if the initial vector is too small. If the heap limit
is set to a value less than 21 (in particular, zero) no heap memory will be
used. In this case, only patterns that do not have a lot of nested backtracking
can be successfully processed.
The \fBpcre2_match()\fP function always needs some heap memory, so setting a
value of zero guarantees a "heap limit exceeded" error. Details of how
\fBpcre2_match()\fP uses the heap are given in the
.\" HREF
\fBpcre2perform\fP
.\"
documentation.
.P
Similarly, for \fBpcre2_dfa_match()\fP, a vector on the system stack is used
when processing pattern recursions, lookarounds, or atomic groups, and only if
this is not big enough is heap memory used. In this case, too, setting a value
of zero disables the use of the heap.
For \fBpcre2_dfa_match()\fP, a vector on the system stack is used when
processing pattern recursions, lookarounds, or atomic groups, and only if this
is not big enough is heap memory used. In this case, setting a value of zero
disables the use of the heap.
.sp
.nf
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
@ -1019,10 +1019,10 @@ less than the limit set by the caller of \fBpcre2_match()\fP or
.fi
.sp
This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP.
Each time a nested backtracking point is passed, a new memory "frame" is used
Each time a nested backtracking point is passed, a new memory frame is used
to remember the state of matching at that point. Thus, this parameter
indirectly limits the amount of memory that is used in a match. However,
because the size of each memory "frame" depends on the number of capturing
because the size of each memory frame depends on the number of capturing
parentheses, the actual memory limit varies from pattern to pattern. This limit
was more useful in versions before 10.30, where function recursion was used for
backtracking.
@ -3162,11 +3162,11 @@ The backtracking match limit was reached.
.sp
PCRE2_ERROR_NOMEMORY
.sp
If a pattern contains many nested backtracking points, heap memory is used to
remember them. This error is given when the memory allocation function (default
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is
also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
Heap memory is used to remember backgracking points. This error is given when
the memory allocation function (default or custom) fails. Note that a different
error, PCRE2_ERROR_HEAPLIMIT, is given if the amount of memory needed exceeds
the heap limit. PCRE2_ERROR_NOMEMORY is also returned if
PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
.sp
PCRE2_ERROR_NULL
.sp
@ -4027,6 +4027,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 14 December 2021
Copyright (c) 1997-2021 University of Cambridge.
Last updated: 27 July 2022
Copyright (c) 1997-2022 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2BUILD 3 "08 December 2021" "PCRE2 10.40"
.TH PCRE2BUILD 3 "27 July 2022" "PCRE2 10.41"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.
@ -278,12 +278,11 @@ to the \fBconfigure\fP command. This setting also applies to the
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
counting is done differently).
.P
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
stack to record backtracking points. The more nested backtracking points there
are (that is, the deeper the search tree), the more memory is needed. If the
initial vector is not large enough, heap memory is used, up to a certain limit,
which is specified in kibibytes (units of 1024 bytes). The limit can be changed
at run time, as described in the
The \fBpcre2_match()\fP function uses heap memory to record backtracking
points. The more nested backtracking points there are (that is, the deeper the
search tree), the more memory is needed. There is an upper limit, specified in
kibibytes (units of 1024 bytes). This limit can be changed at run time, as
described in the
.\" HREF
\fBpcre2api\fP
.\"
@ -625,7 +624,7 @@ give a warning.
.sp
.nf
Philip Hazel
University Computing Service
Retired from University Computing Service
Cambridge, England.
.fi
.
@ -634,6 +633,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 08 December 2021
Copyright (c) 1997-2021 University of Cambridge.
Last updated: 27 July 2022
Copyright (c) 1997-2022 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2GREP 1 "31 August 2021" "PCRE2 10.38"
.TH PCRE2GREP 1 "27 July 2022" "PCRE2 10.41"
.SH NAME
pcre2grep - a grep with Perl-compatible regular expressions.
.SH SYNOPSIS
@ -516,10 +516,7 @@ counter that is incremented each time around its main processing loop. If the
value set by \fB--match-limit\fP is reached, an error occurs.
.sp
The \fB--heap-limit\fP option specifies, as a number of kibibytes (units of
1024 bytes), the amount of heap memory that may be used for matching. Heap
memory is needed only if matching the pattern requires a significant number of
nested backtracking points to be remembered. This parameter can be set to zero
to forbid the use of heap memory altogether.
1024 bytes), the maximum amount of heap memory that may be used for matching.
.sp
The \fB--depth-limit\fP option limits the depth of nested backtracking points,
which indirectly limits the amount of memory that is used. The amount of memory
@ -960,6 +957,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 31 August 2021
Copyright (c) 1997-2021 University of Cambridge.
Last updated: 27 July 2022
Copyright (c) 1997-2022 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2LIMITS 3 "03 February 2019" "PCRE2 10.33"
.TH PCRE2LIMITS 3 "26 July 2022" "PCRE2 10.41"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "SIZE AND OTHER LIMITATIONS"
@ -51,6 +51,10 @@ is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
.P
The maximum length of a string argument to a callout is the largest number a
32-bit unsigned integer can hold.
.P
The maximum amount of heap memory used for matching is controlled by the heap
limit, which can be set in a pattern or in a match context. The default is a
very large number, effectively unlimited.
.
.
.SH AUTHOR
@ -58,7 +62,7 @@ The maximum length of a string argument to a callout is the largest number a
.sp
.nf
Philip Hazel
University Computing Service
Retired from University Computing Service
Cambridge, England.
.fi
.
@ -67,6 +71,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 02 February 2019
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 26 July 2022
Copyright (c) 1997-2022 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2PERFORM 3 "03 February 2019" "PCRE2 10.33"
.TH PCRE2PERFORM 3 "27 July 2022" "PCRE2 10.41"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 PERFORMANCE"
@ -69,12 +69,28 @@ From release 10.30, the interpretive (non-JIT) version of \fBpcre2_match()\fP
uses very little system stack at run time. In earlier releases recursive
function calls could use a great deal of stack, and this could cause problems,
but this usage has been eliminated. Backtracking positions are now explicitly
remembered in memory frames controlled by the code. An initial 20KiB vector of
frames is allocated on the system stack (enough for about 100 frames for small
patterns), but if this is insufficient, heap memory is used. The amount of heap
memory can be limited; if the limit is set to zero, only the initial stack
vector is used. Rewriting patterns to be time-efficient, as described below,
may also reduce the memory requirements.
remembered in memory frames controlled by the code.
.P
The size of each frame depends on the size of pointer variables and the number
of capturing parenthesized groups in the pattern being matched. On a 64-bit
system the frame size for a pattern with no captures is 128 bytes. For each
capturing group the size increases by 16 bytes.
.P
Until release 10.41, an initial 20KiB frames vector was allocated on the system
stack, but this still caused some issues for multi-thread applications where
each thread has a very small stack. From release 10.41 backtracking memory
frames are always held in heap memory. An initial heap allocation is obtained
the first time any match data block is passed to \fBpcre2_match()\fP. This is
remembered with the match data block and re-used if that block is used for
another match. It is freed when the match data block itself is freed.
.P
The size of the initial block is the larger of 20KiB or ten times the pattern's
frame size, unless the heap limit is less than this, in which case the heap
limit is used. If the initial block proves to be too small during matching, it
is replaced by a larger block, subject to the heap limit. The heap limit is
checked only when a new block is to be allocated. Reducing the heap limit
between calls to \fBpcre2_match()\fP with the same match data block does not
affect the saved block.
.P
In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive
function calls, but only for processing atomic groups, lookaround assertions,
@ -230,7 +246,7 @@ pattern to match. This is done by repeatedly matching with different limits.
.sp
.nf
Philip Hazel
University Computing Service
Retired from University Computing Service
Cambridge, England.
.fi
.
@ -239,6 +255,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 03 February 2019
Copyright (c) 1997-2019 University of Cambridge.
Last updated: 27 July 2022
Copyright (c) 1997-2022 University of Cambridge.
.fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "12 January 2022" "PCRE 10.40"
.TH PCRE2TEST 1 "27 July 2022" "PCRE 10.41"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -1206,7 +1206,8 @@ pattern, but can be overridden by modifiers on the subject.
copy=<number or name> copy captured substring
depth_limit=<n> set a depth limit
dfa use \fBpcre2_dfa_match()\fP
find_limits find match and depth limits
find_limits find heap, match and depth limits
find_limits_noheap find match and depth limits
get=<number or name> extract captured substring
getall extract all captured substrings
/g global global matching
@ -1528,7 +1529,7 @@ value that was set on the pattern.
.sp
The \fBheap_limit\fP, \fBmatch_limit\fP, and \fBdepth_limit\fP modifiers set
the appropriate limits in the match context. These values are ignored when the
\fBfind_limits\fP modifier is specified.
\fBfind_limits\fP or \fBfind_limits_noheap\fP modifier is specified.
.
.
.SS "Finding minimum limits"
@ -1538,8 +1539,12 @@ If the \fBfind_limits\fP modifier is present on a subject line, \fBpcre2test\fP
calls the relevant matching function several times, setting different values in
the match context via \fBpcre2_set_heap_limit()\fP,
\fBpcre2_set_match_limit()\fP, or \fBpcre2_set_depth_limit()\fP until it finds
the minimum values for each parameter that allows the match to complete without
error. If JIT is being used, only the match limit is relevant.
the smallest value for each parameter that allows the match to complete without
a "limit exceeded" error. The match itself may succeed or fail. An alternative
modifier, \fBfind_limits_noheap\fP, omits the heap limit. This is used in the
standard tests, because the minimum heap limit varies between systems. If JIT
is being used, only the match limit is relevant, and the other two are
automatically omitted.
.P
When using this modifier, the pattern should not contain any limit settings
such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
@ -1563,9 +1568,7 @@ and non-recursive, to the internal matching function, thus controlling the
overall amount of computing resource that is used.
.P
For both kinds of matching, the \fIheap_limit\fP number, which is in kibibytes
(units of 1024 bytes), limits the amount of heap memory used for matching. A
value of zero disables the use of any heap memory; many simple pattern matches
can be done without using the heap, so zero is not an unreasonable setting.
(units of 1024 bytes), limits the amount of heap memory used for matching.
.
.
.SS "Showing MARK names"
@ -1584,12 +1587,10 @@ is added to the non-match message.
.sp
The \fBmemory\fP modifier causes \fBpcre2test\fP to log the sizes of all heap
memory allocation and freeing calls that occur during a call to
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP. These occur only when a match
requires a bigger vector than the default for remembering backtracking points
(\fBpcre2_match()\fP) or for internal workspace (\fBpcre2_dfa_match()\fP). In
many cases there will be no heap memory used and therefore no additional
output. No heap memory is allocated during matching with JIT, so in that case
the \fBmemory\fP modifier never has any effect. For this modifier to work, the
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP. In the latter case, heap memory
is used only when a match requires more internal workspace that the default
allocation on the stack, so in many cases there will be no output. No heap
memory is allocated during matching with JIT. For this modifier to work, the
\fBnull_context\fP modifier must not be set on both the pattern and the
subject, though it can be set on one or the other.
.
@ -1649,7 +1650,8 @@ Normally, \fBpcre2test\fP passes a context block to \fBpcre2_match()\fP,
If the \fBnull_context\fP modifier is set, however, NULL is passed. This is for
testing that the matching and substitution functions behave correctly in this
case (they use default values). This modifier cannot be used with the
\fBfind_limits\fP or \fBsubstitute_callout\fP modifiers.
\fBfind_limits\fP, \fBfind_limits_noheap\fP, or \fBsubstitute_callout\fP
modifiers.
.P
Similarly, for testing purposes, if the \fBnull_subject\fP or
\fBnull_replacement\fP modifier is set, the subject or replacement string
@ -2119,6 +2121,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 12 January 2022
Last updated: 27 July 2022
Copyright (c) 1997-2022 University of Cambridge.
.fi

View File

@ -220,18 +220,17 @@ not rely on this. */
#define COMPILE_ERROR_BASE 100
/* The initial frames vector for remembering backtracking points in
pcre2_match() is allocated on the system stack, of this size (bytes). The size
must be a multiple of sizeof(PCRE2_SPTR) in all environments, so making it a
multiple of 8 is best. Typical frame sizes are a few hundred bytes (it depends
on the number of capturing parentheses) so 20KiB handles quite a few frames. A
larger vector on the heap is obtained for patterns that need more frames. The
maximum size of this can be limited. */
/* The initial frames vector for remembering pcre2_match() backtracking points
is allocated on the heap, of this size (bytes) or ten times the frame size if
larger, unless the heap limit is smaller. Typical frame sizes are a few hundred
bytes (it depends on the number of capturing parentheses) so 20KiB handles
quite a few frames. A larger vector on the heap is obtained for matches that
need more frames, subject to the heap limit. */
#define START_FRAMES_SIZE 20480
/* Similarly, for DFA matching, an initial internal workspace vector is
allocated on the stack. */
/* For DFA matching, an initial internal workspace vector is allocated on the
stack. The heap is used only if this turns out to be too small. */
#define DFA_START_RWS_SIZE 30720

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2018 University of Cambridge
New API code Copyright (c) 2016-2022 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -649,19 +649,23 @@ the size varies from call to call. As the maximum number of capturing
subpatterns is 65535 we must allow for 65536 strings to include the overall
match. (See also the heapframe structure below.) */
struct heapframe; /* Forward reference */
typedef struct pcre2_real_match_data {
pcre2_memctl memctl;
const pcre2_real_code *code; /* The pattern used for the match */
PCRE2_SPTR subject; /* The subject that was matched */
PCRE2_SPTR mark; /* Pointer to last mark */
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
PCRE2_SIZE startchar; /* Offset to starting code unit */
uint8_t matchedby; /* Type of match (normal, JIT, DFA) */
uint8_t flags; /* Various flags */
uint16_t oveccount; /* Number of pairs */
int rc; /* The return code from the match */
PCRE2_SIZE ovector[131072]; /* Must be last in the structure */
pcre2_memctl memctl; /* Memory control fields */
const pcre2_real_code *code; /* The pattern used for the match */
PCRE2_SPTR subject; /* The subject that was matched */
PCRE2_SPTR mark; /* Pointer to last mark */
struct heapframe *heapframes; /* Backtracking frames heap memory */
PCRE2_SIZE heapframes_size; /* Malloc-ed size */
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
PCRE2_SIZE startchar; /* Offset to starting code unit */
uint8_t matchedby; /* Type of match (normal, JIT, DFA) */
uint8_t flags; /* Various flags */
uint16_t oveccount; /* Number of pairs */
int rc; /* The return code from the match */
PCRE2_SIZE ovector[131072]; /* Must be last in the structure */
} pcre2_real_match_data;
@ -854,10 +858,6 @@ doing traditional NFA matching (pcre2_match() and friends). */
typedef struct match_block {
pcre2_memctl memctl; /* For general use */
PCRE2_SIZE frame_vector_size; /* Size of a backtracking frame */
heapframe *match_frames; /* Points to vector of frames */
heapframe *match_frames_top; /* Points after the end of the vector */
heapframe *stack_frames; /* The original vector on the stack */
PCRE2_SIZE heap_limit; /* As it says */
uint32_t match_limit; /* As it says */
uint32_t match_limit_depth; /* As it says */

View File

@ -204,6 +204,7 @@ Arguments:
P a previous frame of interest
frame_size the frame size
mb points to the match block
match_data points to the match data block
s identification text
Returns: nothing
@ -211,7 +212,7 @@ Returns: nothing
static void
display_frames(FILE *f, heapframe *F, heapframe *P, PCRE2_SIZE frame_size,
match_block *mb, const char *s, ...)
match_block *mb, pcre2_match_data *match_data, const char *s, ...)
{
uint32_t i;
heapframe *Q;
@ -223,10 +224,10 @@ vfprintf(f, s, ap);
va_end(ap);
if (P != NULL) fprintf(f, " P=%lu",
((char *)P - (char *)(mb->match_frames))/frame_size);
((char *)P - (char *)(match_data->heapframes))/frame_size);
fprintf(f, "\n");
for (i = 0, Q = mb->match_frames;
for (i = 0, Q = match_data->heapframes;
Q <= F;
i++, Q = (heapframe *)((char *)Q + frame_size))
{
@ -490,10 +491,16 @@ A version did exist that used individual frames on the heap instead of calling
match() recursively, but this ran substantially slower. The current version is
a refactoring that uses a vector of frames to remember backtracking points.
This runs no slower, and possibly even a bit faster than the original recursive
implementation. An initial vector of size START_FRAMES_SIZE (enough for maybe
50 frames) is allocated on the system stack. If this is not big enough, the
heap is used for a larger vector.
implementation.
At first, an initial vector of size START_FRAMES_SIZE (enough for maybe 50
frames) was allocated on the system stack. If this was not big enough, the heap
was used for a larger vector. However, it turns out that there are environments
where taking as little as 20KiB from the system stack is an embarrassment.
After another refactoring, the heap is used exclusively, but a pointer the
frames vector and its size are cached in the match_data block, so that there is
no new memory allocation if the same match_data block is used for multiple
matches (unless the frames vector has to be extended).
*******************************************************************************
******************************************************************************/
@ -566,10 +573,9 @@ made performance worse.
Arguments:
start_eptr starting character in subject
start_ecode starting position in compiled code
ovector pointer to the final output vector
oveccount number of pairs in ovector
top_bracket number of capturing parentheses in the pattern
frame_size size of each backtracking frame
match_data pointer to the match_data block
mb pointer to "static" variables block
Returns: MATCH_MATCH if matched ) these values are >= 0
@ -580,17 +586,19 @@ Returns: MATCH_MATCH if matched ) these values are >= 0
*/
static int
match(PCRE2_SPTR start_eptr, PCRE2_SPTR start_ecode, PCRE2_SIZE *ovector,
uint16_t oveccount, uint16_t top_bracket, PCRE2_SIZE frame_size,
match_block *mb)
match(PCRE2_SPTR start_eptr, PCRE2_SPTR start_ecode, uint16_t top_bracket,
PCRE2_SIZE frame_size, pcre2_match_data *match_data, match_block *mb)
{
/* Frame-handling variables */
heapframe *F; /* Current frame pointer */
heapframe *N = NULL; /* Temporary frame pointers */
heapframe *P = NULL;
heapframe *frames_top; /* End of frames vector */
heapframe *assert_accept_frame = NULL; /* For passing back a frame with captures */
PCRE2_SIZE frame_copy_size; /* Amount to copy when creating a new frame */
PCRE2_SIZE heapframes_size; /* Usable size of frames vector */
PCRE2_SIZE frame_copy_size; /* Amount to copy when creating a new frame */
/* Local variables that do not need to be preserved over calls to RRMATCH(). */
@ -627,10 +635,14 @@ copied when a new frame is created. */
frame_copy_size = frame_size - offsetof(heapframe, eptr);
/* Set up the first current frame at the start of the vector, and initialize
fields that are not reset for new frames. */
/* Set up the first frame and the end of the frames vector. We set the local
heapframes_size to the usuable amount of the vector, that is, a whole number of
frames. */
F = match_data->heapframes;
heapframes_size = (match_data->heapframes_size / frame_size) * frame_size;
frames_top = (heapframe *)((char *)F + heapframes_size);
F = mb->match_frames;
Frdepth = 0; /* "Recursion" depth */
Fcapture_last = 0; /* Number of most recent capture */
Fcurrent_recurse = RECURSE_UNSET; /* Not pattern recursing. */
@ -646,34 +658,35 @@ backtracking point. */
MATCH_RECURSE:
/* Set up a new backtracking frame. If the vector is full, get a new one
on the heap, doubling the size, but constrained by the heap limit. */
/* Set up a new backtracking frame. If the vector is full, get a new one,
doubling the size, but constrained by the heap limit (which is in KiB). */
N = (heapframe *)((char *)F + frame_size);
if (N >= mb->match_frames_top)
if (N >= frames_top)
{
PCRE2_SIZE newsize = mb->frame_vector_size * 2;
heapframe *new;
PCRE2_SIZE newsize = match_data->heapframes_size * 2;
if ((newsize / 1024) > mb->heap_limit)
if (newsize > mb->heap_limit)
{
PCRE2_SIZE maxsize = ((mb->heap_limit * 1024)/frame_size) * frame_size;
if (mb->frame_vector_size >= maxsize) return PCRE2_ERROR_HEAPLIMIT;
PCRE2_SIZE maxsize = (mb->heap_limit/frame_size) * frame_size;
if (match_data->heapframes_size >= maxsize) return PCRE2_ERROR_HEAPLIMIT;
newsize = maxsize;
}
new = mb->memctl.malloc(newsize, mb->memctl.memory_data);
new = match_data->memctl.malloc(newsize, match_data->memctl.memory_data);
if (new == NULL) return PCRE2_ERROR_NOMEMORY;
memcpy(new, mb->match_frames, mb->frame_vector_size);
memcpy(new, match_data->heapframes, heapframes_size);
F = (heapframe *)((char *)new + ((char *)F - (char *)mb->match_frames));
F = (heapframe *)((char *)new + ((char *)F - (char *)match_data->heapframes));
N = (heapframe *)((char *)F + frame_size);
if (mb->match_frames != mb->stack_frames)
mb->memctl.free(mb->match_frames, mb->memctl.memory_data);
mb->match_frames = new;
mb->match_frames_top = (heapframe *)((char *)mb->match_frames + newsize);
mb->frame_vector_size = newsize;
match_data->memctl.free(match_data->heapframes, match_data->memctl.memory_data);
match_data->heapframes = new;
match_data->heapframes_size = newsize;
heapframes_size = (newsize / frame_size) * frame_size;
frames_top = (heapframe *)((char *)new + heapframes_size);
}
#ifdef DEBUG_SHOW_RMATCH
@ -731,7 +744,7 @@ recursion value. */
if (group_frame_type != 0)
{
Flast_group_offset = (char *)F - (char *)mb->match_frames;
Flast_group_offset = (char *)F - (char *)match_data->heapframes;
if (GF_IDMASK(group_frame_type) == GF_RECURSE)
Fcurrent_recurse = GF_DATAMASK(group_frame_type);
group_frame_type = 0;
@ -773,7 +786,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
for(;;)
{
if (offset == PCRE2_UNSET) return PCRE2_ERROR_INTERNAL;
N = (heapframe *)((char *)mb->match_frames + offset);
N = (heapframe *)((char *)match_data->heapframes + offset);
P = (heapframe *)((char *)N - frame_size);
if (N->group_frame_type == (GF_CAPTURE | number)) break;
offset = P->last_group_offset;
@ -811,7 +824,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
for(;;)
{
if (offset == PCRE2_UNSET) return PCRE2_ERROR_INTERNAL;
N = (heapframe *)((char *)mb->match_frames + offset);
N = (heapframe *)((char *)match_data->heapframes + offset);
P = (heapframe *)((char *)N - frame_size);
if (GF_IDMASK(N->group_frame_type) == GF_RECURSE) break;
offset = P->last_group_offset;
@ -864,14 +877,15 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
mb->mark = Fmark; /* and the last success mark */
if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr;
ovector[0] = Fstart_match - mb->start_subject;
ovector[1] = Feptr - mb->start_subject;
match_data->ovector[0] = Fstart_match - mb->start_subject;
match_data->ovector[1] = Feptr - mb->start_subject;
/* Set i to the smaller of the sizes of the external and frame ovectors. */
i = 2 * ((top_bracket + 1 > oveccount)? oveccount : top_bracket + 1);
memcpy(ovector + 2, Fovector, (i - 2) * sizeof(PCRE2_SIZE));
while (--i >= Foffset_top + 2) ovector[i] = PCRE2_UNSET;
i = 2 * ((top_bracket + 1 > match_data->oveccount)?
match_data->oveccount : top_bracket + 1);
memcpy(match_data->ovector + 2, Fovector, (i - 2) * sizeof(PCRE2_SIZE));
while (--i >= Foffset_top + 2) match_data->ovector[i] = PCRE2_UNSET;
return MATCH_MATCH; /* Note: NOT RRETURN */
@ -5328,7 +5342,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
offset = Flast_group_offset;
while (offset != PCRE2_UNSET)
{
N = (heapframe *)((char *)mb->match_frames + offset);
N = (heapframe *)((char *)match_data->heapframes + offset);
P = (heapframe *)((char *)N - frame_size);
if (N->group_frame_type == (GF_RECURSE | number))
{
@ -5729,7 +5743,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
if (*bracode != OP_BRA && *bracode != OP_COND)
{
N = (heapframe *)((char *)mb->match_frames + Flast_group_offset);
N = (heapframe *)((char *)match_data->heapframes + Flast_group_offset);
P = (heapframe *)((char *)N - frame_size);
Flast_group_offset = P->last_group_offset;
@ -6346,6 +6360,7 @@ BOOL jit_checked_utf = FALSE;
#endif /* SUPPORT_UNICODE */
PCRE2_SIZE frame_size;
PCRE2_SIZE heapframes_size;
/* We need to have mb as a pointer to a match block, because the IS_NEWLINE
macro is used below, and it expects NLBLOCK to be defined as a pointer. */
@ -6354,15 +6369,6 @@ pcre2_callout_block cb;
match_block actual_match_block;
match_block *mb = &actual_match_block;
/* Allocate an initial vector of backtracking frames on the stack. If this
proves to be too small, it is replaced by a larger one on the heap. To get a
vector of the size required that is aligned for pointers, allocate it as a
vector of pointers. */
PCRE2_SPTR stack_frames_vector[START_FRAMES_SIZE/sizeof(PCRE2_SPTR)]
PCRE2_KEEP_UNINITIALIZED;
mb->stack_frames = (heapframe *)stack_frames_vector;
/* Recognize NULL, length 0 as an empty string. */
if (subject == NULL && length == 0) subject = (PCRE2_SPTR)"";
@ -6793,15 +6799,11 @@ switch(re->newline_convention)
vector at the end, whose size depends on the number of capturing parentheses in
the pattern. It is not used at all if there are no capturing parentheses.
frame_size is the total size of each frame
mb->frame_vector_size is the total usable size of the vector (rounded down
to a whole number of frames)
frame_size is the total size of each frame
match_data->heapframes is the pointer to the frames vector
match_data->heapframes_size is the total size of the vector
The last of these is changed within the match() function if the frame vector
has to be expanded. We therefore put it into the match block so that it is
correct when calling match() more than once for non-anchored patterns.
We must also pad frame_size for alignment to ensure subsequent frames are as
We must pad the frame_size for alignment to ensure subsequent frames are as
aligned as heapframe. Whilst ovector is word-aligned due to being a PCRE2_SIZE
array, that does not guarantee it is suitably aligned for pointers, as some
architectures have pointers that are larger than a size_t. */
@ -6813,8 +6815,8 @@ frame_size = (offsetof(heapframe, ovector) +
/* Limits set in the pattern override the match context only if they are
smaller. */
mb->heap_limit = (mcontext->heap_limit < re->limit_heap)?
mcontext->heap_limit : re->limit_heap;
mb->heap_limit = ((mcontext->heap_limit < re->limit_heap)?
mcontext->heap_limit : re->limit_heap) * 1024;
mb->match_limit = (mcontext->match_limit < re->limit_match)?
mcontext->match_limit : re->limit_match;
@ -6823,35 +6825,40 @@ mb->match_limit_depth = (mcontext->depth_limit < re->limit_depth)?
mcontext->depth_limit : re->limit_depth;
/* If a pattern has very many capturing parentheses, the frame size may be very
large. Ensure that there are at least 10 available frames by getting an initial
vector on the heap if necessary, except when the heap limit prevents this. Get
fewer if possible. (The heap limit is in kibibytes.) */
large. Set the initial frame vector size to ensure that there are at least 10
available frames, but enforce a minimum of START_FRAMES_SIZE. If this is
greater than the heap limit, get as large a vector as possible. Always round
the size to a multiple of the frame size. (The heap limit is in kibibytes.) */
if (frame_size <= START_FRAMES_SIZE/10)
heapframes_size = frame_size * 10;
if (heapframes_size < START_FRAMES_SIZE) heapframes_size = START_FRAMES_SIZE;
if (heapframes_size > mb->heap_limit)
{
mb->match_frames = mb->stack_frames; /* Initial frame vector on the stack */
mb->frame_vector_size = ((START_FRAMES_SIZE/frame_size) * frame_size);
if (frame_size > mb->heap_limit ) return PCRE2_ERROR_HEAPLIMIT;
heapframes_size = mb->heap_limit;
}
else
/* If an existing frame vector in the match_data block is large enough, we can
use it.Otherwise, free any pre-existing vector and get a new one. */
if (match_data->heapframes_size < heapframes_size)
{
mb->frame_vector_size = frame_size * 10;
if ((mb->frame_vector_size / 1024) > mb->heap_limit)
match_data->memctl.free(match_data->heapframes,
match_data->memctl.memory_data);
match_data->heapframes = match_data->memctl.malloc(heapframes_size,
match_data->memctl.memory_data);
if (match_data->heapframes == NULL)
{
if (frame_size > mb->heap_limit * 1024) return PCRE2_ERROR_HEAPLIMIT;
mb->frame_vector_size = ((mb->heap_limit * 1024)/frame_size) * frame_size;
}
mb->match_frames = mb->memctl.malloc(mb->frame_vector_size,
mb->memctl.memory_data);
if (mb->match_frames == NULL) return PCRE2_ERROR_NOMEMORY;
match_data->heapframes_size = 0;
return PCRE2_ERROR_NOMEMORY;
}
match_data->heapframes_size = heapframes_size;
}
mb->match_frames_top =
(heapframe *)((char *)mb->match_frames + mb->frame_vector_size);
/* Write to the ovector within the first frame to mark every capture unset and
to avoid uninitialized memory read errors when it is copied to a new frame. */
memset((char *)(mb->match_frames) + offsetof(heapframe, ovector), 0xff,
memset((char *)(match_data->heapframes) + offsetof(heapframe, ovector), 0xff,
frame_size - offsetof(heapframe, ovector));
/* Pointers to the individual character tables */
@ -7279,8 +7286,8 @@ for(;;)
mb->end_offset_top = 0;
mb->skip_arg_count = 0;
rc = match(start_match, mb->start_code, match_data->ovector,
match_data->oveccount, re->top_bracket, frame_size, mb);
rc = match(start_match, mb->start_code, re->top_bracket, frame_size,
match_data, mb);
if (mb->hitend && start_partial == NULL)
{
@ -7463,11 +7470,6 @@ if (utf && end_subject != true_end_subject &&
}
#endif /* SUPPORT_UNICODE */
/* Release an enlarged frame vector that is on the heap. */
if (mb->match_frames != mb->stack_frames)
mb->memctl.free(mb->match_frames, mb->memctl.memory_data);
/* Fill in fields that are always returned in the match data. */
match_data->code = re;

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2019 University of Cambridge
New API code Copyright (c) 2016-2022 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -64,6 +64,8 @@ yield = PRIV(memctl_malloc)(
if (yield == NULL) return NULL;
yield->oveccount = oveccount;
yield->flags = 0;
yield->heapframes = NULL;
yield->heapframes_size = 0;
return yield;
}
@ -95,6 +97,9 @@ pcre2_match_data_free(pcre2_match_data *match_data)
{
if (match_data != NULL)
{
if (match_data->heapframes != NULL)
match_data->memctl.free(match_data->heapframes,
match_data->memctl.memory_data);
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
match_data->memctl.free((void *)match_data->subject,
match_data->memctl.memory_data);

View File

@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Original API code Copyright (c) 1997-2012 University of Cambridge
New API code Copyright (c) 2016-2021 University of Cambridge
New API code Copyright (c) 2016-2022 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@ -259,16 +259,16 @@ PCRE2_UNSET, so as not to imply an offset in the replacement. */
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
return PCRE2_ERROR_BADOPTION;
/* Validate length and find the end of the replacement. A NULL replacement of
/* Validate length and find the end of the replacement. A NULL replacement of
zero length is interpreted as an empty string. */
if (replacement == NULL)
if (replacement == NULL)
{
if (rlength != 0) return PCRE2_ERROR_NULL;
replacement = (PCRE2_SPTR)"";
}
replacement = (PCRE2_SPTR)"";
}
if (rlength == PCRE2_ZERO_TERMINATED) rlength = PRIV(strlen)(replacement);
repend = replacement + rlength;
@ -282,8 +282,9 @@ replacement_only = ((options & PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) != 0);
match data block. We create an internal match_data block in two cases: (a) an
external one is not supplied (and we are not starting from an existing match);
(b) an existing match is to be used for the first substitution. In the latter
case, we copy the existing match into the internal block. This ensures that no
changes are made to the existing match data block. */
case, we copy the existing match into the internal block, except for any cached
heap frame size and pointer. This ensures that no changes are made to the
external match data block. */
if (match_data == NULL)
{
@ -309,6 +310,8 @@ else if (use_existing_match)
if (internal_match_data == NULL) return PCRE2_ERROR_NOMEMORY;
memcpy(internal_match_data, match_data, offsetof(pcre2_match_data, ovector)
+ 2*pairs*sizeof(PCRE2_SIZE));
internal_match_data->heapframes = NULL;
internal_match_data->heapframes_size = 0;
match_data = internal_match_data;
}
@ -328,9 +331,9 @@ scb.ovector = ovector;
if (subject == NULL)
{
if (length != 0) return PCRE2_ERROR_NULL;
if (length != 0) return PCRE2_ERROR_NULL;
subject = (PCRE2_SPTR)"";
}
}
/* Find length of zero-terminated subject */

View File

@ -479,7 +479,7 @@ so many of them that they are split into two fields. */
#define CTL_DFA 0x00000200u
#define CTL_EXPAND 0x00000400u
#define CTL_FINDLIMITS 0x00000800u
#define CTL_FRAMESIZE 0x00001000u
#define CTL_FINDLIMITS_NOHEAP 0x00001000u
#define CTL_FULLBINCODE 0x00002000u
#define CTL_GETALL 0x00004000u
#define CTL_GLOBAL 0x00008000u
@ -522,6 +522,7 @@ so many of them that they are split into two fields. */
#define CTL2_ALLVECTOR 0x00000800u
#define CTL2_NULL_SUBJECT 0x00001000u
#define CTL2_NULL_REPLACEMENT 0x00002000u
#define CTL2_FRAMESIZE 0x00004000u
#define CTL2_NL_SET 0x40000000u /* Informational */
#define CTL2_BSR_SET 0x80000000u /* Informational */
@ -673,8 +674,9 @@ static modstruct modlist[] = {
{ "extended_more", MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE, PO(options) },
{ "extra_alt_bsux", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALT_BSUX, CO(extra_options) },
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
{ "find_limits_noheap", MOD_DAT, MOD_CTL, CTL_FINDLIMITS_NOHEAP, DO(control) },
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
{ "framesize", MOD_PAT, MOD_CTL, CTL_FRAMESIZE, PO(control) },
{ "framesize", MOD_PAT, MOD_CTL, CTL2_FRAMESIZE, PO(control2) },
{ "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) },
{ "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) },
{ "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) },
@ -781,10 +783,11 @@ static modstruct modlist[] = {
#define PUSH_SUPPORTED_COMPILE_CONTROLS ( \
CTL_BINCODE|CTL_CALLOUT_INFO|CTL_FULLBINCODE|CTL_HEXPAT|CTL_INFO| \
CTL_JITVERIFY|CTL_MEMORY|CTL_FRAMESIZE|CTL_PUSH|CTL_PUSHCOPY| \
CTL_JITVERIFY|CTL_MEMORY|CTL_PUSH|CTL_PUSHCOPY| \
CTL_PUSHTABLESCOPY|CTL_USE_LENGTH)
#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL2_BSR_SET|CTL2_NL_SET)
#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL2_BSR_SET|CTL2_FRAMESIZE| \
CTL2_NL_SET)
/* Controls that apply only at compile time with 'push'. */
@ -813,8 +816,9 @@ static uint32_t exclusive_pat_controls[] = {
first control word. */
static uint32_t exclusive_dat_controls[] = {
CTL_ALLUSEDTEXT | CTL_STARTCHAR,
CTL_FINDLIMITS | CTL_NULLCONTEXT };
CTL_ALLUSEDTEXT | CTL_STARTCHAR,
CTL_FINDLIMITS | CTL_NULLCONTEXT,
CTL_FINDLIMITS_NOHEAP | CTL_NULLCONTEXT };
/* Table of single-character abbreviated modifiers. The index field is
initialized to -1, but the first time the modifier is encountered, it is filled
@ -4112,7 +4116,7 @@ Returns: nothing
static void
show_controls(uint32_t controls, uint32_t controls2, const char *before)
{
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@ -4130,7 +4134,8 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
((controls & CTL_DFA) != 0)? " dfa" : "",
((controls & CTL_EXPAND) != 0)? " expand" : "",
((controls & CTL_FINDLIMITS) != 0)? " find_limits" : "",
((controls & CTL_FRAMESIZE) != 0)? " framesize" : "",
((controls & CTL_FINDLIMITS_NOHEAP) != 0)? " find_limits_noheap" : "",
((controls2 & CTL2_FRAMESIZE) != 0)? " framesize" : "",
((controls & CTL_FULLBINCODE) != 0)? " fullbincode" : "",
((controls & CTL_GETALL) != 0)? " getall" : "",
((controls & CTL_GLOBAL) != 0)? " global" : "",
@ -4308,13 +4313,13 @@ if (test_mode == PCRE32_MODE) cblock_size = sizeof(pcre2_real_code_32);
(void)pattern_info(PCRE2_INFO_NAMECOUNT, &name_count, FALSE);
(void)pattern_info(PCRE2_INFO_NAMEENTRYSIZE, &name_entry_size, FALSE);
/* The uint32_t variables are cast before multiplying to stop code analyzers
/* The uint32_t variables are cast before multiplying to stop code analyzers
grumbling about potential overflow. */
fprintf(outfile, "Memory allocation (code space): %" SIZ_FORM "\n", size -
(size_t)name_count * (size_t)name_entry_size * (size_t)code_unit_size -
fprintf(outfile, "Memory allocation (code space): %" SIZ_FORM "\n", size -
(size_t)name_count * (size_t)name_entry_size * (size_t)code_unit_size -
cblock_size);
if (pat_patctl.jit != 0)
{
(void)pattern_info(PCRE2_INFO_JITSIZE, &size, FALSE);
@ -4986,7 +4991,7 @@ switch(cmd)
PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);
}
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
if ((pat_patctl.control & CTL_FRAMESIZE) != 0) show_framesize();
if ((pat_patctl.control2 & CTL2_FRAMESIZE) != 0) show_framesize();
if ((pat_patctl.control & CTL_ANYINFO) != 0)
{
rc = show_pattern_info();
@ -5948,7 +5953,7 @@ if ((pat_patctl.control2 & CTL2_NL_SET) != 0)
/* Output code size and other information if requested. */
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
if ((pat_patctl.control & CTL_FRAMESIZE) != 0) show_framesize();
if ((pat_patctl.control2 & CTL2_FRAMESIZE) != 0) show_framesize();
if ((pat_patctl.control & CTL_ANYINFO) != 0)
{
int rc = show_pattern_info();
@ -6027,10 +6032,46 @@ for (;;)
{
uint32_t stack_start = 0;
/* If we are checking the heap limit, free any frames vector that is cached
in the match_data so we always start without one. */
if (errnumber == PCRE2_ERROR_HEAPLIMIT)
{
PCRE2_SET_HEAP_LIMIT(dat_context, mid);
#ifdef SUPPORT_PCRE2_8
if (code_unit_size == 1)
{
match_data8->memctl.free(match_data8->heapframes,
match_data8->memctl.memory_data);
match_data8->heapframes = NULL;
match_data8->heapframes_size = 0;
}
#endif
#ifdef SUPPORT_PCRE2_16
if (code_unit_size == 2)
{
match_data16->memctl.free(match_data16->heapframes,
match_data16->memctl.memory_data);
match_data16->heapframes = NULL;
match_data16->heapframes_size = 0;
}
#endif
#ifdef SUPPORT_PCRE2_32
if (code_unit_size == 4)
{
match_data32->memctl.free(match_data32->heapframes,
match_data32->memctl.memory_data);
match_data32->heapframes = NULL;
match_data32->heapframes_size = 0;
}
#endif
}
/* No need to mess with the frames vector for match or depth limits. */
else if (errnumber == PCRE2_ERROR_MATCHLIMIT)
{
PCRE2_SET_MATCH_LIMIT(dat_context, mid);
@ -6040,6 +6081,8 @@ for (;;)
PCRE2_SET_DEPTH_LIMIT(dat_context, mid);
}
/* Do the appropriate match */
if ((dat_datctl.control & CTL_DFA) != 0)
{
stack_start = DFA_START_RWS_SIZE/1024;
@ -6058,7 +6101,6 @@ for (;;)
else
{
stack_start = START_FRAMES_SIZE/1024;
PCRE2_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset,
dat_datctl.options, match_data, PTR(dat_context));
}
@ -7584,12 +7626,13 @@ for (gmatched = 0;; gmatched++)
limits are not relevant for JIT. The return from check_match_limit() is the
return from the final call to pcre2_match() or pcre2_dfa_match(). */
if ((dat_datctl.control & CTL_FINDLIMITS) != 0)
if ((dat_datctl.control & (CTL_FINDLIMITS|CTL_FINDLIMITS_NOHEAP)) != 0)
{
capcount = 0; /* This stops compiler warnings */
if (FLD(compiled_code, executable_jit) == NULL ||
(dat_datctl.options & PCRE2_NO_JIT) != 0)
if ((dat_datctl.control & CTL_FINDLIMITS_NOHEAP) == 0 &&
(FLD(compiled_code, executable_jit) == NULL ||
(dat_datctl.options & PCRE2_NO_JIT) != 0))
{
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
}

35
testdata/testinput15 vendored
View File

@ -6,38 +6,44 @@
# (2) Other tests that must not be run with JIT.
# This test is first so that it doesn't inherit a large enough heap frame
# vector from a previous test.
/(*LIMIT_HEAP=21)\[(a)]{60}/expand
\[a]{60}
/(a+)*zz/I
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits
aaaaaaaaaaaaaz\=find_limits
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits_noheap
aaaaaaaaaaaaaz\=find_limits_noheap
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
/* this is a C style comment */\=find_limits
/* this is a C style comment */\=find_limits_noheap
/^(?>a)++/
aa\=find_limits
aaaaaaaaa\=find_limits
aa\=find_limits_noheap
aaaaaaaaa\=find_limits_noheap
/(a)(?1)++/
aa\=find_limits
aaaaaaaaa\=find_limits
aa\=find_limits_noheap
aaaaaaaaa\=find_limits_noheap
/a(?:.)*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
/a(?:.(*THEN))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
/a(?:.(*THEN:ABC))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
aabbccddee\=find_limits
aabbccddee\=find_limits_noheap
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
aabbccddee\=find_limits
aabbccddee\=find_limits_noheap
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
aabbccddee\=find_limits
aabbccddee\=find_limits_noheap
/(*LIMIT_MATCH=12bc)abc/
@ -228,9 +234,6 @@
/(|]+){2,2452}/
(|]+){2,2452}
/(*LIMIT_HEAP=21)\[(a)]{60}/expand
\[a]{60}
/b(?<!ax)(?!cx)/allusedtext
abc
abcz

50
testdata/testoutput15 vendored
View File

@ -6,19 +6,24 @@
# (2) Other tests that must not be run with JIT.
# This test is first so that it doesn't inherit a large enough heap frame
# vector from a previous test.
/(*LIMIT_HEAP=21)\[(a)]{60}/expand
\[a]{60}
Failed: error -63: heap limit exceeded
/(a+)*zz/I
Capture group count = 1
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits
Minimum heap limit = 0
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits_noheap
Minimum match limit = 7
Minimum depth limit = 7
0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazz
1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaz\=find_limits
Minimum heap limit = 0
aaaaaaaaaaaaaz\=find_limits_noheap
Minimum match limit = 20481
Minimum depth limit = 30
No match
@ -27,70 +32,60 @@ No match
Capture group count = 1
May match empty string
Subject length lower bound = 0
/* this is a C style comment */\=find_limits
Minimum heap limit = 0
/* this is a C style comment */\=find_limits_noheap
Minimum match limit = 64
Minimum depth limit = 7
0: /* this is a C style comment */
1: /* this is a C style comment */
/^(?>a)++/
aa\=find_limits
Minimum heap limit = 0
aa\=find_limits_noheap
Minimum match limit = 5
Minimum depth limit = 3
0: aa
aaaaaaaaa\=find_limits
Minimum heap limit = 0
aaaaaaaaa\=find_limits_noheap
Minimum match limit = 12
Minimum depth limit = 3
0: aaaaaaaaa
/(a)(?1)++/
aa\=find_limits
Minimum heap limit = 0
aa\=find_limits_noheap
Minimum match limit = 7
Minimum depth limit = 5
0: aa
1: a
aaaaaaaaa\=find_limits
Minimum heap limit = 0
aaaaaaaaa\=find_limits_noheap
Minimum match limit = 21
Minimum depth limit = 5
0: aaaaaaaaa
1: a
/a(?:.)*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
Minimum heap limit = 0
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
Minimum match limit = 24
Minimum depth limit = 3
0: abbbbbbbbbbbbbbbbbbbbba
/a(?:.(*THEN))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
Minimum heap limit = 0
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
Minimum match limit = 66
Minimum depth limit = 45
0: abbbbbbbbbbbbbbbbbbbbba
/a(?:.(*THEN:ABC))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
Minimum heap limit = 0
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
Minimum match limit = 66
Minimum depth limit = 45
0: abbbbbbbbbbbbbbbbbbbbba
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
aabbccddee\=find_limits
Minimum heap limit = 0
aabbccddee\=find_limits_noheap
Minimum match limit = 7
Minimum depth limit = 7
0: aabbccddee
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
aabbccddee\=find_limits
Minimum heap limit = 0
aabbccddee\=find_limits_noheap
Minimum match limit = 12
Minimum depth limit = 12
0: aabbccddee
@ -101,8 +96,7 @@ Minimum depth limit = 12
5: ee
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
aabbccddee\=find_limits
Minimum heap limit = 0
aabbccddee\=find_limits_noheap
Minimum match limit = 10
Minimum depth limit = 10
0: aabbccddee
@ -521,10 +515,6 @@ No match
0:
1:
/(*LIMIT_HEAP=21)\[(a)]{60}/expand
\[a]{60}
Failed: error -63: heap limit exceeded
/b(?<!ax)(?!cx)/allusedtext
abc
0: abc