Refactor match_data() to always use the heap instead of having an initial frames vector on the stack; some consequential adjustmentsneeded.
This commit is contained in:
parent
e47fc51584
commit
d90fb23878
13
ChangeLog
13
ChangeLog
|
@ -32,6 +32,19 @@ tidied up an untidy #ifdef arrangement in pcre2test.
|
||||||
8. Fixed an issue in the backtracking optimization of character repeats in
|
8. Fixed an issue in the backtracking optimization of character repeats in
|
||||||
JIT. Furthermore optimize star repetitions, not just plus repetitions.
|
JIT. Furthermore optimize star repetitions, not just plus repetitions.
|
||||||
|
|
||||||
|
9. Removed the use of an initial backtracking frames vector on the system stack
|
||||||
|
in pcre2_match() so that it now always uses the heap. (In a multi-thread
|
||||||
|
environment with very small stacks there had been an issue.) This also is
|
||||||
|
tidier for JIT matching, which didn't need that vector. The heap vector is now
|
||||||
|
remembered in the match data block and re-used if that block itself is re-used.
|
||||||
|
It is freed with the match data block.
|
||||||
|
|
||||||
|
10. Adjusted the find_limits code in pcre2test to work with change 9 above.
|
||||||
|
|
||||||
|
11. Added find_limits_noheap to pcre2test, because the heap limits are now
|
||||||
|
different in different environments and so cannot be included in the standard
|
||||||
|
tests.
|
||||||
|
|
||||||
|
|
||||||
Version 10.40 15-April-2022
|
Version 10.40 15-April-2022
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "22 April 2022" "PCRE2 10.41"
|
.TH PCRE2API 3 "27 July 2022" "PCRE2 10.41"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -953,7 +953,7 @@ has its own memory control arrangements (see the
|
||||||
documentation for more details). If the limit is reached, the negative error
|
documentation for more details). If the limit is reached, the negative error
|
||||||
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
|
code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
|
||||||
is built; if it is not, the default is set very large and is essentially
|
is built; if it is not, the default is set very large and is essentially
|
||||||
"unlimited".
|
unlimited.
|
||||||
.P
|
.P
|
||||||
A value for the heap limit may also be supplied by an item at the start of a
|
A value for the heap limit may also be supplied by an item at the start of a
|
||||||
pattern of the form
|
pattern of the form
|
||||||
|
@ -964,18 +964,18 @@ where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||||
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
||||||
limit is set, less than the default.
|
limit is set, less than the default.
|
||||||
.P
|
.P
|
||||||
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
|
The \fBpcre2_match()\fP function always needs some heap memory, so setting a
|
||||||
stack for recording backtracking points. The more nested backtracking points
|
value of zero guarantees a "heap limit exceeded" error. Details of how
|
||||||
there are (that is, the deeper the search tree), the more memory is needed.
|
\fBpcre2_match()\fP uses the heap are given in the
|
||||||
Heap memory is used only if the initial vector is too small. If the heap limit
|
.\" HREF
|
||||||
is set to a value less than 21 (in particular, zero) no heap memory will be
|
\fBpcre2perform\fP
|
||||||
used. In this case, only patterns that do not have a lot of nested backtracking
|
.\"
|
||||||
can be successfully processed.
|
documentation.
|
||||||
.P
|
.P
|
||||||
Similarly, for \fBpcre2_dfa_match()\fP, a vector on the system stack is used
|
For \fBpcre2_dfa_match()\fP, a vector on the system stack is used when
|
||||||
when processing pattern recursions, lookarounds, or atomic groups, and only if
|
processing pattern recursions, lookarounds, or atomic groups, and only if this
|
||||||
this is not big enough is heap memory used. In this case, too, setting a value
|
is not big enough is heap memory used. In this case, setting a value of zero
|
||||||
of zero disables the use of the heap.
|
disables the use of the heap.
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
||||||
|
@ -1019,10 +1019,10 @@ less than the limit set by the caller of \fBpcre2_match()\fP or
|
||||||
.fi
|
.fi
|
||||||
.sp
|
.sp
|
||||||
This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP.
|
This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP.
|
||||||
Each time a nested backtracking point is passed, a new memory "frame" is used
|
Each time a nested backtracking point is passed, a new memory frame is used
|
||||||
to remember the state of matching at that point. Thus, this parameter
|
to remember the state of matching at that point. Thus, this parameter
|
||||||
indirectly limits the amount of memory that is used in a match. However,
|
indirectly limits the amount of memory that is used in a match. However,
|
||||||
because the size of each memory "frame" depends on the number of capturing
|
because the size of each memory frame depends on the number of capturing
|
||||||
parentheses, the actual memory limit varies from pattern to pattern. This limit
|
parentheses, the actual memory limit varies from pattern to pattern. This limit
|
||||||
was more useful in versions before 10.30, where function recursion was used for
|
was more useful in versions before 10.30, where function recursion was used for
|
||||||
backtracking.
|
backtracking.
|
||||||
|
@ -3162,11 +3162,11 @@ The backtracking match limit was reached.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_NOMEMORY
|
PCRE2_ERROR_NOMEMORY
|
||||||
.sp
|
.sp
|
||||||
If a pattern contains many nested backtracking points, heap memory is used to
|
Heap memory is used to remember backgracking points. This error is given when
|
||||||
remember them. This error is given when the memory allocation function (default
|
the memory allocation function (default or custom) fails. Note that a different
|
||||||
or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
|
error, PCRE2_ERROR_HEAPLIMIT, is given if the amount of memory needed exceeds
|
||||||
if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is
|
the heap limit. PCRE2_ERROR_NOMEMORY is also returned if
|
||||||
also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
|
PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_NULL
|
PCRE2_ERROR_NULL
|
||||||
.sp
|
.sp
|
||||||
|
@ -4027,6 +4027,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 14 December 2021
|
Last updated: 27 July 2022
|
||||||
Copyright (c) 1997-2021 University of Cambridge.
|
Copyright (c) 1997-2022 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2BUILD 3 "08 December 2021" "PCRE2 10.40"
|
.TH PCRE2BUILD 3 "27 July 2022" "PCRE2 10.41"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.
|
.
|
||||||
|
@ -278,12 +278,11 @@ to the \fBconfigure\fP command. This setting also applies to the
|
||||||
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
|
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
|
||||||
counting is done differently).
|
counting is done differently).
|
||||||
.P
|
.P
|
||||||
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
|
The \fBpcre2_match()\fP function uses heap memory to record backtracking
|
||||||
stack to record backtracking points. The more nested backtracking points there
|
points. The more nested backtracking points there are (that is, the deeper the
|
||||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
search tree), the more memory is needed. There is an upper limit, specified in
|
||||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
kibibytes (units of 1024 bytes). This limit can be changed at run time, as
|
||||||
which is specified in kibibytes (units of 1024 bytes). The limit can be changed
|
described in the
|
||||||
at run time, as described in the
|
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2api\fP
|
\fBpcre2api\fP
|
||||||
.\"
|
.\"
|
||||||
|
@ -625,7 +624,7 @@ give a warning.
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
University Computing Service
|
Retired from University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
.fi
|
.fi
|
||||||
.
|
.
|
||||||
|
@ -634,6 +633,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 08 December 2021
|
Last updated: 27 July 2022
|
||||||
Copyright (c) 1997-2021 University of Cambridge.
|
Copyright (c) 1997-2022 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2GREP 1 "31 August 2021" "PCRE2 10.38"
|
.TH PCRE2GREP 1 "27 July 2022" "PCRE2 10.41"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2grep - a grep with Perl-compatible regular expressions.
|
pcre2grep - a grep with Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -516,10 +516,7 @@ counter that is incremented each time around its main processing loop. If the
|
||||||
value set by \fB--match-limit\fP is reached, an error occurs.
|
value set by \fB--match-limit\fP is reached, an error occurs.
|
||||||
.sp
|
.sp
|
||||||
The \fB--heap-limit\fP option specifies, as a number of kibibytes (units of
|
The \fB--heap-limit\fP option specifies, as a number of kibibytes (units of
|
||||||
1024 bytes), the amount of heap memory that may be used for matching. Heap
|
1024 bytes), the maximum amount of heap memory that may be used for matching.
|
||||||
memory is needed only if matching the pattern requires a significant number of
|
|
||||||
nested backtracking points to be remembered. This parameter can be set to zero
|
|
||||||
to forbid the use of heap memory altogether.
|
|
||||||
.sp
|
.sp
|
||||||
The \fB--depth-limit\fP option limits the depth of nested backtracking points,
|
The \fB--depth-limit\fP option limits the depth of nested backtracking points,
|
||||||
which indirectly limits the amount of memory that is used. The amount of memory
|
which indirectly limits the amount of memory that is used. The amount of memory
|
||||||
|
@ -960,6 +957,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 31 August 2021
|
Last updated: 27 July 2022
|
||||||
Copyright (c) 1997-2021 University of Cambridge.
|
Copyright (c) 1997-2022 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2LIMITS 3 "03 February 2019" "PCRE2 10.33"
|
.TH PCRE2LIMITS 3 "26 July 2022" "PCRE2 10.41"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "SIZE AND OTHER LIMITATIONS"
|
.SH "SIZE AND OTHER LIMITATIONS"
|
||||||
|
@ -51,6 +51,10 @@ is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
|
||||||
.P
|
.P
|
||||||
The maximum length of a string argument to a callout is the largest number a
|
The maximum length of a string argument to a callout is the largest number a
|
||||||
32-bit unsigned integer can hold.
|
32-bit unsigned integer can hold.
|
||||||
|
.P
|
||||||
|
The maximum amount of heap memory used for matching is controlled by the heap
|
||||||
|
limit, which can be set in a pattern or in a match context. The default is a
|
||||||
|
very large number, effectively unlimited.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH AUTHOR
|
.SH AUTHOR
|
||||||
|
@ -58,7 +62,7 @@ The maximum length of a string argument to a callout is the largest number a
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
University Computing Service
|
Retired from University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
.fi
|
.fi
|
||||||
.
|
.
|
||||||
|
@ -67,6 +71,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 02 February 2019
|
Last updated: 26 July 2022
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2022 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PERFORM 3 "03 February 2019" "PCRE2 10.33"
|
.TH PCRE2PERFORM 3 "27 July 2022" "PCRE2 10.41"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 PERFORMANCE"
|
.SH "PCRE2 PERFORMANCE"
|
||||||
|
@ -69,12 +69,28 @@ From release 10.30, the interpretive (non-JIT) version of \fBpcre2_match()\fP
|
||||||
uses very little system stack at run time. In earlier releases recursive
|
uses very little system stack at run time. In earlier releases recursive
|
||||||
function calls could use a great deal of stack, and this could cause problems,
|
function calls could use a great deal of stack, and this could cause problems,
|
||||||
but this usage has been eliminated. Backtracking positions are now explicitly
|
but this usage has been eliminated. Backtracking positions are now explicitly
|
||||||
remembered in memory frames controlled by the code. An initial 20KiB vector of
|
remembered in memory frames controlled by the code.
|
||||||
frames is allocated on the system stack (enough for about 100 frames for small
|
.P
|
||||||
patterns), but if this is insufficient, heap memory is used. The amount of heap
|
The size of each frame depends on the size of pointer variables and the number
|
||||||
memory can be limited; if the limit is set to zero, only the initial stack
|
of capturing parenthesized groups in the pattern being matched. On a 64-bit
|
||||||
vector is used. Rewriting patterns to be time-efficient, as described below,
|
system the frame size for a pattern with no captures is 128 bytes. For each
|
||||||
may also reduce the memory requirements.
|
capturing group the size increases by 16 bytes.
|
||||||
|
.P
|
||||||
|
Until release 10.41, an initial 20KiB frames vector was allocated on the system
|
||||||
|
stack, but this still caused some issues for multi-thread applications where
|
||||||
|
each thread has a very small stack. From release 10.41 backtracking memory
|
||||||
|
frames are always held in heap memory. An initial heap allocation is obtained
|
||||||
|
the first time any match data block is passed to \fBpcre2_match()\fP. This is
|
||||||
|
remembered with the match data block and re-used if that block is used for
|
||||||
|
another match. It is freed when the match data block itself is freed.
|
||||||
|
.P
|
||||||
|
The size of the initial block is the larger of 20KiB or ten times the pattern's
|
||||||
|
frame size, unless the heap limit is less than this, in which case the heap
|
||||||
|
limit is used. If the initial block proves to be too small during matching, it
|
||||||
|
is replaced by a larger block, subject to the heap limit. The heap limit is
|
||||||
|
checked only when a new block is to be allocated. Reducing the heap limit
|
||||||
|
between calls to \fBpcre2_match()\fP with the same match data block does not
|
||||||
|
affect the saved block.
|
||||||
.P
|
.P
|
||||||
In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive
|
In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive
|
||||||
function calls, but only for processing atomic groups, lookaround assertions,
|
function calls, but only for processing atomic groups, lookaround assertions,
|
||||||
|
@ -230,7 +246,7 @@ pattern to match. This is done by repeatedly matching with different limits.
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
University Computing Service
|
Retired from University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
.fi
|
.fi
|
||||||
.
|
.
|
||||||
|
@ -239,6 +255,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 03 February 2019
|
Last updated: 27 July 2022
|
||||||
Copyright (c) 1997-2019 University of Cambridge.
|
Copyright (c) 1997-2022 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "12 January 2022" "PCRE 10.40"
|
.TH PCRE2TEST 1 "27 July 2022" "PCRE 10.41"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -1206,7 +1206,8 @@ pattern, but can be overridden by modifiers on the subject.
|
||||||
copy=<number or name> copy captured substring
|
copy=<number or name> copy captured substring
|
||||||
depth_limit=<n> set a depth limit
|
depth_limit=<n> set a depth limit
|
||||||
dfa use \fBpcre2_dfa_match()\fP
|
dfa use \fBpcre2_dfa_match()\fP
|
||||||
find_limits find match and depth limits
|
find_limits find heap, match and depth limits
|
||||||
|
find_limits_noheap find match and depth limits
|
||||||
get=<number or name> extract captured substring
|
get=<number or name> extract captured substring
|
||||||
getall extract all captured substrings
|
getall extract all captured substrings
|
||||||
/g global global matching
|
/g global global matching
|
||||||
|
@ -1528,7 +1529,7 @@ value that was set on the pattern.
|
||||||
.sp
|
.sp
|
||||||
The \fBheap_limit\fP, \fBmatch_limit\fP, and \fBdepth_limit\fP modifiers set
|
The \fBheap_limit\fP, \fBmatch_limit\fP, and \fBdepth_limit\fP modifiers set
|
||||||
the appropriate limits in the match context. These values are ignored when the
|
the appropriate limits in the match context. These values are ignored when the
|
||||||
\fBfind_limits\fP modifier is specified.
|
\fBfind_limits\fP or \fBfind_limits_noheap\fP modifier is specified.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Finding minimum limits"
|
.SS "Finding minimum limits"
|
||||||
|
@ -1538,8 +1539,12 @@ If the \fBfind_limits\fP modifier is present on a subject line, \fBpcre2test\fP
|
||||||
calls the relevant matching function several times, setting different values in
|
calls the relevant matching function several times, setting different values in
|
||||||
the match context via \fBpcre2_set_heap_limit()\fP,
|
the match context via \fBpcre2_set_heap_limit()\fP,
|
||||||
\fBpcre2_set_match_limit()\fP, or \fBpcre2_set_depth_limit()\fP until it finds
|
\fBpcre2_set_match_limit()\fP, or \fBpcre2_set_depth_limit()\fP until it finds
|
||||||
the minimum values for each parameter that allows the match to complete without
|
the smallest value for each parameter that allows the match to complete without
|
||||||
error. If JIT is being used, only the match limit is relevant.
|
a "limit exceeded" error. The match itself may succeed or fail. An alternative
|
||||||
|
modifier, \fBfind_limits_noheap\fP, omits the heap limit. This is used in the
|
||||||
|
standard tests, because the minimum heap limit varies between systems. If JIT
|
||||||
|
is being used, only the match limit is relevant, and the other two are
|
||||||
|
automatically omitted.
|
||||||
.P
|
.P
|
||||||
When using this modifier, the pattern should not contain any limit settings
|
When using this modifier, the pattern should not contain any limit settings
|
||||||
such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
|
such as (*LIMIT_MATCH=...) within it. If such a setting is present and is
|
||||||
|
@ -1563,9 +1568,7 @@ and non-recursive, to the internal matching function, thus controlling the
|
||||||
overall amount of computing resource that is used.
|
overall amount of computing resource that is used.
|
||||||
.P
|
.P
|
||||||
For both kinds of matching, the \fIheap_limit\fP number, which is in kibibytes
|
For both kinds of matching, the \fIheap_limit\fP number, which is in kibibytes
|
||||||
(units of 1024 bytes), limits the amount of heap memory used for matching. A
|
(units of 1024 bytes), limits the amount of heap memory used for matching.
|
||||||
value of zero disables the use of any heap memory; many simple pattern matches
|
|
||||||
can be done without using the heap, so zero is not an unreasonable setting.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Showing MARK names"
|
.SS "Showing MARK names"
|
||||||
|
@ -1584,12 +1587,10 @@ is added to the non-match message.
|
||||||
.sp
|
.sp
|
||||||
The \fBmemory\fP modifier causes \fBpcre2test\fP to log the sizes of all heap
|
The \fBmemory\fP modifier causes \fBpcre2test\fP to log the sizes of all heap
|
||||||
memory allocation and freeing calls that occur during a call to
|
memory allocation and freeing calls that occur during a call to
|
||||||
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP. These occur only when a match
|
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP. In the latter case, heap memory
|
||||||
requires a bigger vector than the default for remembering backtracking points
|
is used only when a match requires more internal workspace that the default
|
||||||
(\fBpcre2_match()\fP) or for internal workspace (\fBpcre2_dfa_match()\fP). In
|
allocation on the stack, so in many cases there will be no output. No heap
|
||||||
many cases there will be no heap memory used and therefore no additional
|
memory is allocated during matching with JIT. For this modifier to work, the
|
||||||
output. No heap memory is allocated during matching with JIT, so in that case
|
|
||||||
the \fBmemory\fP modifier never has any effect. For this modifier to work, the
|
|
||||||
\fBnull_context\fP modifier must not be set on both the pattern and the
|
\fBnull_context\fP modifier must not be set on both the pattern and the
|
||||||
subject, though it can be set on one or the other.
|
subject, though it can be set on one or the other.
|
||||||
.
|
.
|
||||||
|
@ -1649,7 +1650,8 @@ Normally, \fBpcre2test\fP passes a context block to \fBpcre2_match()\fP,
|
||||||
If the \fBnull_context\fP modifier is set, however, NULL is passed. This is for
|
If the \fBnull_context\fP modifier is set, however, NULL is passed. This is for
|
||||||
testing that the matching and substitution functions behave correctly in this
|
testing that the matching and substitution functions behave correctly in this
|
||||||
case (they use default values). This modifier cannot be used with the
|
case (they use default values). This modifier cannot be used with the
|
||||||
\fBfind_limits\fP or \fBsubstitute_callout\fP modifiers.
|
\fBfind_limits\fP, \fBfind_limits_noheap\fP, or \fBsubstitute_callout\fP
|
||||||
|
modifiers.
|
||||||
.P
|
.P
|
||||||
Similarly, for testing purposes, if the \fBnull_subject\fP or
|
Similarly, for testing purposes, if the \fBnull_subject\fP or
|
||||||
\fBnull_replacement\fP modifier is set, the subject or replacement string
|
\fBnull_replacement\fP modifier is set, the subject or replacement string
|
||||||
|
@ -2119,6 +2121,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 12 January 2022
|
Last updated: 27 July 2022
|
||||||
Copyright (c) 1997-2022 University of Cambridge.
|
Copyright (c) 1997-2022 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -220,18 +220,17 @@ not rely on this. */
|
||||||
|
|
||||||
#define COMPILE_ERROR_BASE 100
|
#define COMPILE_ERROR_BASE 100
|
||||||
|
|
||||||
/* The initial frames vector for remembering backtracking points in
|
/* The initial frames vector for remembering pcre2_match() backtracking points
|
||||||
pcre2_match() is allocated on the system stack, of this size (bytes). The size
|
is allocated on the heap, of this size (bytes) or ten times the frame size if
|
||||||
must be a multiple of sizeof(PCRE2_SPTR) in all environments, so making it a
|
larger, unless the heap limit is smaller. Typical frame sizes are a few hundred
|
||||||
multiple of 8 is best. Typical frame sizes are a few hundred bytes (it depends
|
bytes (it depends on the number of capturing parentheses) so 20KiB handles
|
||||||
on the number of capturing parentheses) so 20KiB handles quite a few frames. A
|
quite a few frames. A larger vector on the heap is obtained for matches that
|
||||||
larger vector on the heap is obtained for patterns that need more frames. The
|
need more frames, subject to the heap limit. */
|
||||||
maximum size of this can be limited. */
|
|
||||||
|
|
||||||
#define START_FRAMES_SIZE 20480
|
#define START_FRAMES_SIZE 20480
|
||||||
|
|
||||||
/* Similarly, for DFA matching, an initial internal workspace vector is
|
/* For DFA matching, an initial internal workspace vector is allocated on the
|
||||||
allocated on the stack. */
|
stack. The heap is used only if this turns out to be too small. */
|
||||||
|
|
||||||
#define DFA_START_RWS_SIZE 30720
|
#define DFA_START_RWS_SIZE 30720
|
||||||
|
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2018 University of Cambridge
|
New API code Copyright (c) 2016-2022 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -649,19 +649,23 @@ the size varies from call to call. As the maximum number of capturing
|
||||||
subpatterns is 65535 we must allow for 65536 strings to include the overall
|
subpatterns is 65535 we must allow for 65536 strings to include the overall
|
||||||
match. (See also the heapframe structure below.) */
|
match. (See also the heapframe structure below.) */
|
||||||
|
|
||||||
|
struct heapframe; /* Forward reference */
|
||||||
|
|
||||||
typedef struct pcre2_real_match_data {
|
typedef struct pcre2_real_match_data {
|
||||||
pcre2_memctl memctl;
|
pcre2_memctl memctl; /* Memory control fields */
|
||||||
const pcre2_real_code *code; /* The pattern used for the match */
|
const pcre2_real_code *code; /* The pattern used for the match */
|
||||||
PCRE2_SPTR subject; /* The subject that was matched */
|
PCRE2_SPTR subject; /* The subject that was matched */
|
||||||
PCRE2_SPTR mark; /* Pointer to last mark */
|
PCRE2_SPTR mark; /* Pointer to last mark */
|
||||||
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
|
struct heapframe *heapframes; /* Backtracking frames heap memory */
|
||||||
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
|
PCRE2_SIZE heapframes_size; /* Malloc-ed size */
|
||||||
PCRE2_SIZE startchar; /* Offset to starting code unit */
|
PCRE2_SIZE leftchar; /* Offset to leftmost code unit */
|
||||||
uint8_t matchedby; /* Type of match (normal, JIT, DFA) */
|
PCRE2_SIZE rightchar; /* Offset to rightmost code unit */
|
||||||
uint8_t flags; /* Various flags */
|
PCRE2_SIZE startchar; /* Offset to starting code unit */
|
||||||
uint16_t oveccount; /* Number of pairs */
|
uint8_t matchedby; /* Type of match (normal, JIT, DFA) */
|
||||||
int rc; /* The return code from the match */
|
uint8_t flags; /* Various flags */
|
||||||
PCRE2_SIZE ovector[131072]; /* Must be last in the structure */
|
uint16_t oveccount; /* Number of pairs */
|
||||||
|
int rc; /* The return code from the match */
|
||||||
|
PCRE2_SIZE ovector[131072]; /* Must be last in the structure */
|
||||||
} pcre2_real_match_data;
|
} pcre2_real_match_data;
|
||||||
|
|
||||||
|
|
||||||
|
@ -854,10 +858,6 @@ doing traditional NFA matching (pcre2_match() and friends). */
|
||||||
|
|
||||||
typedef struct match_block {
|
typedef struct match_block {
|
||||||
pcre2_memctl memctl; /* For general use */
|
pcre2_memctl memctl; /* For general use */
|
||||||
PCRE2_SIZE frame_vector_size; /* Size of a backtracking frame */
|
|
||||||
heapframe *match_frames; /* Points to vector of frames */
|
|
||||||
heapframe *match_frames_top; /* Points after the end of the vector */
|
|
||||||
heapframe *stack_frames; /* The original vector on the stack */
|
|
||||||
PCRE2_SIZE heap_limit; /* As it says */
|
PCRE2_SIZE heap_limit; /* As it says */
|
||||||
uint32_t match_limit; /* As it says */
|
uint32_t match_limit; /* As it says */
|
||||||
uint32_t match_limit_depth; /* As it says */
|
uint32_t match_limit_depth; /* As it says */
|
||||||
|
|
|
@ -204,6 +204,7 @@ Arguments:
|
||||||
P a previous frame of interest
|
P a previous frame of interest
|
||||||
frame_size the frame size
|
frame_size the frame size
|
||||||
mb points to the match block
|
mb points to the match block
|
||||||
|
match_data points to the match data block
|
||||||
s identification text
|
s identification text
|
||||||
|
|
||||||
Returns: nothing
|
Returns: nothing
|
||||||
|
@ -211,7 +212,7 @@ Returns: nothing
|
||||||
|
|
||||||
static void
|
static void
|
||||||
display_frames(FILE *f, heapframe *F, heapframe *P, PCRE2_SIZE frame_size,
|
display_frames(FILE *f, heapframe *F, heapframe *P, PCRE2_SIZE frame_size,
|
||||||
match_block *mb, const char *s, ...)
|
match_block *mb, pcre2_match_data *match_data, const char *s, ...)
|
||||||
{
|
{
|
||||||
uint32_t i;
|
uint32_t i;
|
||||||
heapframe *Q;
|
heapframe *Q;
|
||||||
|
@ -223,10 +224,10 @@ vfprintf(f, s, ap);
|
||||||
va_end(ap);
|
va_end(ap);
|
||||||
|
|
||||||
if (P != NULL) fprintf(f, " P=%lu",
|
if (P != NULL) fprintf(f, " P=%lu",
|
||||||
((char *)P - (char *)(mb->match_frames))/frame_size);
|
((char *)P - (char *)(match_data->heapframes))/frame_size);
|
||||||
fprintf(f, "\n");
|
fprintf(f, "\n");
|
||||||
|
|
||||||
for (i = 0, Q = mb->match_frames;
|
for (i = 0, Q = match_data->heapframes;
|
||||||
Q <= F;
|
Q <= F;
|
||||||
i++, Q = (heapframe *)((char *)Q + frame_size))
|
i++, Q = (heapframe *)((char *)Q + frame_size))
|
||||||
{
|
{
|
||||||
|
@ -490,10 +491,16 @@ A version did exist that used individual frames on the heap instead of calling
|
||||||
match() recursively, but this ran substantially slower. The current version is
|
match() recursively, but this ran substantially slower. The current version is
|
||||||
a refactoring that uses a vector of frames to remember backtracking points.
|
a refactoring that uses a vector of frames to remember backtracking points.
|
||||||
This runs no slower, and possibly even a bit faster than the original recursive
|
This runs no slower, and possibly even a bit faster than the original recursive
|
||||||
implementation. An initial vector of size START_FRAMES_SIZE (enough for maybe
|
implementation.
|
||||||
50 frames) is allocated on the system stack. If this is not big enough, the
|
|
||||||
heap is used for a larger vector.
|
|
||||||
|
|
||||||
|
At first, an initial vector of size START_FRAMES_SIZE (enough for maybe 50
|
||||||
|
frames) was allocated on the system stack. If this was not big enough, the heap
|
||||||
|
was used for a larger vector. However, it turns out that there are environments
|
||||||
|
where taking as little as 20KiB from the system stack is an embarrassment.
|
||||||
|
After another refactoring, the heap is used exclusively, but a pointer the
|
||||||
|
frames vector and its size are cached in the match_data block, so that there is
|
||||||
|
no new memory allocation if the same match_data block is used for multiple
|
||||||
|
matches (unless the frames vector has to be extended).
|
||||||
*******************************************************************************
|
*******************************************************************************
|
||||||
******************************************************************************/
|
******************************************************************************/
|
||||||
|
|
||||||
|
@ -566,10 +573,9 @@ made performance worse.
|
||||||
Arguments:
|
Arguments:
|
||||||
start_eptr starting character in subject
|
start_eptr starting character in subject
|
||||||
start_ecode starting position in compiled code
|
start_ecode starting position in compiled code
|
||||||
ovector pointer to the final output vector
|
|
||||||
oveccount number of pairs in ovector
|
|
||||||
top_bracket number of capturing parentheses in the pattern
|
top_bracket number of capturing parentheses in the pattern
|
||||||
frame_size size of each backtracking frame
|
frame_size size of each backtracking frame
|
||||||
|
match_data pointer to the match_data block
|
||||||
mb pointer to "static" variables block
|
mb pointer to "static" variables block
|
||||||
|
|
||||||
Returns: MATCH_MATCH if matched ) these values are >= 0
|
Returns: MATCH_MATCH if matched ) these values are >= 0
|
||||||
|
@ -580,17 +586,19 @@ Returns: MATCH_MATCH if matched ) these values are >= 0
|
||||||
*/
|
*/
|
||||||
|
|
||||||
static int
|
static int
|
||||||
match(PCRE2_SPTR start_eptr, PCRE2_SPTR start_ecode, PCRE2_SIZE *ovector,
|
match(PCRE2_SPTR start_eptr, PCRE2_SPTR start_ecode, uint16_t top_bracket,
|
||||||
uint16_t oveccount, uint16_t top_bracket, PCRE2_SIZE frame_size,
|
PCRE2_SIZE frame_size, pcre2_match_data *match_data, match_block *mb)
|
||||||
match_block *mb)
|
|
||||||
{
|
{
|
||||||
/* Frame-handling variables */
|
/* Frame-handling variables */
|
||||||
|
|
||||||
heapframe *F; /* Current frame pointer */
|
heapframe *F; /* Current frame pointer */
|
||||||
heapframe *N = NULL; /* Temporary frame pointers */
|
heapframe *N = NULL; /* Temporary frame pointers */
|
||||||
heapframe *P = NULL;
|
heapframe *P = NULL;
|
||||||
|
|
||||||
|
heapframe *frames_top; /* End of frames vector */
|
||||||
heapframe *assert_accept_frame = NULL; /* For passing back a frame with captures */
|
heapframe *assert_accept_frame = NULL; /* For passing back a frame with captures */
|
||||||
PCRE2_SIZE frame_copy_size; /* Amount to copy when creating a new frame */
|
PCRE2_SIZE heapframes_size; /* Usable size of frames vector */
|
||||||
|
PCRE2_SIZE frame_copy_size; /* Amount to copy when creating a new frame */
|
||||||
|
|
||||||
/* Local variables that do not need to be preserved over calls to RRMATCH(). */
|
/* Local variables that do not need to be preserved over calls to RRMATCH(). */
|
||||||
|
|
||||||
|
@ -627,10 +635,14 @@ copied when a new frame is created. */
|
||||||
|
|
||||||
frame_copy_size = frame_size - offsetof(heapframe, eptr);
|
frame_copy_size = frame_size - offsetof(heapframe, eptr);
|
||||||
|
|
||||||
/* Set up the first current frame at the start of the vector, and initialize
|
/* Set up the first frame and the end of the frames vector. We set the local
|
||||||
fields that are not reset for new frames. */
|
heapframes_size to the usuable amount of the vector, that is, a whole number of
|
||||||
|
frames. */
|
||||||
|
|
||||||
|
F = match_data->heapframes;
|
||||||
|
heapframes_size = (match_data->heapframes_size / frame_size) * frame_size;
|
||||||
|
frames_top = (heapframe *)((char *)F + heapframes_size);
|
||||||
|
|
||||||
F = mb->match_frames;
|
|
||||||
Frdepth = 0; /* "Recursion" depth */
|
Frdepth = 0; /* "Recursion" depth */
|
||||||
Fcapture_last = 0; /* Number of most recent capture */
|
Fcapture_last = 0; /* Number of most recent capture */
|
||||||
Fcurrent_recurse = RECURSE_UNSET; /* Not pattern recursing. */
|
Fcurrent_recurse = RECURSE_UNSET; /* Not pattern recursing. */
|
||||||
|
@ -646,34 +658,35 @@ backtracking point. */
|
||||||
|
|
||||||
MATCH_RECURSE:
|
MATCH_RECURSE:
|
||||||
|
|
||||||
/* Set up a new backtracking frame. If the vector is full, get a new one
|
/* Set up a new backtracking frame. If the vector is full, get a new one,
|
||||||
on the heap, doubling the size, but constrained by the heap limit. */
|
doubling the size, but constrained by the heap limit (which is in KiB). */
|
||||||
|
|
||||||
N = (heapframe *)((char *)F + frame_size);
|
N = (heapframe *)((char *)F + frame_size);
|
||||||
if (N >= mb->match_frames_top)
|
if (N >= frames_top)
|
||||||
{
|
{
|
||||||
PCRE2_SIZE newsize = mb->frame_vector_size * 2;
|
|
||||||
heapframe *new;
|
heapframe *new;
|
||||||
|
PCRE2_SIZE newsize = match_data->heapframes_size * 2;
|
||||||
|
|
||||||
if ((newsize / 1024) > mb->heap_limit)
|
if (newsize > mb->heap_limit)
|
||||||
{
|
{
|
||||||
PCRE2_SIZE maxsize = ((mb->heap_limit * 1024)/frame_size) * frame_size;
|
PCRE2_SIZE maxsize = (mb->heap_limit/frame_size) * frame_size;
|
||||||
if (mb->frame_vector_size >= maxsize) return PCRE2_ERROR_HEAPLIMIT;
|
if (match_data->heapframes_size >= maxsize) return PCRE2_ERROR_HEAPLIMIT;
|
||||||
newsize = maxsize;
|
newsize = maxsize;
|
||||||
}
|
}
|
||||||
|
|
||||||
new = mb->memctl.malloc(newsize, mb->memctl.memory_data);
|
new = match_data->memctl.malloc(newsize, match_data->memctl.memory_data);
|
||||||
if (new == NULL) return PCRE2_ERROR_NOMEMORY;
|
if (new == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
memcpy(new, mb->match_frames, mb->frame_vector_size);
|
memcpy(new, match_data->heapframes, heapframes_size);
|
||||||
|
|
||||||
F = (heapframe *)((char *)new + ((char *)F - (char *)mb->match_frames));
|
F = (heapframe *)((char *)new + ((char *)F - (char *)match_data->heapframes));
|
||||||
N = (heapframe *)((char *)F + frame_size);
|
N = (heapframe *)((char *)F + frame_size);
|
||||||
|
|
||||||
if (mb->match_frames != mb->stack_frames)
|
match_data->memctl.free(match_data->heapframes, match_data->memctl.memory_data);
|
||||||
mb->memctl.free(mb->match_frames, mb->memctl.memory_data);
|
match_data->heapframes = new;
|
||||||
mb->match_frames = new;
|
match_data->heapframes_size = newsize;
|
||||||
mb->match_frames_top = (heapframe *)((char *)mb->match_frames + newsize);
|
|
||||||
mb->frame_vector_size = newsize;
|
heapframes_size = (newsize / frame_size) * frame_size;
|
||||||
|
frames_top = (heapframe *)((char *)new + heapframes_size);
|
||||||
}
|
}
|
||||||
|
|
||||||
#ifdef DEBUG_SHOW_RMATCH
|
#ifdef DEBUG_SHOW_RMATCH
|
||||||
|
@ -731,7 +744,7 @@ recursion value. */
|
||||||
|
|
||||||
if (group_frame_type != 0)
|
if (group_frame_type != 0)
|
||||||
{
|
{
|
||||||
Flast_group_offset = (char *)F - (char *)mb->match_frames;
|
Flast_group_offset = (char *)F - (char *)match_data->heapframes;
|
||||||
if (GF_IDMASK(group_frame_type) == GF_RECURSE)
|
if (GF_IDMASK(group_frame_type) == GF_RECURSE)
|
||||||
Fcurrent_recurse = GF_DATAMASK(group_frame_type);
|
Fcurrent_recurse = GF_DATAMASK(group_frame_type);
|
||||||
group_frame_type = 0;
|
group_frame_type = 0;
|
||||||
|
@ -773,7 +786,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
for(;;)
|
for(;;)
|
||||||
{
|
{
|
||||||
if (offset == PCRE2_UNSET) return PCRE2_ERROR_INTERNAL;
|
if (offset == PCRE2_UNSET) return PCRE2_ERROR_INTERNAL;
|
||||||
N = (heapframe *)((char *)mb->match_frames + offset);
|
N = (heapframe *)((char *)match_data->heapframes + offset);
|
||||||
P = (heapframe *)((char *)N - frame_size);
|
P = (heapframe *)((char *)N - frame_size);
|
||||||
if (N->group_frame_type == (GF_CAPTURE | number)) break;
|
if (N->group_frame_type == (GF_CAPTURE | number)) break;
|
||||||
offset = P->last_group_offset;
|
offset = P->last_group_offset;
|
||||||
|
@ -811,7 +824,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
for(;;)
|
for(;;)
|
||||||
{
|
{
|
||||||
if (offset == PCRE2_UNSET) return PCRE2_ERROR_INTERNAL;
|
if (offset == PCRE2_UNSET) return PCRE2_ERROR_INTERNAL;
|
||||||
N = (heapframe *)((char *)mb->match_frames + offset);
|
N = (heapframe *)((char *)match_data->heapframes + offset);
|
||||||
P = (heapframe *)((char *)N - frame_size);
|
P = (heapframe *)((char *)N - frame_size);
|
||||||
if (GF_IDMASK(N->group_frame_type) == GF_RECURSE) break;
|
if (GF_IDMASK(N->group_frame_type) == GF_RECURSE) break;
|
||||||
offset = P->last_group_offset;
|
offset = P->last_group_offset;
|
||||||
|
@ -864,14 +877,15 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
mb->mark = Fmark; /* and the last success mark */
|
mb->mark = Fmark; /* and the last success mark */
|
||||||
if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr;
|
if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr;
|
||||||
|
|
||||||
ovector[0] = Fstart_match - mb->start_subject;
|
match_data->ovector[0] = Fstart_match - mb->start_subject;
|
||||||
ovector[1] = Feptr - mb->start_subject;
|
match_data->ovector[1] = Feptr - mb->start_subject;
|
||||||
|
|
||||||
/* Set i to the smaller of the sizes of the external and frame ovectors. */
|
/* Set i to the smaller of the sizes of the external and frame ovectors. */
|
||||||
|
|
||||||
i = 2 * ((top_bracket + 1 > oveccount)? oveccount : top_bracket + 1);
|
i = 2 * ((top_bracket + 1 > match_data->oveccount)?
|
||||||
memcpy(ovector + 2, Fovector, (i - 2) * sizeof(PCRE2_SIZE));
|
match_data->oveccount : top_bracket + 1);
|
||||||
while (--i >= Foffset_top + 2) ovector[i] = PCRE2_UNSET;
|
memcpy(match_data->ovector + 2, Fovector, (i - 2) * sizeof(PCRE2_SIZE));
|
||||||
|
while (--i >= Foffset_top + 2) match_data->ovector[i] = PCRE2_UNSET;
|
||||||
return MATCH_MATCH; /* Note: NOT RRETURN */
|
return MATCH_MATCH; /* Note: NOT RRETURN */
|
||||||
|
|
||||||
|
|
||||||
|
@ -5328,7 +5342,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
offset = Flast_group_offset;
|
offset = Flast_group_offset;
|
||||||
while (offset != PCRE2_UNSET)
|
while (offset != PCRE2_UNSET)
|
||||||
{
|
{
|
||||||
N = (heapframe *)((char *)mb->match_frames + offset);
|
N = (heapframe *)((char *)match_data->heapframes + offset);
|
||||||
P = (heapframe *)((char *)N - frame_size);
|
P = (heapframe *)((char *)N - frame_size);
|
||||||
if (N->group_frame_type == (GF_RECURSE | number))
|
if (N->group_frame_type == (GF_RECURSE | number))
|
||||||
{
|
{
|
||||||
|
@ -5729,7 +5743,7 @@ fprintf(stderr, "++ op=%d\n", *Fecode);
|
||||||
|
|
||||||
if (*bracode != OP_BRA && *bracode != OP_COND)
|
if (*bracode != OP_BRA && *bracode != OP_COND)
|
||||||
{
|
{
|
||||||
N = (heapframe *)((char *)mb->match_frames + Flast_group_offset);
|
N = (heapframe *)((char *)match_data->heapframes + Flast_group_offset);
|
||||||
P = (heapframe *)((char *)N - frame_size);
|
P = (heapframe *)((char *)N - frame_size);
|
||||||
Flast_group_offset = P->last_group_offset;
|
Flast_group_offset = P->last_group_offset;
|
||||||
|
|
||||||
|
@ -6346,6 +6360,7 @@ BOOL jit_checked_utf = FALSE;
|
||||||
#endif /* SUPPORT_UNICODE */
|
#endif /* SUPPORT_UNICODE */
|
||||||
|
|
||||||
PCRE2_SIZE frame_size;
|
PCRE2_SIZE frame_size;
|
||||||
|
PCRE2_SIZE heapframes_size;
|
||||||
|
|
||||||
/* We need to have mb as a pointer to a match block, because the IS_NEWLINE
|
/* We need to have mb as a pointer to a match block, because the IS_NEWLINE
|
||||||
macro is used below, and it expects NLBLOCK to be defined as a pointer. */
|
macro is used below, and it expects NLBLOCK to be defined as a pointer. */
|
||||||
|
@ -6354,15 +6369,6 @@ pcre2_callout_block cb;
|
||||||
match_block actual_match_block;
|
match_block actual_match_block;
|
||||||
match_block *mb = &actual_match_block;
|
match_block *mb = &actual_match_block;
|
||||||
|
|
||||||
/* Allocate an initial vector of backtracking frames on the stack. If this
|
|
||||||
proves to be too small, it is replaced by a larger one on the heap. To get a
|
|
||||||
vector of the size required that is aligned for pointers, allocate it as a
|
|
||||||
vector of pointers. */
|
|
||||||
|
|
||||||
PCRE2_SPTR stack_frames_vector[START_FRAMES_SIZE/sizeof(PCRE2_SPTR)]
|
|
||||||
PCRE2_KEEP_UNINITIALIZED;
|
|
||||||
mb->stack_frames = (heapframe *)stack_frames_vector;
|
|
||||||
|
|
||||||
/* Recognize NULL, length 0 as an empty string. */
|
/* Recognize NULL, length 0 as an empty string. */
|
||||||
|
|
||||||
if (subject == NULL && length == 0) subject = (PCRE2_SPTR)"";
|
if (subject == NULL && length == 0) subject = (PCRE2_SPTR)"";
|
||||||
|
@ -6793,15 +6799,11 @@ switch(re->newline_convention)
|
||||||
vector at the end, whose size depends on the number of capturing parentheses in
|
vector at the end, whose size depends on the number of capturing parentheses in
|
||||||
the pattern. It is not used at all if there are no capturing parentheses.
|
the pattern. It is not used at all if there are no capturing parentheses.
|
||||||
|
|
||||||
frame_size is the total size of each frame
|
frame_size is the total size of each frame
|
||||||
mb->frame_vector_size is the total usable size of the vector (rounded down
|
match_data->heapframes is the pointer to the frames vector
|
||||||
to a whole number of frames)
|
match_data->heapframes_size is the total size of the vector
|
||||||
|
|
||||||
The last of these is changed within the match() function if the frame vector
|
We must pad the frame_size for alignment to ensure subsequent frames are as
|
||||||
has to be expanded. We therefore put it into the match block so that it is
|
|
||||||
correct when calling match() more than once for non-anchored patterns.
|
|
||||||
|
|
||||||
We must also pad frame_size for alignment to ensure subsequent frames are as
|
|
||||||
aligned as heapframe. Whilst ovector is word-aligned due to being a PCRE2_SIZE
|
aligned as heapframe. Whilst ovector is word-aligned due to being a PCRE2_SIZE
|
||||||
array, that does not guarantee it is suitably aligned for pointers, as some
|
array, that does not guarantee it is suitably aligned for pointers, as some
|
||||||
architectures have pointers that are larger than a size_t. */
|
architectures have pointers that are larger than a size_t. */
|
||||||
|
@ -6813,8 +6815,8 @@ frame_size = (offsetof(heapframe, ovector) +
|
||||||
/* Limits set in the pattern override the match context only if they are
|
/* Limits set in the pattern override the match context only if they are
|
||||||
smaller. */
|
smaller. */
|
||||||
|
|
||||||
mb->heap_limit = (mcontext->heap_limit < re->limit_heap)?
|
mb->heap_limit = ((mcontext->heap_limit < re->limit_heap)?
|
||||||
mcontext->heap_limit : re->limit_heap;
|
mcontext->heap_limit : re->limit_heap) * 1024;
|
||||||
|
|
||||||
mb->match_limit = (mcontext->match_limit < re->limit_match)?
|
mb->match_limit = (mcontext->match_limit < re->limit_match)?
|
||||||
mcontext->match_limit : re->limit_match;
|
mcontext->match_limit : re->limit_match;
|
||||||
|
@ -6823,35 +6825,40 @@ mb->match_limit_depth = (mcontext->depth_limit < re->limit_depth)?
|
||||||
mcontext->depth_limit : re->limit_depth;
|
mcontext->depth_limit : re->limit_depth;
|
||||||
|
|
||||||
/* If a pattern has very many capturing parentheses, the frame size may be very
|
/* If a pattern has very many capturing parentheses, the frame size may be very
|
||||||
large. Ensure that there are at least 10 available frames by getting an initial
|
large. Set the initial frame vector size to ensure that there are at least 10
|
||||||
vector on the heap if necessary, except when the heap limit prevents this. Get
|
available frames, but enforce a minimum of START_FRAMES_SIZE. If this is
|
||||||
fewer if possible. (The heap limit is in kibibytes.) */
|
greater than the heap limit, get as large a vector as possible. Always round
|
||||||
|
the size to a multiple of the frame size. (The heap limit is in kibibytes.) */
|
||||||
|
|
||||||
if (frame_size <= START_FRAMES_SIZE/10)
|
heapframes_size = frame_size * 10;
|
||||||
|
if (heapframes_size < START_FRAMES_SIZE) heapframes_size = START_FRAMES_SIZE;
|
||||||
|
if (heapframes_size > mb->heap_limit)
|
||||||
{
|
{
|
||||||
mb->match_frames = mb->stack_frames; /* Initial frame vector on the stack */
|
if (frame_size > mb->heap_limit ) return PCRE2_ERROR_HEAPLIMIT;
|
||||||
mb->frame_vector_size = ((START_FRAMES_SIZE/frame_size) * frame_size);
|
heapframes_size = mb->heap_limit;
|
||||||
}
|
}
|
||||||
else
|
|
||||||
|
/* If an existing frame vector in the match_data block is large enough, we can
|
||||||
|
use it.Otherwise, free any pre-existing vector and get a new one. */
|
||||||
|
|
||||||
|
if (match_data->heapframes_size < heapframes_size)
|
||||||
{
|
{
|
||||||
mb->frame_vector_size = frame_size * 10;
|
match_data->memctl.free(match_data->heapframes,
|
||||||
if ((mb->frame_vector_size / 1024) > mb->heap_limit)
|
match_data->memctl.memory_data);
|
||||||
|
match_data->heapframes = match_data->memctl.malloc(heapframes_size,
|
||||||
|
match_data->memctl.memory_data);
|
||||||
|
if (match_data->heapframes == NULL)
|
||||||
{
|
{
|
||||||
if (frame_size > mb->heap_limit * 1024) return PCRE2_ERROR_HEAPLIMIT;
|
match_data->heapframes_size = 0;
|
||||||
mb->frame_vector_size = ((mb->heap_limit * 1024)/frame_size) * frame_size;
|
return PCRE2_ERROR_NOMEMORY;
|
||||||
}
|
}
|
||||||
mb->match_frames = mb->memctl.malloc(mb->frame_vector_size,
|
match_data->heapframes_size = heapframes_size;
|
||||||
mb->memctl.memory_data);
|
|
||||||
if (mb->match_frames == NULL) return PCRE2_ERROR_NOMEMORY;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
mb->match_frames_top =
|
|
||||||
(heapframe *)((char *)mb->match_frames + mb->frame_vector_size);
|
|
||||||
|
|
||||||
/* Write to the ovector within the first frame to mark every capture unset and
|
/* Write to the ovector within the first frame to mark every capture unset and
|
||||||
to avoid uninitialized memory read errors when it is copied to a new frame. */
|
to avoid uninitialized memory read errors when it is copied to a new frame. */
|
||||||
|
|
||||||
memset((char *)(mb->match_frames) + offsetof(heapframe, ovector), 0xff,
|
memset((char *)(match_data->heapframes) + offsetof(heapframe, ovector), 0xff,
|
||||||
frame_size - offsetof(heapframe, ovector));
|
frame_size - offsetof(heapframe, ovector));
|
||||||
|
|
||||||
/* Pointers to the individual character tables */
|
/* Pointers to the individual character tables */
|
||||||
|
@ -7279,8 +7286,8 @@ for(;;)
|
||||||
mb->end_offset_top = 0;
|
mb->end_offset_top = 0;
|
||||||
mb->skip_arg_count = 0;
|
mb->skip_arg_count = 0;
|
||||||
|
|
||||||
rc = match(start_match, mb->start_code, match_data->ovector,
|
rc = match(start_match, mb->start_code, re->top_bracket, frame_size,
|
||||||
match_data->oveccount, re->top_bracket, frame_size, mb);
|
match_data, mb);
|
||||||
|
|
||||||
if (mb->hitend && start_partial == NULL)
|
if (mb->hitend && start_partial == NULL)
|
||||||
{
|
{
|
||||||
|
@ -7463,11 +7470,6 @@ if (utf && end_subject != true_end_subject &&
|
||||||
}
|
}
|
||||||
#endif /* SUPPORT_UNICODE */
|
#endif /* SUPPORT_UNICODE */
|
||||||
|
|
||||||
/* Release an enlarged frame vector that is on the heap. */
|
|
||||||
|
|
||||||
if (mb->match_frames != mb->stack_frames)
|
|
||||||
mb->memctl.free(mb->match_frames, mb->memctl.memory_data);
|
|
||||||
|
|
||||||
/* Fill in fields that are always returned in the match data. */
|
/* Fill in fields that are always returned in the match data. */
|
||||||
|
|
||||||
match_data->code = re;
|
match_data->code = re;
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2019 University of Cambridge
|
New API code Copyright (c) 2016-2022 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -64,6 +64,8 @@ yield = PRIV(memctl_malloc)(
|
||||||
if (yield == NULL) return NULL;
|
if (yield == NULL) return NULL;
|
||||||
yield->oveccount = oveccount;
|
yield->oveccount = oveccount;
|
||||||
yield->flags = 0;
|
yield->flags = 0;
|
||||||
|
yield->heapframes = NULL;
|
||||||
|
yield->heapframes_size = 0;
|
||||||
return yield;
|
return yield;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -95,6 +97,9 @@ pcre2_match_data_free(pcre2_match_data *match_data)
|
||||||
{
|
{
|
||||||
if (match_data != NULL)
|
if (match_data != NULL)
|
||||||
{
|
{
|
||||||
|
if (match_data->heapframes != NULL)
|
||||||
|
match_data->memctl.free(match_data->heapframes,
|
||||||
|
match_data->memctl.memory_data);
|
||||||
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
|
if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0)
|
||||||
match_data->memctl.free((void *)match_data->subject,
|
match_data->memctl.free((void *)match_data->subject,
|
||||||
match_data->memctl.memory_data);
|
match_data->memctl.memory_data);
|
||||||
|
|
|
@ -7,7 +7,7 @@ and semantics are as close as possible to those of the Perl 5 language.
|
||||||
|
|
||||||
Written by Philip Hazel
|
Written by Philip Hazel
|
||||||
Original API code Copyright (c) 1997-2012 University of Cambridge
|
Original API code Copyright (c) 1997-2012 University of Cambridge
|
||||||
New API code Copyright (c) 2016-2021 University of Cambridge
|
New API code Copyright (c) 2016-2022 University of Cambridge
|
||||||
|
|
||||||
-----------------------------------------------------------------------------
|
-----------------------------------------------------------------------------
|
||||||
Redistribution and use in source and binary forms, with or without
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
@ -259,16 +259,16 @@ PCRE2_UNSET, so as not to imply an offset in the replacement. */
|
||||||
|
|
||||||
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
|
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
|
||||||
return PCRE2_ERROR_BADOPTION;
|
return PCRE2_ERROR_BADOPTION;
|
||||||
|
|
||||||
/* Validate length and find the end of the replacement. A NULL replacement of
|
/* Validate length and find the end of the replacement. A NULL replacement of
|
||||||
zero length is interpreted as an empty string. */
|
zero length is interpreted as an empty string. */
|
||||||
|
|
||||||
if (replacement == NULL)
|
if (replacement == NULL)
|
||||||
{
|
{
|
||||||
if (rlength != 0) return PCRE2_ERROR_NULL;
|
if (rlength != 0) return PCRE2_ERROR_NULL;
|
||||||
replacement = (PCRE2_SPTR)"";
|
replacement = (PCRE2_SPTR)"";
|
||||||
}
|
}
|
||||||
|
|
||||||
if (rlength == PCRE2_ZERO_TERMINATED) rlength = PRIV(strlen)(replacement);
|
if (rlength == PCRE2_ZERO_TERMINATED) rlength = PRIV(strlen)(replacement);
|
||||||
repend = replacement + rlength;
|
repend = replacement + rlength;
|
||||||
|
|
||||||
|
@ -282,8 +282,9 @@ replacement_only = ((options & PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) != 0);
|
||||||
match data block. We create an internal match_data block in two cases: (a) an
|
match data block. We create an internal match_data block in two cases: (a) an
|
||||||
external one is not supplied (and we are not starting from an existing match);
|
external one is not supplied (and we are not starting from an existing match);
|
||||||
(b) an existing match is to be used for the first substitution. In the latter
|
(b) an existing match is to be used for the first substitution. In the latter
|
||||||
case, we copy the existing match into the internal block. This ensures that no
|
case, we copy the existing match into the internal block, except for any cached
|
||||||
changes are made to the existing match data block. */
|
heap frame size and pointer. This ensures that no changes are made to the
|
||||||
|
external match data block. */
|
||||||
|
|
||||||
if (match_data == NULL)
|
if (match_data == NULL)
|
||||||
{
|
{
|
||||||
|
@ -309,6 +310,8 @@ else if (use_existing_match)
|
||||||
if (internal_match_data == NULL) return PCRE2_ERROR_NOMEMORY;
|
if (internal_match_data == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||||
memcpy(internal_match_data, match_data, offsetof(pcre2_match_data, ovector)
|
memcpy(internal_match_data, match_data, offsetof(pcre2_match_data, ovector)
|
||||||
+ 2*pairs*sizeof(PCRE2_SIZE));
|
+ 2*pairs*sizeof(PCRE2_SIZE));
|
||||||
|
internal_match_data->heapframes = NULL;
|
||||||
|
internal_match_data->heapframes_size = 0;
|
||||||
match_data = internal_match_data;
|
match_data = internal_match_data;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -328,9 +331,9 @@ scb.ovector = ovector;
|
||||||
|
|
||||||
if (subject == NULL)
|
if (subject == NULL)
|
||||||
{
|
{
|
||||||
if (length != 0) return PCRE2_ERROR_NULL;
|
if (length != 0) return PCRE2_ERROR_NULL;
|
||||||
subject = (PCRE2_SPTR)"";
|
subject = (PCRE2_SPTR)"";
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Find length of zero-terminated subject */
|
/* Find length of zero-terminated subject */
|
||||||
|
|
||||||
|
|
|
@ -479,7 +479,7 @@ so many of them that they are split into two fields. */
|
||||||
#define CTL_DFA 0x00000200u
|
#define CTL_DFA 0x00000200u
|
||||||
#define CTL_EXPAND 0x00000400u
|
#define CTL_EXPAND 0x00000400u
|
||||||
#define CTL_FINDLIMITS 0x00000800u
|
#define CTL_FINDLIMITS 0x00000800u
|
||||||
#define CTL_FRAMESIZE 0x00001000u
|
#define CTL_FINDLIMITS_NOHEAP 0x00001000u
|
||||||
#define CTL_FULLBINCODE 0x00002000u
|
#define CTL_FULLBINCODE 0x00002000u
|
||||||
#define CTL_GETALL 0x00004000u
|
#define CTL_GETALL 0x00004000u
|
||||||
#define CTL_GLOBAL 0x00008000u
|
#define CTL_GLOBAL 0x00008000u
|
||||||
|
@ -522,6 +522,7 @@ so many of them that they are split into two fields. */
|
||||||
#define CTL2_ALLVECTOR 0x00000800u
|
#define CTL2_ALLVECTOR 0x00000800u
|
||||||
#define CTL2_NULL_SUBJECT 0x00001000u
|
#define CTL2_NULL_SUBJECT 0x00001000u
|
||||||
#define CTL2_NULL_REPLACEMENT 0x00002000u
|
#define CTL2_NULL_REPLACEMENT 0x00002000u
|
||||||
|
#define CTL2_FRAMESIZE 0x00004000u
|
||||||
|
|
||||||
#define CTL2_NL_SET 0x40000000u /* Informational */
|
#define CTL2_NL_SET 0x40000000u /* Informational */
|
||||||
#define CTL2_BSR_SET 0x80000000u /* Informational */
|
#define CTL2_BSR_SET 0x80000000u /* Informational */
|
||||||
|
@ -673,8 +674,9 @@ static modstruct modlist[] = {
|
||||||
{ "extended_more", MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE, PO(options) },
|
{ "extended_more", MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE, PO(options) },
|
||||||
{ "extra_alt_bsux", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALT_BSUX, CO(extra_options) },
|
{ "extra_alt_bsux", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALT_BSUX, CO(extra_options) },
|
||||||
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
|
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
|
||||||
|
{ "find_limits_noheap", MOD_DAT, MOD_CTL, CTL_FINDLIMITS_NOHEAP, DO(control) },
|
||||||
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
|
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
|
||||||
{ "framesize", MOD_PAT, MOD_CTL, CTL_FRAMESIZE, PO(control) },
|
{ "framesize", MOD_PAT, MOD_CTL, CTL2_FRAMESIZE, PO(control2) },
|
||||||
{ "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) },
|
{ "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) },
|
||||||
{ "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) },
|
{ "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) },
|
||||||
{ "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) },
|
{ "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) },
|
||||||
|
@ -781,10 +783,11 @@ static modstruct modlist[] = {
|
||||||
|
|
||||||
#define PUSH_SUPPORTED_COMPILE_CONTROLS ( \
|
#define PUSH_SUPPORTED_COMPILE_CONTROLS ( \
|
||||||
CTL_BINCODE|CTL_CALLOUT_INFO|CTL_FULLBINCODE|CTL_HEXPAT|CTL_INFO| \
|
CTL_BINCODE|CTL_CALLOUT_INFO|CTL_FULLBINCODE|CTL_HEXPAT|CTL_INFO| \
|
||||||
CTL_JITVERIFY|CTL_MEMORY|CTL_FRAMESIZE|CTL_PUSH|CTL_PUSHCOPY| \
|
CTL_JITVERIFY|CTL_MEMORY|CTL_PUSH|CTL_PUSHCOPY| \
|
||||||
CTL_PUSHTABLESCOPY|CTL_USE_LENGTH)
|
CTL_PUSHTABLESCOPY|CTL_USE_LENGTH)
|
||||||
|
|
||||||
#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL2_BSR_SET|CTL2_NL_SET)
|
#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL2_BSR_SET|CTL2_FRAMESIZE| \
|
||||||
|
CTL2_NL_SET)
|
||||||
|
|
||||||
/* Controls that apply only at compile time with 'push'. */
|
/* Controls that apply only at compile time with 'push'. */
|
||||||
|
|
||||||
|
@ -813,8 +816,9 @@ static uint32_t exclusive_pat_controls[] = {
|
||||||
first control word. */
|
first control word. */
|
||||||
|
|
||||||
static uint32_t exclusive_dat_controls[] = {
|
static uint32_t exclusive_dat_controls[] = {
|
||||||
CTL_ALLUSEDTEXT | CTL_STARTCHAR,
|
CTL_ALLUSEDTEXT | CTL_STARTCHAR,
|
||||||
CTL_FINDLIMITS | CTL_NULLCONTEXT };
|
CTL_FINDLIMITS | CTL_NULLCONTEXT,
|
||||||
|
CTL_FINDLIMITS_NOHEAP | CTL_NULLCONTEXT };
|
||||||
|
|
||||||
/* Table of single-character abbreviated modifiers. The index field is
|
/* Table of single-character abbreviated modifiers. The index field is
|
||||||
initialized to -1, but the first time the modifier is encountered, it is filled
|
initialized to -1, but the first time the modifier is encountered, it is filled
|
||||||
|
@ -4112,7 +4116,7 @@ Returns: nothing
|
||||||
static void
|
static void
|
||||||
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
show_controls(uint32_t controls, uint32_t controls2, const char *before)
|
||||||
{
|
{
|
||||||
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
|
||||||
before,
|
before,
|
||||||
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
|
||||||
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
|
||||||
|
@ -4130,7 +4134,8 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
|
||||||
((controls & CTL_DFA) != 0)? " dfa" : "",
|
((controls & CTL_DFA) != 0)? " dfa" : "",
|
||||||
((controls & CTL_EXPAND) != 0)? " expand" : "",
|
((controls & CTL_EXPAND) != 0)? " expand" : "",
|
||||||
((controls & CTL_FINDLIMITS) != 0)? " find_limits" : "",
|
((controls & CTL_FINDLIMITS) != 0)? " find_limits" : "",
|
||||||
((controls & CTL_FRAMESIZE) != 0)? " framesize" : "",
|
((controls & CTL_FINDLIMITS_NOHEAP) != 0)? " find_limits_noheap" : "",
|
||||||
|
((controls2 & CTL2_FRAMESIZE) != 0)? " framesize" : "",
|
||||||
((controls & CTL_FULLBINCODE) != 0)? " fullbincode" : "",
|
((controls & CTL_FULLBINCODE) != 0)? " fullbincode" : "",
|
||||||
((controls & CTL_GETALL) != 0)? " getall" : "",
|
((controls & CTL_GETALL) != 0)? " getall" : "",
|
||||||
((controls & CTL_GLOBAL) != 0)? " global" : "",
|
((controls & CTL_GLOBAL) != 0)? " global" : "",
|
||||||
|
@ -4308,13 +4313,13 @@ if (test_mode == PCRE32_MODE) cblock_size = sizeof(pcre2_real_code_32);
|
||||||
(void)pattern_info(PCRE2_INFO_NAMECOUNT, &name_count, FALSE);
|
(void)pattern_info(PCRE2_INFO_NAMECOUNT, &name_count, FALSE);
|
||||||
(void)pattern_info(PCRE2_INFO_NAMEENTRYSIZE, &name_entry_size, FALSE);
|
(void)pattern_info(PCRE2_INFO_NAMEENTRYSIZE, &name_entry_size, FALSE);
|
||||||
|
|
||||||
/* The uint32_t variables are cast before multiplying to stop code analyzers
|
/* The uint32_t variables are cast before multiplying to stop code analyzers
|
||||||
grumbling about potential overflow. */
|
grumbling about potential overflow. */
|
||||||
|
|
||||||
fprintf(outfile, "Memory allocation (code space): %" SIZ_FORM "\n", size -
|
fprintf(outfile, "Memory allocation (code space): %" SIZ_FORM "\n", size -
|
||||||
(size_t)name_count * (size_t)name_entry_size * (size_t)code_unit_size -
|
(size_t)name_count * (size_t)name_entry_size * (size_t)code_unit_size -
|
||||||
cblock_size);
|
cblock_size);
|
||||||
|
|
||||||
if (pat_patctl.jit != 0)
|
if (pat_patctl.jit != 0)
|
||||||
{
|
{
|
||||||
(void)pattern_info(PCRE2_INFO_JITSIZE, &size, FALSE);
|
(void)pattern_info(PCRE2_INFO_JITSIZE, &size, FALSE);
|
||||||
|
@ -4986,7 +4991,7 @@ switch(cmd)
|
||||||
PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);
|
PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);
|
||||||
}
|
}
|
||||||
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
|
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
|
||||||
if ((pat_patctl.control & CTL_FRAMESIZE) != 0) show_framesize();
|
if ((pat_patctl.control2 & CTL2_FRAMESIZE) != 0) show_framesize();
|
||||||
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
||||||
{
|
{
|
||||||
rc = show_pattern_info();
|
rc = show_pattern_info();
|
||||||
|
@ -5948,7 +5953,7 @@ if ((pat_patctl.control2 & CTL2_NL_SET) != 0)
|
||||||
/* Output code size and other information if requested. */
|
/* Output code size and other information if requested. */
|
||||||
|
|
||||||
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
|
if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
|
||||||
if ((pat_patctl.control & CTL_FRAMESIZE) != 0) show_framesize();
|
if ((pat_patctl.control2 & CTL2_FRAMESIZE) != 0) show_framesize();
|
||||||
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
if ((pat_patctl.control & CTL_ANYINFO) != 0)
|
||||||
{
|
{
|
||||||
int rc = show_pattern_info();
|
int rc = show_pattern_info();
|
||||||
|
@ -6027,10 +6032,46 @@ for (;;)
|
||||||
{
|
{
|
||||||
uint32_t stack_start = 0;
|
uint32_t stack_start = 0;
|
||||||
|
|
||||||
|
/* If we are checking the heap limit, free any frames vector that is cached
|
||||||
|
in the match_data so we always start without one. */
|
||||||
|
|
||||||
if (errnumber == PCRE2_ERROR_HEAPLIMIT)
|
if (errnumber == PCRE2_ERROR_HEAPLIMIT)
|
||||||
{
|
{
|
||||||
PCRE2_SET_HEAP_LIMIT(dat_context, mid);
|
PCRE2_SET_HEAP_LIMIT(dat_context, mid);
|
||||||
|
|
||||||
|
#ifdef SUPPORT_PCRE2_8
|
||||||
|
if (code_unit_size == 1)
|
||||||
|
{
|
||||||
|
match_data8->memctl.free(match_data8->heapframes,
|
||||||
|
match_data8->memctl.memory_data);
|
||||||
|
match_data8->heapframes = NULL;
|
||||||
|
match_data8->heapframes_size = 0;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef SUPPORT_PCRE2_16
|
||||||
|
if (code_unit_size == 2)
|
||||||
|
{
|
||||||
|
match_data16->memctl.free(match_data16->heapframes,
|
||||||
|
match_data16->memctl.memory_data);
|
||||||
|
match_data16->heapframes = NULL;
|
||||||
|
match_data16->heapframes_size = 0;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef SUPPORT_PCRE2_32
|
||||||
|
if (code_unit_size == 4)
|
||||||
|
{
|
||||||
|
match_data32->memctl.free(match_data32->heapframes,
|
||||||
|
match_data32->memctl.memory_data);
|
||||||
|
match_data32->heapframes = NULL;
|
||||||
|
match_data32->heapframes_size = 0;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* No need to mess with the frames vector for match or depth limits. */
|
||||||
|
|
||||||
else if (errnumber == PCRE2_ERROR_MATCHLIMIT)
|
else if (errnumber == PCRE2_ERROR_MATCHLIMIT)
|
||||||
{
|
{
|
||||||
PCRE2_SET_MATCH_LIMIT(dat_context, mid);
|
PCRE2_SET_MATCH_LIMIT(dat_context, mid);
|
||||||
|
@ -6040,6 +6081,8 @@ for (;;)
|
||||||
PCRE2_SET_DEPTH_LIMIT(dat_context, mid);
|
PCRE2_SET_DEPTH_LIMIT(dat_context, mid);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Do the appropriate match */
|
||||||
|
|
||||||
if ((dat_datctl.control & CTL_DFA) != 0)
|
if ((dat_datctl.control & CTL_DFA) != 0)
|
||||||
{
|
{
|
||||||
stack_start = DFA_START_RWS_SIZE/1024;
|
stack_start = DFA_START_RWS_SIZE/1024;
|
||||||
|
@ -6058,7 +6101,6 @@ for (;;)
|
||||||
|
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
stack_start = START_FRAMES_SIZE/1024;
|
|
||||||
PCRE2_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset,
|
PCRE2_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset,
|
||||||
dat_datctl.options, match_data, PTR(dat_context));
|
dat_datctl.options, match_data, PTR(dat_context));
|
||||||
}
|
}
|
||||||
|
@ -7584,12 +7626,13 @@ for (gmatched = 0;; gmatched++)
|
||||||
limits are not relevant for JIT. The return from check_match_limit() is the
|
limits are not relevant for JIT. The return from check_match_limit() is the
|
||||||
return from the final call to pcre2_match() or pcre2_dfa_match(). */
|
return from the final call to pcre2_match() or pcre2_dfa_match(). */
|
||||||
|
|
||||||
if ((dat_datctl.control & CTL_FINDLIMITS) != 0)
|
if ((dat_datctl.control & (CTL_FINDLIMITS|CTL_FINDLIMITS_NOHEAP)) != 0)
|
||||||
{
|
{
|
||||||
capcount = 0; /* This stops compiler warnings */
|
capcount = 0; /* This stops compiler warnings */
|
||||||
|
|
||||||
if (FLD(compiled_code, executable_jit) == NULL ||
|
if ((dat_datctl.control & CTL_FINDLIMITS_NOHEAP) == 0 &&
|
||||||
(dat_datctl.options & PCRE2_NO_JIT) != 0)
|
(FLD(compiled_code, executable_jit) == NULL ||
|
||||||
|
(dat_datctl.options & PCRE2_NO_JIT) != 0))
|
||||||
{
|
{
|
||||||
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
|
(void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap");
|
||||||
}
|
}
|
||||||
|
|
|
@ -6,38 +6,44 @@
|
||||||
|
|
||||||
# (2) Other tests that must not be run with JIT.
|
# (2) Other tests that must not be run with JIT.
|
||||||
|
|
||||||
|
# This test is first so that it doesn't inherit a large enough heap frame
|
||||||
|
# vector from a previous test.
|
||||||
|
|
||||||
|
/(*LIMIT_HEAP=21)\[(a)]{60}/expand
|
||||||
|
\[a]{60}
|
||||||
|
|
||||||
/(a+)*zz/I
|
/(a+)*zz/I
|
||||||
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits
|
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits_noheap
|
||||||
aaaaaaaaaaaaaz\=find_limits
|
aaaaaaaaaaaaaz\=find_limits_noheap
|
||||||
|
|
||||||
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
|
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
|
||||||
/* this is a C style comment */\=find_limits
|
/* this is a C style comment */\=find_limits_noheap
|
||||||
|
|
||||||
/^(?>a)++/
|
/^(?>a)++/
|
||||||
aa\=find_limits
|
aa\=find_limits_noheap
|
||||||
aaaaaaaaa\=find_limits
|
aaaaaaaaa\=find_limits_noheap
|
||||||
|
|
||||||
/(a)(?1)++/
|
/(a)(?1)++/
|
||||||
aa\=find_limits
|
aa\=find_limits_noheap
|
||||||
aaaaaaaaa\=find_limits
|
aaaaaaaaa\=find_limits_noheap
|
||||||
|
|
||||||
/a(?:.)*?a/ims
|
/a(?:.)*?a/ims
|
||||||
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
|
||||||
|
|
||||||
/a(?:.(*THEN))*?a/ims
|
/a(?:.(*THEN))*?a/ims
|
||||||
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
|
||||||
|
|
||||||
/a(?:.(*THEN:ABC))*?a/ims
|
/a(?:.(*THEN:ABC))*?a/ims
|
||||||
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
|
||||||
|
|
||||||
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
|
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
|
||||||
aabbccddee\=find_limits
|
aabbccddee\=find_limits_noheap
|
||||||
|
|
||||||
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
|
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
|
||||||
aabbccddee\=find_limits
|
aabbccddee\=find_limits_noheap
|
||||||
|
|
||||||
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
|
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
|
||||||
aabbccddee\=find_limits
|
aabbccddee\=find_limits_noheap
|
||||||
|
|
||||||
/(*LIMIT_MATCH=12bc)abc/
|
/(*LIMIT_MATCH=12bc)abc/
|
||||||
|
|
||||||
|
@ -228,9 +234,6 @@
|
||||||
/(|]+){2,2452}/
|
/(|]+){2,2452}/
|
||||||
(|]+){2,2452}
|
(|]+){2,2452}
|
||||||
|
|
||||||
/(*LIMIT_HEAP=21)\[(a)]{60}/expand
|
|
||||||
\[a]{60}
|
|
||||||
|
|
||||||
/b(?<!ax)(?!cx)/allusedtext
|
/b(?<!ax)(?!cx)/allusedtext
|
||||||
abc
|
abc
|
||||||
abcz
|
abcz
|
||||||
|
|
|
@ -6,19 +6,24 @@
|
||||||
|
|
||||||
# (2) Other tests that must not be run with JIT.
|
# (2) Other tests that must not be run with JIT.
|
||||||
|
|
||||||
|
# This test is first so that it doesn't inherit a large enough heap frame
|
||||||
|
# vector from a previous test.
|
||||||
|
|
||||||
|
/(*LIMIT_HEAP=21)\[(a)]{60}/expand
|
||||||
|
\[a]{60}
|
||||||
|
Failed: error -63: heap limit exceeded
|
||||||
|
|
||||||
/(a+)*zz/I
|
/(a+)*zz/I
|
||||||
Capture group count = 1
|
Capture group count = 1
|
||||||
Starting code units: a z
|
Starting code units: a z
|
||||||
Last code unit = 'z'
|
Last code unit = 'z'
|
||||||
Subject length lower bound = 2
|
Subject length lower bound = 2
|
||||||
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits
|
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 7
|
Minimum match limit = 7
|
||||||
Minimum depth limit = 7
|
Minimum depth limit = 7
|
||||||
0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazz
|
0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazz
|
||||||
1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
|
1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
|
||||||
aaaaaaaaaaaaaz\=find_limits
|
aaaaaaaaaaaaaz\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 20481
|
Minimum match limit = 20481
|
||||||
Minimum depth limit = 30
|
Minimum depth limit = 30
|
||||||
No match
|
No match
|
||||||
|
@ -27,70 +32,60 @@ No match
|
||||||
Capture group count = 1
|
Capture group count = 1
|
||||||
May match empty string
|
May match empty string
|
||||||
Subject length lower bound = 0
|
Subject length lower bound = 0
|
||||||
/* this is a C style comment */\=find_limits
|
/* this is a C style comment */\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 64
|
Minimum match limit = 64
|
||||||
Minimum depth limit = 7
|
Minimum depth limit = 7
|
||||||
0: /* this is a C style comment */
|
0: /* this is a C style comment */
|
||||||
1: /* this is a C style comment */
|
1: /* this is a C style comment */
|
||||||
|
|
||||||
/^(?>a)++/
|
/^(?>a)++/
|
||||||
aa\=find_limits
|
aa\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 5
|
Minimum match limit = 5
|
||||||
Minimum depth limit = 3
|
Minimum depth limit = 3
|
||||||
0: aa
|
0: aa
|
||||||
aaaaaaaaa\=find_limits
|
aaaaaaaaa\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 12
|
Minimum match limit = 12
|
||||||
Minimum depth limit = 3
|
Minimum depth limit = 3
|
||||||
0: aaaaaaaaa
|
0: aaaaaaaaa
|
||||||
|
|
||||||
/(a)(?1)++/
|
/(a)(?1)++/
|
||||||
aa\=find_limits
|
aa\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 7
|
Minimum match limit = 7
|
||||||
Minimum depth limit = 5
|
Minimum depth limit = 5
|
||||||
0: aa
|
0: aa
|
||||||
1: a
|
1: a
|
||||||
aaaaaaaaa\=find_limits
|
aaaaaaaaa\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 21
|
Minimum match limit = 21
|
||||||
Minimum depth limit = 5
|
Minimum depth limit = 5
|
||||||
0: aaaaaaaaa
|
0: aaaaaaaaa
|
||||||
1: a
|
1: a
|
||||||
|
|
||||||
/a(?:.)*?a/ims
|
/a(?:.)*?a/ims
|
||||||
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 24
|
Minimum match limit = 24
|
||||||
Minimum depth limit = 3
|
Minimum depth limit = 3
|
||||||
0: abbbbbbbbbbbbbbbbbbbbba
|
0: abbbbbbbbbbbbbbbbbbbbba
|
||||||
|
|
||||||
/a(?:.(*THEN))*?a/ims
|
/a(?:.(*THEN))*?a/ims
|
||||||
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 66
|
Minimum match limit = 66
|
||||||
Minimum depth limit = 45
|
Minimum depth limit = 45
|
||||||
0: abbbbbbbbbbbbbbbbbbbbba
|
0: abbbbbbbbbbbbbbbbbbbbba
|
||||||
|
|
||||||
/a(?:.(*THEN:ABC))*?a/ims
|
/a(?:.(*THEN:ABC))*?a/ims
|
||||||
abbbbbbbbbbbbbbbbbbbbba\=find_limits
|
abbbbbbbbbbbbbbbbbbbbba\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 66
|
Minimum match limit = 66
|
||||||
Minimum depth limit = 45
|
Minimum depth limit = 45
|
||||||
0: abbbbbbbbbbbbbbbbbbbbba
|
0: abbbbbbbbbbbbbbbbbbbbba
|
||||||
|
|
||||||
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
|
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
|
||||||
aabbccddee\=find_limits
|
aabbccddee\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 7
|
Minimum match limit = 7
|
||||||
Minimum depth limit = 7
|
Minimum depth limit = 7
|
||||||
0: aabbccddee
|
0: aabbccddee
|
||||||
|
|
||||||
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
|
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
|
||||||
aabbccddee\=find_limits
|
aabbccddee\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 12
|
Minimum match limit = 12
|
||||||
Minimum depth limit = 12
|
Minimum depth limit = 12
|
||||||
0: aabbccddee
|
0: aabbccddee
|
||||||
|
@ -101,8 +96,7 @@ Minimum depth limit = 12
|
||||||
5: ee
|
5: ee
|
||||||
|
|
||||||
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
|
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
|
||||||
aabbccddee\=find_limits
|
aabbccddee\=find_limits_noheap
|
||||||
Minimum heap limit = 0
|
|
||||||
Minimum match limit = 10
|
Minimum match limit = 10
|
||||||
Minimum depth limit = 10
|
Minimum depth limit = 10
|
||||||
0: aabbccddee
|
0: aabbccddee
|
||||||
|
@ -521,10 +515,6 @@ No match
|
||||||
0:
|
0:
|
||||||
1:
|
1:
|
||||||
|
|
||||||
/(*LIMIT_HEAP=21)\[(a)]{60}/expand
|
|
||||||
\[a]{60}
|
|
||||||
Failed: error -63: heap limit exceeded
|
|
||||||
|
|
||||||
/b(?<!ax)(?!cx)/allusedtext
|
/b(?<!ax)(?!cx)/allusedtext
|
||||||
abc
|
abc
|
||||||
0: abc
|
0: abc
|
||||||
|
|
Loading…
Reference in New Issue