More typos and changes to "Kibibytes" for "Kilobytes".
This commit is contained in:
parent
fabea723cf
commit
e75410a5d8
10
ChangeLog
10
ChangeLog
|
@ -370,8 +370,8 @@ tests to improve coverage.
|
||||||
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
|
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
|
||||||
pcre2test, a crash could occur.
|
pcre2test, a crash could occur.
|
||||||
|
|
||||||
32. Make -bigstack in RunTest allocate a 64MB stack (instead of 16 MB) so that
|
32. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16 MiB) so
|
||||||
all the tests can run with clang's sanitizing options.
|
that all the tests can run with clang's sanitizing options.
|
||||||
|
|
||||||
33. Implement extra compile options in the compile context and add the first
|
33. Implement extra compile options in the compile context and add the first
|
||||||
one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
||||||
|
@ -964,9 +964,9 @@ to the same code as '.' when PCRE2_DOTALL is set).
|
||||||
40. Fix two clang compiler warnings in pcre2test when only one code unit width
|
40. Fix two clang compiler warnings in pcre2test when only one code unit width
|
||||||
is supported.
|
is supported.
|
||||||
|
|
||||||
41. Upgrade RunTest to automatically re-run test 2 with a large (64M) stack if
|
41. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack
|
||||||
it fails when running the interpreter with a 16M stack (and if changing the
|
if it fails when running the interpreter with a 16MiB stack (and if changing
|
||||||
stack size via pcre2test is possible). This avoids having to manually set a
|
the stack size via pcre2test is possible). This avoids having to manually set a
|
||||||
large stack size when testing with clang.
|
large stack size when testing with clang.
|
||||||
|
|
||||||
42. Fix register overwite in JIT when SSE2 acceleration is enabled.
|
42. Fix register overwite in JIT when SSE2 acceleration is enabled.
|
||||||
|
|
2
HACKING
2
HACKING
|
@ -370,7 +370,7 @@ default value for LINK_SIZE is 2, except for the 32-bit library, where it can
|
||||||
only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
|
only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
|
||||||
and the 16-bit library can be compiled to use 4-byte values, though this
|
and the 16-bit library can be compiled to use 4-byte values, though this
|
||||||
impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
|
impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
|
||||||
necessary only when patterns whose compiled length is greater than 64K code
|
necessary only when patterns whose compiled length is greater than 65535 code
|
||||||
units are going to be processed. When a LINK_SIZE value uses more than one code
|
units are going to be processed. When a LINK_SIZE value uses more than one code
|
||||||
unit, the most significant unit is first.
|
unit, the most significant unit is first.
|
||||||
|
|
||||||
|
|
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
||||||
|
|
||||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||||
|
|
||||||
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
Prior to release 10.30 the default system stack size of 1MiB in some Windows
|
||||||
environments caused issues with some tests. This should no longer be the case
|
environments caused issues with some tests. This should no longer be the case
|
||||||
for 10.30 and later releases.
|
for 10.30 and later releases.
|
||||||
|
|
||||||
|
|
2
README
2
README
|
@ -263,7 +263,7 @@ library. They are also documented in the pcre2build man page.
|
||||||
pcre2_set_heap_limit).
|
pcre2_set_heap_limit).
|
||||||
|
|
||||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
64 kibibytes. You can increase this by adding --with-link-size=3 to the
|
||||||
"configure" command. PCRE2 then uses three bytes instead of two for offsets
|
"configure" command. PCRE2 then uses three bytes instead of two for offsets
|
||||||
to different parts of the compiled pattern. In the 16-bit library,
|
to different parts of the compiled pattern. In the 16-bit library,
|
||||||
--with-link-size=3 is the same as --with-link-size=4, which (in both
|
--with-link-size=3 is the same as --with-link-size=4, which (in both
|
||||||
|
|
|
@ -706,8 +706,8 @@ fi
|
||||||
AC_DEFINE_UNQUOTED([LINK_SIZE], [$with_link_size], [
|
AC_DEFINE_UNQUOTED([LINK_SIZE], [$with_link_size], [
|
||||||
The value of LINK_SIZE determines the number of bytes used to store
|
The value of LINK_SIZE determines the number of bytes used to store
|
||||||
links as offsets within the compiled regex. The default is 2, which
|
links as offsets within the compiled regex. The default is 2, which
|
||||||
allows for compiled patterns up to 64K long. This covers the vast
|
allows for compiled patterns up to 65535 code units long. This covers the
|
||||||
majority of cases. However, PCRE2 can also be compiled to use 3 or 4
|
vast majority of cases. However, PCRE2 can also be compiled to use 3 or 4
|
||||||
bytes instead. This allows for longer patterns in extreme cases.])
|
bytes instead. This allows for longer patterns in extreme cases.])
|
||||||
|
|
||||||
AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
|
AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
|
||||||
|
|
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
||||||
|
|
||||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||||
|
|
||||||
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
Prior to release 10.30 the default system stack size of 1MiB in some Windows
|
||||||
environments caused issues with some tests. This should no longer be the case
|
environments caused issues with some tests. This should no longer be the case
|
||||||
for 10.30 and later releases.
|
for 10.30 and later releases.
|
||||||
|
|
||||||
|
|
|
@ -263,7 +263,7 @@ library. They are also documented in the pcre2build man page.
|
||||||
pcre2_set_heap_limit).
|
pcre2_set_heap_limit).
|
||||||
|
|
||||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
64 kibibytes. You can increase this by adding --with-link-size=3 to the
|
||||||
"configure" command. PCRE2 then uses three bytes instead of two for offsets
|
"configure" command. PCRE2 then uses three bytes instead of two for offsets
|
||||||
to different parts of the compiled pattern. In the 16-bit library,
|
to different parts of the compiled pattern. In the 16-bit library,
|
||||||
--with-link-size=3 is the same as --with-link-size=4, which (in both
|
--with-link-size=3 is the same as --with-link-size=4, which (in both
|
||||||
|
|
|
@ -38,7 +38,7 @@ passed to a matching function. The arguments of this function are:
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If <i>callback</i> is NULL and <i>callback_data</i> is NULL, an internal 32K
|
If <i>callback</i> is NULL and <i>callback_data</i> is NULL, an internal 32KiB
|
||||||
block on the machine stack is used.
|
block on the machine stack is used.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -49,8 +49,9 @@ If <i>callback</i> is NULL and <i>callback_data</i> is not NULL,
|
||||||
<P>
|
<P>
|
||||||
If <i>callback</i> not NULL, it is called with <i>callback_data</i> as an
|
If <i>callback</i> not NULL, it is called with <i>callback_data</i> as an
|
||||||
argument at the start of matching, in order to set up a JIT stack. If the
|
argument at the start of matching, in order to set up a JIT stack. If the
|
||||||
result is NULL, the internal 32K stack is used; otherwise the return value must
|
result is NULL, the internal 32KiB stack is used; otherwise the return value
|
||||||
be a valid JIT stack, the result of calling <b>pcre2_jit_stack_create()</b>.
|
must be a valid JIT stack, the result of calling
|
||||||
|
<b>pcre2_jit_stack_create()</b>.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
You may safely use the same JIT stack for multiple patterns, as long as they
|
You may safely use the same JIT stack for multiple patterns, as long as they
|
||||||
|
|
|
@ -33,8 +33,8 @@ context, for memory allocation functions, or NULL for standard memory
|
||||||
allocation. The result can be passed to the JIT run-time code by calling
|
allocation. The result can be passed to the JIT run-time code by calling
|
||||||
<b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern,
|
<b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern,
|
||||||
which can then be processed by <b>pcre2_match()</b> or <b>pcre2_jit_match()</b>.
|
which can then be processed by <b>pcre2_match()</b> or <b>pcre2_jit_match()</b>.
|
||||||
A maximum stack size of 512K to 1M should be more than enough for any pattern.
|
A maximum stack size of 512KiB to 1MiB should be more than enough for any
|
||||||
For more details, see the
|
pattern. For more details, see the
|
||||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||||
page.
|
page.
|
||||||
</P>
|
</P>
|
||||||
|
|
|
@ -973,7 +973,7 @@ less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
|
||||||
limit is set, less than the default.
|
limit is set, less than the default.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
The <b>pcre2_match()</b> function starts out using a 20KiB vector on the system
|
||||||
stack for recording backtracking points. The more nested backtracking points
|
stack for recording backtracking points. The more nested backtracking points
|
||||||
there are (that is, the deeper the search tree), the more memory is needed.
|
there are (that is, the deeper the search tree), the more memory is needed.
|
||||||
Heap memory is used only if the initial vector is too small. If the heap limit
|
Heap memory is used only if the initial vector is too small. If the heap limit
|
||||||
|
@ -1155,7 +1155,7 @@ relevant.
|
||||||
<P>
|
<P>
|
||||||
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
|
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
|
||||||
but the most massive patterns, since it allows the size of the compiled pattern
|
but the most massive patterns, since it allows the size of the compiled pattern
|
||||||
to be up to 64K code units. Larger values allow larger regular expressions to
|
to be up to 65535 code units. Larger values allow larger regular expressions to
|
||||||
be compiled by those two libraries, but at the expense of slower matching.
|
be compiled by those two libraries, but at the expense of slower matching.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_MATCHLIMIT
|
PCRE2_CONFIG_MATCHLIMIT
|
||||||
|
|
|
@ -252,10 +252,10 @@ Within a compiled pattern, offset values are used to point from one part to
|
||||||
another (for example, from an opening parenthesis to an alternation
|
another (for example, from an opening parenthesis to an alternation
|
||||||
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
|
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
|
||||||
are used for these offsets, leading to a maximum size for a compiled pattern of
|
are used for these offsets, leading to a maximum size for a compiled pattern of
|
||||||
around 64K code units. This is sufficient to handle all but the most gigantic
|
around 64 thousand code units. This is sufficient to handle all but the most
|
||||||
patterns. Nevertheless, some people do want to process truly enormous patterns,
|
gigantic patterns. Nevertheless, some people do want to process truly enormous
|
||||||
so it is possible to compile PCRE2 to use three-byte or four-byte offsets by
|
patterns, so it is possible to compile PCRE2 to use three-byte or four-byte
|
||||||
adding a setting such as
|
offsets by adding a setting such as
|
||||||
<pre>
|
<pre>
|
||||||
--with-link-size=3
|
--with-link-size=3
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -282,7 +282,7 @@ to the <b>configure</b> command. This setting also applies to the
|
||||||
counting is done differently).
|
counting is done differently).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
The <b>pcre2_match()</b> function starts out using a 20KiB vector on the system
|
||||||
stack to record backtracking points. The more nested backtracking points there
|
stack to record backtracking points. The more nested backtracking points there
|
||||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||||
|
@ -399,13 +399,13 @@ they are not.
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
|
<b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
|
||||||
scanning, in order to be able to output "before" and "after" lines when it
|
scanning, in order to be able to output "before" and "after" lines when it
|
||||||
finds a match. The starting size of the buffer is controlled by a parameter
|
finds a match. The default starting size of the buffer is 20KiB. The buffer
|
||||||
whose default value is 20K. The buffer itself is three times this size, but
|
itself is three times this size, but because of the way it is used for holding
|
||||||
because of the way it is used for holding "before" lines, the longest line that
|
"before" lines, the longest line that is guaranteed to be processable is the
|
||||||
is guaranteed to be processable is the parameter size. If a longer line is
|
notional buffer size. If a longer line is encountered, <b>pcre2grep</b>
|
||||||
encountered, <b>pcre2grep</b> automatically expands the buffer, up to a
|
automatically expands the buffer, up to a specified maximum size, whose default
|
||||||
specified maximum size, whose default is 1M or the starting size, whichever is
|
is 1MiB or the starting size, whichever is the larger. You can change the
|
||||||
the larger. You can change the default parameter values by adding, for example,
|
default parameter values by adding, for example,
|
||||||
<pre>
|
<pre>
|
||||||
--with-pcre2grep-bufsize=51200
|
--with-pcre2grep-bufsize=51200
|
||||||
--with-pcre2grep-max-bufsize=2097152
|
--with-pcre2grep-max-bufsize=2097152
|
||||||
|
|
|
@ -87,7 +87,7 @@ that is obtained at the start of processing. If an input file contains very
|
||||||
long lines, a larger buffer may be needed; this is handled by automatically
|
long lines, a larger buffer may be needed; this is handled by automatically
|
||||||
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
||||||
default values for these parameters can be set when <b>pcre2grep</b> is
|
default values for these parameters can be set when <b>pcre2grep</b> is
|
||||||
built; if nothing is specified, the defaults are set to 20K and 1M
|
built; if nothing is specified, the defaults are set to 20KiB and 1MiB
|
||||||
respectively. An error occurs if a line is too long and the buffer can no
|
respectively. An error occurs if a line is too long and the buffer can no
|
||||||
longer be expanded.
|
longer be expanded.
|
||||||
</P>
|
</P>
|
||||||
|
@ -97,7 +97,7 @@ allow for buffering "before" and "after" lines. If the buffer size is too
|
||||||
small, fewer than requested "before" and "after" lines may be output.
|
small, fewer than requested "before" and "after" lines may be output.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater.
|
||||||
BUFSIZ is defined in <b><stdio.h></b>. When there is more than one pattern
|
BUFSIZ is defined in <b><stdio.h></b>. When there is more than one pattern
|
||||||
(specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
|
(specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
|
||||||
each line in the order in which they are defined, except that all the <b>-e</b>
|
each line in the order in which they are defined, except that all the <b>-e</b>
|
||||||
|
|
|
@ -179,7 +179,7 @@ when JIT matching is used.
|
||||||
<br><a name="SEC6" href="#TOC1">CONTROLLING THE JIT STACK</a><br>
|
<br><a name="SEC6" href="#TOC1">CONTROLLING THE JIT STACK</a><br>
|
||||||
<P>
|
<P>
|
||||||
When the compiled JIT code runs, it needs a block of memory to use as a stack.
|
When the compiled JIT code runs, it needs a block of memory to use as a stack.
|
||||||
By default, it uses 32K on the machine stack. However, some large or
|
By default, it uses 32KiB on the machine stack. However, some large or
|
||||||
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
|
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
|
||||||
is given when there is not enough stack. Three functions are provided for
|
is given when there is not enough stack. Three functions are provided for
|
||||||
managing blocks of memory for use as JIT stacks. There is further discussion
|
managing blocks of memory for use as JIT stacks. There is further discussion
|
||||||
|
@ -194,8 +194,8 @@ allocation functions, or NULL for standard memory allocation). It returns a
|
||||||
pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
|
pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
|
||||||
is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
|
is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
|
||||||
that is no longer needed. (For the technically minded: the address space is
|
that is no longer needed. (For the technically minded: the address space is
|
||||||
allocated by mmap or VirtualAlloc.) A maximum stack size of 512K to 1M should
|
allocated by mmap or VirtualAlloc.) A maximum stack size of 512KiB to 1MiB
|
||||||
be more than enough for any pattern.
|
should be more than enough for any pattern.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code
|
The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code
|
||||||
|
@ -209,7 +209,7 @@ The first argument is a pointer to a match context. When this is subsequently
|
||||||
passed to a matching function, its information determines which JIT stack is
|
passed to a matching function, its information determines which JIT stack is
|
||||||
used. There are three cases for the values of the other two options:
|
used. There are three cases for the values of the other two options:
|
||||||
<pre>
|
<pre>
|
||||||
(1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32K block
|
(1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32KiB block
|
||||||
on the machine stack is used. This is the default when a match
|
on the machine stack is used. This is the default when a match
|
||||||
context is created.
|
context is created.
|
||||||
|
|
||||||
|
@ -220,7 +220,7 @@ used. There are three cases for the values of the other two options:
|
||||||
(3) If <i>callback</i> is not NULL, it must point to a function that is
|
(3) If <i>callback</i> is not NULL, it must point to a function that is
|
||||||
called with <i>data</i> as an argument at the start of matching, in
|
called with <i>data</i> as an argument at the start of matching, in
|
||||||
order to set up a JIT stack. If the return from the callback
|
order to set up a JIT stack. If the return from the callback
|
||||||
function is NULL, the internal 32K stack is used; otherwise the
|
function is NULL, the internal 32KiB stack is used; otherwise the
|
||||||
return value must be a valid JIT stack, the result of calling
|
return value must be a valid JIT stack, the result of calling
|
||||||
<b>pcre2_jit_stack_create()</b>.
|
<b>pcre2_jit_stack_create()</b>.
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -286,9 +286,9 @@ we do the recursion in memory.
|
||||||
Modern operating systems have a nice feature: they can reserve an address space
|
Modern operating systems have a nice feature: they can reserve an address space
|
||||||
instead of allocating memory. We can safely allocate memory pages inside this
|
instead of allocating memory. We can safely allocate memory pages inside this
|
||||||
address space, so the stack could grow without moving memory data (this is
|
address space, so the stack could grow without moving memory data (this is
|
||||||
important because of pointers). Thus we can allocate 1M address space, and use
|
important because of pointers). Thus we can allocate 1MiB address space, and
|
||||||
only a single memory page (usually 4K) if that is enough. However, we can still
|
use only a single memory page (usually 4KiB) if that is enough. However, we can
|
||||||
grow up to 1M anytime if needed.
|
still grow up to 1MiB anytime if needed.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
(3) Who "owns" a JIT stack?
|
(3) Who "owns" a JIT stack?
|
||||||
|
@ -328,7 +328,7 @@ list of patterns.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
(6) OK, the stack is for long term memory allocation. But what happens if a
|
(6) OK, the stack is for long term memory allocation. But what happens if a
|
||||||
pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
|
pattern causes stack overflow with a stack of 1MiB? Is that 1MiB kept until the
|
||||||
stack is freed?
|
stack is freed?
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -20,12 +20,12 @@ There are some size limitations in PCRE2 but it is hoped that they will never
|
||||||
in practice be relevant.
|
in practice be relevant.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The maximum size of a compiled pattern is approximately 64K code units for the
|
The maximum size of a compiled pattern is approximately 64 thousand code units
|
||||||
8-bit and 16-bit libraries if PCRE2 is compiled with the default internal
|
for the 8-bit and 16-bit libraries if PCRE2 is compiled with the default
|
||||||
linkage size, which is 2 bytes for these libraries. If you want to process
|
internal linkage size, which is 2 bytes for these libraries. If you want to
|
||||||
regular expressions that are truly enormous, you can compile PCRE2 with an
|
process regular expressions that are truly enormous, you can compile PCRE2 with
|
||||||
internal linkage size of 3 or 4 (when building the 16-bit library, 3 is rounded
|
an internal linkage size of 3 or 4 (when building the 16-bit library, 3 is
|
||||||
up to 4). See the <b>README</b> file in the source distribution and the
|
rounded up to 4). See the <b>README</b> file in the source distribution and the
|
||||||
<a href="pcre2build.html"><b>pcre2build</b></a>
|
<a href="pcre2build.html"><b>pcre2build</b></a>
|
||||||
documentation for details. In these cases the limit is substantially larger.
|
documentation for details. In these cases the limit is substantially larger.
|
||||||
However, the speed of execution is slower. In the 32-bit library, the internal
|
However, the speed of execution is slower. In the 32-bit library, the internal
|
||||||
|
|
|
@ -549,7 +549,7 @@ Absolute and relative backreferences
|
||||||
<P>
|
<P>
|
||||||
The sequence \g followed by a signed or unsigned number, optionally enclosed
|
The sequence \g followed by a signed or unsigned number, optionally enclosed
|
||||||
in braces, is an absolute or relative backreference. A named backreference
|
in braces, is an absolute or relative backreference. A named backreference
|
||||||
can be coded as \g{name}. backreferences are discussed
|
can be coded as \g{name}. Backreferences are discussed
|
||||||
<a href="#backreferences">later,</a>
|
<a href="#backreferences">later,</a>
|
||||||
following the discussion of
|
following the discussion of
|
||||||
<a href="#subpattern">parenthesized subpatterns.</a>
|
<a href="#subpattern">parenthesized subpatterns.</a>
|
||||||
|
@ -2247,7 +2247,7 @@ done using alternation, as in the example above, or by a quantifier with a
|
||||||
minimum of zero.
|
minimum of zero.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
backreferences of this type cause the group that they reference to be treated
|
Backreferences of this type cause the group that they reference to be treated
|
||||||
as an
|
as an
|
||||||
<a href="#atomicgroup">atomic group.</a>
|
<a href="#atomicgroup">atomic group.</a>
|
||||||
Once the whole group has been matched, a subsequent matching failure cannot
|
Once the whole group has been matched, a subsequent matching failure cannot
|
||||||
|
|
|
@ -52,9 +52,9 @@ example, the very simple pattern
|
||||||
<pre>
|
<pre>
|
||||||
((ab){1,1000}c){1,3}
|
((ab){1,1000}c){1,3}
|
||||||
</pre>
|
</pre>
|
||||||
uses over 50K bytes when compiled using the 8-bit library. When PCRE2 is
|
uses over 50KiB when compiled using the 8-bit library. When PCRE2 is
|
||||||
compiled with its default internal pointer size of two bytes, the size limit on
|
compiled with its default internal pointer size of two bytes, the size limit on
|
||||||
a compiled pattern is 64K code units in the 8-bit and 16-bit libraries, and
|
a compiled pattern is 65535 code units in the 8-bit and 16-bit libraries, and
|
||||||
this is reached with the above pattern if the outer repetition is increased
|
this is reached with the above pattern if the outer repetition is increased
|
||||||
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
|
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
|
||||||
handle larger compiled patterns, but it is better to try to rewrite your
|
handle larger compiled patterns, but it is better to try to rewrite your
|
||||||
|
@ -68,14 +68,14 @@ facility. Re-writing the above pattern as
|
||||||
<pre>
|
<pre>
|
||||||
((ab)(?2){0,999}c)(?1){0,2}
|
((ab)(?2){0,999}c)(?1){0,2}
|
||||||
</pre>
|
</pre>
|
||||||
reduces the memory requirements to around 16K, and indeed it remains under 20K
|
reduces the memory requirements to around 16KiB, and indeed it remains under
|
||||||
even with the outer repetition increased to 100. However, this kind of pattern
|
20KiB even with the outer repetition increased to 100. However, this kind of
|
||||||
is not always exactly equivalent, because any captures within subroutine calls
|
pattern is not always exactly equivalent, because any captures within
|
||||||
are lost when the subroutine completes. If this is not a problem, this kind of
|
subroutine calls are lost when the subroutine completes. If this is not a
|
||||||
rewriting will allow you to process patterns that PCRE2 cannot otherwise
|
problem, this kind of rewriting will allow you to process patterns that PCRE2
|
||||||
handle. The matching performance of the two different versions of the pattern
|
cannot otherwise handle. The matching performance of the two different versions
|
||||||
are roughly the same. (This applies from release 10.30 - things were different
|
of the pattern are roughly the same. (This applies from release 10.30 - things
|
||||||
in earlier releases.)
|
were different in earlier releases.)
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC3" href="#TOC1">STACK AND HEAP USAGE AT RUN TIME</a><br>
|
<br><a name="SEC3" href="#TOC1">STACK AND HEAP USAGE AT RUN TIME</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -83,7 +83,7 @@ From release 10.30, the interpretive (non-JIT) version of <b>pcre2_match()</b>
|
||||||
uses very little system stack at run time. In earlier releases recursive
|
uses very little system stack at run time. In earlier releases recursive
|
||||||
function calls could use a great deal of stack, and this could cause problems,
|
function calls could use a great deal of stack, and this could cause problems,
|
||||||
but this usage has been eliminated. Backtracking positions are now explicitly
|
but this usage has been eliminated. Backtracking positions are now explicitly
|
||||||
remembered in memory frames controlled by the code. An initial 20K vector of
|
remembered in memory frames controlled by the code. An initial 20KiB vector of
|
||||||
frames is allocated on the system stack (enough for about 100 frames for small
|
frames is allocated on the system stack (enough for about 100 frames for small
|
||||||
patterns), but if this is insufficient, heap memory is used. The amount of heap
|
patterns), but if this is insufficient, heap memory is used. The amount of heap
|
||||||
memory can be limited; if the limit is set to zero, only the initial stack
|
memory can be limited; if the limit is set to zero, only the initial stack
|
||||||
|
|
125
doc/pcre2.txt
125
doc/pcre2.txt
|
@ -979,13 +979,14 @@ PCRE2 CONTEXTS
|
||||||
unless ddd is less than the limit set by the caller of pcre2_match()
|
unless ddd is less than the limit set by the caller of pcre2_match()
|
||||||
or, if no such limit is set, less than the default.
|
or, if no such limit is set, less than the default.
|
||||||
|
|
||||||
The pcre2_match() function starts out using a 20K vector on the system
|
The pcre2_match() function starts out using a 20KiB vector on the sys-
|
||||||
stack for recording backtracking points. The more nested backtracking
|
tem stack for recording backtracking points. The more nested backtrack-
|
||||||
points there are (that is, the deeper the search tree), the more memory
|
ing points there are (that is, the deeper the search tree), the more
|
||||||
is needed. Heap memory is used only if the initial vector is too
|
memory is needed. Heap memory is used only if the initial vector is
|
||||||
small. If the heap limit is set to a value less than 21 (in particular,
|
too small. If the heap limit is set to a value less than 21 (in partic-
|
||||||
zero) no heap memory will be used. In this case, only patterns that do
|
ular, zero) no heap memory will be used. In this case, only patterns
|
||||||
not have a lot of nested backtracking can be successfully processed.
|
that do not have a lot of nested backtracking can be successfully pro-
|
||||||
|
cessed.
|
||||||
|
|
||||||
Similarly, for pcre2_dfa_match(), a vector on the system stack is used
|
Similarly, for pcre2_dfa_match(), a vector on the system stack is used
|
||||||
when processing pattern recursions, lookarounds, or atomic groups, and
|
when processing pattern recursions, lookarounds, or atomic groups, and
|
||||||
|
@ -1152,9 +1153,9 @@ CHECKING BUILD-TIME OPTIONS
|
||||||
|
|
||||||
The default value of 2 for the 8-bit and 16-bit libraries is sufficient
|
The default value of 2 for the 8-bit and 16-bit libraries is sufficient
|
||||||
for all but the most massive patterns, since it allows the size of the
|
for all but the most massive patterns, since it allows the size of the
|
||||||
compiled pattern to be up to 64K code units. Larger values allow larger
|
compiled pattern to be up to 65535 code units. Larger values allow
|
||||||
regular expressions to be compiled by those two libraries, but at the
|
larger regular expressions to be compiled by those two libraries, but
|
||||||
expense of slower matching.
|
at the expense of slower matching.
|
||||||
|
|
||||||
PCRE2_CONFIG_MATCHLIMIT
|
PCRE2_CONFIG_MATCHLIMIT
|
||||||
|
|
||||||
|
@ -3710,11 +3711,11 @@ HANDLING VERY LARGE PATTERNS
|
||||||
part to another (for example, from an opening parenthesis to an alter-
|
part to another (for example, from an opening parenthesis to an alter-
|
||||||
nation metacharacter). By default, in the 8-bit and 16-bit libraries,
|
nation metacharacter). By default, in the 8-bit and 16-bit libraries,
|
||||||
two-byte values are used for these offsets, leading to a maximum size
|
two-byte values are used for these offsets, leading to a maximum size
|
||||||
for a compiled pattern of around 64K code units. This is sufficient to
|
for a compiled pattern of around 64 thousand code units. This is suffi-
|
||||||
handle all but the most gigantic patterns. Nevertheless, some people do
|
cient to handle all but the most gigantic patterns. Nevertheless, some
|
||||||
want to process truly enormous patterns, so it is possible to compile
|
people do want to process truly enormous patterns, so it is possible to
|
||||||
PCRE2 to use three-byte or four-byte offsets by adding a setting such
|
compile PCRE2 to use three-byte or four-byte offsets by adding a set-
|
||||||
as
|
ting such as
|
||||||
|
|
||||||
--with-link-size=3
|
--with-link-size=3
|
||||||
|
|
||||||
|
@ -3741,8 +3742,8 @@ LIMITING PCRE2 RESOURCE USAGE
|
||||||
pcre2_dfa_match() matching function, and to JIT matching (though the
|
pcre2_dfa_match() matching function, and to JIT matching (though the
|
||||||
counting is done differently).
|
counting is done differently).
|
||||||
|
|
||||||
The pcre2_match() function starts out using a 20K vector on the system
|
The pcre2_match() function starts out using a 20KiB vector on the sys-
|
||||||
stack to record backtracking points. The more nested backtracking
|
tem stack to record backtracking points. The more nested backtracking
|
||||||
points there are (that is, the deeper the search tree), the more memory
|
points there are (that is, the deeper the search tree), the more memory
|
||||||
is needed. If the initial vector is not large enough, heap memory is
|
is needed. If the initial vector is not large enough, heap memory is
|
||||||
used, up to a certain limit, which is specified in kibibytes (units of
|
used, up to a certain limit, which is specified in kibibytes (units of
|
||||||
|
@ -3857,14 +3858,14 @@ PCRE2GREP BUFFER SIZE
|
||||||
|
|
||||||
pcre2grep uses an internal buffer to hold a "window" on the file it is
|
pcre2grep uses an internal buffer to hold a "window" on the file it is
|
||||||
scanning, in order to be able to output "before" and "after" lines when
|
scanning, in order to be able to output "before" and "after" lines when
|
||||||
it finds a match. The starting size of the buffer is controlled by a
|
it finds a match. The default starting size of the buffer is 20KiB. The
|
||||||
parameter whose default value is 20K. The buffer itself is three times
|
buffer itself is three times this size, but because of the way it is
|
||||||
this size, but because of the way it is used for holding "before"
|
used for holding "before" lines, the longest line that is guaranteed to
|
||||||
lines, the longest line that is guaranteed to be processable is the
|
be processable is the notional buffer size. If a longer line is encoun-
|
||||||
parameter size. If a longer line is encountered, pcre2grep automati-
|
tered, pcre2grep automatically expands the buffer, up to a specified
|
||||||
cally expands the buffer, up to a specified maximum size, whose default
|
maximum size, whose default is 1MiB or the starting size, whichever is
|
||||||
is 1M or the starting size, whichever is the larger. You can change the
|
the larger. You can change the default parameter values by adding, for
|
||||||
default parameter values by adding, for example,
|
example,
|
||||||
|
|
||||||
--with-pcre2grep-bufsize=51200
|
--with-pcre2grep-bufsize=51200
|
||||||
--with-pcre2grep-max-bufsize=2097152
|
--with-pcre2grep-max-bufsize=2097152
|
||||||
|
@ -4818,7 +4819,7 @@ RETURN VALUES FROM JIT MATCHING
|
||||||
CONTROLLING THE JIT STACK
|
CONTROLLING THE JIT STACK
|
||||||
|
|
||||||
When the compiled JIT code runs, it needs a block of memory to use as a
|
When the compiled JIT code runs, it needs a block of memory to use as a
|
||||||
stack. By default, it uses 32K on the machine stack. However, some
|
stack. By default, it uses 32KiB on the machine stack. However, some
|
||||||
large or complicated patterns need more than this. The error
|
large or complicated patterns need more than this. The error
|
||||||
PCRE2_ERROR_JIT_STACKLIMIT is given when there is not enough stack.
|
PCRE2_ERROR_JIT_STACKLIMIT is given when there is not enough stack.
|
||||||
Three functions are provided for managing blocks of memory for use as
|
Three functions are provided for managing blocks of memory for use as
|
||||||
|
@ -4832,7 +4833,8 @@ CONTROLLING THE JIT STACK
|
||||||
NULL if there is an error. The pcre2_jit_stack_free() function is used
|
NULL if there is an error. The pcre2_jit_stack_free() function is used
|
||||||
to free a stack that is no longer needed. (For the technically minded:
|
to free a stack that is no longer needed. (For the technically minded:
|
||||||
the address space is allocated by mmap or VirtualAlloc.) A maximum
|
the address space is allocated by mmap or VirtualAlloc.) A maximum
|
||||||
stack size of 512K to 1M should be more than enough for any pattern.
|
stack size of 512KiB to 1MiB should be more than enough for any pat-
|
||||||
|
tern.
|
||||||
|
|
||||||
The pcre2_jit_stack_assign() function specifies which stack JIT code
|
The pcre2_jit_stack_assign() function specifies which stack JIT code
|
||||||
should use. Its arguments are as follows:
|
should use. Its arguments are as follows:
|
||||||
|
@ -4846,7 +4848,7 @@ CONTROLLING THE JIT STACK
|
||||||
JIT stack is used. There are three cases for the values of the other
|
JIT stack is used. There are three cases for the values of the other
|
||||||
two options:
|
two options:
|
||||||
|
|
||||||
(1) If callback is NULL and data is NULL, an internal 32K block
|
(1) If callback is NULL and data is NULL, an internal 32KiB block
|
||||||
on the machine stack is used. This is the default when a match
|
on the machine stack is used. This is the default when a match
|
||||||
context is created.
|
context is created.
|
||||||
|
|
||||||
|
@ -4857,7 +4859,7 @@ CONTROLLING THE JIT STACK
|
||||||
(3) If callback is not NULL, it must point to a function that is
|
(3) If callback is not NULL, it must point to a function that is
|
||||||
called with data as an argument at the start of matching, in
|
called with data as an argument at the start of matching, in
|
||||||
order to set up a JIT stack. If the return from the callback
|
order to set up a JIT stack. If the return from the callback
|
||||||
function is NULL, the internal 32K stack is used; otherwise the
|
function is NULL, the internal 32KiB stack is used; otherwise the
|
||||||
return value must be a valid JIT stack, the result of calling
|
return value must be a valid JIT stack, the result of calling
|
||||||
pcre2_jit_stack_create().
|
pcre2_jit_stack_create().
|
||||||
|
|
||||||
|
@ -4921,9 +4923,9 @@ JIT STACK FAQ
|
||||||
address space instead of allocating memory. We can safely allocate mem-
|
address space instead of allocating memory. We can safely allocate mem-
|
||||||
ory pages inside this address space, so the stack could grow without
|
ory pages inside this address space, so the stack could grow without
|
||||||
moving memory data (this is important because of pointers). Thus we can
|
moving memory data (this is important because of pointers). Thus we can
|
||||||
allocate 1M address space, and use only a single memory page (usually
|
allocate 1MiB address space, and use only a single memory page (usually
|
||||||
4K) if that is enough. However, we can still grow up to 1M anytime if
|
4KiB) if that is enough. However, we can still grow up to 1MiB anytime
|
||||||
needed.
|
if needed.
|
||||||
|
|
||||||
(3) Who "owns" a JIT stack?
|
(3) Who "owns" a JIT stack?
|
||||||
|
|
||||||
|
@ -4956,8 +4958,8 @@ JIT STACK FAQ
|
||||||
this without keeping a list of patterns.
|
this without keeping a list of patterns.
|
||||||
|
|
||||||
(6) OK, the stack is for long term memory allocation. But what happens
|
(6) OK, the stack is for long term memory allocation. But what happens
|
||||||
if a pattern causes stack overflow with a stack of 1M? Is that 1M kept
|
if a pattern causes stack overflow with a stack of 1MiB? Is that 1MiB
|
||||||
until the stack is freed?
|
kept until the stack is freed?
|
||||||
|
|
||||||
Especially on embedded sytems, it might be a good idea to release mem-
|
Especially on embedded sytems, it might be a good idea to release mem-
|
||||||
ory sometimes without freeing the stack. There is no API for this at
|
ory sometimes without freeing the stack. There is no API for this at
|
||||||
|
@ -5073,16 +5075,16 @@ SIZE AND OTHER LIMITATIONS
|
||||||
There are some size limitations in PCRE2 but it is hoped that they will
|
There are some size limitations in PCRE2 but it is hoped that they will
|
||||||
never in practice be relevant.
|
never in practice be relevant.
|
||||||
|
|
||||||
The maximum size of a compiled pattern is approximately 64K code units
|
The maximum size of a compiled pattern is approximately 64 thousand
|
||||||
for the 8-bit and 16-bit libraries if PCRE2 is compiled with the
|
code units for the 8-bit and 16-bit libraries if PCRE2 is compiled with
|
||||||
default internal linkage size, which is 2 bytes for these libraries. If
|
the default internal linkage size, which is 2 bytes for these
|
||||||
you want to process regular expressions that are truly enormous, you
|
libraries. If you want to process regular expressions that are truly
|
||||||
can compile PCRE2 with an internal linkage size of 3 or 4 (when build-
|
enormous, you can compile PCRE2 with an internal linkage size of 3 or 4
|
||||||
ing the 16-bit library, 3 is rounded up to 4). See the README file in
|
(when building the 16-bit library, 3 is rounded up to 4). See the
|
||||||
the source distribution and the pcre2build documentation for details.
|
README file in the source distribution and the pcre2build documentation
|
||||||
In these cases the limit is substantially larger. However, the speed
|
for details. In these cases the limit is substantially larger. How-
|
||||||
of execution is slower. In the 32-bit library, the internal linkage
|
ever, the speed of execution is slower. In the 32-bit library, the
|
||||||
size is always 4.
|
internal linkage size is always 4.
|
||||||
|
|
||||||
The maximum length of a source pattern string is essentially unlimited;
|
The maximum length of a source pattern string is essentially unlimited;
|
||||||
it is the largest number a PCRE2_SIZE variable can hold. However, the
|
it is the largest number a PCRE2_SIZE variable can hold. However, the
|
||||||
|
@ -6258,7 +6260,7 @@ BACKSLASH
|
||||||
|
|
||||||
The sequence \g followed by a signed or unsigned number, optionally
|
The sequence \g followed by a signed or unsigned number, optionally
|
||||||
enclosed in braces, is an absolute or relative backreference. A named
|
enclosed in braces, is an absolute or relative backreference. A named
|
||||||
backreference can be coded as \g{name}. backreferences are discussed
|
backreference can be coded as \g{name}. Backreferences are discussed
|
||||||
later, following the discussion of parenthesized subpatterns.
|
later, following the discussion of parenthesized subpatterns.
|
||||||
|
|
||||||
Absolute and relative subroutine calls
|
Absolute and relative subroutine calls
|
||||||
|
@ -7737,7 +7739,7 @@ BACKREFERENCES
|
||||||
the backreference. This can be done using alternation, as in the exam-
|
the backreference. This can be done using alternation, as in the exam-
|
||||||
ple above, or by a quantifier with a minimum of zero.
|
ple above, or by a quantifier with a minimum of zero.
|
||||||
|
|
||||||
backreferences of this type cause the group that they reference to be
|
Backreferences of this type cause the group that they reference to be
|
||||||
treated as an atomic group. Once the whole group has been matched, a
|
treated as an atomic group. Once the whole group has been matched, a
|
||||||
subsequent matching failure cannot cause backtracking into the middle
|
subsequent matching failure cannot cause backtracking into the middle
|
||||||
of the group.
|
of the group.
|
||||||
|
@ -8937,22 +8939,21 @@ COMPILED PATTERN MEMORY USAGE
|
||||||
|
|
||||||
((ab){1,1000}c){1,3}
|
((ab){1,1000}c){1,3}
|
||||||
|
|
||||||
uses over 50K bytes when compiled using the 8-bit library. When PCRE2
|
uses over 50KiB when compiled using the 8-bit library. When PCRE2 is
|
||||||
is compiled with its default internal pointer size of two bytes, the
|
compiled with its default internal pointer size of two bytes, the size
|
||||||
size limit on a compiled pattern is 64K code units in the 8-bit and
|
limit on a compiled pattern is 65535 code units in the 8-bit and 16-bit
|
||||||
16-bit libraries, and this is reached with the above pattern if the
|
libraries, and this is reached with the above pattern if the outer rep-
|
||||||
outer repetition is increased from 3 to 4. PCRE2 can be compiled to use
|
etition is increased from 3 to 4. PCRE2 can be compiled to use larger
|
||||||
larger internal pointers and thus handle larger compiled patterns, but
|
internal pointers and thus handle larger compiled patterns, but it is
|
||||||
it is better to try to rewrite your pattern to use less memory if you
|
better to try to rewrite your pattern to use less memory if you can.
|
||||||
can.
|
|
||||||
|
|
||||||
One way of reducing the memory usage for such patterns is to make use
|
One way of reducing the memory usage for such patterns is to make use
|
||||||
of PCRE2's "subroutine" facility. Re-writing the above pattern as
|
of PCRE2's "subroutine" facility. Re-writing the above pattern as
|
||||||
|
|
||||||
((ab)(?2){0,999}c)(?1){0,2}
|
((ab)(?2){0,999}c)(?1){0,2}
|
||||||
|
|
||||||
reduces the memory requirements to around 16K, and indeed it remains
|
reduces the memory requirements to around 16KiB, and indeed it remains
|
||||||
under 20K even with the outer repetition increased to 100. However,
|
under 20KiB even with the outer repetition increased to 100. However,
|
||||||
this kind of pattern is not always exactly equivalent, because any cap-
|
this kind of pattern is not always exactly equivalent, because any cap-
|
||||||
tures within subroutine calls are lost when the subroutine completes.
|
tures within subroutine calls are lost when the subroutine completes.
|
||||||
If this is not a problem, this kind of rewriting will allow you to
|
If this is not a problem, this kind of rewriting will allow you to
|
||||||
|
@ -8969,12 +8970,12 @@ STACK AND HEAP USAGE AT RUN TIME
|
||||||
sive function calls could use a great deal of stack, and this could
|
sive function calls could use a great deal of stack, and this could
|
||||||
cause problems, but this usage has been eliminated. Backtracking posi-
|
cause problems, but this usage has been eliminated. Backtracking posi-
|
||||||
tions are now explicitly remembered in memory frames controlled by the
|
tions are now explicitly remembered in memory frames controlled by the
|
||||||
code. An initial 20K vector of frames is allocated on the system stack
|
code. An initial 20KiB vector of frames is allocated on the system
|
||||||
(enough for about 100 frames for small patterns), but if this is insuf-
|
stack (enough for about 100 frames for small patterns), but if this is
|
||||||
ficient, heap memory is used. The amount of heap memory can be limited;
|
insufficient, heap memory is used. The amount of heap memory can be
|
||||||
if the limit is set to zero, only the initial stack vector is used.
|
limited; if the limit is set to zero, only the initial stack vector is
|
||||||
Rewriting patterns to be time-efficient, as described below, may also
|
used. Rewriting patterns to be time-efficient, as described below, may
|
||||||
reduce the memory requirements.
|
also reduce the memory requirements.
|
||||||
|
|
||||||
In contrast to pcre2_match(), pcre2_dfa_match() does use recursive
|
In contrast to pcre2_match(), pcre2_dfa_match() does use recursive
|
||||||
function calls, but only for processing atomic groups, lookaround
|
function calls, but only for processing atomic groups, lookaround
|
||||||
|
|
|
@ -24,7 +24,7 @@ passed to a matching function. The arguments of this function are:
|
||||||
callback a callback function
|
callback a callback function
|
||||||
callback_data a JIT stack or a value to be passed to the callback
|
callback_data a JIT stack or a value to be passed to the callback
|
||||||
.P
|
.P
|
||||||
If \fIcallback\fP is NULL and \fIcallback_data\fP is NULL, an internal 32K
|
If \fIcallback\fP is NULL and \fIcallback_data\fP is NULL, an internal 32KiB
|
||||||
block on the machine stack is used.
|
block on the machine stack is used.
|
||||||
.P
|
.P
|
||||||
If \fIcallback\fP is NULL and \fIcallback_data\fP is not NULL,
|
If \fIcallback\fP is NULL and \fIcallback_data\fP is not NULL,
|
||||||
|
@ -33,8 +33,9 @@ If \fIcallback\fP is NULL and \fIcallback_data\fP is not NULL,
|
||||||
.P
|
.P
|
||||||
If \fIcallback\fP not NULL, it is called with \fIcallback_data\fP as an
|
If \fIcallback\fP not NULL, it is called with \fIcallback_data\fP as an
|
||||||
argument at the start of matching, in order to set up a JIT stack. If the
|
argument at the start of matching, in order to set up a JIT stack. If the
|
||||||
result is NULL, the internal 32K stack is used; otherwise the return value must
|
result is NULL, the internal 32KiB stack is used; otherwise the return value
|
||||||
be a valid JIT stack, the result of calling \fBpcre2_jit_stack_create()\fP.
|
must be a valid JIT stack, the result of calling
|
||||||
|
\fBpcre2_jit_stack_create()\fP.
|
||||||
.P
|
.P
|
||||||
You may safely use the same JIT stack for multiple patterns, as long as they
|
You may safely use the same JIT stack for multiple patterns, as long as they
|
||||||
are all matched in the same thread. In a multithread application, each thread
|
are all matched in the same thread. In a multithread application, each thread
|
||||||
|
|
|
@ -21,8 +21,8 @@ context, for memory allocation functions, or NULL for standard memory
|
||||||
allocation. The result can be passed to the JIT run-time code by calling
|
allocation. The result can be passed to the JIT run-time code by calling
|
||||||
\fBpcre2_jit_stack_assign()\fP to associate the stack with a compiled pattern,
|
\fBpcre2_jit_stack_assign()\fP to associate the stack with a compiled pattern,
|
||||||
which can then be processed by \fBpcre2_match()\fP or \fBpcre2_jit_match()\fP.
|
which can then be processed by \fBpcre2_match()\fP or \fBpcre2_jit_match()\fP.
|
||||||
A maximum stack size of 512K to 1M should be more than enough for any pattern.
|
A maximum stack size of 512KiB to 1MiB should be more than enough for any
|
||||||
For more details, see the
|
pattern. For more details, see the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2jit\fP
|
\fBpcre2jit\fP
|
||||||
.\"
|
.\"
|
||||||
|
|
|
@ -909,7 +909,7 @@ where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||||
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
||||||
limit is set, less than the default.
|
limit is set, less than the default.
|
||||||
.P
|
.P
|
||||||
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
|
||||||
stack for recording backtracking points. The more nested backtracking points
|
stack for recording backtracking points. The more nested backtracking points
|
||||||
there are (that is, the deeper the search tree), the more memory is needed.
|
there are (that is, the deeper the search tree), the more memory is needed.
|
||||||
Heap memory is used only if the initial vector is too small. If the heap limit
|
Heap memory is used only if the initial vector is too small. If the heap limit
|
||||||
|
@ -1084,7 +1084,7 @@ relevant.
|
||||||
.P
|
.P
|
||||||
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
|
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
|
||||||
but the most massive patterns, since it allows the size of the compiled pattern
|
but the most massive patterns, since it allows the size of the compiled pattern
|
||||||
to be up to 64K code units. Larger values allow larger regular expressions to
|
to be up to 65535 code units. Larger values allow larger regular expressions to
|
||||||
be compiled by those two libraries, but at the expense of slower matching.
|
be compiled by those two libraries, but at the expense of slower matching.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_MATCHLIMIT
|
PCRE2_CONFIG_MATCHLIMIT
|
||||||
|
|
|
@ -244,10 +244,10 @@ Within a compiled pattern, offset values are used to point from one part to
|
||||||
another (for example, from an opening parenthesis to an alternation
|
another (for example, from an opening parenthesis to an alternation
|
||||||
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
|
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
|
||||||
are used for these offsets, leading to a maximum size for a compiled pattern of
|
are used for these offsets, leading to a maximum size for a compiled pattern of
|
||||||
around 64K code units. This is sufficient to handle all but the most gigantic
|
around 64 thousand code units. This is sufficient to handle all but the most
|
||||||
patterns. Nevertheless, some people do want to process truly enormous patterns,
|
gigantic patterns. Nevertheless, some people do want to process truly enormous
|
||||||
so it is possible to compile PCRE2 to use three-byte or four-byte offsets by
|
patterns, so it is possible to compile PCRE2 to use three-byte or four-byte
|
||||||
adding a setting such as
|
offsets by adding a setting such as
|
||||||
.sp
|
.sp
|
||||||
--with-link-size=3
|
--with-link-size=3
|
||||||
.sp
|
.sp
|
||||||
|
@ -277,7 +277,7 @@ to the \fBconfigure\fP command. This setting also applies to the
|
||||||
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
|
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
|
||||||
counting is done differently).
|
counting is done differently).
|
||||||
.P
|
.P
|
||||||
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
|
||||||
stack to record backtracking points. The more nested backtracking points there
|
stack to record backtracking points. The more nested backtracking points there
|
||||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||||
|
@ -403,13 +403,13 @@ they are not.
|
||||||
.sp
|
.sp
|
||||||
\fBpcre2grep\fP uses an internal buffer to hold a "window" on the file it is
|
\fBpcre2grep\fP uses an internal buffer to hold a "window" on the file it is
|
||||||
scanning, in order to be able to output "before" and "after" lines when it
|
scanning, in order to be able to output "before" and "after" lines when it
|
||||||
finds a match. The starting size of the buffer is controlled by a parameter
|
finds a match. The default starting size of the buffer is 20KiB. The buffer
|
||||||
whose default value is 20K. The buffer itself is three times this size, but
|
itself is three times this size, but because of the way it is used for holding
|
||||||
because of the way it is used for holding "before" lines, the longest line that
|
"before" lines, the longest line that is guaranteed to be processable is the
|
||||||
is guaranteed to be processable is the parameter size. If a longer line is
|
notional buffer size. If a longer line is encountered, \fBpcre2grep\fP
|
||||||
encountered, \fBpcre2grep\fP automatically expands the buffer, up to a
|
automatically expands the buffer, up to a specified maximum size, whose default
|
||||||
specified maximum size, whose default is 1M or the starting size, whichever is
|
is 1MiB or the starting size, whichever is the larger. You can change the
|
||||||
the larger. You can change the default parameter values by adding, for example,
|
default parameter values by adding, for example,
|
||||||
.sp
|
.sp
|
||||||
--with-pcre2grep-bufsize=51200
|
--with-pcre2grep-bufsize=51200
|
||||||
--with-pcre2grep-max-bufsize=2097152
|
--with-pcre2grep-max-bufsize=2097152
|
||||||
|
|
|
@ -58,7 +58,7 @@ that is obtained at the start of processing. If an input file contains very
|
||||||
long lines, a larger buffer may be needed; this is handled by automatically
|
long lines, a larger buffer may be needed; this is handled by automatically
|
||||||
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
|
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
|
||||||
default values for these parameters can be set when \fBpcre2grep\fP is
|
default values for these parameters can be set when \fBpcre2grep\fP is
|
||||||
built; if nothing is specified, the defaults are set to 20K and 1M
|
built; if nothing is specified, the defaults are set to 20KiB and 1MiB
|
||||||
respectively. An error occurs if a line is too long and the buffer can no
|
respectively. An error occurs if a line is too long and the buffer can no
|
||||||
longer be expanded.
|
longer be expanded.
|
||||||
.P
|
.P
|
||||||
|
@ -66,7 +66,7 @@ The block of memory that is actually used is three times the "buffer size", to
|
||||||
allow for buffering "before" and "after" lines. If the buffer size is too
|
allow for buffering "before" and "after" lines. If the buffer size is too
|
||||||
small, fewer than requested "before" and "after" lines may be output.
|
small, fewer than requested "before" and "after" lines may be output.
|
||||||
.P
|
.P
|
||||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater.
|
||||||
BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
|
BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
|
||||||
(specified by the use of \fB-e\fP and/or \fB-f\fP), each pattern is applied to
|
(specified by the use of \fB-e\fP and/or \fB-f\fP), each pattern is applied to
|
||||||
each line in the order in which they are defined, except that all the \fB-e\fP
|
each line in the order in which they are defined, except that all the \fB-e\fP
|
||||||
|
|
|
@ -58,15 +58,15 @@ DESCRIPTION
|
||||||
automatically extending the buffer, up to the limit specified by --max-
|
automatically extending the buffer, up to the limit specified by --max-
|
||||||
buffer-size. The default values for these parameters can be set when
|
buffer-size. The default values for these parameters can be set when
|
||||||
pcre2grep is built; if nothing is specified, the defaults are set to
|
pcre2grep is built; if nothing is specified, the defaults are set to
|
||||||
20K and 1M respectively. An error occurs if a line is too long and the
|
20KiB and 1MiB respectively. An error occurs if a line is too long and
|
||||||
buffer can no longer be expanded.
|
the buffer can no longer be expanded.
|
||||||
|
|
||||||
The block of memory that is actually used is three times the "buffer
|
The block of memory that is actually used is three times the "buffer
|
||||||
size", to allow for buffering "before" and "after" lines. If the buffer
|
size", to allow for buffering "before" and "after" lines. If the buffer
|
||||||
size is too small, fewer than requested "before" and "after" lines may
|
size is too small, fewer than requested "before" and "after" lines may
|
||||||
be output.
|
be output.
|
||||||
|
|
||||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the
|
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the
|
||||||
greater. BUFSIZ is defined in <stdio.h>. When there is more than one
|
greater. BUFSIZ is defined in <stdio.h>. When there is more than one
|
||||||
pattern (specified by the use of -e and/or -f), each pattern is applied
|
pattern (specified by the use of -e and/or -f), each pattern is applied
|
||||||
to each line in the order in which they are defined, except that all
|
to each line in the order in which they are defined, except that all
|
||||||
|
|
|
@ -161,7 +161,7 @@ when JIT matching is used.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
When the compiled JIT code runs, it needs a block of memory to use as a stack.
|
When the compiled JIT code runs, it needs a block of memory to use as a stack.
|
||||||
By default, it uses 32K on the machine stack. However, some large or
|
By default, it uses 32KiB on the machine stack. However, some large or
|
||||||
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
|
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
|
||||||
is given when there is not enough stack. Three functions are provided for
|
is given when there is not enough stack. Three functions are provided for
|
||||||
managing blocks of memory for use as JIT stacks. There is further discussion
|
managing blocks of memory for use as JIT stacks. There is further discussion
|
||||||
|
@ -178,8 +178,8 @@ allocation functions, or NULL for standard memory allocation). It returns a
|
||||||
pointer to an opaque structure of type \fBpcre2_jit_stack\fP, or NULL if there
|
pointer to an opaque structure of type \fBpcre2_jit_stack\fP, or NULL if there
|
||||||
is an error. The \fBpcre2_jit_stack_free()\fP function is used to free a stack
|
is an error. The \fBpcre2_jit_stack_free()\fP function is used to free a stack
|
||||||
that is no longer needed. (For the technically minded: the address space is
|
that is no longer needed. (For the technically minded: the address space is
|
||||||
allocated by mmap or VirtualAlloc.) A maximum stack size of 512K to 1M should
|
allocated by mmap or VirtualAlloc.) A maximum stack size of 512KiB to 1MiB
|
||||||
be more than enough for any pattern.
|
should be more than enough for any pattern.
|
||||||
.P
|
.P
|
||||||
The \fBpcre2_jit_stack_assign()\fP function specifies which stack JIT code
|
The \fBpcre2_jit_stack_assign()\fP function specifies which stack JIT code
|
||||||
should use. Its arguments are as follows:
|
should use. Its arguments are as follows:
|
||||||
|
@ -192,7 +192,7 @@ The first argument is a pointer to a match context. When this is subsequently
|
||||||
passed to a matching function, its information determines which JIT stack is
|
passed to a matching function, its information determines which JIT stack is
|
||||||
used. There are three cases for the values of the other two options:
|
used. There are three cases for the values of the other two options:
|
||||||
.sp
|
.sp
|
||||||
(1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32K block
|
(1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32KiB block
|
||||||
on the machine stack is used. This is the default when a match
|
on the machine stack is used. This is the default when a match
|
||||||
context is created.
|
context is created.
|
||||||
.sp
|
.sp
|
||||||
|
@ -203,7 +203,7 @@ used. There are three cases for the values of the other two options:
|
||||||
(3) If \fIcallback\fP is not NULL, it must point to a function that is
|
(3) If \fIcallback\fP is not NULL, it must point to a function that is
|
||||||
called with \fIdata\fP as an argument at the start of matching, in
|
called with \fIdata\fP as an argument at the start of matching, in
|
||||||
order to set up a JIT stack. If the return from the callback
|
order to set up a JIT stack. If the return from the callback
|
||||||
function is NULL, the internal 32K stack is used; otherwise the
|
function is NULL, the internal 32KiB stack is used; otherwise the
|
||||||
return value must be a valid JIT stack, the result of calling
|
return value must be a valid JIT stack, the result of calling
|
||||||
\fBpcre2_jit_stack_create()\fP.
|
\fBpcre2_jit_stack_create()\fP.
|
||||||
.sp
|
.sp
|
||||||
|
@ -265,9 +265,9 @@ we do the recursion in memory.
|
||||||
Modern operating systems have a nice feature: they can reserve an address space
|
Modern operating systems have a nice feature: they can reserve an address space
|
||||||
instead of allocating memory. We can safely allocate memory pages inside this
|
instead of allocating memory. We can safely allocate memory pages inside this
|
||||||
address space, so the stack could grow without moving memory data (this is
|
address space, so the stack could grow without moving memory data (this is
|
||||||
important because of pointers). Thus we can allocate 1M address space, and use
|
important because of pointers). Thus we can allocate 1MiB address space, and
|
||||||
only a single memory page (usually 4K) if that is enough. However, we can still
|
use only a single memory page (usually 4KiB) if that is enough. However, we can
|
||||||
grow up to 1M anytime if needed.
|
still grow up to 1MiB anytime if needed.
|
||||||
.P
|
.P
|
||||||
(3) Who "owns" a JIT stack?
|
(3) Who "owns" a JIT stack?
|
||||||
.sp
|
.sp
|
||||||
|
@ -300,7 +300,7 @@ say two minutes. The JIT callback can help to achieve this without keeping a
|
||||||
list of patterns.
|
list of patterns.
|
||||||
.P
|
.P
|
||||||
(6) OK, the stack is for long term memory allocation. But what happens if a
|
(6) OK, the stack is for long term memory allocation. But what happens if a
|
||||||
pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
|
pattern causes stack overflow with a stack of 1MiB? Is that 1MiB kept until the
|
||||||
stack is freed?
|
stack is freed?
|
||||||
.sp
|
.sp
|
||||||
Especially on embedded sytems, it might be a good idea to release memory
|
Especially on embedded sytems, it might be a good idea to release memory
|
||||||
|
|
|
@ -7,12 +7,12 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
There are some size limitations in PCRE2 but it is hoped that they will never
|
There are some size limitations in PCRE2 but it is hoped that they will never
|
||||||
in practice be relevant.
|
in practice be relevant.
|
||||||
.P
|
.P
|
||||||
The maximum size of a compiled pattern is approximately 64K code units for the
|
The maximum size of a compiled pattern is approximately 64 thousand code units
|
||||||
8-bit and 16-bit libraries if PCRE2 is compiled with the default internal
|
for the 8-bit and 16-bit libraries if PCRE2 is compiled with the default
|
||||||
linkage size, which is 2 bytes for these libraries. If you want to process
|
internal linkage size, which is 2 bytes for these libraries. If you want to
|
||||||
regular expressions that are truly enormous, you can compile PCRE2 with an
|
process regular expressions that are truly enormous, you can compile PCRE2 with
|
||||||
internal linkage size of 3 or 4 (when building the 16-bit library, 3 is rounded
|
an internal linkage size of 3 or 4 (when building the 16-bit library, 3 is
|
||||||
up to 4). See the \fBREADME\fP file in the source distribution and the
|
rounded up to 4). See the \fBREADME\fP file in the source distribution and the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2build\fP
|
\fBpcre2build\fP
|
||||||
.\"
|
.\"
|
||||||
|
|
|
@ -528,7 +528,7 @@ by code point, as described above.
|
||||||
.sp
|
.sp
|
||||||
The sequence \eg followed by a signed or unsigned number, optionally enclosed
|
The sequence \eg followed by a signed or unsigned number, optionally enclosed
|
||||||
in braces, is an absolute or relative backreference. A named backreference
|
in braces, is an absolute or relative backreference. A named backreference
|
||||||
can be coded as \eg{name}. backreferences are discussed
|
can be coded as \eg{name}. Backreferences are discussed
|
||||||
.\" HTML <a href="#backreferences">
|
.\" HTML <a href="#backreferences">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
later,
|
later,
|
||||||
|
@ -2243,7 +2243,7 @@ that the first iteration does not need to match the backreference. This can be
|
||||||
done using alternation, as in the example above, or by a quantifier with a
|
done using alternation, as in the example above, or by a quantifier with a
|
||||||
minimum of zero.
|
minimum of zero.
|
||||||
.P
|
.P
|
||||||
backreferences of this type cause the group that they reference to be treated
|
Backreferences of this type cause the group that they reference to be treated
|
||||||
as an
|
as an
|
||||||
.\" HTML <a href="#atomicgroup">
|
.\" HTML <a href="#atomicgroup">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
|
|
|
@ -34,9 +34,9 @@ example, the very simple pattern
|
||||||
.sp
|
.sp
|
||||||
((ab){1,1000}c){1,3}
|
((ab){1,1000}c){1,3}
|
||||||
.sp
|
.sp
|
||||||
uses over 50K bytes when compiled using the 8-bit library. When PCRE2 is
|
uses over 50KiB when compiled using the 8-bit library. When PCRE2 is
|
||||||
compiled with its default internal pointer size of two bytes, the size limit on
|
compiled with its default internal pointer size of two bytes, the size limit on
|
||||||
a compiled pattern is 64K code units in the 8-bit and 16-bit libraries, and
|
a compiled pattern is 65535 code units in the 8-bit and 16-bit libraries, and
|
||||||
this is reached with the above pattern if the outer repetition is increased
|
this is reached with the above pattern if the outer repetition is increased
|
||||||
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
|
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
|
||||||
handle larger compiled patterns, but it is better to try to rewrite your
|
handle larger compiled patterns, but it is better to try to rewrite your
|
||||||
|
@ -52,14 +52,14 @@ facility. Re-writing the above pattern as
|
||||||
.sp
|
.sp
|
||||||
((ab)(?2){0,999}c)(?1){0,2}
|
((ab)(?2){0,999}c)(?1){0,2}
|
||||||
.sp
|
.sp
|
||||||
reduces the memory requirements to around 16K, and indeed it remains under 20K
|
reduces the memory requirements to around 16KiB, and indeed it remains under
|
||||||
even with the outer repetition increased to 100. However, this kind of pattern
|
20KiB even with the outer repetition increased to 100. However, this kind of
|
||||||
is not always exactly equivalent, because any captures within subroutine calls
|
pattern is not always exactly equivalent, because any captures within
|
||||||
are lost when the subroutine completes. If this is not a problem, this kind of
|
subroutine calls are lost when the subroutine completes. If this is not a
|
||||||
rewriting will allow you to process patterns that PCRE2 cannot otherwise
|
problem, this kind of rewriting will allow you to process patterns that PCRE2
|
||||||
handle. The matching performance of the two different versions of the pattern
|
cannot otherwise handle. The matching performance of the two different versions
|
||||||
are roughly the same. (This applies from release 10.30 - things were different
|
of the pattern are roughly the same. (This applies from release 10.30 - things
|
||||||
in earlier releases.)
|
were different in earlier releases.)
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "STACK AND HEAP USAGE AT RUN TIME"
|
.SH "STACK AND HEAP USAGE AT RUN TIME"
|
||||||
|
@ -69,7 +69,7 @@ From release 10.30, the interpretive (non-JIT) version of \fBpcre2_match()\fP
|
||||||
uses very little system stack at run time. In earlier releases recursive
|
uses very little system stack at run time. In earlier releases recursive
|
||||||
function calls could use a great deal of stack, and this could cause problems,
|
function calls could use a great deal of stack, and this could cause problems,
|
||||||
but this usage has been eliminated. Backtracking positions are now explicitly
|
but this usage has been eliminated. Backtracking positions are now explicitly
|
||||||
remembered in memory frames controlled by the code. An initial 20K vector of
|
remembered in memory frames controlled by the code. An initial 20KiB vector of
|
||||||
frames is allocated on the system stack (enough for about 100 frames for small
|
frames is allocated on the system stack (enough for about 100 frames for small
|
||||||
patterns), but if this is insufficient, heap memory is used. The amount of heap
|
patterns), but if this is insufficient, heap memory is used. The amount of heap
|
||||||
memory can be limited; if the limit is set to zero, only the initial stack
|
memory can be limited; if the limit is set to zero, only the initial stack
|
||||||
|
|
|
@ -134,16 +134,16 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
|
|
||||||
/* This limits the amount of memory that may be used while matching a pattern.
|
/* This limits the amount of memory that may be used while matching a pattern.
|
||||||
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
||||||
to JIT matching. The value is in kilobytes. */
|
to JIT matching. The value is in kibibytes (units of 1024 bytes). */
|
||||||
#ifndef HEAP_LIMIT
|
#ifndef HEAP_LIMIT
|
||||||
#define HEAP_LIMIT 20000000
|
#define HEAP_LIMIT 20000000
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||||
as offsets within the compiled regex. The default is 2, which allows for
|
as offsets within the compiled regex. The default is 2, which allows for
|
||||||
compiled patterns up to 64K long. This covers the vast majority of cases.
|
compiled patterns up to 65535 code units long. This covers the vast
|
||||||
However, PCRE2 can also be compiled to use 3 or 4 bytes instead. This
|
majority of cases. However, PCRE2 can also be compiled to use 3 or 4 bytes
|
||||||
allows for longer patterns in extreme cases. */
|
instead. This allows for longer patterns in extreme cases. */
|
||||||
#ifndef LINK_SIZE
|
#ifndef LINK_SIZE
|
||||||
#define LINK_SIZE 2
|
#define LINK_SIZE 2
|
||||||
#endif
|
#endif
|
||||||
|
|
|
@ -139,9 +139,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
|
|
||||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||||
as offsets within the compiled regex. The default is 2, which allows for
|
as offsets within the compiled regex. The default is 2, which allows for
|
||||||
compiled patterns up to 64K long. This covers the vast majority of cases.
|
compiled patterns up to 65535 code units long. This covers the vast
|
||||||
However, PCRE2 can also be compiled to use 3 or 4 bytes instead. This
|
majority of cases. However, PCRE2 can also be compiled to use 3 or 4 bytes
|
||||||
allows for longer patterns in extreme cases. */
|
instead. This allows for longer patterns in extreme cases. */
|
||||||
#undef LINK_SIZE
|
#undef LINK_SIZE
|
||||||
|
|
||||||
/* Define to the sub-directory where libtool stores uninstalled libraries. */
|
/* Define to the sub-directory where libtool stores uninstalled libraries. */
|
||||||
|
|
|
@ -412,7 +412,7 @@ if (rws->next != NULL)
|
||||||
}
|
}
|
||||||
|
|
||||||
/* All sizes are in units of sizeof(int), except for mb->heaplimit, which is in
|
/* All sizes are in units of sizeof(int), except for mb->heaplimit, which is in
|
||||||
kilobytes. */
|
kibibytes. */
|
||||||
|
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
|
|
|
@ -247,7 +247,7 @@ not rely on this. */
|
||||||
pcre2_match() is allocated on the system stack, of this size (bytes). The size
|
pcre2_match() is allocated on the system stack, of this size (bytes). The size
|
||||||
must be a multiple of sizeof(PCRE2_SPTR) in all environments, so making it a
|
must be a multiple of sizeof(PCRE2_SPTR) in all environments, so making it a
|
||||||
multiple of 8 is best. Typical frame sizes are a few hundred bytes (it depends
|
multiple of 8 is best. Typical frame sizes are a few hundred bytes (it depends
|
||||||
on the number of capturing parentheses) so 20K handles quite a few frames. A
|
on the number of capturing parentheses) so 20KiB handles quite a few frames. A
|
||||||
larger vector on the heap is obtained for patterns that need more frames. The
|
larger vector on the heap is obtained for patterns that need more frames. The
|
||||||
maximum size of this can be limited. */
|
maximum size of this can be limited. */
|
||||||
|
|
||||||
|
|
|
@ -6283,7 +6283,7 @@ mb->match_limit_depth = (mcontext->depth_limit < re->limit_depth)?
|
||||||
/* If a pattern has very many capturing parentheses, the frame size may be very
|
/* If a pattern has very many capturing parentheses, the frame size may be very
|
||||||
large. Ensure that there are at least 10 available frames by getting an initial
|
large. Ensure that there are at least 10 available frames by getting an initial
|
||||||
vector on the heap if necessary, except when the heap limit prevents this. Get
|
vector on the heap if necessary, except when the heap limit prevents this. Get
|
||||||
fewer if possible. (The heap limit is in kilobytes.) */
|
fewer if possible. (The heap limit is in kibibytes.) */
|
||||||
|
|
||||||
if (frame_size <= START_FRAMES_SIZE/10)
|
if (frame_size <= START_FRAMES_SIZE/10)
|
||||||
{
|
{
|
||||||
|
|
|
@ -416,7 +416,7 @@ static option_item optionlist[] = {
|
||||||
{ OP_NODATA, N_LBUFFER, NULL, "line-buffered", "use line buffering" },
|
{ OP_NODATA, N_LBUFFER, NULL, "line-buffered", "use line buffering" },
|
||||||
{ OP_NODATA, N_LOFFSETS, NULL, "line-offsets", "output line numbers and offsets, not text" },
|
{ OP_NODATA, N_LOFFSETS, NULL, "line-offsets", "output line numbers and offsets, not text" },
|
||||||
{ OP_STRING, N_LOCALE, &locale, "locale=locale", "use the named locale" },
|
{ OP_STRING, N_LOCALE, &locale, "locale=locale", "use the named locale" },
|
||||||
{ OP_SIZE, N_H_LIMIT, &heap_limit, "heap-limit=number", "set PCRE2 heap limit option (kilobytes)" },
|
{ OP_SIZE, N_H_LIMIT, &heap_limit, "heap-limit=number", "set PCRE2 heap limit option (kibibytes)" },
|
||||||
{ OP_U32NUMBER, N_M_LIMIT, &match_limit, "match-limit=number", "set PCRE2 match limit option" },
|
{ OP_U32NUMBER, N_M_LIMIT, &match_limit, "match-limit=number", "set PCRE2 match limit option" },
|
||||||
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "depth-limit=number", "set PCRE2 depth limit option" },
|
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "depth-limit=number", "set PCRE2 depth limit option" },
|
||||||
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "recursion-limit=number", "obsolete synonym for depth-limit" },
|
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "recursion-limit=number", "obsolete synonym for depth-limit" },
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
This is a file of miscellaneous text that is used as test data for checking
|
This is a file of miscellaneous text that is used as test data for checking
|
||||||
that the pcregrep command is working correctly. The file must be more than 24K
|
that the pcregrep command is working correctly. The file must be more than
|
||||||
long so that it needs more than a single read() call to process it. New
|
24KiB long so that it needs more than a single read() call to process it. New
|
||||||
features should be added at the end, because some of the tests involve the
|
features should be added at the end, because some of the tests involve the
|
||||||
output of line numbers, and we don't want these to change.
|
output of line numbers, and we don't want these to change.
|
||||||
|
|
||||||
|
@ -9,7 +9,7 @@ In the middle of a line, PATTERN appears.
|
||||||
|
|
||||||
This pattern is in lower case.
|
This pattern is in lower case.
|
||||||
|
|
||||||
Here follows a whole lot of stuff that makes the file over 24K long.
|
Here follows a whole lot of stuff that makes the file over 24KiB long.
|
||||||
|
|
||||||
-------------------------------------------------------------------------------
|
-------------------------------------------------------------------------------
|
||||||
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
|
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
|
||||||
|
|
|
@ -346,7 +346,7 @@ RC=0
|
||||||
./testdata/grepinput-9-
|
./testdata/grepinput-9-
|
||||||
./testdata/grepinput:10:This pattern is in lower case.
|
./testdata/grepinput:10:This pattern is in lower case.
|
||||||
./testdata/grepinput-11-
|
./testdata/grepinput-11-
|
||||||
./testdata/grepinput-12-Here follows a whole lot of stuff that makes the file over 24K long.
|
./testdata/grepinput-12-Here follows a whole lot of stuff that makes the file over 24KiB long.
|
||||||
./testdata/grepinput-13-
|
./testdata/grepinput-13-
|
||||||
--
|
--
|
||||||
./testdata/grepinput:623:Check up on PATTERN near the end.
|
./testdata/grepinput:623:Check up on PATTERN near the end.
|
||||||
|
@ -379,6 +379,7 @@ RC=0
|
||||||
./testdata/grepinputx
|
./testdata/grepinputx
|
||||||
RC=0
|
RC=0
|
||||||
---------------------------- Test 37 -----------------------------
|
---------------------------- Test 37 -----------------------------
|
||||||
|
24KiB long so that it needs more than a single read() call to process it. New
|
||||||
aaaaa0
|
aaaaa0
|
||||||
aaaaa2
|
aaaaa2
|
||||||
010203040506
|
010203040506
|
||||||
|
@ -465,11 +466,11 @@ fox [1;31mjumps[0m
|
||||||
This time it [1;31mjumps[0m and [1;31mjumps[0m and [1;31mjumps[0m.
|
This time it [1;31mjumps[0m and [1;31mjumps[0m and [1;31mjumps[0m.
|
||||||
RC=0
|
RC=0
|
||||||
---------------------------- Test 53 ------------------------------
|
---------------------------- Test 53 ------------------------------
|
||||||
36972,6
|
36976,6
|
||||||
36990,4
|
36994,4
|
||||||
37024,4
|
37028,4
|
||||||
37066,5
|
37070,5
|
||||||
37083,4
|
37087,4
|
||||||
RC=0
|
RC=0
|
||||||
---------------------------- Test 54 ------------------------------
|
---------------------------- Test 54 ------------------------------
|
||||||
595:15,6
|
595:15,6
|
||||||
|
@ -519,8 +520,8 @@ RC=0
|
||||||
pcre2grep: pcre2_match() gave error -47 while matching text that starts:
|
pcre2grep: pcre2_match() gave error -47 while matching text that starts:
|
||||||
|
|
||||||
This is a file of miscellaneous text that is used as test data for checking
|
This is a file of miscellaneous text that is used as test data for checking
|
||||||
that the pcregrep command is working correctly. The file must be more than 24K
|
that the pcregrep command is working correctly. The file must be more than
|
||||||
long so that it needs more than a single read
|
24KiB long so that it needs more than a single re
|
||||||
|
|
||||||
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
|
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
|
||||||
pcre2grep: Check your regex for nested unlimited loops.
|
pcre2grep: Check your regex for nested unlimited loops.
|
||||||
|
@ -529,8 +530,8 @@ RC=1
|
||||||
pcre2grep: pcre2_match() gave error -53 while matching text that starts:
|
pcre2grep: pcre2_match() gave error -53 while matching text that starts:
|
||||||
|
|
||||||
This is a file of miscellaneous text that is used as test data for checking
|
This is a file of miscellaneous text that is used as test data for checking
|
||||||
that the pcregrep command is working correctly. The file must be more than 24K
|
that the pcregrep command is working correctly. The file must be more than
|
||||||
long so that it needs more than a single read
|
24KiB long so that it needs more than a single re
|
||||||
|
|
||||||
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
|
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
|
||||||
pcre2grep: Check your regex for nested unlimited loops.
|
pcre2grep: Check your regex for nested unlimited loops.
|
||||||
|
@ -814,11 +815,11 @@ RC=0
|
||||||
615:0,12
|
615:0,12
|
||||||
RC=0
|
RC=0
|
||||||
---------------------------- Test 112 -----------------------------
|
---------------------------- Test 112 -----------------------------
|
||||||
37168,12
|
37172,12
|
||||||
37180,12
|
37184,12
|
||||||
37192,12
|
37196,12
|
||||||
37204,12
|
37208,12
|
||||||
37216,12
|
37220,12
|
||||||
RC=0
|
RC=0
|
||||||
---------------------------- Test 113 -----------------------------
|
---------------------------- Test 113 -----------------------------
|
||||||
480
|
480
|
||||||
|
|
Loading…
Reference in New Issue