More typos and changes to "Kibibytes" for "Kilobytes".
This commit is contained in:
parent
fabea723cf
commit
e75410a5d8
10
ChangeLog
10
ChangeLog
|
@ -370,8 +370,8 @@ tests to improve coverage.
|
||||||
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
|
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
|
||||||
pcre2test, a crash could occur.
|
pcre2test, a crash could occur.
|
||||||
|
|
||||||
32. Make -bigstack in RunTest allocate a 64MB stack (instead of 16 MB) so that
|
32. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16 MiB) so
|
||||||
all the tests can run with clang's sanitizing options.
|
that all the tests can run with clang's sanitizing options.
|
||||||
|
|
||||||
33. Implement extra compile options in the compile context and add the first
|
33. Implement extra compile options in the compile context and add the first
|
||||||
one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
||||||
|
@ -964,9 +964,9 @@ to the same code as '.' when PCRE2_DOTALL is set).
|
||||||
40. Fix two clang compiler warnings in pcre2test when only one code unit width
|
40. Fix two clang compiler warnings in pcre2test when only one code unit width
|
||||||
is supported.
|
is supported.
|
||||||
|
|
||||||
41. Upgrade RunTest to automatically re-run test 2 with a large (64M) stack if
|
41. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack
|
||||||
it fails when running the interpreter with a 16M stack (and if changing the
|
if it fails when running the interpreter with a 16MiB stack (and if changing
|
||||||
stack size via pcre2test is possible). This avoids having to manually set a
|
the stack size via pcre2test is possible). This avoids having to manually set a
|
||||||
large stack size when testing with clang.
|
large stack size when testing with clang.
|
||||||
|
|
||||||
42. Fix register overwite in JIT when SSE2 acceleration is enabled.
|
42. Fix register overwite in JIT when SSE2 acceleration is enabled.
|
||||||
|
|
2
HACKING
2
HACKING
|
@ -370,7 +370,7 @@ default value for LINK_SIZE is 2, except for the 32-bit library, where it can
|
||||||
only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
|
only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
|
||||||
and the 16-bit library can be compiled to use 4-byte values, though this
|
and the 16-bit library can be compiled to use 4-byte values, though this
|
||||||
impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
|
impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
|
||||||
necessary only when patterns whose compiled length is greater than 64K code
|
necessary only when patterns whose compiled length is greater than 65535 code
|
||||||
units are going to be processed. When a LINK_SIZE value uses more than one code
|
units are going to be processed. When a LINK_SIZE value uses more than one code
|
||||||
unit, the most significant unit is first.
|
unit, the most significant unit is first.
|
||||||
|
|
||||||
|
|
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
||||||
|
|
||||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||||
|
|
||||||
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
Prior to release 10.30 the default system stack size of 1MiB in some Windows
|
||||||
environments caused issues with some tests. This should no longer be the case
|
environments caused issues with some tests. This should no longer be the case
|
||||||
for 10.30 and later releases.
|
for 10.30 and later releases.
|
||||||
|
|
||||||
|
|
2
README
2
README
|
@ -263,7 +263,7 @@ library. They are also documented in the pcre2build man page.
|
||||||
pcre2_set_heap_limit).
|
pcre2_set_heap_limit).
|
||||||
|
|
||||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
64 kibibytes. You can increase this by adding --with-link-size=3 to the
|
||||||
"configure" command. PCRE2 then uses three bytes instead of two for offsets
|
"configure" command. PCRE2 then uses three bytes instead of two for offsets
|
||||||
to different parts of the compiled pattern. In the 16-bit library,
|
to different parts of the compiled pattern. In the 16-bit library,
|
||||||
--with-link-size=3 is the same as --with-link-size=4, which (in both
|
--with-link-size=3 is the same as --with-link-size=4, which (in both
|
||||||
|
|
|
@ -706,8 +706,8 @@ fi
|
||||||
AC_DEFINE_UNQUOTED([LINK_SIZE], [$with_link_size], [
|
AC_DEFINE_UNQUOTED([LINK_SIZE], [$with_link_size], [
|
||||||
The value of LINK_SIZE determines the number of bytes used to store
|
The value of LINK_SIZE determines the number of bytes used to store
|
||||||
links as offsets within the compiled regex. The default is 2, which
|
links as offsets within the compiled regex. The default is 2, which
|
||||||
allows for compiled patterns up to 64K long. This covers the vast
|
allows for compiled patterns up to 65535 code units long. This covers the
|
||||||
majority of cases. However, PCRE2 can also be compiled to use 3 or 4
|
vast majority of cases. However, PCRE2 can also be compiled to use 3 or 4
|
||||||
bytes instead. This allows for longer patterns in extreme cases.])
|
bytes instead. This allows for longer patterns in extreme cases.])
|
||||||
|
|
||||||
AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
|
AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
|
||||||
|
|
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
||||||
|
|
||||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||||
|
|
||||||
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
Prior to release 10.30 the default system stack size of 1MiB in some Windows
|
||||||
environments caused issues with some tests. This should no longer be the case
|
environments caused issues with some tests. This should no longer be the case
|
||||||
for 10.30 and later releases.
|
for 10.30 and later releases.
|
||||||
|
|
||||||
|
|
|
@ -263,7 +263,7 @@ library. They are also documented in the pcre2build man page.
|
||||||
pcre2_set_heap_limit).
|
pcre2_set_heap_limit).
|
||||||
|
|
||||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
64 kibibytes. You can increase this by adding --with-link-size=3 to the
|
||||||
"configure" command. PCRE2 then uses three bytes instead of two for offsets
|
"configure" command. PCRE2 then uses three bytes instead of two for offsets
|
||||||
to different parts of the compiled pattern. In the 16-bit library,
|
to different parts of the compiled pattern. In the 16-bit library,
|
||||||
--with-link-size=3 is the same as --with-link-size=4, which (in both
|
--with-link-size=3 is the same as --with-link-size=4, which (in both
|
||||||
|
|
|
@ -38,7 +38,7 @@ passed to a matching function. The arguments of this function are:
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If <i>callback</i> is NULL and <i>callback_data</i> is NULL, an internal 32K
|
If <i>callback</i> is NULL and <i>callback_data</i> is NULL, an internal 32KiB
|
||||||
block on the machine stack is used.
|
block on the machine stack is used.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -49,8 +49,9 @@ If <i>callback</i> is NULL and <i>callback_data</i> is not NULL,
|
||||||
<P>
|
<P>
|
||||||
If <i>callback</i> not NULL, it is called with <i>callback_data</i> as an
|
If <i>callback</i> not NULL, it is called with <i>callback_data</i> as an
|
||||||
argument at the start of matching, in order to set up a JIT stack. If the
|
argument at the start of matching, in order to set up a JIT stack. If the
|
||||||
result is NULL, the internal 32K stack is used; otherwise the return value must
|
result is NULL, the internal 32KiB stack is used; otherwise the return value
|
||||||
be a valid JIT stack, the result of calling <b>pcre2_jit_stack_create()</b>.
|
must be a valid JIT stack, the result of calling
|
||||||
|
<b>pcre2_jit_stack_create()</b>.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
You may safely use the same JIT stack for multiple patterns, as long as they
|
You may safely use the same JIT stack for multiple patterns, as long as they
|
||||||
|
|
|
@ -33,8 +33,8 @@ context, for memory allocation functions, or NULL for standard memory
|
||||||
allocation. The result can be passed to the JIT run-time code by calling
|
allocation. The result can be passed to the JIT run-time code by calling
|
||||||
<b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern,
|
<b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern,
|
||||||
which can then be processed by <b>pcre2_match()</b> or <b>pcre2_jit_match()</b>.
|
which can then be processed by <b>pcre2_match()</b> or <b>pcre2_jit_match()</b>.
|
||||||
A maximum stack size of 512K to 1M should be more than enough for any pattern.
|
A maximum stack size of 512KiB to 1MiB should be more than enough for any
|
||||||
For more details, see the
|
pattern. For more details, see the
|
||||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||||
page.
|
page.
|
||||||
</P>
|
</P>
|
||||||
|
|
|
@ -973,7 +973,7 @@ less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
|
||||||
limit is set, less than the default.
|
limit is set, less than the default.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
The <b>pcre2_match()</b> function starts out using a 20KiB vector on the system
|
||||||
stack for recording backtracking points. The more nested backtracking points
|
stack for recording backtracking points. The more nested backtracking points
|
||||||
there are (that is, the deeper the search tree), the more memory is needed.
|
there are (that is, the deeper the search tree), the more memory is needed.
|
||||||
Heap memory is used only if the initial vector is too small. If the heap limit
|
Heap memory is used only if the initial vector is too small. If the heap limit
|
||||||
|
@ -1155,7 +1155,7 @@ relevant.
|
||||||
<P>
|
<P>
|
||||||
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
|
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
|
||||||
but the most massive patterns, since it allows the size of the compiled pattern
|
but the most massive patterns, since it allows the size of the compiled pattern
|
||||||
to be up to 64K code units. Larger values allow larger regular expressions to
|
to be up to 65535 code units. Larger values allow larger regular expressions to
|
||||||
be compiled by those two libraries, but at the expense of slower matching.
|
be compiled by those two libraries, but at the expense of slower matching.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_MATCHLIMIT
|
PCRE2_CONFIG_MATCHLIMIT
|
||||||
|
|
|
@ -252,10 +252,10 @@ Within a compiled pattern, offset values are used to point from one part to
|
||||||
another (for example, from an opening parenthesis to an alternation
|
another (for example, from an opening parenthesis to an alternation
|
||||||
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
|
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
|
||||||
are used for these offsets, leading to a maximum size for a compiled pattern of
|
are used for these offsets, leading to a maximum size for a compiled pattern of
|
||||||
around 64K code units. This is sufficient to handle all but the most gigantic
|
around 64 thousand code units. This is sufficient to handle all but the most
|
||||||
patterns. Nevertheless, some people do want to process truly enormous patterns,
|
gigantic patterns. Nevertheless, some people do want to process truly enormous
|
||||||
so it is possible to compile PCRE2 to use three-byte or four-byte offsets by
|
patterns, so it is possible to compile PCRE2 to use three-byte or four-byte
|
||||||
adding a setting such as
|
offsets by adding a setting such as
|
||||||
<pre>
|
<pre>
|
||||||
--with-link-size=3
|
--with-link-size=3
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -282,7 +282,7 @@ to the <b>configure</b> command. This setting also applies to the
|
||||||
counting is done differently).
|
counting is done differently).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
The <b>pcre2_match()</b> function starts out using a 20KiB vector on the system
|
||||||
stack to record backtracking points. The more nested backtracking points there
|
stack to record backtracking points. The more nested backtracking points there
|
||||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||||
|
@ -399,13 +399,13 @@ they are not.
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
|
<b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
|
||||||
scanning, in order to be able to output "before" and "after" lines when it
|
scanning, in order to be able to output "before" and "after" lines when it
|
||||||
finds a match. The starting size of the buffer is controlled by a parameter
|
finds a match. The default starting size of the buffer is 20KiB. The buffer
|
||||||
whose default value is 20K. The buffer itself is three times this size, but
|
itself is three times this size, but because of the way it is used for holding
|
||||||
because of the way it is used for holding "before" lines, the longest line that
|
"before" lines, the longest line that is guaranteed to be processable is the
|
||||||
is guaranteed to be processable is the parameter size. If a longer line is
|
notional buffer size. If a longer line is encountered, <b>pcre2grep</b>
|
||||||
encountered, <b>pcre2grep</b> automatically expands the buffer, up to a
|
automatically expands the buffer, up to a specified maximum size, whose default
|
||||||
specified maximum size, whose default is 1M or the starting size, whichever is
|
is 1MiB or the starting size, whichever is the larger. You can change the
|
||||||
the larger. You can change the default parameter values by adding, for example,
|
default parameter values by adding, for example,
|
||||||
<pre>
|
<pre>
|
||||||
--with-pcre2grep-bufsize=51200
|
--with-pcre2grep-bufsize=51200
|
||||||
--with-pcre2grep-max-bufsize=2097152
|
--with-pcre2grep-max-bufsize=2097152
|
||||||
|
|
|
@ -87,7 +87,7 @@ that is obtained at the start of processing. If an input file contains very
|
||||||
long lines, a larger buffer may be needed; this is handled by automatically
|
long lines, a larger buffer may be needed; this is handled by automatically
|
||||||
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
||||||
default values for these parameters can be set when <b>pcre2grep</b> is
|
default values for these parameters can be set when <b>pcre2grep</b> is
|
||||||
built; if nothing is specified, the defaults are set to 20K and 1M
|
built; if nothing is specified, the defaults are set to 20KiB and 1MiB
|
||||||
respectively. An error occurs if a line is too long and the buffer can no
|
respectively. An error occurs if a line is too long and the buffer can no
|
||||||
longer be expanded.
|
longer be expanded.
|
||||||
</P>
|
</P>
|
||||||
|
@ -97,7 +97,7 @@ allow for buffering "before" and "after" lines. If the buffer size is too
|
||||||
small, fewer than requested "before" and "after" lines may be output.
|
small, fewer than requested "before" and "after" lines may be output.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater.
|
||||||
BUFSIZ is defined in <b><stdio.h></b>. When there is more than one pattern
|
BUFSIZ is defined in <b><stdio.h></b>. When there is more than one pattern
|
||||||
(specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
|
(specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
|
||||||
each line in the order in which they are defined, except that all the <b>-e</b>
|
each line in the order in which they are defined, except that all the <b>-e</b>
|
||||||
|
|
|
@ -179,7 +179,7 @@ when JIT matching is used.
|
||||||
<br><a name="SEC6" href="#TOC1">CONTROLLING THE JIT STACK</a><br>
|
<br><a name="SEC6" href="#TOC1">CONTROLLING THE JIT STACK</a><br>
|
||||||
<P>
|
<P>
|
||||||
When the compiled JIT code runs, it needs a block of memory to use as a stack.
|
When the compiled JIT code runs, it needs a block of memory to use as a stack.
|
||||||
By default, it uses 32K on the machine stack. However, some large or
|
By default, it uses 32KiB on the machine stack. However, some large or
|
||||||
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
|
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
|
||||||
is given when there is not enough stack. Three functions are provided for
|
is given when there is not enough stack. Three functions are provided for
|
||||||
managing blocks of memory for use as JIT stacks. There is further discussion
|
managing blocks of memory for use as JIT stacks. There is further discussion
|
||||||
|
@ -194,8 +194,8 @@ allocation functions, or NULL for standard memory allocation). It returns a
|
||||||
pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
|
pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
|
||||||
is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
|
is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
|
||||||
that is no longer needed. (For the technically minded: the address space is
|
that is no longer needed. (For the technically minded: the address space is
|
||||||
allocated by mmap or VirtualAlloc.) A maximum stack size of 512K to 1M should
|
allocated by mmap or VirtualAlloc.) A maximum stack size of 512KiB to 1MiB
|
||||||
be more than enough for any pattern.
|
should be more than enough for any pattern.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code
|
The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code
|
||||||
|
@ -209,7 +209,7 @@ The first argument is a pointer to a match context. When this is subsequently
|
||||||
passed to a matching function, its information determines which JIT stack is
|
passed to a matching function, its information determines which JIT stack is
|
||||||
used. There are three cases for the values of the other two options:
|
used. There are three cases for the values of the other two options:
|
||||||
<pre>
|
<pre>
|
||||||
(1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32K block
|
(1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32KiB block
|
||||||
on the machine stack is used. This is the default when a match
|
on the machine stack is used. This is the default when a match
|
||||||
context is created.
|
context is created.
|
||||||
|
|
||||||
|
@ -220,7 +220,7 @@ used. There are three cases for the values of the other two options:
|
||||||
(3) If <i>callback</i> is not NULL, it must point to a function that is
|
(3) If <i>callback</i> is not NULL, it must point to a function that is
|
||||||
called with <i>data</i> as an argument at the start of matching, in
|
called with <i>data</i> as an argument at the start of matching, in
|
||||||
order to set up a JIT stack. If the return from the callback
|
order to set up a JIT stack. If the return from the callback
|
||||||
function is NULL, the internal 32K stack is used; otherwise the
|
function is NULL, the internal 32KiB stack is used; otherwise the
|
||||||
return value must be a valid JIT stack, the result of calling
|
return value must be a valid JIT stack, the result of calling
|
||||||
<b>pcre2_jit_stack_create()</b>.
|
<b>pcre2_jit_stack_create()</b>.
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -286,9 +286,9 @@ we do the recursion in memory.
|
||||||
Modern operating systems have a nice feature: they can reserve an address space
|
Modern operating systems have a nice feature: they can reserve an address space
|
||||||
instead of allocating memory. We can safely allocate memory pages inside this
|
instead of allocating memory. We can safely allocate memory pages inside this
|
||||||
address space, so the stack could grow without moving memory data (this is
|
address space, so the stack could grow without moving memory data (this is
|
||||||
important because of pointers). Thus we can allocate 1M address space, and use
|
important because of pointers). Thus we can allocate 1MiB address space, and
|
||||||
only a single memory page (usually 4K) if that is enough. However, we can still
|
use only a single memory page (usually 4KiB) if that is enough. However, we can
|
||||||
grow up to 1M anytime if needed.
|
still grow up to 1MiB anytime if needed.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
(3) Who "owns" a JIT stack?
|
(3) Who "owns" a JIT stack?
|
||||||
|
@ -328,7 +328,7 @@ list of patterns.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
(6) OK, the stack is for long term memory allocation. But what happens if a
|
(6) OK, the stack is for long term memory allocation. But what happens if a
|
||||||
pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
|
pattern causes stack overflow with a stack of 1MiB? Is that 1MiB kept until the
|
||||||
stack is freed?
|
stack is freed?
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -20,12 +20,12 @@ There are some size limitations in PCRE2 but it is hoped that they will never
|
||||||
in practice be relevant.
|
in practice be relevant.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The maximum size of a compiled pattern is approximately 64K code units for the
|
The maximum size of a compiled pattern is approximately 64 thousand code units
|
||||||
8-bit and 16-bit libraries if PCRE2 is compiled with the default internal
|
for the 8-bit and 16-bit libraries if PCRE2 is compiled with the default
|
||||||
linkage size, which is 2 bytes for these libraries. If you want to process
|
internal linkage size, which is 2 bytes for these libraries. If you want to
|
||||||
regular expressions that are truly enormous, you can compile PCRE2 with an
|
process regular expressions that are truly enormous, you can compile PCRE2 with
|
||||||
internal linkage size of 3 or 4 (when building the 16-bit library, 3 is rounded
|
an internal linkage size of 3 or 4 (when building the 16-bit library, 3 is
|
||||||
up to 4). See the <b>README</b> file in the source distribution and the
|
rounded up to 4). See the <b>README</b> file in the source distribution and the
|
||||||
<a href="pcre2build.html"><b>pcre2build</b></a>
|
<a href="pcre2build.html"><b>pcre2build</b></a>
|
||||||
documentation for details. In these cases the limit is substantially larger.
|
documentation for details. In these cases the limit is substantially larger.
|
||||||
However, the speed of execution is slower. In the 32-bit library, the internal
|
However, the speed of execution is slower. In the 32-bit library, the internal
|
||||||
|
|
|
@ -549,7 +549,7 @@ Absolute and relative backreferences
|
||||||
<P>
|
<P>
|
||||||
The sequence \g followed by a signed or unsigned number, optionally enclosed
|
The sequence \g followed by a signed or unsigned number, optionally enclosed
|
||||||
in braces, is an absolute or relative backreference. A named backreference
|
in braces, is an absolute or relative backreference. A named backreference
|
||||||
can be coded as \g{name}. backreferences are discussed
|
can be coded as \g{name}. Backreferences are discussed
|
||||||
<a href="#backreferences">later,</a>
|
<a href="#backreferences">later,</a>
|
||||||
following the discussion of
|
following the discussion of
|
||||||
<a href="#subpattern">parenthesized subpatterns.</a>
|
<a href="#subpattern">parenthesized subpatterns.</a>
|
||||||
|
@ -2247,7 +2247,7 @@ done using alternation, as in the example above, or by a quantifier with a
|
||||||
minimum of zero.
|
minimum of zero.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
backreferences of this type cause the group that they reference to be treated
|
Backreferences of this type cause the group that they reference to be treated
|
||||||
as an
|
as an
|
||||||
<a href="#atomicgroup">atomic group.</a>
|
<a href="#atomicgroup">atomic group.</a>
|
||||||
Once the whole group has been matched, a subsequent matching failure cannot
|
Once the whole group has been matched, a subsequent matching failure cannot
|
||||||
|
|
|
@ -52,9 +52,9 @@ example, the very simple pattern
|
||||||
<pre>
|
<pre>
|
||||||
((ab){1,1000}c){1,3}
|
((ab){1,1000}c){1,3}
|
||||||
</pre>
|
</pre>
|
||||||
uses over 50K bytes when compiled using the 8-bit library. When PCRE2 is
|
uses over 50KiB when compiled using the 8-bit library. When PCRE2 is
|
||||||
compiled with its default internal pointer size of two bytes, the size limit on
|
compiled with its default internal pointer size of two bytes, the size limit on
|
||||||
a compiled pattern is 64K code units in the 8-bit and 16-bit libraries, and
|
a compiled pattern is 65535 code units in the 8-bit and 16-bit libraries, and
|
||||||
this is reached with the above pattern if the outer repetition is increased
|
this is reached with the above pattern if the outer repetition is increased
|
||||||
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
|
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
|
||||||
handle larger compiled patterns, but it is better to try to rewrite your
|
handle larger compiled patterns, but it is better to try to rewrite your
|
||||||
|
@ -68,14 +68,14 @@ facility. Re-writing the above pattern as
|
||||||
<pre>
|
<pre>
|
||||||
((ab)(?2){0,999}c)(?1){0,2}
|
((ab)(?2){0,999}c)(?1){0,2}
|
||||||
</pre>
|
</pre>
|
||||||
reduces the memory requirements to around 16K, and indeed it remains under 20K
|
reduces the memory requirements to around 16KiB, and indeed it remains under
|
||||||
even with the outer repetition increased to 100. However, this kind of pattern
|
20KiB even with the outer repetition increased to 100. However, this kind of
|
||||||
is not always exactly equivalent, because any captures within subroutine calls
|
pattern is not always exactly equivalent, because any captures within
|
||||||
are lost when the subroutine completes. If this is not a problem, this kind of
|
subroutine calls are lost when the subroutine completes. If this is not a
|
||||||
rewriting will allow you to process patterns that PCRE2 cannot otherwise
|
problem, this kind of rewriting will allow you to process patterns that PCRE2
|
||||||
handle. The matching performance of the two different versions of the pattern
|
cannot otherwise handle. The matching performance of the two different versions
|
||||||
are roughly the same. (This applies from release 10.30 - things were different
|
of the pattern are roughly the same. (This applies from release 10.30 - things
|
||||||
in earlier releases.)
|
were different in earlier releases.)
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC3" href="#TOC1">STACK AND HEAP USAGE AT RUN TIME</a><br>
|
<br><a name="SEC3" href="#TOC1">STACK AND HEAP USAGE AT RUN TIME</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -83,7 +83,7 @@ From release 10.30, the interpretive (non-JIT) version of <b>pcre2_match()</b>
|
||||||
uses very little system stack at run time. In earlier releases recursive
|
uses very little system stack at run time. In earlier releases recursive
|
||||||
function calls could use a great deal of stack, and this could cause problems,
|
function calls could use a great deal of stack, and this could cause problems,
|
||||||
but this usage has been eliminated. Backtracking positions are now explicitly
|
but this usage has been eliminated. Backtracking positions are now explicitly
|
||||||
remembered in memory frames controlled by the code. An initial 20K vector of
|
remembered in memory frames controlled by the code. An initial 20KiB vector of
|
||||||
frames is allocated on the system stack (enough for about 100 frames for small
|
frames is allocated on the system stack (enough for about 100 frames for small
|
||||||
patterns), but if this is insufficient, heap memory is used. The amount of heap
|
patterns), but if this is insufficient, heap memory is used. The amount of heap
|
||||||
memory can be limited; if the limit is set to zero, only the initial stack
|
memory can be limited; if the limit is set to zero, only the initial stack
|
||||||
|
|
2465
doc/pcre2.txt
2465
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -24,7 +24,7 @@ passed to a matching function. The arguments of this function are:
|
||||||
callback a callback function
|
callback a callback function
|
||||||
callback_data a JIT stack or a value to be passed to the callback
|
callback_data a JIT stack or a value to be passed to the callback
|
||||||
.P
|
.P
|
||||||
If \fIcallback\fP is NULL and \fIcallback_data\fP is NULL, an internal 32K
|
If \fIcallback\fP is NULL and \fIcallback_data\fP is NULL, an internal 32KiB
|
||||||
block on the machine stack is used.
|
block on the machine stack is used.
|
||||||
.P
|
.P
|
||||||
If \fIcallback\fP is NULL and \fIcallback_data\fP is not NULL,
|
If \fIcallback\fP is NULL and \fIcallback_data\fP is not NULL,
|
||||||
|
@ -33,8 +33,9 @@ If \fIcallback\fP is NULL and \fIcallback_data\fP is not NULL,
|
||||||
.P
|
.P
|
||||||
If \fIcallback\fP not NULL, it is called with \fIcallback_data\fP as an
|
If \fIcallback\fP not NULL, it is called with \fIcallback_data\fP as an
|
||||||
argument at the start of matching, in order to set up a JIT stack. If the
|
argument at the start of matching, in order to set up a JIT stack. If the
|
||||||
result is NULL, the internal 32K stack is used; otherwise the return value must
|
result is NULL, the internal 32KiB stack is used; otherwise the return value
|
||||||
be a valid JIT stack, the result of calling \fBpcre2_jit_stack_create()\fP.
|
must be a valid JIT stack, the result of calling
|
||||||
|
\fBpcre2_jit_stack_create()\fP.
|
||||||
.P
|
.P
|
||||||
You may safely use the same JIT stack for multiple patterns, as long as they
|
You may safely use the same JIT stack for multiple patterns, as long as they
|
||||||
are all matched in the same thread. In a multithread application, each thread
|
are all matched in the same thread. In a multithread application, each thread
|
||||||
|
|
|
@ -21,8 +21,8 @@ context, for memory allocation functions, or NULL for standard memory
|
||||||
allocation. The result can be passed to the JIT run-time code by calling
|
allocation. The result can be passed to the JIT run-time code by calling
|
||||||
\fBpcre2_jit_stack_assign()\fP to associate the stack with a compiled pattern,
|
\fBpcre2_jit_stack_assign()\fP to associate the stack with a compiled pattern,
|
||||||
which can then be processed by \fBpcre2_match()\fP or \fBpcre2_jit_match()\fP.
|
which can then be processed by \fBpcre2_match()\fP or \fBpcre2_jit_match()\fP.
|
||||||
A maximum stack size of 512K to 1M should be more than enough for any pattern.
|
A maximum stack size of 512KiB to 1MiB should be more than enough for any
|
||||||
For more details, see the
|
pattern. For more details, see the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2jit\fP
|
\fBpcre2jit\fP
|
||||||
.\"
|
.\"
|
||||||
|
|
|
@ -909,7 +909,7 @@ where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||||
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
||||||
limit is set, less than the default.
|
limit is set, less than the default.
|
||||||
.P
|
.P
|
||||||
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
|
||||||
stack for recording backtracking points. The more nested backtracking points
|
stack for recording backtracking points. The more nested backtracking points
|
||||||
there are (that is, the deeper the search tree), the more memory is needed.
|
there are (that is, the deeper the search tree), the more memory is needed.
|
||||||
Heap memory is used only if the initial vector is too small. If the heap limit
|
Heap memory is used only if the initial vector is too small. If the heap limit
|
||||||
|
@ -1084,7 +1084,7 @@ relevant.
|
||||||
.P
|
.P
|
||||||
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
|
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
|
||||||
but the most massive patterns, since it allows the size of the compiled pattern
|
but the most massive patterns, since it allows the size of the compiled pattern
|
||||||
to be up to 64K code units. Larger values allow larger regular expressions to
|
to be up to 65535 code units. Larger values allow larger regular expressions to
|
||||||
be compiled by those two libraries, but at the expense of slower matching.
|
be compiled by those two libraries, but at the expense of slower matching.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_MATCHLIMIT
|
PCRE2_CONFIG_MATCHLIMIT
|
||||||
|
|
|
@ -244,10 +244,10 @@ Within a compiled pattern, offset values are used to point from one part to
|
||||||
another (for example, from an opening parenthesis to an alternation
|
another (for example, from an opening parenthesis to an alternation
|
||||||
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
|
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
|
||||||
are used for these offsets, leading to a maximum size for a compiled pattern of
|
are used for these offsets, leading to a maximum size for a compiled pattern of
|
||||||
around 64K code units. This is sufficient to handle all but the most gigantic
|
around 64 thousand code units. This is sufficient to handle all but the most
|
||||||
patterns. Nevertheless, some people do want to process truly enormous patterns,
|
gigantic patterns. Nevertheless, some people do want to process truly enormous
|
||||||
so it is possible to compile PCRE2 to use three-byte or four-byte offsets by
|
patterns, so it is possible to compile PCRE2 to use three-byte or four-byte
|
||||||
adding a setting such as
|
offsets by adding a setting such as
|
||||||
.sp
|
.sp
|
||||||
--with-link-size=3
|
--with-link-size=3
|
||||||
.sp
|
.sp
|
||||||
|
@ -277,7 +277,7 @@ to the \fBconfigure\fP command. This setting also applies to the
|
||||||
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
|
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
|
||||||
counting is done differently).
|
counting is done differently).
|
||||||
.P
|
.P
|
||||||
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
|
||||||
stack to record backtracking points. The more nested backtracking points there
|
stack to record backtracking points. The more nested backtracking points there
|
||||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||||
|
@ -403,13 +403,13 @@ they are not.
|
||||||
.sp
|
.sp
|
||||||
\fBpcre2grep\fP uses an internal buffer to hold a "window" on the file it is
|
\fBpcre2grep\fP uses an internal buffer to hold a "window" on the file it is
|
||||||
scanning, in order to be able to output "before" and "after" lines when it
|
scanning, in order to be able to output "before" and "after" lines when it
|
||||||
finds a match. The starting size of the buffer is controlled by a parameter
|
finds a match. The default starting size of the buffer is 20KiB. The buffer
|
||||||
whose default value is 20K. The buffer itself is three times this size, but
|
itself is three times this size, but because of the way it is used for holding
|
||||||
because of the way it is used for holding "before" lines, the longest line that
|
"before" lines, the longest line that is guaranteed to be processable is the
|
||||||
is guaranteed to be processable is the parameter size. If a longer line is
|
notional buffer size. If a longer line is encountered, \fBpcre2grep\fP
|
||||||
encountered, \fBpcre2grep\fP automatically expands the buffer, up to a
|
automatically expands the buffer, up to a specified maximum size, whose default
|
||||||
specified maximum size, whose default is 1M or the starting size, whichever is
|
is 1MiB or the starting size, whichever is the larger. You can change the
|
||||||
the larger. You can change the default parameter values by adding, for example,
|
default parameter values by adding, for example,
|
||||||
.sp
|
.sp
|
||||||
--with-pcre2grep-bufsize=51200
|
--with-pcre2grep-bufsize=51200
|
||||||
--with-pcre2grep-max-bufsize=2097152
|
--with-pcre2grep-max-bufsize=2097152
|
||||||
|
|
|
@ -58,7 +58,7 @@ that is obtained at the start of processing. If an input file contains very
|
||||||
long lines, a larger buffer may be needed; this is handled by automatically
|
long lines, a larger buffer may be needed; this is handled by automatically
|
||||||
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
|
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
|
||||||
default values for these parameters can be set when \fBpcre2grep\fP is
|
default values for these parameters can be set when \fBpcre2grep\fP is
|
||||||
built; if nothing is specified, the defaults are set to 20K and 1M
|
built; if nothing is specified, the defaults are set to 20KiB and 1MiB
|
||||||
respectively. An error occurs if a line is too long and the buffer can no
|
respectively. An error occurs if a line is too long and the buffer can no
|
||||||
longer be expanded.
|
longer be expanded.
|
||||||
.P
|
.P
|
||||||
|
@ -66,7 +66,7 @@ The block of memory that is actually used is three times the "buffer size", to
|
||||||
allow for buffering "before" and "after" lines. If the buffer size is too
|
allow for buffering "before" and "after" lines. If the buffer size is too
|
||||||
small, fewer than requested "before" and "after" lines may be output.
|
small, fewer than requested "before" and "after" lines may be output.
|
||||||
.P
|
.P
|
||||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater.
|
||||||
BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
|
BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
|
||||||
(specified by the use of \fB-e\fP and/or \fB-f\fP), each pattern is applied to
|
(specified by the use of \fB-e\fP and/or \fB-f\fP), each pattern is applied to
|
||||||
each line in the order in which they are defined, except that all the \fB-e\fP
|
each line in the order in which they are defined, except that all the \fB-e\fP
|
||||||
|
|
|
@ -58,15 +58,15 @@ DESCRIPTION
|
||||||
automatically extending the buffer, up to the limit specified by --max-
|
automatically extending the buffer, up to the limit specified by --max-
|
||||||
buffer-size. The default values for these parameters can be set when
|
buffer-size. The default values for these parameters can be set when
|
||||||
pcre2grep is built; if nothing is specified, the defaults are set to
|
pcre2grep is built; if nothing is specified, the defaults are set to
|
||||||
20K and 1M respectively. An error occurs if a line is too long and the
|
20KiB and 1MiB respectively. An error occurs if a line is too long and
|
||||||
buffer can no longer be expanded.
|
the buffer can no longer be expanded.
|
||||||
|
|
||||||
The block of memory that is actually used is three times the "buffer
|
The block of memory that is actually used is three times the "buffer
|
||||||
size", to allow for buffering "before" and "after" lines. If the buffer
|
size", to allow for buffering "before" and "after" lines. If the buffer
|
||||||
size is too small, fewer than requested "before" and "after" lines may
|
size is too small, fewer than requested "before" and "after" lines may
|
||||||
be output.
|
be output.
|
||||||
|
|
||||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the
|
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the
|
||||||
greater. BUFSIZ is defined in <stdio.h>. When there is more than one
|
greater. BUFSIZ is defined in <stdio.h>. When there is more than one
|
||||||
pattern (specified by the use of -e and/or -f), each pattern is applied
|
pattern (specified by the use of -e and/or -f), each pattern is applied
|
||||||
to each line in the order in which they are defined, except that all
|
to each line in the order in which they are defined, except that all
|
||||||
|
|
|
@ -161,7 +161,7 @@ when JIT matching is used.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
When the compiled JIT code runs, it needs a block of memory to use as a stack.
|
When the compiled JIT code runs, it needs a block of memory to use as a stack.
|
||||||
By default, it uses 32K on the machine stack. However, some large or
|
By default, it uses 32KiB on the machine stack. However, some large or
|
||||||
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
|
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
|
||||||
is given when there is not enough stack. Three functions are provided for
|
is given when there is not enough stack. Three functions are provided for
|
||||||
managing blocks of memory for use as JIT stacks. There is further discussion
|
managing blocks of memory for use as JIT stacks. There is further discussion
|
||||||
|
@ -178,8 +178,8 @@ allocation functions, or NULL for standard memory allocation). It returns a
|
||||||
pointer to an opaque structure of type \fBpcre2_jit_stack\fP, or NULL if there
|
pointer to an opaque structure of type \fBpcre2_jit_stack\fP, or NULL if there
|
||||||
is an error. The \fBpcre2_jit_stack_free()\fP function is used to free a stack
|
is an error. The \fBpcre2_jit_stack_free()\fP function is used to free a stack
|
||||||
that is no longer needed. (For the technically minded: the address space is
|
that is no longer needed. (For the technically minded: the address space is
|
||||||
allocated by mmap or VirtualAlloc.) A maximum stack size of 512K to 1M should
|
allocated by mmap or VirtualAlloc.) A maximum stack size of 512KiB to 1MiB
|
||||||
be more than enough for any pattern.
|
should be more than enough for any pattern.
|
||||||
.P
|
.P
|
||||||
The \fBpcre2_jit_stack_assign()\fP function specifies which stack JIT code
|
The \fBpcre2_jit_stack_assign()\fP function specifies which stack JIT code
|
||||||
should use. Its arguments are as follows:
|
should use. Its arguments are as follows:
|
||||||
|
@ -192,7 +192,7 @@ The first argument is a pointer to a match context. When this is subsequently
|
||||||
passed to a matching function, its information determines which JIT stack is
|
passed to a matching function, its information determines which JIT stack is
|
||||||
used. There are three cases for the values of the other two options:
|
used. There are three cases for the values of the other two options:
|
||||||
.sp
|
.sp
|
||||||
(1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32K block
|
(1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32KiB block
|
||||||
on the machine stack is used. This is the default when a match
|
on the machine stack is used. This is the default when a match
|
||||||
context is created.
|
context is created.
|
||||||
.sp
|
.sp
|
||||||
|
@ -203,7 +203,7 @@ used. There are three cases for the values of the other two options:
|
||||||
(3) If \fIcallback\fP is not NULL, it must point to a function that is
|
(3) If \fIcallback\fP is not NULL, it must point to a function that is
|
||||||
called with \fIdata\fP as an argument at the start of matching, in
|
called with \fIdata\fP as an argument at the start of matching, in
|
||||||
order to set up a JIT stack. If the return from the callback
|
order to set up a JIT stack. If the return from the callback
|
||||||
function is NULL, the internal 32K stack is used; otherwise the
|
function is NULL, the internal 32KiB stack is used; otherwise the
|
||||||
return value must be a valid JIT stack, the result of calling
|
return value must be a valid JIT stack, the result of calling
|
||||||
\fBpcre2_jit_stack_create()\fP.
|
\fBpcre2_jit_stack_create()\fP.
|
||||||
.sp
|
.sp
|
||||||
|
@ -265,9 +265,9 @@ we do the recursion in memory.
|
||||||
Modern operating systems have a nice feature: they can reserve an address space
|
Modern operating systems have a nice feature: they can reserve an address space
|
||||||
instead of allocating memory. We can safely allocate memory pages inside this
|
instead of allocating memory. We can safely allocate memory pages inside this
|
||||||
address space, so the stack could grow without moving memory data (this is
|
address space, so the stack could grow without moving memory data (this is
|
||||||
important because of pointers). Thus we can allocate 1M address space, and use
|
important because of pointers). Thus we can allocate 1MiB address space, and
|
||||||
only a single memory page (usually 4K) if that is enough. However, we can still
|
use only a single memory page (usually 4KiB) if that is enough. However, we can
|
||||||
grow up to 1M anytime if needed.
|
still grow up to 1MiB anytime if needed.
|
||||||
.P
|
.P
|
||||||
(3) Who "owns" a JIT stack?
|
(3) Who "owns" a JIT stack?
|
||||||
.sp
|
.sp
|
||||||
|
@ -300,7 +300,7 @@ say two minutes. The JIT callback can help to achieve this without keeping a
|
||||||
list of patterns.
|
list of patterns.
|
||||||
.P
|
.P
|
||||||
(6) OK, the stack is for long term memory allocation. But what happens if a
|
(6) OK, the stack is for long term memory allocation. But what happens if a
|
||||||
pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
|
pattern causes stack overflow with a stack of 1MiB? Is that 1MiB kept until the
|
||||||
stack is freed?
|
stack is freed?
|
||||||
.sp
|
.sp
|
||||||
Especially on embedded sytems, it might be a good idea to release memory
|
Especially on embedded sytems, it might be a good idea to release memory
|
||||||
|
|
|
@ -7,12 +7,12 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
There are some size limitations in PCRE2 but it is hoped that they will never
|
There are some size limitations in PCRE2 but it is hoped that they will never
|
||||||
in practice be relevant.
|
in practice be relevant.
|
||||||
.P
|
.P
|
||||||
The maximum size of a compiled pattern is approximately 64K code units for the
|
The maximum size of a compiled pattern is approximately 64 thousand code units
|
||||||
8-bit and 16-bit libraries if PCRE2 is compiled with the default internal
|
for the 8-bit and 16-bit libraries if PCRE2 is compiled with the default
|
||||||
linkage size, which is 2 bytes for these libraries. If you want to process
|
internal linkage size, which is 2 bytes for these libraries. If you want to
|
||||||
regular expressions that are truly enormous, you can compile PCRE2 with an
|
process regular expressions that are truly enormous, you can compile PCRE2 with
|
||||||
internal linkage size of 3 or 4 (when building the 16-bit library, 3 is rounded
|
an internal linkage size of 3 or 4 (when building the 16-bit library, 3 is
|
||||||
up to 4). See the \fBREADME\fP file in the source distribution and the
|
rounded up to 4). See the \fBREADME\fP file in the source distribution and the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2build\fP
|
\fBpcre2build\fP
|
||||||
.\"
|
.\"
|
||||||
|
|
|
@ -528,7 +528,7 @@ by code point, as described above.
|
||||||
.sp
|
.sp
|
||||||
The sequence \eg followed by a signed or unsigned number, optionally enclosed
|
The sequence \eg followed by a signed or unsigned number, optionally enclosed
|
||||||
in braces, is an absolute or relative backreference. A named backreference
|
in braces, is an absolute or relative backreference. A named backreference
|
||||||
can be coded as \eg{name}. backreferences are discussed
|
can be coded as \eg{name}. Backreferences are discussed
|
||||||
.\" HTML <a href="#backreferences">
|
.\" HTML <a href="#backreferences">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
later,
|
later,
|
||||||
|
@ -2243,7 +2243,7 @@ that the first iteration does not need to match the backreference. This can be
|
||||||
done using alternation, as in the example above, or by a quantifier with a
|
done using alternation, as in the example above, or by a quantifier with a
|
||||||
minimum of zero.
|
minimum of zero.
|
||||||
.P
|
.P
|
||||||
backreferences of this type cause the group that they reference to be treated
|
Backreferences of this type cause the group that they reference to be treated
|
||||||
as an
|
as an
|
||||||
.\" HTML <a href="#atomicgroup">
|
.\" HTML <a href="#atomicgroup">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
|
|
|
@ -34,9 +34,9 @@ example, the very simple pattern
|
||||||
.sp
|
.sp
|
||||||
((ab){1,1000}c){1,3}
|
((ab){1,1000}c){1,3}
|
||||||
.sp
|
.sp
|
||||||
uses over 50K bytes when compiled using the 8-bit library. When PCRE2 is
|
uses over 50KiB when compiled using the 8-bit library. When PCRE2 is
|
||||||
compiled with its default internal pointer size of two bytes, the size limit on
|
compiled with its default internal pointer size of two bytes, the size limit on
|
||||||
a compiled pattern is 64K code units in the 8-bit and 16-bit libraries, and
|
a compiled pattern is 65535 code units in the 8-bit and 16-bit libraries, and
|
||||||
this is reached with the above pattern if the outer repetition is increased
|
this is reached with the above pattern if the outer repetition is increased
|
||||||
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
|
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
|
||||||
handle larger compiled patterns, but it is better to try to rewrite your
|
handle larger compiled patterns, but it is better to try to rewrite your
|
||||||
|
@ -52,14 +52,14 @@ facility. Re-writing the above pattern as
|
||||||
.sp
|
.sp
|
||||||
((ab)(?2){0,999}c)(?1){0,2}
|
((ab)(?2){0,999}c)(?1){0,2}
|
||||||
.sp
|
.sp
|
||||||
reduces the memory requirements to around 16K, and indeed it remains under 20K
|
reduces the memory requirements to around 16KiB, and indeed it remains under
|
||||||
even with the outer repetition increased to 100. However, this kind of pattern
|
20KiB even with the outer repetition increased to 100. However, this kind of
|
||||||
is not always exactly equivalent, because any captures within subroutine calls
|
pattern is not always exactly equivalent, because any captures within
|
||||||
are lost when the subroutine completes. If this is not a problem, this kind of
|
subroutine calls are lost when the subroutine completes. If this is not a
|
||||||
rewriting will allow you to process patterns that PCRE2 cannot otherwise
|
problem, this kind of rewriting will allow you to process patterns that PCRE2
|
||||||
handle. The matching performance of the two different versions of the pattern
|
cannot otherwise handle. The matching performance of the two different versions
|
||||||
are roughly the same. (This applies from release 10.30 - things were different
|
of the pattern are roughly the same. (This applies from release 10.30 - things
|
||||||
in earlier releases.)
|
were different in earlier releases.)
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "STACK AND HEAP USAGE AT RUN TIME"
|
.SH "STACK AND HEAP USAGE AT RUN TIME"
|
||||||
|
@ -69,7 +69,7 @@ From release 10.30, the interpretive (non-JIT) version of \fBpcre2_match()\fP
|
||||||
uses very little system stack at run time. In earlier releases recursive
|
uses very little system stack at run time. In earlier releases recursive
|
||||||
function calls could use a great deal of stack, and this could cause problems,
|
function calls could use a great deal of stack, and this could cause problems,
|
||||||
but this usage has been eliminated. Backtracking positions are now explicitly
|
but this usage has been eliminated. Backtracking positions are now explicitly
|
||||||
remembered in memory frames controlled by the code. An initial 20K vector of
|
remembered in memory frames controlled by the code. An initial 20KiB vector of
|
||||||
frames is allocated on the system stack (enough for about 100 frames for small
|
frames is allocated on the system stack (enough for about 100 frames for small
|
||||||
patterns), but if this is insufficient, heap memory is used. The amount of heap
|
patterns), but if this is insufficient, heap memory is used. The amount of heap
|
||||||
memory can be limited; if the limit is set to zero, only the initial stack
|
memory can be limited; if the limit is set to zero, only the initial stack
|
||||||
|
|
|
@ -134,16 +134,16 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
|
|
||||||
/* This limits the amount of memory that may be used while matching a pattern.
|
/* This limits the amount of memory that may be used while matching a pattern.
|
||||||
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
||||||
to JIT matching. The value is in kilobytes. */
|
to JIT matching. The value is in kibibytes (units of 1024 bytes). */
|
||||||
#ifndef HEAP_LIMIT
|
#ifndef HEAP_LIMIT
|
||||||
#define HEAP_LIMIT 20000000
|
#define HEAP_LIMIT 20000000
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||||
as offsets within the compiled regex. The default is 2, which allows for
|
as offsets within the compiled regex. The default is 2, which allows for
|
||||||
compiled patterns up to 64K long. This covers the vast majority of cases.
|
compiled patterns up to 65535 code units long. This covers the vast
|
||||||
However, PCRE2 can also be compiled to use 3 or 4 bytes instead. This
|
majority of cases. However, PCRE2 can also be compiled to use 3 or 4 bytes
|
||||||
allows for longer patterns in extreme cases. */
|
instead. This allows for longer patterns in extreme cases. */
|
||||||
#ifndef LINK_SIZE
|
#ifndef LINK_SIZE
|
||||||
#define LINK_SIZE 2
|
#define LINK_SIZE 2
|
||||||
#endif
|
#endif
|
||||||
|
|
|
@ -139,9 +139,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
|
|
||||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||||
as offsets within the compiled regex. The default is 2, which allows for
|
as offsets within the compiled regex. The default is 2, which allows for
|
||||||
compiled patterns up to 64K long. This covers the vast majority of cases.
|
compiled patterns up to 65535 code units long. This covers the vast
|
||||||
However, PCRE2 can also be compiled to use 3 or 4 bytes instead. This
|
majority of cases. However, PCRE2 can also be compiled to use 3 or 4 bytes
|
||||||
allows for longer patterns in extreme cases. */
|
instead. This allows for longer patterns in extreme cases. */
|
||||||
#undef LINK_SIZE
|
#undef LINK_SIZE
|
||||||
|
|
||||||
/* Define to the sub-directory where libtool stores uninstalled libraries. */
|
/* Define to the sub-directory where libtool stores uninstalled libraries. */
|
||||||
|
|
|
@ -412,7 +412,7 @@ if (rws->next != NULL)
|
||||||
}
|
}
|
||||||
|
|
||||||
/* All sizes are in units of sizeof(int), except for mb->heaplimit, which is in
|
/* All sizes are in units of sizeof(int), except for mb->heaplimit, which is in
|
||||||
kilobytes. */
|
kibibytes. */
|
||||||
|
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
|
|
|
@ -247,7 +247,7 @@ not rely on this. */
|
||||||
pcre2_match() is allocated on the system stack, of this size (bytes). The size
|
pcre2_match() is allocated on the system stack, of this size (bytes). The size
|
||||||
must be a multiple of sizeof(PCRE2_SPTR) in all environments, so making it a
|
must be a multiple of sizeof(PCRE2_SPTR) in all environments, so making it a
|
||||||
multiple of 8 is best. Typical frame sizes are a few hundred bytes (it depends
|
multiple of 8 is best. Typical frame sizes are a few hundred bytes (it depends
|
||||||
on the number of capturing parentheses) so 20K handles quite a few frames. A
|
on the number of capturing parentheses) so 20KiB handles quite a few frames. A
|
||||||
larger vector on the heap is obtained for patterns that need more frames. The
|
larger vector on the heap is obtained for patterns that need more frames. The
|
||||||
maximum size of this can be limited. */
|
maximum size of this can be limited. */
|
||||||
|
|
||||||
|
|
|
@ -6283,7 +6283,7 @@ mb->match_limit_depth = (mcontext->depth_limit < re->limit_depth)?
|
||||||
/* If a pattern has very many capturing parentheses, the frame size may be very
|
/* If a pattern has very many capturing parentheses, the frame size may be very
|
||||||
large. Ensure that there are at least 10 available frames by getting an initial
|
large. Ensure that there are at least 10 available frames by getting an initial
|
||||||
vector on the heap if necessary, except when the heap limit prevents this. Get
|
vector on the heap if necessary, except when the heap limit prevents this. Get
|
||||||
fewer if possible. (The heap limit is in kilobytes.) */
|
fewer if possible. (The heap limit is in kibibytes.) */
|
||||||
|
|
||||||
if (frame_size <= START_FRAMES_SIZE/10)
|
if (frame_size <= START_FRAMES_SIZE/10)
|
||||||
{
|
{
|
||||||
|
|
|
@ -416,7 +416,7 @@ static option_item optionlist[] = {
|
||||||
{ OP_NODATA, N_LBUFFER, NULL, "line-buffered", "use line buffering" },
|
{ OP_NODATA, N_LBUFFER, NULL, "line-buffered", "use line buffering" },
|
||||||
{ OP_NODATA, N_LOFFSETS, NULL, "line-offsets", "output line numbers and offsets, not text" },
|
{ OP_NODATA, N_LOFFSETS, NULL, "line-offsets", "output line numbers and offsets, not text" },
|
||||||
{ OP_STRING, N_LOCALE, &locale, "locale=locale", "use the named locale" },
|
{ OP_STRING, N_LOCALE, &locale, "locale=locale", "use the named locale" },
|
||||||
{ OP_SIZE, N_H_LIMIT, &heap_limit, "heap-limit=number", "set PCRE2 heap limit option (kilobytes)" },
|
{ OP_SIZE, N_H_LIMIT, &heap_limit, "heap-limit=number", "set PCRE2 heap limit option (kibibytes)" },
|
||||||
{ OP_U32NUMBER, N_M_LIMIT, &match_limit, "match-limit=number", "set PCRE2 match limit option" },
|
{ OP_U32NUMBER, N_M_LIMIT, &match_limit, "match-limit=number", "set PCRE2 match limit option" },
|
||||||
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "depth-limit=number", "set PCRE2 depth limit option" },
|
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "depth-limit=number", "set PCRE2 depth limit option" },
|
||||||
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "recursion-limit=number", "obsolete synonym for depth-limit" },
|
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "recursion-limit=number", "obsolete synonym for depth-limit" },
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
This is a file of miscellaneous text that is used as test data for checking
|
This is a file of miscellaneous text that is used as test data for checking
|
||||||
that the pcregrep command is working correctly. The file must be more than 24K
|
that the pcregrep command is working correctly. The file must be more than
|
||||||
long so that it needs more than a single read() call to process it. New
|
24KiB long so that it needs more than a single read() call to process it. New
|
||||||
features should be added at the end, because some of the tests involve the
|
features should be added at the end, because some of the tests involve the
|
||||||
output of line numbers, and we don't want these to change.
|
output of line numbers, and we don't want these to change.
|
||||||
|
|
||||||
|
@ -9,7 +9,7 @@ In the middle of a line, PATTERN appears.
|
||||||
|
|
||||||
This pattern is in lower case.
|
This pattern is in lower case.
|
||||||
|
|
||||||
Here follows a whole lot of stuff that makes the file over 24K long.
|
Here follows a whole lot of stuff that makes the file over 24KiB long.
|
||||||
|
|
||||||
-------------------------------------------------------------------------------
|
-------------------------------------------------------------------------------
|
||||||
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
|
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
|
||||||
|
|
|
@ -346,7 +346,7 @@ RC=0
|
||||||
./testdata/grepinput-9-
|
./testdata/grepinput-9-
|
||||||
./testdata/grepinput:10:This pattern is in lower case.
|
./testdata/grepinput:10:This pattern is in lower case.
|
||||||
./testdata/grepinput-11-
|
./testdata/grepinput-11-
|
||||||
./testdata/grepinput-12-Here follows a whole lot of stuff that makes the file over 24K long.
|
./testdata/grepinput-12-Here follows a whole lot of stuff that makes the file over 24KiB long.
|
||||||
./testdata/grepinput-13-
|
./testdata/grepinput-13-
|
||||||
--
|
--
|
||||||
./testdata/grepinput:623:Check up on PATTERN near the end.
|
./testdata/grepinput:623:Check up on PATTERN near the end.
|
||||||
|
@ -379,6 +379,7 @@ RC=0
|
||||||
./testdata/grepinputx
|
./testdata/grepinputx
|
||||||
RC=0
|
RC=0
|
||||||
---------------------------- Test 37 -----------------------------
|
---------------------------- Test 37 -----------------------------
|
||||||
|
24KiB long so that it needs more than a single read() call to process it. New
|
||||||
aaaaa0
|
aaaaa0
|
||||||
aaaaa2
|
aaaaa2
|
||||||
010203040506
|
010203040506
|
||||||
|
@ -465,11 +466,11 @@ fox [1;31mjumps[0m
|
||||||
This time it [1;31mjumps[0m and [1;31mjumps[0m and [1;31mjumps[0m.
|
This time it [1;31mjumps[0m and [1;31mjumps[0m and [1;31mjumps[0m.
|
||||||
RC=0
|
RC=0
|
||||||
---------------------------- Test 53 ------------------------------
|
---------------------------- Test 53 ------------------------------
|
||||||
36972,6
|
36976,6
|
||||||
36990,4
|
36994,4
|
||||||
37024,4
|
37028,4
|
||||||
37066,5
|
37070,5
|
||||||
37083,4
|
37087,4
|
||||||
RC=0
|
RC=0
|
||||||
---------------------------- Test 54 ------------------------------
|
---------------------------- Test 54 ------------------------------
|
||||||
595:15,6
|
595:15,6
|
||||||
|
@ -519,8 +520,8 @@ RC=0
|
||||||
pcre2grep: pcre2_match() gave error -47 while matching text that starts:
|
pcre2grep: pcre2_match() gave error -47 while matching text that starts:
|
||||||
|
|
||||||
This is a file of miscellaneous text that is used as test data for checking
|
This is a file of miscellaneous text that is used as test data for checking
|
||||||
that the pcregrep command is working correctly. The file must be more than 24K
|
that the pcregrep command is working correctly. The file must be more than
|
||||||
long so that it needs more than a single read
|
24KiB long so that it needs more than a single re
|
||||||
|
|
||||||
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
|
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
|
||||||
pcre2grep: Check your regex for nested unlimited loops.
|
pcre2grep: Check your regex for nested unlimited loops.
|
||||||
|
@ -529,8 +530,8 @@ RC=1
|
||||||
pcre2grep: pcre2_match() gave error -53 while matching text that starts:
|
pcre2grep: pcre2_match() gave error -53 while matching text that starts:
|
||||||
|
|
||||||
This is a file of miscellaneous text that is used as test data for checking
|
This is a file of miscellaneous text that is used as test data for checking
|
||||||
that the pcregrep command is working correctly. The file must be more than 24K
|
that the pcregrep command is working correctly. The file must be more than
|
||||||
long so that it needs more than a single read
|
24KiB long so that it needs more than a single re
|
||||||
|
|
||||||
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
|
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
|
||||||
pcre2grep: Check your regex for nested unlimited loops.
|
pcre2grep: Check your regex for nested unlimited loops.
|
||||||
|
@ -814,11 +815,11 @@ RC=0
|
||||||
615:0,12
|
615:0,12
|
||||||
RC=0
|
RC=0
|
||||||
---------------------------- Test 112 -----------------------------
|
---------------------------- Test 112 -----------------------------
|
||||||
37168,12
|
37172,12
|
||||||
37180,12
|
37184,12
|
||||||
37192,12
|
37196,12
|
||||||
37204,12
|
37208,12
|
||||||
37216,12
|
37220,12
|
||||||
RC=0
|
RC=0
|
||||||
---------------------------- Test 113 -----------------------------
|
---------------------------- Test 113 -----------------------------
|
||||||
480
|
480
|
||||||
|
|
Loading…
Reference in New Issue