More typos and changes to "Kibibytes" for "Kilobytes".
This commit is contained in:
parent
fabea723cf
commit
e75410a5d8
10
ChangeLog
10
ChangeLog
|
@ -370,8 +370,8 @@ tests to improve coverage.
|
|||
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
|
||||
pcre2test, a crash could occur.
|
||||
|
||||
32. Make -bigstack in RunTest allocate a 64MB stack (instead of 16 MB) so that
|
||||
all the tests can run with clang's sanitizing options.
|
||||
32. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16 MiB) so
|
||||
that all the tests can run with clang's sanitizing options.
|
||||
|
||||
33. Implement extra compile options in the compile context and add the first
|
||||
one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
|
||||
|
@ -964,9 +964,9 @@ to the same code as '.' when PCRE2_DOTALL is set).
|
|||
40. Fix two clang compiler warnings in pcre2test when only one code unit width
|
||||
is supported.
|
||||
|
||||
41. Upgrade RunTest to automatically re-run test 2 with a large (64M) stack if
|
||||
it fails when running the interpreter with a 16M stack (and if changing the
|
||||
stack size via pcre2test is possible). This avoids having to manually set a
|
||||
41. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack
|
||||
if it fails when running the interpreter with a 16MiB stack (and if changing
|
||||
the stack size via pcre2test is possible). This avoids having to manually set a
|
||||
large stack size when testing with clang.
|
||||
|
||||
42. Fix register overwite in JIT when SSE2 acceleration is enabled.
|
||||
|
|
2
HACKING
2
HACKING
|
@ -370,7 +370,7 @@ default value for LINK_SIZE is 2, except for the 32-bit library, where it can
|
|||
only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
|
||||
and the 16-bit library can be compiled to use 4-byte values, though this
|
||||
impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
|
||||
necessary only when patterns whose compiled length is greater than 64K code
|
||||
necessary only when patterns whose compiled length is greater than 65535 code
|
||||
units are going to be processed. When a LINK_SIZE value uses more than one code
|
||||
unit, the most significant unit is first.
|
||||
|
||||
|
|
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
|||
|
||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||
|
||||
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
||||
Prior to release 10.30 the default system stack size of 1MiB in some Windows
|
||||
environments caused issues with some tests. This should no longer be the case
|
||||
for 10.30 and later releases.
|
||||
|
||||
|
|
2
README
2
README
|
@ -263,7 +263,7 @@ library. They are also documented in the pcre2build man page.
|
|||
pcre2_set_heap_limit).
|
||||
|
||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
||||
64 kibibytes. You can increase this by adding --with-link-size=3 to the
|
||||
"configure" command. PCRE2 then uses three bytes instead of two for offsets
|
||||
to different parts of the compiled pattern. In the 16-bit library,
|
||||
--with-link-size=3 is the same as --with-link-size=4, which (in both
|
||||
|
|
|
@ -706,8 +706,8 @@ fi
|
|||
AC_DEFINE_UNQUOTED([LINK_SIZE], [$with_link_size], [
|
||||
The value of LINK_SIZE determines the number of bytes used to store
|
||||
links as offsets within the compiled regex. The default is 2, which
|
||||
allows for compiled patterns up to 64K long. This covers the vast
|
||||
majority of cases. However, PCRE2 can also be compiled to use 3 or 4
|
||||
allows for compiled patterns up to 65535 code units long. This covers the
|
||||
vast majority of cases. However, PCRE2 can also be compiled to use 3 or 4
|
||||
bytes instead. This allows for longer patterns in extreme cases.])
|
||||
|
||||
AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [
|
||||
|
|
|
@ -186,7 +186,7 @@ can skip ahead to the CMake section.
|
|||
|
||||
STACK SIZE IN WINDOWS ENVIRONMENTS
|
||||
|
||||
Prior to release 10.30 the default system stack size of 1MB in some Windows
|
||||
Prior to release 10.30 the default system stack size of 1MiB in some Windows
|
||||
environments caused issues with some tests. This should no longer be the case
|
||||
for 10.30 and later releases.
|
||||
|
||||
|
|
|
@ -263,7 +263,7 @@ library. They are also documented in the pcre2build man page.
|
|||
pcre2_set_heap_limit).
|
||||
|
||||
. In the 8-bit library, the default maximum compiled pattern size is around
|
||||
64K bytes. You can increase this by adding --with-link-size=3 to the
|
||||
64 kibibytes. You can increase this by adding --with-link-size=3 to the
|
||||
"configure" command. PCRE2 then uses three bytes instead of two for offsets
|
||||
to different parts of the compiled pattern. In the 16-bit library,
|
||||
--with-link-size=3 is the same as --with-link-size=4, which (in both
|
||||
|
|
|
@ -38,7 +38,7 @@ passed to a matching function. The arguments of this function are:
|
|||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
If <i>callback</i> is NULL and <i>callback_data</i> is NULL, an internal 32K
|
||||
If <i>callback</i> is NULL and <i>callback_data</i> is NULL, an internal 32KiB
|
||||
block on the machine stack is used.
|
||||
</P>
|
||||
<P>
|
||||
|
@ -49,8 +49,9 @@ If <i>callback</i> is NULL and <i>callback_data</i> is not NULL,
|
|||
<P>
|
||||
If <i>callback</i> not NULL, it is called with <i>callback_data</i> as an
|
||||
argument at the start of matching, in order to set up a JIT stack. If the
|
||||
result is NULL, the internal 32K stack is used; otherwise the return value must
|
||||
be a valid JIT stack, the result of calling <b>pcre2_jit_stack_create()</b>.
|
||||
result is NULL, the internal 32KiB stack is used; otherwise the return value
|
||||
must be a valid JIT stack, the result of calling
|
||||
<b>pcre2_jit_stack_create()</b>.
|
||||
</P>
|
||||
<P>
|
||||
You may safely use the same JIT stack for multiple patterns, as long as they
|
||||
|
|
|
@ -33,8 +33,8 @@ context, for memory allocation functions, or NULL for standard memory
|
|||
allocation. The result can be passed to the JIT run-time code by calling
|
||||
<b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern,
|
||||
which can then be processed by <b>pcre2_match()</b> or <b>pcre2_jit_match()</b>.
|
||||
A maximum stack size of 512K to 1M should be more than enough for any pattern.
|
||||
For more details, see the
|
||||
A maximum stack size of 512KiB to 1MiB should be more than enough for any
|
||||
pattern. For more details, see the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
page.
|
||||
</P>
|
||||
|
|
|
@ -973,7 +973,7 @@ less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
|
|||
limit is set, less than the default.
|
||||
</P>
|
||||
<P>
|
||||
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
||||
The <b>pcre2_match()</b> function starts out using a 20KiB vector on the system
|
||||
stack for recording backtracking points. The more nested backtracking points
|
||||
there are (that is, the deeper the search tree), the more memory is needed.
|
||||
Heap memory is used only if the initial vector is too small. If the heap limit
|
||||
|
@ -1155,7 +1155,7 @@ relevant.
|
|||
<P>
|
||||
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
|
||||
but the most massive patterns, since it allows the size of the compiled pattern
|
||||
to be up to 64K code units. Larger values allow larger regular expressions to
|
||||
to be up to 65535 code units. Larger values allow larger regular expressions to
|
||||
be compiled by those two libraries, but at the expense of slower matching.
|
||||
<pre>
|
||||
PCRE2_CONFIG_MATCHLIMIT
|
||||
|
|
|
@ -252,10 +252,10 @@ Within a compiled pattern, offset values are used to point from one part to
|
|||
another (for example, from an opening parenthesis to an alternation
|
||||
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
|
||||
are used for these offsets, leading to a maximum size for a compiled pattern of
|
||||
around 64K code units. This is sufficient to handle all but the most gigantic
|
||||
patterns. Nevertheless, some people do want to process truly enormous patterns,
|
||||
so it is possible to compile PCRE2 to use three-byte or four-byte offsets by
|
||||
adding a setting such as
|
||||
around 64 thousand code units. This is sufficient to handle all but the most
|
||||
gigantic patterns. Nevertheless, some people do want to process truly enormous
|
||||
patterns, so it is possible to compile PCRE2 to use three-byte or four-byte
|
||||
offsets by adding a setting such as
|
||||
<pre>
|
||||
--with-link-size=3
|
||||
</pre>
|
||||
|
@ -282,7 +282,7 @@ to the <b>configure</b> command. This setting also applies to the
|
|||
counting is done differently).
|
||||
</P>
|
||||
<P>
|
||||
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
|
||||
The <b>pcre2_match()</b> function starts out using a 20KiB vector on the system
|
||||
stack to record backtracking points. The more nested backtracking points there
|
||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||
|
@ -399,13 +399,13 @@ they are not.
|
|||
<P>
|
||||
<b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
|
||||
scanning, in order to be able to output "before" and "after" lines when it
|
||||
finds a match. The starting size of the buffer is controlled by a parameter
|
||||
whose default value is 20K. The buffer itself is three times this size, but
|
||||
because of the way it is used for holding "before" lines, the longest line that
|
||||
is guaranteed to be processable is the parameter size. If a longer line is
|
||||
encountered, <b>pcre2grep</b> automatically expands the buffer, up to a
|
||||
specified maximum size, whose default is 1M or the starting size, whichever is
|
||||
the larger. You can change the default parameter values by adding, for example,
|
||||
finds a match. The default starting size of the buffer is 20KiB. The buffer
|
||||
itself is three times this size, but because of the way it is used for holding
|
||||
"before" lines, the longest line that is guaranteed to be processable is the
|
||||
notional buffer size. If a longer line is encountered, <b>pcre2grep</b>
|
||||
automatically expands the buffer, up to a specified maximum size, whose default
|
||||
is 1MiB or the starting size, whichever is the larger. You can change the
|
||||
default parameter values by adding, for example,
|
||||
<pre>
|
||||
--with-pcre2grep-bufsize=51200
|
||||
--with-pcre2grep-max-bufsize=2097152
|
||||
|
|
|
@ -87,7 +87,7 @@ that is obtained at the start of processing. If an input file contains very
|
|||
long lines, a larger buffer may be needed; this is handled by automatically
|
||||
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
|
||||
default values for these parameters can be set when <b>pcre2grep</b> is
|
||||
built; if nothing is specified, the defaults are set to 20K and 1M
|
||||
built; if nothing is specified, the defaults are set to 20KiB and 1MiB
|
||||
respectively. An error occurs if a line is too long and the buffer can no
|
||||
longer be expanded.
|
||||
</P>
|
||||
|
@ -97,7 +97,7 @@ allow for buffering "before" and "after" lines. If the buffer size is too
|
|||
small, fewer than requested "before" and "after" lines may be output.
|
||||
</P>
|
||||
<P>
|
||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
||||
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater.
|
||||
BUFSIZ is defined in <b><stdio.h></b>. When there is more than one pattern
|
||||
(specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
|
||||
each line in the order in which they are defined, except that all the <b>-e</b>
|
||||
|
|
|
@ -179,7 +179,7 @@ when JIT matching is used.
|
|||
<br><a name="SEC6" href="#TOC1">CONTROLLING THE JIT STACK</a><br>
|
||||
<P>
|
||||
When the compiled JIT code runs, it needs a block of memory to use as a stack.
|
||||
By default, it uses 32K on the machine stack. However, some large or
|
||||
By default, it uses 32KiB on the machine stack. However, some large or
|
||||
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
|
||||
is given when there is not enough stack. Three functions are provided for
|
||||
managing blocks of memory for use as JIT stacks. There is further discussion
|
||||
|
@ -194,8 +194,8 @@ allocation functions, or NULL for standard memory allocation). It returns a
|
|||
pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
|
||||
is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
|
||||
that is no longer needed. (For the technically minded: the address space is
|
||||
allocated by mmap or VirtualAlloc.) A maximum stack size of 512K to 1M should
|
||||
be more than enough for any pattern.
|
||||
allocated by mmap or VirtualAlloc.) A maximum stack size of 512KiB to 1MiB
|
||||
should be more than enough for any pattern.
|
||||
</P>
|
||||
<P>
|
||||
The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code
|
||||
|
@ -209,7 +209,7 @@ The first argument is a pointer to a match context. When this is subsequently
|
|||
passed to a matching function, its information determines which JIT stack is
|
||||
used. There are three cases for the values of the other two options:
|
||||
<pre>
|
||||
(1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32K block
|
||||
(1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32KiB block
|
||||
on the machine stack is used. This is the default when a match
|
||||
context is created.
|
||||
|
||||
|
@ -220,7 +220,7 @@ used. There are three cases for the values of the other two options:
|
|||
(3) If <i>callback</i> is not NULL, it must point to a function that is
|
||||
called with <i>data</i> as an argument at the start of matching, in
|
||||
order to set up a JIT stack. If the return from the callback
|
||||
function is NULL, the internal 32K stack is used; otherwise the
|
||||
function is NULL, the internal 32KiB stack is used; otherwise the
|
||||
return value must be a valid JIT stack, the result of calling
|
||||
<b>pcre2_jit_stack_create()</b>.
|
||||
</pre>
|
||||
|
@ -286,9 +286,9 @@ we do the recursion in memory.
|
|||
Modern operating systems have a nice feature: they can reserve an address space
|
||||
instead of allocating memory. We can safely allocate memory pages inside this
|
||||
address space, so the stack could grow without moving memory data (this is
|
||||
important because of pointers). Thus we can allocate 1M address space, and use
|
||||
only a single memory page (usually 4K) if that is enough. However, we can still
|
||||
grow up to 1M anytime if needed.
|
||||
important because of pointers). Thus we can allocate 1MiB address space, and
|
||||
use only a single memory page (usually 4KiB) if that is enough. However, we can
|
||||
still grow up to 1MiB anytime if needed.
|
||||
</P>
|
||||
<P>
|
||||
(3) Who "owns" a JIT stack?
|
||||
|
@ -328,7 +328,7 @@ list of patterns.
|
|||
</P>
|
||||
<P>
|
||||
(6) OK, the stack is for long term memory allocation. But what happens if a
|
||||
pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
|
||||
pattern causes stack overflow with a stack of 1MiB? Is that 1MiB kept until the
|
||||
stack is freed?
|
||||
<br>
|
||||
<br>
|
||||
|
|
|
@ -20,12 +20,12 @@ There are some size limitations in PCRE2 but it is hoped that they will never
|
|||
in practice be relevant.
|
||||
</P>
|
||||
<P>
|
||||
The maximum size of a compiled pattern is approximately 64K code units for the
|
||||
8-bit and 16-bit libraries if PCRE2 is compiled with the default internal
|
||||
linkage size, which is 2 bytes for these libraries. If you want to process
|
||||
regular expressions that are truly enormous, you can compile PCRE2 with an
|
||||
internal linkage size of 3 or 4 (when building the 16-bit library, 3 is rounded
|
||||
up to 4). See the <b>README</b> file in the source distribution and the
|
||||
The maximum size of a compiled pattern is approximately 64 thousand code units
|
||||
for the 8-bit and 16-bit libraries if PCRE2 is compiled with the default
|
||||
internal linkage size, which is 2 bytes for these libraries. If you want to
|
||||
process regular expressions that are truly enormous, you can compile PCRE2 with
|
||||
an internal linkage size of 3 or 4 (when building the 16-bit library, 3 is
|
||||
rounded up to 4). See the <b>README</b> file in the source distribution and the
|
||||
<a href="pcre2build.html"><b>pcre2build</b></a>
|
||||
documentation for details. In these cases the limit is substantially larger.
|
||||
However, the speed of execution is slower. In the 32-bit library, the internal
|
||||
|
|
|
@ -549,7 +549,7 @@ Absolute and relative backreferences
|
|||
<P>
|
||||
The sequence \g followed by a signed or unsigned number, optionally enclosed
|
||||
in braces, is an absolute or relative backreference. A named backreference
|
||||
can be coded as \g{name}. backreferences are discussed
|
||||
can be coded as \g{name}. Backreferences are discussed
|
||||
<a href="#backreferences">later,</a>
|
||||
following the discussion of
|
||||
<a href="#subpattern">parenthesized subpatterns.</a>
|
||||
|
@ -2247,7 +2247,7 @@ done using alternation, as in the example above, or by a quantifier with a
|
|||
minimum of zero.
|
||||
</P>
|
||||
<P>
|
||||
backreferences of this type cause the group that they reference to be treated
|
||||
Backreferences of this type cause the group that they reference to be treated
|
||||
as an
|
||||
<a href="#atomicgroup">atomic group.</a>
|
||||
Once the whole group has been matched, a subsequent matching failure cannot
|
||||
|
|
|
@ -52,9 +52,9 @@ example, the very simple pattern
|
|||
<pre>
|
||||
((ab){1,1000}c){1,3}
|
||||
</pre>
|
||||
uses over 50K bytes when compiled using the 8-bit library. When PCRE2 is
|
||||
uses over 50KiB when compiled using the 8-bit library. When PCRE2 is
|
||||
compiled with its default internal pointer size of two bytes, the size limit on
|
||||
a compiled pattern is 64K code units in the 8-bit and 16-bit libraries, and
|
||||
a compiled pattern is 65535 code units in the 8-bit and 16-bit libraries, and
|
||||
this is reached with the above pattern if the outer repetition is increased
|
||||
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
|
||||
handle larger compiled patterns, but it is better to try to rewrite your
|
||||
|
@ -68,14 +68,14 @@ facility. Re-writing the above pattern as
|
|||
<pre>
|
||||
((ab)(?2){0,999}c)(?1){0,2}
|
||||
</pre>
|
||||
reduces the memory requirements to around 16K, and indeed it remains under 20K
|
||||
even with the outer repetition increased to 100. However, this kind of pattern
|
||||
is not always exactly equivalent, because any captures within subroutine calls
|
||||
are lost when the subroutine completes. If this is not a problem, this kind of
|
||||
rewriting will allow you to process patterns that PCRE2 cannot otherwise
|
||||
handle. The matching performance of the two different versions of the pattern
|
||||
are roughly the same. (This applies from release 10.30 - things were different
|
||||
in earlier releases.)
|
||||
reduces the memory requirements to around 16KiB, and indeed it remains under
|
||||
20KiB even with the outer repetition increased to 100. However, this kind of
|
||||
pattern is not always exactly equivalent, because any captures within
|
||||
subroutine calls are lost when the subroutine completes. If this is not a
|
||||
problem, this kind of rewriting will allow you to process patterns that PCRE2
|
||||
cannot otherwise handle. The matching performance of the two different versions
|
||||
of the pattern are roughly the same. (This applies from release 10.30 - things
|
||||
were different in earlier releases.)
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">STACK AND HEAP USAGE AT RUN TIME</a><br>
|
||||
<P>
|
||||
|
@ -83,7 +83,7 @@ From release 10.30, the interpretive (non-JIT) version of <b>pcre2_match()</b>
|
|||
uses very little system stack at run time. In earlier releases recursive
|
||||
function calls could use a great deal of stack, and this could cause problems,
|
||||
but this usage has been eliminated. Backtracking positions are now explicitly
|
||||
remembered in memory frames controlled by the code. An initial 20K vector of
|
||||
remembered in memory frames controlled by the code. An initial 20KiB vector of
|
||||
frames is allocated on the system stack (enough for about 100 frames for small
|
||||
patterns), but if this is insufficient, heap memory is used. The amount of heap
|
||||
memory can be limited; if the limit is set to zero, only the initial stack
|
||||
|
|
2465
doc/pcre2.txt
2465
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -24,7 +24,7 @@ passed to a matching function. The arguments of this function are:
|
|||
callback a callback function
|
||||
callback_data a JIT stack or a value to be passed to the callback
|
||||
.P
|
||||
If \fIcallback\fP is NULL and \fIcallback_data\fP is NULL, an internal 32K
|
||||
If \fIcallback\fP is NULL and \fIcallback_data\fP is NULL, an internal 32KiB
|
||||
block on the machine stack is used.
|
||||
.P
|
||||
If \fIcallback\fP is NULL and \fIcallback_data\fP is not NULL,
|
||||
|
@ -33,8 +33,9 @@ If \fIcallback\fP is NULL and \fIcallback_data\fP is not NULL,
|
|||
.P
|
||||
If \fIcallback\fP not NULL, it is called with \fIcallback_data\fP as an
|
||||
argument at the start of matching, in order to set up a JIT stack. If the
|
||||
result is NULL, the internal 32K stack is used; otherwise the return value must
|
||||
be a valid JIT stack, the result of calling \fBpcre2_jit_stack_create()\fP.
|
||||
result is NULL, the internal 32KiB stack is used; otherwise the return value
|
||||
must be a valid JIT stack, the result of calling
|
||||
\fBpcre2_jit_stack_create()\fP.
|
||||
.P
|
||||
You may safely use the same JIT stack for multiple patterns, as long as they
|
||||
are all matched in the same thread. In a multithread application, each thread
|
||||
|
|
|
@ -21,8 +21,8 @@ context, for memory allocation functions, or NULL for standard memory
|
|||
allocation. The result can be passed to the JIT run-time code by calling
|
||||
\fBpcre2_jit_stack_assign()\fP to associate the stack with a compiled pattern,
|
||||
which can then be processed by \fBpcre2_match()\fP or \fBpcre2_jit_match()\fP.
|
||||
A maximum stack size of 512K to 1M should be more than enough for any pattern.
|
||||
For more details, see the
|
||||
A maximum stack size of 512KiB to 1MiB should be more than enough for any
|
||||
pattern. For more details, see the
|
||||
.\" HREF
|
||||
\fBpcre2jit\fP
|
||||
.\"
|
||||
|
|
|
@ -909,7 +909,7 @@ where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
|||
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
||||
limit is set, less than the default.
|
||||
.P
|
||||
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
||||
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
|
||||
stack for recording backtracking points. The more nested backtracking points
|
||||
there are (that is, the deeper the search tree), the more memory is needed.
|
||||
Heap memory is used only if the initial vector is too small. If the heap limit
|
||||
|
@ -1084,7 +1084,7 @@ relevant.
|
|||
.P
|
||||
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
|
||||
but the most massive patterns, since it allows the size of the compiled pattern
|
||||
to be up to 64K code units. Larger values allow larger regular expressions to
|
||||
to be up to 65535 code units. Larger values allow larger regular expressions to
|
||||
be compiled by those two libraries, but at the expense of slower matching.
|
||||
.sp
|
||||
PCRE2_CONFIG_MATCHLIMIT
|
||||
|
|
|
@ -244,10 +244,10 @@ Within a compiled pattern, offset values are used to point from one part to
|
|||
another (for example, from an opening parenthesis to an alternation
|
||||
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
|
||||
are used for these offsets, leading to a maximum size for a compiled pattern of
|
||||
around 64K code units. This is sufficient to handle all but the most gigantic
|
||||
patterns. Nevertheless, some people do want to process truly enormous patterns,
|
||||
so it is possible to compile PCRE2 to use three-byte or four-byte offsets by
|
||||
adding a setting such as
|
||||
around 64 thousand code units. This is sufficient to handle all but the most
|
||||
gigantic patterns. Nevertheless, some people do want to process truly enormous
|
||||
patterns, so it is possible to compile PCRE2 to use three-byte or four-byte
|
||||
offsets by adding a setting such as
|
||||
.sp
|
||||
--with-link-size=3
|
||||
.sp
|
||||
|
@ -277,7 +277,7 @@ to the \fBconfigure\fP command. This setting also applies to the
|
|||
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
|
||||
counting is done differently).
|
||||
.P
|
||||
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
|
||||
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
|
||||
stack to record backtracking points. The more nested backtracking points there
|
||||
are (that is, the deeper the search tree), the more memory is needed. If the
|
||||
initial vector is not large enough, heap memory is used, up to a certain limit,
|
||||
|
@ -403,13 +403,13 @@ they are not.
|
|||
.sp
|
||||
\fBpcre2grep\fP uses an internal buffer to hold a "window" on the file it is
|
||||
scanning, in order to be able to output "before" and "after" lines when it
|
||||
finds a match. The starting size of the buffer is controlled by a parameter
|
||||
whose default value is 20K. The buffer itself is three times this size, but
|
||||
because of the way it is used for holding "before" lines, the longest line that
|
||||
is guaranteed to be processable is the parameter size. If a longer line is
|
||||
encountered, \fBpcre2grep\fP automatically expands the buffer, up to a
|
||||
specified maximum size, whose default is 1M or the starting size, whichever is
|
||||
the larger. You can change the default parameter values by adding, for example,
|
||||
finds a match. The default starting size of the buffer is 20KiB. The buffer
|
||||
itself is three times this size, but because of the way it is used for holding
|
||||
"before" lines, the longest line that is guaranteed to be processable is the
|
||||
notional buffer size. If a longer line is encountered, \fBpcre2grep\fP
|
||||
automatically expands the buffer, up to a specified maximum size, whose default
|
||||
is 1MiB or the starting size, whichever is the larger. You can change the
|
||||
default parameter values by adding, for example,
|
||||
.sp
|
||||
--with-pcre2grep-bufsize=51200
|
||||
--with-pcre2grep-max-bufsize=2097152
|
||||
|
|
|
@ -58,7 +58,7 @@ that is obtained at the start of processing. If an input file contains very
|
|||
long lines, a larger buffer may be needed; this is handled by automatically
|
||||
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
|
||||
default values for these parameters can be set when \fBpcre2grep\fP is
|
||||
built; if nothing is specified, the defaults are set to 20K and 1M
|
||||
built; if nothing is specified, the defaults are set to 20KiB and 1MiB
|
||||
respectively. An error occurs if a line is too long and the buffer can no
|
||||
longer be expanded.
|
||||
.P
|
||||
|
@ -66,7 +66,7 @@ The block of memory that is actually used is three times the "buffer size", to
|
|||
allow for buffering "before" and "after" lines. If the buffer size is too
|
||||
small, fewer than requested "before" and "after" lines may be output.
|
||||
.P
|
||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
||||
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater.
|
||||
BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
|
||||
(specified by the use of \fB-e\fP and/or \fB-f\fP), each pattern is applied to
|
||||
each line in the order in which they are defined, except that all the \fB-e\fP
|
||||
|
|
|
@ -58,15 +58,15 @@ DESCRIPTION
|
|||
automatically extending the buffer, up to the limit specified by --max-
|
||||
buffer-size. The default values for these parameters can be set when
|
||||
pcre2grep is built; if nothing is specified, the defaults are set to
|
||||
20K and 1M respectively. An error occurs if a line is too long and the
|
||||
buffer can no longer be expanded.
|
||||
20KiB and 1MiB respectively. An error occurs if a line is too long and
|
||||
the buffer can no longer be expanded.
|
||||
|
||||
The block of memory that is actually used is three times the "buffer
|
||||
size", to allow for buffering "before" and "after" lines. If the buffer
|
||||
size is too small, fewer than requested "before" and "after" lines may
|
||||
be output.
|
||||
|
||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the
|
||||
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the
|
||||
greater. BUFSIZ is defined in <stdio.h>. When there is more than one
|
||||
pattern (specified by the use of -e and/or -f), each pattern is applied
|
||||
to each line in the order in which they are defined, except that all
|
||||
|
|
|
@ -161,7 +161,7 @@ when JIT matching is used.
|
|||
.rs
|
||||
.sp
|
||||
When the compiled JIT code runs, it needs a block of memory to use as a stack.
|
||||
By default, it uses 32K on the machine stack. However, some large or
|
||||
By default, it uses 32KiB on the machine stack. However, some large or
|
||||
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
|
||||
is given when there is not enough stack. Three functions are provided for
|
||||
managing blocks of memory for use as JIT stacks. There is further discussion
|
||||
|
@ -178,8 +178,8 @@ allocation functions, or NULL for standard memory allocation). It returns a
|
|||
pointer to an opaque structure of type \fBpcre2_jit_stack\fP, or NULL if there
|
||||
is an error. The \fBpcre2_jit_stack_free()\fP function is used to free a stack
|
||||
that is no longer needed. (For the technically minded: the address space is
|
||||
allocated by mmap or VirtualAlloc.) A maximum stack size of 512K to 1M should
|
||||
be more than enough for any pattern.
|
||||
allocated by mmap or VirtualAlloc.) A maximum stack size of 512KiB to 1MiB
|
||||
should be more than enough for any pattern.
|
||||
.P
|
||||
The \fBpcre2_jit_stack_assign()\fP function specifies which stack JIT code
|
||||
should use. Its arguments are as follows:
|
||||
|
@ -192,7 +192,7 @@ The first argument is a pointer to a match context. When this is subsequently
|
|||
passed to a matching function, its information determines which JIT stack is
|
||||
used. There are three cases for the values of the other two options:
|
||||
.sp
|
||||
(1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32K block
|
||||
(1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32KiB block
|
||||
on the machine stack is used. This is the default when a match
|
||||
context is created.
|
||||
.sp
|
||||
|
@ -203,7 +203,7 @@ used. There are three cases for the values of the other two options:
|
|||
(3) If \fIcallback\fP is not NULL, it must point to a function that is
|
||||
called with \fIdata\fP as an argument at the start of matching, in
|
||||
order to set up a JIT stack. If the return from the callback
|
||||
function is NULL, the internal 32K stack is used; otherwise the
|
||||
function is NULL, the internal 32KiB stack is used; otherwise the
|
||||
return value must be a valid JIT stack, the result of calling
|
||||
\fBpcre2_jit_stack_create()\fP.
|
||||
.sp
|
||||
|
@ -265,9 +265,9 @@ we do the recursion in memory.
|
|||
Modern operating systems have a nice feature: they can reserve an address space
|
||||
instead of allocating memory. We can safely allocate memory pages inside this
|
||||
address space, so the stack could grow without moving memory data (this is
|
||||
important because of pointers). Thus we can allocate 1M address space, and use
|
||||
only a single memory page (usually 4K) if that is enough. However, we can still
|
||||
grow up to 1M anytime if needed.
|
||||
important because of pointers). Thus we can allocate 1MiB address space, and
|
||||
use only a single memory page (usually 4KiB) if that is enough. However, we can
|
||||
still grow up to 1MiB anytime if needed.
|
||||
.P
|
||||
(3) Who "owns" a JIT stack?
|
||||
.sp
|
||||
|
@ -300,7 +300,7 @@ say two minutes. The JIT callback can help to achieve this without keeping a
|
|||
list of patterns.
|
||||
.P
|
||||
(6) OK, the stack is for long term memory allocation. But what happens if a
|
||||
pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
|
||||
pattern causes stack overflow with a stack of 1MiB? Is that 1MiB kept until the
|
||||
stack is freed?
|
||||
.sp
|
||||
Especially on embedded sytems, it might be a good idea to release memory
|
||||
|
|
|
@ -7,12 +7,12 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
|||
There are some size limitations in PCRE2 but it is hoped that they will never
|
||||
in practice be relevant.
|
||||
.P
|
||||
The maximum size of a compiled pattern is approximately 64K code units for the
|
||||
8-bit and 16-bit libraries if PCRE2 is compiled with the default internal
|
||||
linkage size, which is 2 bytes for these libraries. If you want to process
|
||||
regular expressions that are truly enormous, you can compile PCRE2 with an
|
||||
internal linkage size of 3 or 4 (when building the 16-bit library, 3 is rounded
|
||||
up to 4). See the \fBREADME\fP file in the source distribution and the
|
||||
The maximum size of a compiled pattern is approximately 64 thousand code units
|
||||
for the 8-bit and 16-bit libraries if PCRE2 is compiled with the default
|
||||
internal linkage size, which is 2 bytes for these libraries. If you want to
|
||||
process regular expressions that are truly enormous, you can compile PCRE2 with
|
||||
an internal linkage size of 3 or 4 (when building the 16-bit library, 3 is
|
||||
rounded up to 4). See the \fBREADME\fP file in the source distribution and the
|
||||
.\" HREF
|
||||
\fBpcre2build\fP
|
||||
.\"
|
||||
|
|
|
@ -528,7 +528,7 @@ by code point, as described above.
|
|||
.sp
|
||||
The sequence \eg followed by a signed or unsigned number, optionally enclosed
|
||||
in braces, is an absolute or relative backreference. A named backreference
|
||||
can be coded as \eg{name}. backreferences are discussed
|
||||
can be coded as \eg{name}. Backreferences are discussed
|
||||
.\" HTML <a href="#backreferences">
|
||||
.\" </a>
|
||||
later,
|
||||
|
@ -2243,7 +2243,7 @@ that the first iteration does not need to match the backreference. This can be
|
|||
done using alternation, as in the example above, or by a quantifier with a
|
||||
minimum of zero.
|
||||
.P
|
||||
backreferences of this type cause the group that they reference to be treated
|
||||
Backreferences of this type cause the group that they reference to be treated
|
||||
as an
|
||||
.\" HTML <a href="#atomicgroup">
|
||||
.\" </a>
|
||||
|
|
|
@ -34,9 +34,9 @@ example, the very simple pattern
|
|||
.sp
|
||||
((ab){1,1000}c){1,3}
|
||||
.sp
|
||||
uses over 50K bytes when compiled using the 8-bit library. When PCRE2 is
|
||||
uses over 50KiB when compiled using the 8-bit library. When PCRE2 is
|
||||
compiled with its default internal pointer size of two bytes, the size limit on
|
||||
a compiled pattern is 64K code units in the 8-bit and 16-bit libraries, and
|
||||
a compiled pattern is 65535 code units in the 8-bit and 16-bit libraries, and
|
||||
this is reached with the above pattern if the outer repetition is increased
|
||||
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
|
||||
handle larger compiled patterns, but it is better to try to rewrite your
|
||||
|
@ -52,14 +52,14 @@ facility. Re-writing the above pattern as
|
|||
.sp
|
||||
((ab)(?2){0,999}c)(?1){0,2}
|
||||
.sp
|
||||
reduces the memory requirements to around 16K, and indeed it remains under 20K
|
||||
even with the outer repetition increased to 100. However, this kind of pattern
|
||||
is not always exactly equivalent, because any captures within subroutine calls
|
||||
are lost when the subroutine completes. If this is not a problem, this kind of
|
||||
rewriting will allow you to process patterns that PCRE2 cannot otherwise
|
||||
handle. The matching performance of the two different versions of the pattern
|
||||
are roughly the same. (This applies from release 10.30 - things were different
|
||||
in earlier releases.)
|
||||
reduces the memory requirements to around 16KiB, and indeed it remains under
|
||||
20KiB even with the outer repetition increased to 100. However, this kind of
|
||||
pattern is not always exactly equivalent, because any captures within
|
||||
subroutine calls are lost when the subroutine completes. If this is not a
|
||||
problem, this kind of rewriting will allow you to process patterns that PCRE2
|
||||
cannot otherwise handle. The matching performance of the two different versions
|
||||
of the pattern are roughly the same. (This applies from release 10.30 - things
|
||||
were different in earlier releases.)
|
||||
.
|
||||
.
|
||||
.SH "STACK AND HEAP USAGE AT RUN TIME"
|
||||
|
@ -69,7 +69,7 @@ From release 10.30, the interpretive (non-JIT) version of \fBpcre2_match()\fP
|
|||
uses very little system stack at run time. In earlier releases recursive
|
||||
function calls could use a great deal of stack, and this could cause problems,
|
||||
but this usage has been eliminated. Backtracking positions are now explicitly
|
||||
remembered in memory frames controlled by the code. An initial 20K vector of
|
||||
remembered in memory frames controlled by the code. An initial 20KiB vector of
|
||||
frames is allocated on the system stack (enough for about 100 frames for small
|
||||
patterns), but if this is insufficient, heap memory is used. The amount of heap
|
||||
memory can be limited; if the limit is set to zero, only the initial stack
|
||||
|
|
|
@ -134,16 +134,16 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
|
||||
/* This limits the amount of memory that may be used while matching a pattern.
|
||||
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
|
||||
to JIT matching. The value is in kilobytes. */
|
||||
to JIT matching. The value is in kibibytes (units of 1024 bytes). */
|
||||
#ifndef HEAP_LIMIT
|
||||
#define HEAP_LIMIT 20000000
|
||||
#endif
|
||||
|
||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||
as offsets within the compiled regex. The default is 2, which allows for
|
||||
compiled patterns up to 64K long. This covers the vast majority of cases.
|
||||
However, PCRE2 can also be compiled to use 3 or 4 bytes instead. This
|
||||
allows for longer patterns in extreme cases. */
|
||||
compiled patterns up to 65535 code units long. This covers the vast
|
||||
majority of cases. However, PCRE2 can also be compiled to use 3 or 4 bytes
|
||||
instead. This allows for longer patterns in extreme cases. */
|
||||
#ifndef LINK_SIZE
|
||||
#define LINK_SIZE 2
|
||||
#endif
|
||||
|
|
|
@ -139,9 +139,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
|||
|
||||
/* The value of LINK_SIZE determines the number of bytes used to store links
|
||||
as offsets within the compiled regex. The default is 2, which allows for
|
||||
compiled patterns up to 64K long. This covers the vast majority of cases.
|
||||
However, PCRE2 can also be compiled to use 3 or 4 bytes instead. This
|
||||
allows for longer patterns in extreme cases. */
|
||||
compiled patterns up to 65535 code units long. This covers the vast
|
||||
majority of cases. However, PCRE2 can also be compiled to use 3 or 4 bytes
|
||||
instead. This allows for longer patterns in extreme cases. */
|
||||
#undef LINK_SIZE
|
||||
|
||||
/* Define to the sub-directory where libtool stores uninstalled libraries. */
|
||||
|
|
|
@ -412,7 +412,7 @@ if (rws->next != NULL)
|
|||
}
|
||||
|
||||
/* All sizes are in units of sizeof(int), except for mb->heaplimit, which is in
|
||||
kilobytes. */
|
||||
kibibytes. */
|
||||
|
||||
else
|
||||
{
|
||||
|
|
|
@ -247,7 +247,7 @@ not rely on this. */
|
|||
pcre2_match() is allocated on the system stack, of this size (bytes). The size
|
||||
must be a multiple of sizeof(PCRE2_SPTR) in all environments, so making it a
|
||||
multiple of 8 is best. Typical frame sizes are a few hundred bytes (it depends
|
||||
on the number of capturing parentheses) so 20K handles quite a few frames. A
|
||||
on the number of capturing parentheses) so 20KiB handles quite a few frames. A
|
||||
larger vector on the heap is obtained for patterns that need more frames. The
|
||||
maximum size of this can be limited. */
|
||||
|
||||
|
|
|
@ -6283,7 +6283,7 @@ mb->match_limit_depth = (mcontext->depth_limit < re->limit_depth)?
|
|||
/* If a pattern has very many capturing parentheses, the frame size may be very
|
||||
large. Ensure that there are at least 10 available frames by getting an initial
|
||||
vector on the heap if necessary, except when the heap limit prevents this. Get
|
||||
fewer if possible. (The heap limit is in kilobytes.) */
|
||||
fewer if possible. (The heap limit is in kibibytes.) */
|
||||
|
||||
if (frame_size <= START_FRAMES_SIZE/10)
|
||||
{
|
||||
|
|
|
@ -416,7 +416,7 @@ static option_item optionlist[] = {
|
|||
{ OP_NODATA, N_LBUFFER, NULL, "line-buffered", "use line buffering" },
|
||||
{ OP_NODATA, N_LOFFSETS, NULL, "line-offsets", "output line numbers and offsets, not text" },
|
||||
{ OP_STRING, N_LOCALE, &locale, "locale=locale", "use the named locale" },
|
||||
{ OP_SIZE, N_H_LIMIT, &heap_limit, "heap-limit=number", "set PCRE2 heap limit option (kilobytes)" },
|
||||
{ OP_SIZE, N_H_LIMIT, &heap_limit, "heap-limit=number", "set PCRE2 heap limit option (kibibytes)" },
|
||||
{ OP_U32NUMBER, N_M_LIMIT, &match_limit, "match-limit=number", "set PCRE2 match limit option" },
|
||||
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "depth-limit=number", "set PCRE2 depth limit option" },
|
||||
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "recursion-limit=number", "obsolete synonym for depth-limit" },
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
This is a file of miscellaneous text that is used as test data for checking
|
||||
that the pcregrep command is working correctly. The file must be more than 24K
|
||||
long so that it needs more than a single read() call to process it. New
|
||||
that the pcregrep command is working correctly. The file must be more than
|
||||
24KiB long so that it needs more than a single read() call to process it. New
|
||||
features should be added at the end, because some of the tests involve the
|
||||
output of line numbers, and we don't want these to change.
|
||||
|
||||
|
@ -9,7 +9,7 @@ In the middle of a line, PATTERN appears.
|
|||
|
||||
This pattern is in lower case.
|
||||
|
||||
Here follows a whole lot of stuff that makes the file over 24K long.
|
||||
Here follows a whole lot of stuff that makes the file over 24KiB long.
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the
|
||||
|
|
|
@ -346,7 +346,7 @@ RC=0
|
|||
./testdata/grepinput-9-
|
||||
./testdata/grepinput:10:This pattern is in lower case.
|
||||
./testdata/grepinput-11-
|
||||
./testdata/grepinput-12-Here follows a whole lot of stuff that makes the file over 24K long.
|
||||
./testdata/grepinput-12-Here follows a whole lot of stuff that makes the file over 24KiB long.
|
||||
./testdata/grepinput-13-
|
||||
--
|
||||
./testdata/grepinput:623:Check up on PATTERN near the end.
|
||||
|
@ -379,6 +379,7 @@ RC=0
|
|||
./testdata/grepinputx
|
||||
RC=0
|
||||
---------------------------- Test 37 -----------------------------
|
||||
24KiB long so that it needs more than a single read() call to process it. New
|
||||
aaaaa0
|
||||
aaaaa2
|
||||
010203040506
|
||||
|
@ -465,11 +466,11 @@ fox [1;31mjumps[0m
|
|||
This time it [1;31mjumps[0m and [1;31mjumps[0m and [1;31mjumps[0m.
|
||||
RC=0
|
||||
---------------------------- Test 53 ------------------------------
|
||||
36972,6
|
||||
36990,4
|
||||
37024,4
|
||||
37066,5
|
||||
37083,4
|
||||
36976,6
|
||||
36994,4
|
||||
37028,4
|
||||
37070,5
|
||||
37087,4
|
||||
RC=0
|
||||
---------------------------- Test 54 ------------------------------
|
||||
595:15,6
|
||||
|
@ -519,8 +520,8 @@ RC=0
|
|||
pcre2grep: pcre2_match() gave error -47 while matching text that starts:
|
||||
|
||||
This is a file of miscellaneous text that is used as test data for checking
|
||||
that the pcregrep command is working correctly. The file must be more than 24K
|
||||
long so that it needs more than a single read
|
||||
that the pcregrep command is working correctly. The file must be more than
|
||||
24KiB long so that it needs more than a single re
|
||||
|
||||
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
|
||||
pcre2grep: Check your regex for nested unlimited loops.
|
||||
|
@ -529,8 +530,8 @@ RC=1
|
|||
pcre2grep: pcre2_match() gave error -53 while matching text that starts:
|
||||
|
||||
This is a file of miscellaneous text that is used as test data for checking
|
||||
that the pcregrep command is working correctly. The file must be more than 24K
|
||||
long so that it needs more than a single read
|
||||
that the pcregrep command is working correctly. The file must be more than
|
||||
24KiB long so that it needs more than a single re
|
||||
|
||||
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
|
||||
pcre2grep: Check your regex for nested unlimited loops.
|
||||
|
@ -814,11 +815,11 @@ RC=0
|
|||
615:0,12
|
||||
RC=0
|
||||
---------------------------- Test 112 -----------------------------
|
||||
37168,12
|
||||
37180,12
|
||||
37192,12
|
||||
37204,12
|
||||
37216,12
|
||||
37172,12
|
||||
37184,12
|
||||
37196,12
|
||||
37208,12
|
||||
37220,12
|
||||
RC=0
|
||||
---------------------------- Test 113 -----------------------------
|
||||
480
|
||||
|
|
Loading…
Reference in New Issue