More typos and changes to "Kibibytes" for "Kilobytes".

This commit is contained in:
Philip.Hazel 2018-06-18 14:03:33 +00:00
parent fabea723cf
commit e75410a5d8
35 changed files with 1378 additions and 1374 deletions

View File

@ -370,8 +370,8 @@ tests to improve coverage.
31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
pcre2test, a crash could occur.
32. Make -bigstack in RunTest allocate a 64MB stack (instead of 16 MB) so that
all the tests can run with clang's sanitizing options.
32. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16 MiB) so
that all the tests can run with clang's sanitizing options.
33. Implement extra compile options in the compile context and add the first
one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
@ -964,9 +964,9 @@ to the same code as '.' when PCRE2_DOTALL is set).
40. Fix two clang compiler warnings in pcre2test when only one code unit width
is supported.
41. Upgrade RunTest to automatically re-run test 2 with a large (64M) stack if
it fails when running the interpreter with a 16M stack (and if changing the
stack size via pcre2test is possible). This avoids having to manually set a
41. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack
if it fails when running the interpreter with a 16MiB stack (and if changing
the stack size via pcre2test is possible). This avoids having to manually set a
large stack size when testing with clang.
42. Fix register overwite in JIT when SSE2 acceleration is enabled.

View File

@ -370,7 +370,7 @@ default value for LINK_SIZE is 2, except for the 32-bit library, where it can
only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values,
and the 16-bit library can be compiled to use 4-byte values, though this
impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is
necessary only when patterns whose compiled length is greater than 64K code
necessary only when patterns whose compiled length is greater than 65535 code
units are going to be processed. When a LINK_SIZE value uses more than one code
unit, the most significant unit is first.

View File

@ -186,7 +186,7 @@ can skip ahead to the CMake section.
STACK SIZE IN WINDOWS ENVIRONMENTS
Prior to release 10.30 the default system stack size of 1MB in some Windows
Prior to release 10.30 the default system stack size of 1MiB in some Windows
environments caused issues with some tests. This should no longer be the case
for 10.30 and later releases.

2
README
View File

@ -263,7 +263,7 @@ library. They are also documented in the pcre2build man page.
pcre2_set_heap_limit).
. In the 8-bit library, the default maximum compiled pattern size is around
64K bytes. You can increase this by adding --with-link-size=3 to the
64 kibibytes. You can increase this by adding --with-link-size=3 to the
"configure" command. PCRE2 then uses three bytes instead of two for offsets
to different parts of the compiled pattern. In the 16-bit library,
--with-link-size=3 is the same as --with-link-size=4, which (in both

View File

@ -706,8 +706,8 @@ fi
AC_DEFINE_UNQUOTED([LINK_SIZE], [$with_link_size], [
The value of LINK_SIZE determines the number of bytes used to store
links as offsets within the compiled regex. The default is 2, which
allows for compiled patterns up to 64K long. This covers the vast
majority of cases. However, PCRE2 can also be compiled to use 3 or 4
allows for compiled patterns up to 65535 code units long. This covers the
vast majority of cases. However, PCRE2 can also be compiled to use 3 or 4
bytes instead. This allows for longer patterns in extreme cases.])
AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [

View File

@ -186,7 +186,7 @@ can skip ahead to the CMake section.
STACK SIZE IN WINDOWS ENVIRONMENTS
Prior to release 10.30 the default system stack size of 1MB in some Windows
Prior to release 10.30 the default system stack size of 1MiB in some Windows
environments caused issues with some tests. This should no longer be the case
for 10.30 and later releases.

View File

@ -263,7 +263,7 @@ library. They are also documented in the pcre2build man page.
pcre2_set_heap_limit).
. In the 8-bit library, the default maximum compiled pattern size is around
64K bytes. You can increase this by adding --with-link-size=3 to the
64 kibibytes. You can increase this by adding --with-link-size=3 to the
"configure" command. PCRE2 then uses three bytes instead of two for offsets
to different parts of the compiled pattern. In the 16-bit library,
--with-link-size=3 is the same as --with-link-size=4, which (in both

View File

@ -38,7 +38,7 @@ passed to a matching function. The arguments of this function are:
</PRE>
</P>
<P>
If <i>callback</i> is NULL and <i>callback_data</i> is NULL, an internal 32K
If <i>callback</i> is NULL and <i>callback_data</i> is NULL, an internal 32KiB
block on the machine stack is used.
</P>
<P>
@ -49,8 +49,9 @@ If <i>callback</i> is NULL and <i>callback_data</i> is not NULL,
<P>
If <i>callback</i> not NULL, it is called with <i>callback_data</i> as an
argument at the start of matching, in order to set up a JIT stack. If the
result is NULL, the internal 32K stack is used; otherwise the return value must
be a valid JIT stack, the result of calling <b>pcre2_jit_stack_create()</b>.
result is NULL, the internal 32KiB stack is used; otherwise the return value
must be a valid JIT stack, the result of calling
<b>pcre2_jit_stack_create()</b>.
</P>
<P>
You may safely use the same JIT stack for multiple patterns, as long as they

View File

@ -33,8 +33,8 @@ context, for memory allocation functions, or NULL for standard memory
allocation. The result can be passed to the JIT run-time code by calling
<b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern,
which can then be processed by <b>pcre2_match()</b> or <b>pcre2_jit_match()</b>.
A maximum stack size of 512K to 1M should be more than enough for any pattern.
For more details, see the
A maximum stack size of 512KiB to 1MiB should be more than enough for any
pattern. For more details, see the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
page.
</P>

View File

@ -973,7 +973,7 @@ less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
limit is set, less than the default.
</P>
<P>
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
The <b>pcre2_match()</b> function starts out using a 20KiB vector on the system
stack for recording backtracking points. The more nested backtracking points
there are (that is, the deeper the search tree), the more memory is needed.
Heap memory is used only if the initial vector is too small. If the heap limit
@ -1155,7 +1155,7 @@ relevant.
<P>
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
but the most massive patterns, since it allows the size of the compiled pattern
to be up to 64K code units. Larger values allow larger regular expressions to
to be up to 65535 code units. Larger values allow larger regular expressions to
be compiled by those two libraries, but at the expense of slower matching.
<pre>
PCRE2_CONFIG_MATCHLIMIT

View File

@ -252,10 +252,10 @@ Within a compiled pattern, offset values are used to point from one part to
another (for example, from an opening parenthesis to an alternation
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
are used for these offsets, leading to a maximum size for a compiled pattern of
around 64K code units. This is sufficient to handle all but the most gigantic
patterns. Nevertheless, some people do want to process truly enormous patterns,
so it is possible to compile PCRE2 to use three-byte or four-byte offsets by
adding a setting such as
around 64 thousand code units. This is sufficient to handle all but the most
gigantic patterns. Nevertheless, some people do want to process truly enormous
patterns, so it is possible to compile PCRE2 to use three-byte or four-byte
offsets by adding a setting such as
<pre>
--with-link-size=3
</pre>
@ -282,7 +282,7 @@ to the <b>configure</b> command. This setting also applies to the
counting is done differently).
</P>
<P>
The <b>pcre2_match()</b> function starts out using a 20K vector on the system
The <b>pcre2_match()</b> function starts out using a 20KiB vector on the system
stack to record backtracking points. The more nested backtracking points there
are (that is, the deeper the search tree), the more memory is needed. If the
initial vector is not large enough, heap memory is used, up to a certain limit,
@ -399,13 +399,13 @@ they are not.
<P>
<b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
scanning, in order to be able to output "before" and "after" lines when it
finds a match. The starting size of the buffer is controlled by a parameter
whose default value is 20K. The buffer itself is three times this size, but
because of the way it is used for holding "before" lines, the longest line that
is guaranteed to be processable is the parameter size. If a longer line is
encountered, <b>pcre2grep</b> automatically expands the buffer, up to a
specified maximum size, whose default is 1M or the starting size, whichever is
the larger. You can change the default parameter values by adding, for example,
finds a match. The default starting size of the buffer is 20KiB. The buffer
itself is three times this size, but because of the way it is used for holding
"before" lines, the longest line that is guaranteed to be processable is the
notional buffer size. If a longer line is encountered, <b>pcre2grep</b>
automatically expands the buffer, up to a specified maximum size, whose default
is 1MiB or the starting size, whichever is the larger. You can change the
default parameter values by adding, for example,
<pre>
--with-pcre2grep-bufsize=51200
--with-pcre2grep-max-bufsize=2097152

View File

@ -87,7 +87,7 @@ that is obtained at the start of processing. If an input file contains very
long lines, a larger buffer may be needed; this is handled by automatically
extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
default values for these parameters can be set when <b>pcre2grep</b> is
built; if nothing is specified, the defaults are set to 20K and 1M
built; if nothing is specified, the defaults are set to 20KiB and 1MiB
respectively. An error occurs if a line is too long and the buffer can no
longer be expanded.
</P>
@ -97,7 +97,7 @@ allow for buffering "before" and "after" lines. If the buffer size is too
small, fewer than requested "before" and "after" lines may be output.
</P>
<P>
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater.
BUFSIZ is defined in <b>&#60;stdio.h&#62;</b>. When there is more than one pattern
(specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
each line in the order in which they are defined, except that all the <b>-e</b>

View File

@ -179,7 +179,7 @@ when JIT matching is used.
<br><a name="SEC6" href="#TOC1">CONTROLLING THE JIT STACK</a><br>
<P>
When the compiled JIT code runs, it needs a block of memory to use as a stack.
By default, it uses 32K on the machine stack. However, some large or
By default, it uses 32KiB on the machine stack. However, some large or
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
is given when there is not enough stack. Three functions are provided for
managing blocks of memory for use as JIT stacks. There is further discussion
@ -194,8 +194,8 @@ allocation functions, or NULL for standard memory allocation). It returns a
pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
that is no longer needed. (For the technically minded: the address space is
allocated by mmap or VirtualAlloc.) A maximum stack size of 512K to 1M should
be more than enough for any pattern.
allocated by mmap or VirtualAlloc.) A maximum stack size of 512KiB to 1MiB
should be more than enough for any pattern.
</P>
<P>
The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code
@ -209,7 +209,7 @@ The first argument is a pointer to a match context. When this is subsequently
passed to a matching function, its information determines which JIT stack is
used. There are three cases for the values of the other two options:
<pre>
(1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32K block
(1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32KiB block
on the machine stack is used. This is the default when a match
context is created.
@ -220,7 +220,7 @@ used. There are three cases for the values of the other two options:
(3) If <i>callback</i> is not NULL, it must point to a function that is
called with <i>data</i> as an argument at the start of matching, in
order to set up a JIT stack. If the return from the callback
function is NULL, the internal 32K stack is used; otherwise the
function is NULL, the internal 32KiB stack is used; otherwise the
return value must be a valid JIT stack, the result of calling
<b>pcre2_jit_stack_create()</b>.
</pre>
@ -286,9 +286,9 @@ we do the recursion in memory.
Modern operating systems have a nice feature: they can reserve an address space
instead of allocating memory. We can safely allocate memory pages inside this
address space, so the stack could grow without moving memory data (this is
important because of pointers). Thus we can allocate 1M address space, and use
only a single memory page (usually 4K) if that is enough. However, we can still
grow up to 1M anytime if needed.
important because of pointers). Thus we can allocate 1MiB address space, and
use only a single memory page (usually 4KiB) if that is enough. However, we can
still grow up to 1MiB anytime if needed.
</P>
<P>
(3) Who "owns" a JIT stack?
@ -328,7 +328,7 @@ list of patterns.
</P>
<P>
(6) OK, the stack is for long term memory allocation. But what happens if a
pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
pattern causes stack overflow with a stack of 1MiB? Is that 1MiB kept until the
stack is freed?
<br>
<br>

View File

@ -20,12 +20,12 @@ There are some size limitations in PCRE2 but it is hoped that they will never
in practice be relevant.
</P>
<P>
The maximum size of a compiled pattern is approximately 64K code units for the
8-bit and 16-bit libraries if PCRE2 is compiled with the default internal
linkage size, which is 2 bytes for these libraries. If you want to process
regular expressions that are truly enormous, you can compile PCRE2 with an
internal linkage size of 3 or 4 (when building the 16-bit library, 3 is rounded
up to 4). See the <b>README</b> file in the source distribution and the
The maximum size of a compiled pattern is approximately 64 thousand code units
for the 8-bit and 16-bit libraries if PCRE2 is compiled with the default
internal linkage size, which is 2 bytes for these libraries. If you want to
process regular expressions that are truly enormous, you can compile PCRE2 with
an internal linkage size of 3 or 4 (when building the 16-bit library, 3 is
rounded up to 4). See the <b>README</b> file in the source distribution and the
<a href="pcre2build.html"><b>pcre2build</b></a>
documentation for details. In these cases the limit is substantially larger.
However, the speed of execution is slower. In the 32-bit library, the internal

View File

@ -549,7 +549,7 @@ Absolute and relative backreferences
<P>
The sequence \g followed by a signed or unsigned number, optionally enclosed
in braces, is an absolute or relative backreference. A named backreference
can be coded as \g{name}. backreferences are discussed
can be coded as \g{name}. Backreferences are discussed
<a href="#backreferences">later,</a>
following the discussion of
<a href="#subpattern">parenthesized subpatterns.</a>
@ -2247,7 +2247,7 @@ done using alternation, as in the example above, or by a quantifier with a
minimum of zero.
</P>
<P>
backreferences of this type cause the group that they reference to be treated
Backreferences of this type cause the group that they reference to be treated
as an
<a href="#atomicgroup">atomic group.</a>
Once the whole group has been matched, a subsequent matching failure cannot

View File

@ -52,9 +52,9 @@ example, the very simple pattern
<pre>
((ab){1,1000}c){1,3}
</pre>
uses over 50K bytes when compiled using the 8-bit library. When PCRE2 is
uses over 50KiB when compiled using the 8-bit library. When PCRE2 is
compiled with its default internal pointer size of two bytes, the size limit on
a compiled pattern is 64K code units in the 8-bit and 16-bit libraries, and
a compiled pattern is 65535 code units in the 8-bit and 16-bit libraries, and
this is reached with the above pattern if the outer repetition is increased
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
handle larger compiled patterns, but it is better to try to rewrite your
@ -68,14 +68,14 @@ facility. Re-writing the above pattern as
<pre>
((ab)(?2){0,999}c)(?1){0,2}
</pre>
reduces the memory requirements to around 16K, and indeed it remains under 20K
even with the outer repetition increased to 100. However, this kind of pattern
is not always exactly equivalent, because any captures within subroutine calls
are lost when the subroutine completes. If this is not a problem, this kind of
rewriting will allow you to process patterns that PCRE2 cannot otherwise
handle. The matching performance of the two different versions of the pattern
are roughly the same. (This applies from release 10.30 - things were different
in earlier releases.)
reduces the memory requirements to around 16KiB, and indeed it remains under
20KiB even with the outer repetition increased to 100. However, this kind of
pattern is not always exactly equivalent, because any captures within
subroutine calls are lost when the subroutine completes. If this is not a
problem, this kind of rewriting will allow you to process patterns that PCRE2
cannot otherwise handle. The matching performance of the two different versions
of the pattern are roughly the same. (This applies from release 10.30 - things
were different in earlier releases.)
</P>
<br><a name="SEC3" href="#TOC1">STACK AND HEAP USAGE AT RUN TIME</a><br>
<P>
@ -83,7 +83,7 @@ From release 10.30, the interpretive (non-JIT) version of <b>pcre2_match()</b>
uses very little system stack at run time. In earlier releases recursive
function calls could use a great deal of stack, and this could cause problems,
but this usage has been eliminated. Backtracking positions are now explicitly
remembered in memory frames controlled by the code. An initial 20K vector of
remembered in memory frames controlled by the code. An initial 20KiB vector of
frames is allocated on the system stack (enough for about 100 frames for small
patterns), but if this is insufficient, heap memory is used. The amount of heap
memory can be limited; if the limit is set to zero, only the initial stack

File diff suppressed because it is too large Load Diff

View File

@ -24,7 +24,7 @@ passed to a matching function. The arguments of this function are:
callback a callback function
callback_data a JIT stack or a value to be passed to the callback
.P
If \fIcallback\fP is NULL and \fIcallback_data\fP is NULL, an internal 32K
If \fIcallback\fP is NULL and \fIcallback_data\fP is NULL, an internal 32KiB
block on the machine stack is used.
.P
If \fIcallback\fP is NULL and \fIcallback_data\fP is not NULL,
@ -33,8 +33,9 @@ If \fIcallback\fP is NULL and \fIcallback_data\fP is not NULL,
.P
If \fIcallback\fP not NULL, it is called with \fIcallback_data\fP as an
argument at the start of matching, in order to set up a JIT stack. If the
result is NULL, the internal 32K stack is used; otherwise the return value must
be a valid JIT stack, the result of calling \fBpcre2_jit_stack_create()\fP.
result is NULL, the internal 32KiB stack is used; otherwise the return value
must be a valid JIT stack, the result of calling
\fBpcre2_jit_stack_create()\fP.
.P
You may safely use the same JIT stack for multiple patterns, as long as they
are all matched in the same thread. In a multithread application, each thread

View File

@ -21,8 +21,8 @@ context, for memory allocation functions, or NULL for standard memory
allocation. The result can be passed to the JIT run-time code by calling
\fBpcre2_jit_stack_assign()\fP to associate the stack with a compiled pattern,
which can then be processed by \fBpcre2_match()\fP or \fBpcre2_jit_match()\fP.
A maximum stack size of 512K to 1M should be more than enough for any pattern.
For more details, see the
A maximum stack size of 512KiB to 1MiB should be more than enough for any
pattern. For more details, see the
.\" HREF
\fBpcre2jit\fP
.\"

View File

@ -909,7 +909,7 @@ where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
limit is set, less than the default.
.P
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
stack for recording backtracking points. The more nested backtracking points
there are (that is, the deeper the search tree), the more memory is needed.
Heap memory is used only if the initial vector is too small. If the heap limit
@ -1084,7 +1084,7 @@ relevant.
.P
The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
but the most massive patterns, since it allows the size of the compiled pattern
to be up to 64K code units. Larger values allow larger regular expressions to
to be up to 65535 code units. Larger values allow larger regular expressions to
be compiled by those two libraries, but at the expense of slower matching.
.sp
PCRE2_CONFIG_MATCHLIMIT

View File

@ -244,10 +244,10 @@ Within a compiled pattern, offset values are used to point from one part to
another (for example, from an opening parenthesis to an alternation
metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
are used for these offsets, leading to a maximum size for a compiled pattern of
around 64K code units. This is sufficient to handle all but the most gigantic
patterns. Nevertheless, some people do want to process truly enormous patterns,
so it is possible to compile PCRE2 to use three-byte or four-byte offsets by
adding a setting such as
around 64 thousand code units. This is sufficient to handle all but the most
gigantic patterns. Nevertheless, some people do want to process truly enormous
patterns, so it is possible to compile PCRE2 to use three-byte or four-byte
offsets by adding a setting such as
.sp
--with-link-size=3
.sp
@ -277,7 +277,7 @@ to the \fBconfigure\fP command. This setting also applies to the
\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the
counting is done differently).
.P
The \fBpcre2_match()\fP function starts out using a 20K vector on the system
The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system
stack to record backtracking points. The more nested backtracking points there
are (that is, the deeper the search tree), the more memory is needed. If the
initial vector is not large enough, heap memory is used, up to a certain limit,
@ -403,13 +403,13 @@ they are not.
.sp
\fBpcre2grep\fP uses an internal buffer to hold a "window" on the file it is
scanning, in order to be able to output "before" and "after" lines when it
finds a match. The starting size of the buffer is controlled by a parameter
whose default value is 20K. The buffer itself is three times this size, but
because of the way it is used for holding "before" lines, the longest line that
is guaranteed to be processable is the parameter size. If a longer line is
encountered, \fBpcre2grep\fP automatically expands the buffer, up to a
specified maximum size, whose default is 1M or the starting size, whichever is
the larger. You can change the default parameter values by adding, for example,
finds a match. The default starting size of the buffer is 20KiB. The buffer
itself is three times this size, but because of the way it is used for holding
"before" lines, the longest line that is guaranteed to be processable is the
notional buffer size. If a longer line is encountered, \fBpcre2grep\fP
automatically expands the buffer, up to a specified maximum size, whose default
is 1MiB or the starting size, whichever is the larger. You can change the
default parameter values by adding, for example,
.sp
--with-pcre2grep-bufsize=51200
--with-pcre2grep-max-bufsize=2097152

View File

@ -58,7 +58,7 @@ that is obtained at the start of processing. If an input file contains very
long lines, a larger buffer may be needed; this is handled by automatically
extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
default values for these parameters can be set when \fBpcre2grep\fP is
built; if nothing is specified, the defaults are set to 20K and 1M
built; if nothing is specified, the defaults are set to 20KiB and 1MiB
respectively. An error occurs if a line is too long and the buffer can no
longer be expanded.
.P
@ -66,7 +66,7 @@ The block of memory that is actually used is three times the "buffer size", to
allow for buffering "before" and "after" lines. If the buffer size is too
small, fewer than requested "before" and "after" lines may be output.
.P
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater.
BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
(specified by the use of \fB-e\fP and/or \fB-f\fP), each pattern is applied to
each line in the order in which they are defined, except that all the \fB-e\fP

View File

@ -58,15 +58,15 @@ DESCRIPTION
automatically extending the buffer, up to the limit specified by --max-
buffer-size. The default values for these parameters can be set when
pcre2grep is built; if nothing is specified, the defaults are set to
20K and 1M respectively. An error occurs if a line is too long and the
buffer can no longer be expanded.
20KiB and 1MiB respectively. An error occurs if a line is too long and
the buffer can no longer be expanded.
The block of memory that is actually used is three times the "buffer
size", to allow for buffering "before" and "after" lines. If the buffer
size is too small, fewer than requested "before" and "after" lines may
be output.
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the
greater. BUFSIZ is defined in <stdio.h>. When there is more than one
pattern (specified by the use of -e and/or -f), each pattern is applied
to each line in the order in which they are defined, except that all

View File

@ -161,7 +161,7 @@ when JIT matching is used.
.rs
.sp
When the compiled JIT code runs, it needs a block of memory to use as a stack.
By default, it uses 32K on the machine stack. However, some large or
By default, it uses 32KiB on the machine stack. However, some large or
complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT
is given when there is not enough stack. Three functions are provided for
managing blocks of memory for use as JIT stacks. There is further discussion
@ -178,8 +178,8 @@ allocation functions, or NULL for standard memory allocation). It returns a
pointer to an opaque structure of type \fBpcre2_jit_stack\fP, or NULL if there
is an error. The \fBpcre2_jit_stack_free()\fP function is used to free a stack
that is no longer needed. (For the technically minded: the address space is
allocated by mmap or VirtualAlloc.) A maximum stack size of 512K to 1M should
be more than enough for any pattern.
allocated by mmap or VirtualAlloc.) A maximum stack size of 512KiB to 1MiB
should be more than enough for any pattern.
.P
The \fBpcre2_jit_stack_assign()\fP function specifies which stack JIT code
should use. Its arguments are as follows:
@ -192,7 +192,7 @@ The first argument is a pointer to a match context. When this is subsequently
passed to a matching function, its information determines which JIT stack is
used. There are three cases for the values of the other two options:
.sp
(1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32K block
(1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32KiB block
on the machine stack is used. This is the default when a match
context is created.
.sp
@ -203,7 +203,7 @@ used. There are three cases for the values of the other two options:
(3) If \fIcallback\fP is not NULL, it must point to a function that is
called with \fIdata\fP as an argument at the start of matching, in
order to set up a JIT stack. If the return from the callback
function is NULL, the internal 32K stack is used; otherwise the
function is NULL, the internal 32KiB stack is used; otherwise the
return value must be a valid JIT stack, the result of calling
\fBpcre2_jit_stack_create()\fP.
.sp
@ -265,9 +265,9 @@ we do the recursion in memory.
Modern operating systems have a nice feature: they can reserve an address space
instead of allocating memory. We can safely allocate memory pages inside this
address space, so the stack could grow without moving memory data (this is
important because of pointers). Thus we can allocate 1M address space, and use
only a single memory page (usually 4K) if that is enough. However, we can still
grow up to 1M anytime if needed.
important because of pointers). Thus we can allocate 1MiB address space, and
use only a single memory page (usually 4KiB) if that is enough. However, we can
still grow up to 1MiB anytime if needed.
.P
(3) Who "owns" a JIT stack?
.sp
@ -300,7 +300,7 @@ say two minutes. The JIT callback can help to achieve this without keeping a
list of patterns.
.P
(6) OK, the stack is for long term memory allocation. But what happens if a
pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
pattern causes stack overflow with a stack of 1MiB? Is that 1MiB kept until the
stack is freed?
.sp
Especially on embedded sytems, it might be a good idea to release memory

View File

@ -7,12 +7,12 @@ PCRE2 - Perl-compatible regular expressions (revised API)
There are some size limitations in PCRE2 but it is hoped that they will never
in practice be relevant.
.P
The maximum size of a compiled pattern is approximately 64K code units for the
8-bit and 16-bit libraries if PCRE2 is compiled with the default internal
linkage size, which is 2 bytes for these libraries. If you want to process
regular expressions that are truly enormous, you can compile PCRE2 with an
internal linkage size of 3 or 4 (when building the 16-bit library, 3 is rounded
up to 4). See the \fBREADME\fP file in the source distribution and the
The maximum size of a compiled pattern is approximately 64 thousand code units
for the 8-bit and 16-bit libraries if PCRE2 is compiled with the default
internal linkage size, which is 2 bytes for these libraries. If you want to
process regular expressions that are truly enormous, you can compile PCRE2 with
an internal linkage size of 3 or 4 (when building the 16-bit library, 3 is
rounded up to 4). See the \fBREADME\fP file in the source distribution and the
.\" HREF
\fBpcre2build\fP
.\"

View File

@ -528,7 +528,7 @@ by code point, as described above.
.sp
The sequence \eg followed by a signed or unsigned number, optionally enclosed
in braces, is an absolute or relative backreference. A named backreference
can be coded as \eg{name}. backreferences are discussed
can be coded as \eg{name}. Backreferences are discussed
.\" HTML <a href="#backreferences">
.\" </a>
later,
@ -2243,7 +2243,7 @@ that the first iteration does not need to match the backreference. This can be
done using alternation, as in the example above, or by a quantifier with a
minimum of zero.
.P
backreferences of this type cause the group that they reference to be treated
Backreferences of this type cause the group that they reference to be treated
as an
.\" HTML <a href="#atomicgroup">
.\" </a>

View File

@ -34,9 +34,9 @@ example, the very simple pattern
.sp
((ab){1,1000}c){1,3}
.sp
uses over 50K bytes when compiled using the 8-bit library. When PCRE2 is
uses over 50KiB when compiled using the 8-bit library. When PCRE2 is
compiled with its default internal pointer size of two bytes, the size limit on
a compiled pattern is 64K code units in the 8-bit and 16-bit libraries, and
a compiled pattern is 65535 code units in the 8-bit and 16-bit libraries, and
this is reached with the above pattern if the outer repetition is increased
from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus
handle larger compiled patterns, but it is better to try to rewrite your
@ -52,14 +52,14 @@ facility. Re-writing the above pattern as
.sp
((ab)(?2){0,999}c)(?1){0,2}
.sp
reduces the memory requirements to around 16K, and indeed it remains under 20K
even with the outer repetition increased to 100. However, this kind of pattern
is not always exactly equivalent, because any captures within subroutine calls
are lost when the subroutine completes. If this is not a problem, this kind of
rewriting will allow you to process patterns that PCRE2 cannot otherwise
handle. The matching performance of the two different versions of the pattern
are roughly the same. (This applies from release 10.30 - things were different
in earlier releases.)
reduces the memory requirements to around 16KiB, and indeed it remains under
20KiB even with the outer repetition increased to 100. However, this kind of
pattern is not always exactly equivalent, because any captures within
subroutine calls are lost when the subroutine completes. If this is not a
problem, this kind of rewriting will allow you to process patterns that PCRE2
cannot otherwise handle. The matching performance of the two different versions
of the pattern are roughly the same. (This applies from release 10.30 - things
were different in earlier releases.)
.
.
.SH "STACK AND HEAP USAGE AT RUN TIME"
@ -69,7 +69,7 @@ From release 10.30, the interpretive (non-JIT) version of \fBpcre2_match()\fP
uses very little system stack at run time. In earlier releases recursive
function calls could use a great deal of stack, and this could cause problems,
but this usage has been eliminated. Backtracking positions are now explicitly
remembered in memory frames controlled by the code. An initial 20K vector of
remembered in memory frames controlled by the code. An initial 20KiB vector of
frames is allocated on the system stack (enough for about 100 frames for small
patterns), but if this is insufficient, heap memory is used. The amount of heap
memory can be limited; if the limit is set to zero, only the initial stack

View File

@ -134,16 +134,16 @@ sure both macros are undefined; an emulation function will then be used. */
/* This limits the amount of memory that may be used while matching a pattern.
It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply
to JIT matching. The value is in kilobytes. */
to JIT matching. The value is in kibibytes (units of 1024 bytes). */
#ifndef HEAP_LIMIT
#define HEAP_LIMIT 20000000
#endif
/* The value of LINK_SIZE determines the number of bytes used to store links
as offsets within the compiled regex. The default is 2, which allows for
compiled patterns up to 64K long. This covers the vast majority of cases.
However, PCRE2 can also be compiled to use 3 or 4 bytes instead. This
allows for longer patterns in extreme cases. */
compiled patterns up to 65535 code units long. This covers the vast
majority of cases. However, PCRE2 can also be compiled to use 3 or 4 bytes
instead. This allows for longer patterns in extreme cases. */
#ifndef LINK_SIZE
#define LINK_SIZE 2
#endif

View File

@ -139,9 +139,9 @@ sure both macros are undefined; an emulation function will then be used. */
/* The value of LINK_SIZE determines the number of bytes used to store links
as offsets within the compiled regex. The default is 2, which allows for
compiled patterns up to 64K long. This covers the vast majority of cases.
However, PCRE2 can also be compiled to use 3 or 4 bytes instead. This
allows for longer patterns in extreme cases. */
compiled patterns up to 65535 code units long. This covers the vast
majority of cases. However, PCRE2 can also be compiled to use 3 or 4 bytes
instead. This allows for longer patterns in extreme cases. */
#undef LINK_SIZE
/* Define to the sub-directory where libtool stores uninstalled libraries. */

View File

@ -412,7 +412,7 @@ if (rws->next != NULL)
}
/* All sizes are in units of sizeof(int), except for mb->heaplimit, which is in
kilobytes. */
kibibytes. */
else
{

View File

@ -247,7 +247,7 @@ not rely on this. */
pcre2_match() is allocated on the system stack, of this size (bytes). The size
must be a multiple of sizeof(PCRE2_SPTR) in all environments, so making it a
multiple of 8 is best. Typical frame sizes are a few hundred bytes (it depends
on the number of capturing parentheses) so 20K handles quite a few frames. A
on the number of capturing parentheses) so 20KiB handles quite a few frames. A
larger vector on the heap is obtained for patterns that need more frames. The
maximum size of this can be limited. */

View File

@ -6283,7 +6283,7 @@ mb->match_limit_depth = (mcontext->depth_limit < re->limit_depth)?
/* If a pattern has very many capturing parentheses, the frame size may be very
large. Ensure that there are at least 10 available frames by getting an initial
vector on the heap if necessary, except when the heap limit prevents this. Get
fewer if possible. (The heap limit is in kilobytes.) */
fewer if possible. (The heap limit is in kibibytes.) */
if (frame_size <= START_FRAMES_SIZE/10)
{

View File

@ -416,7 +416,7 @@ static option_item optionlist[] = {
{ OP_NODATA, N_LBUFFER, NULL, "line-buffered", "use line buffering" },
{ OP_NODATA, N_LOFFSETS, NULL, "line-offsets", "output line numbers and offsets, not text" },
{ OP_STRING, N_LOCALE, &locale, "locale=locale", "use the named locale" },
{ OP_SIZE, N_H_LIMIT, &heap_limit, "heap-limit=number", "set PCRE2 heap limit option (kilobytes)" },
{ OP_SIZE, N_H_LIMIT, &heap_limit, "heap-limit=number", "set PCRE2 heap limit option (kibibytes)" },
{ OP_U32NUMBER, N_M_LIMIT, &match_limit, "match-limit=number", "set PCRE2 match limit option" },
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "depth-limit=number", "set PCRE2 depth limit option" },
{ OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "recursion-limit=number", "obsolete synonym for depth-limit" },

6
testdata/grepinput vendored
View File

@ -1,6 +1,6 @@
This is a file of miscellaneous text that is used as test data for checking
that the pcregrep command is working correctly. The file must be more than 24K
long so that it needs more than a single read() call to process it. New
that the pcregrep command is working correctly. The file must be more than
24KiB long so that it needs more than a single read() call to process it. New
features should be added at the end, because some of the tests involve the
output of line numbers, and we don't want these to change.
@ -9,7 +9,7 @@ In the middle of a line, PATTERN appears.
This pattern is in lower case.
Here follows a whole lot of stuff that makes the file over 24K long.
Here follows a whole lot of stuff that makes the file over 24KiB long.
-------------------------------------------------------------------------------
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the

31
testdata/grepoutput vendored
View File

@ -346,7 +346,7 @@ RC=0
./testdata/grepinput-9-
./testdata/grepinput:10:This pattern is in lower case.
./testdata/grepinput-11-
./testdata/grepinput-12-Here follows a whole lot of stuff that makes the file over 24K long.
./testdata/grepinput-12-Here follows a whole lot of stuff that makes the file over 24KiB long.
./testdata/grepinput-13-
--
./testdata/grepinput:623:Check up on PATTERN near the end.
@ -379,6 +379,7 @@ RC=0
./testdata/grepinputx
RC=0
---------------------------- Test 37 -----------------------------
24KiB long so that it needs more than a single read() call to process it. New
aaaaa0
aaaaa2
010203040506
@ -465,11 +466,11 @@ fox jumps
This time it jumps and jumps and jumps.
RC=0
---------------------------- Test 53 ------------------------------
36972,6
36990,4
37024,4
37066,5
37083,4
36976,6
36994,4
37028,4
37070,5
37087,4
RC=0
---------------------------- Test 54 ------------------------------
595:15,6
@ -519,8 +520,8 @@ RC=0
pcre2grep: pcre2_match() gave error -47 while matching text that starts:
This is a file of miscellaneous text that is used as test data for checking
that the pcregrep command is working correctly. The file must be more than 24K
long so that it needs more than a single read
that the pcregrep command is working correctly. The file must be more than
24KiB long so that it needs more than a single re
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
pcre2grep: Check your regex for nested unlimited loops.
@ -529,8 +530,8 @@ RC=1
pcre2grep: pcre2_match() gave error -53 while matching text that starts:
This is a file of miscellaneous text that is used as test data for checking
that the pcregrep command is working correctly. The file must be more than 24K
long so that it needs more than a single read
that the pcregrep command is working correctly. The file must be more than
24KiB long so that it needs more than a single re
pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded.
pcre2grep: Check your regex for nested unlimited loops.
@ -814,11 +815,11 @@ RC=0
615:0,12
RC=0
---------------------------- Test 112 -----------------------------
37168,12
37180,12
37192,12
37204,12
37216,12
37172,12
37184,12
37196,12
37208,12
37220,12
RC=0
---------------------------- Test 113 -----------------------------
480