Remove references to the now-deleted pcre2stack man page.

This commit is contained in:
Philip.Hazel 2017-04-01 09:38:58 +00:00
parent 66ec3fc62f
commit 0bf17d9974
12 changed files with 181 additions and 579 deletions

View File

@ -66,7 +66,7 @@ End
echo "Making pcre2.txt" echo "Making pcre2.txt"
for file in pcre2 pcre2api pcre2build pcre2callout pcre2compat pcre2jit \ for file in pcre2 pcre2api pcre2build pcre2callout pcre2compat pcre2jit \
pcre2limits pcre2matching pcre2partial pcre2pattern pcre2perform \ pcre2limits pcre2matching pcre2partial pcre2pattern pcre2perform \
pcre2posix pcre2sample pcre2serialize pcre2stack pcre2syntax \ pcre2posix pcre2sample pcre2serialize pcre2syntax \
pcre2unicode ; do pcre2unicode ; do
echo " Processing $file.3" echo " Processing $file.3"
nroff -c -man $file.3 >$file.rawtxt nroff -c -man $file.3 >$file.rawtxt
@ -146,7 +146,6 @@ for file in *.3 ; do
toc=-toc toc=-toc
if [ `expr $base : '.*_'` -ne 0 ] ; then toc="" ; fi if [ `expr $base : '.*_'` -ne 0 ] ; then toc="" ; fi
if [ "$base" = "pcre2sample" ] || \ if [ "$base" = "pcre2sample" ] || \
[ "$base" = "pcre2stack" ] || \
[ "$base" = "pcre2compat" ] || \ [ "$base" = "pcre2compat" ] || \
[ "$base" = "pcre2limits" ] || \ [ "$base" = "pcre2limits" ] || \
[ "$base" = "pcre2unicode" ] ; then [ "$base" = "pcre2unicode" ] ; then

View File

@ -167,7 +167,6 @@ listing), and the short pages for individual functions, are concatenated in
pcre2perform discussion of performance issues pcre2perform discussion of performance issues
pcre2posix the POSIX-compatible C API for the 8-bit library pcre2posix the POSIX-compatible C API for the 8-bit library
pcre2sample discussion of the pcre2demo program pcre2sample discussion of the pcre2demo program
pcre2stack discussion of stack and memory usage
pcre2syntax quick syntax reference pcre2syntax quick syntax reference
pcre2test description of the <b>pcre2test</b> command pcre2test description of the <b>pcre2test</b> command
pcre2unicode discussion of Unicode and UTF support pcre2unicode discussion of Unicode and UTF support
@ -190,7 +189,7 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
</P> </P>
<br><a name="SEC5" href="#TOC1">REVISION</a><br> <br><a name="SEC5" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 27 March 2017 Last updated: 01 April 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -3245,7 +3245,7 @@ fail, this error is given.
<P> <P>
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>, <b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3), <b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3). <b>pcre2sample</b>(3), <b>pcre2unicode</b>(3).
</P> </P>
<br><a name="SEC41" href="#TOC1">AUTHOR</a><br> <br><a name="SEC41" href="#TOC1">AUTHOR</a><br>
<P> <P>
@ -3258,7 +3258,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 27 March 2017 Last updated: 01 April 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -275,7 +275,7 @@ limit controls this; it defaults to the value that is set for
to the <b>configure</b> command. This value can also be overridden at run time. to the <b>configure</b> command. This value can also be overridden at run time.
As well as applying to <b>pcre2_match()</b>, this limit also controls the depth As well as applying to <b>pcre2_match()</b>, this limit also controls the depth
of recursive function calls in <b>pcre2_dfa_match()</b>. These are used for of recursive function calls in <b>pcre2_dfa_match()</b>. These are used for
lookaround assertions and recursion within patterns. lookaround assertions, atomic groups, and recursion within patterns.
</P> </P>
<br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br> <br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
<P> <P>
@ -530,7 +530,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC25" href="#TOC1">REVISION</a><br> <br><a name="SEC25" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 29 March 2017 Last updated: 31 March 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -732,12 +732,12 @@ relying on the C I/O library to convert this to an appropriate sequence.
Many of the short and long forms of <b>pcre2grep</b>'s options are the same Many of the short and long forms of <b>pcre2grep</b>'s options are the same
as in the GNU <b>grep</b> program. Any long option of the form as in the GNU <b>grep</b> program. Any long option of the form
<b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b> <b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
(PCRE2 terminology). However, the <b>--file-list</b>, <b>--file-offsets</b>, (PCRE2 terminology). However, the <b>--depth-limit</b>, <b>--file-list</b>,
<b>--include-dir</b>, <b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>--file-offsets</b>, <b>--include-dir</b>, <b>--line-offsets</b>,
<b>-M</b>, <b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>, <b>--multiline</b>, <b>-N</b>,
<b>--recursion-limit</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to <b>--newline</b>, <b>--om-separator</b>, <b>-u</b>, and <b>--utf-8</b> options are
<b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option with a specific to <b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option
capturing parentheses number. with a capturing parentheses number.
</P> </P>
<P> <P>
Although most of the common options work the same way, a few are different in Although most of the common options work the same way, a few are different in
@ -867,7 +867,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC15" href="#TOC1">REVISION</a><br> <br><a name="SEC15" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 21 March 2017 Last updated: 31 March 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -194,12 +194,8 @@ allocation functions, or NULL for standard memory allocation). It returns a
pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
that is no longer needed. (For the technically minded: the address space is that is no longer needed. (For the technically minded: the address space is
allocated by mmap or VirtualAlloc.) allocated by mmap or VirtualAlloc.) A maximum stack size of 512K to 1M should
</P> be more than enough for any pattern.
<P>
JIT uses far less memory for recursion than the interpretive code,
and a maximum stack size of 512K to 1M should be more than enough for any
pattern.
</P> </P>
<P> <P>
The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code
@ -436,7 +432,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC13" href="#TOC1">REVISION</a><br> <br><a name="SEC13" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 30 March 2017 Last updated: 31 March 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -1,217 +0,0 @@
<html>
<head>
<title>pcre2stack specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre2stack man page</h1>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
</p>
<p>
This page is part of the PCRE2 HTML documentation. It was generated
automatically from the original man page. If there is any nonsense in it,
please consult the man page, in case the conversion went wrong.
<br>
<br><b>
PCRE2 DISCUSSION OF STACK USAGE
</b><br>
<P>
When you call <b>pcre2_match()</b>, it makes use of an internal function called
<b>match()</b>. This calls itself recursively at branch points in the pattern,
in order to remember the state of the match so that it can back up and try a
different alternative after a failure. As matching proceeds deeper and deeper
into the tree of possibilities, the recursion depth increases. The
<b>match()</b> function is also called in other circumstances, for example,
whenever a parenthesized sub-pattern is entered, and in certain cases of
repetition.
</P>
<P>
Not all calls of <b>match()</b> increase the recursion depth; for an item such
as a* it may be called several times at the same level, after matching
different numbers of a's. Furthermore, in a number of cases where the result of
the recursive call would immediately be passed back as the result of the
current call (a "tail recursion"), the function is just restarted instead.
</P>
<P>
Each time the internal <b>match()</b> function is called recursively, it uses
memory from the process stack. For certain kinds of pattern and data, very
large amounts of stack may be needed, despite the recognition of "tail
recursion". Note that if PCRE2 is compiled with the -fsanitize=address option
of the GCC compiler, the stack requirements are greatly increased.
</P>
<P>
The above comments apply when <b>pcre2_match()</b> is run in its normal
interpretive manner. If the compiled pattern was processed by
<b>pcre2_jit_compile()</b>, and just-in-time compiling was successful, and the
options passed to <b>pcre2_match()</b> were not incompatible, the matching
process uses the JIT-compiled code instead of the <b>match()</b> function. In
this case, the memory requirements are handled entirely differently. See the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for details.
</P>
<P>
The <b>pcre2_dfa_match()</b> function operates in a different way to
<b>pcre2_match()</b>, and uses recursion only when there is a regular expression
recursion or subroutine call in the pattern. This includes the processing of
assertion and "once-only" subpatterns, which are handled like subroutine calls.
Normally, these are never very deep, and the limit on the complexity of
<b>pcre2_dfa_match()</b> is controlled by the amount of workspace it is given.
However, it is possible to write patterns with runaway infinite recursions;
such patterns will cause <b>pcre2_dfa_match()</b> to run out of stack unless a
limit is applied (see below).
</P>
<P>
The comments in the next three sections do not apply to
<b>pcre2_dfa_match()</b>; they are relevant only for <b>pcre2_match()</b> without
the JIT optimization.
</P>
<br><b>
Reducing <b>pcre2_match()</b>'s stack usage
</b><br>
<P>
You can often reduce the amount of recursion, and therefore the
amount of stack used, by modifying the pattern that is being matched. Consider,
for example, this pattern:
<pre>
([^&#60;]|&#60;(?!inet))+
</pre>
It matches from wherever it starts until it encounters "&#60;inet" or the end of
the data, and is the kind of pattern that might be used when processing an XML
file. Each iteration of the outer parentheses matches either one character that
is not "&#60;" or a "&#60;" that is not followed by "inet". However, each time a
parenthesis is processed, a recursion occurs, so this formulation uses a stack
frame for each matched character. For a long string, a lot of stack is
required. Consider now this rewritten pattern, which matches exactly the same
strings:
<pre>
([^&#60;]++|&#60;(?!inet))+
</pre>
This uses very much less stack, because runs of characters that do not contain
"&#60;" are "swallowed" in one item inside the parentheses. Recursion happens only
when a "&#60;" character that is not followed by "inet" is encountered (and we
assume this is relatively rare). A possessive quantifier is used to stop any
backtracking into the runs of non-"&#60;" characters, but that is not related to
stack usage.
</P>
<P>
This example shows that one way of avoiding stack problems when matching long
subject strings is to write repeated parenthesized subpatterns to match more
than one character whenever possible.
</P>
<br><b>
Compiling PCRE2 to use heap instead of stack for <b>pcre2_match()</b>
</b><br>
<P>
In environments where stack memory is constrained, you might want to compile
PCRE2 to use heap memory instead of stack for remembering back-up points when
<b>pcre2_match()</b> is running. This makes it run more slowly, however. Details
of how to do this are given in the
<a href="pcre2build.html"><b>pcre2build</b></a>
documentation. When built in this way, instead of using the stack, PCRE2
gets memory for remembering backup points from the heap. By default, the memory
is obtained by calling the system <b>malloc()</b> function, but you can arrange
to supply your own memory management function. For details, see the section
entitled
<a href="pcre2api.html#matchcontext">"The match context"</a>
in the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation. Since the block sizes are always the same, it may be possible to
implement a customized memory handler that is more efficient than the standard
function. The memory blocks obtained for this purpose are retained and re-used
if possible while <b>pcre2_match()</b> is running. They are all freed just
before it exits.
</P>
<br><b>
Limiting <b>pcre2_match()</b>'s stack usage
</b><br>
<P>
You can set limits on the number of times the internal <b>match()</b> function
is called, both in total and recursively. If a limit is exceeded,
<b>pcre2_match()</b> returns an error code. Setting suitable limits should
prevent it from running out of stack. The default values of the limits are very
large, and unlikely ever to operate. They can be changed when PCRE2 is built,
and they can also be set when <b>pcre2_match()</b> is called. For details of
these interfaces, see the
<a href="pcre2build.html"><b>pcre2build</b></a>
documentation and the section entitled
<a href="pcre2api.html#matchcontext">"The match context"</a>
in the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation.
</P>
<P>
As a very rough rule of thumb, you should reckon on about 500 bytes per
recursion. Thus, if you want to limit your stack usage to 8Mb, you should set
the limit at 16000 recursions. A 64Mb stack, on the other hand, can support
around 128000 recursions.
</P>
<P>
The <b>pcre2test</b> test program has a modifier called "find_limits" which, if
applied to a subject line, causes it to find the smallest limits that allow a a
pattern to match. This is done by calling <b>pcre2_match()</b> repeatedly with
different limits.
</P>
<br><b>
Limiting <b>pcre2_dfa_match()</b>'s stack usage
</b><br>
<P>
The recursion limit, as described above for <b>pcre2_match()</b>, also applies
to <b>pcre2_dfa_match()</b>, whose use of recursive function calls for
recursions in the pattern can lead to runaway stack usage. The non-recursive
match limit is not relevant for DFA matching, and is ignored.
</P>
<br><b>
Changing stack size in Unix-like systems
</b><br>
<P>
In Unix-like environments, there is not often a problem with the stack unless
very long strings are involved, though the default limit on stack size varies
from system to system. Values from 8Mb to 64Mb are common. You can find your
default limit by running the command:
<pre>
ulimit -s
</pre>
Unfortunately, the effect of running out of stack is often SIGSEGV, though
sometimes a more explicit error message is given. You can normally increase the
limit on stack size by code such as this:
<pre>
struct rlimit rlim;
getrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur = 100*1024*1024;
setrlimit(RLIMIT_STACK, &rlim);
</pre>
This reads the current limits (soft and hard) using <b>getrlimit()</b>, then
attempts to increase the soft limit to 100Mb using <b>setrlimit()</b>. You must
do this before calling <b>pcre2_match()</b>.
</P>
<br><b>
Changing stack size in Mac OS X
</b><br>
<P>
Using <b>setrlimit()</b>, as described above, should also work on Mac OS X. It
is also possible to set a stack size when linking a program. There is a
discussion about stack sizes in Mac OS X at this web site:
<a href="http://developer.apple.com/qa/qa2005/qa1419.html">http://developer.apple.com/qa/qa2005/qa1419.html.</a>
</P>
<br><b>
AUTHOR
</b><br>
<P>
Philip Hazel
<br>
University Computing Service
<br>
Cambridge, England.
<br>
</P>
<br><b>
REVISION
</b><br>
<P>
Last updated: 23 December 2016
<br>
Copyright &copy; 1997-2016 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
</p>

View File

@ -440,7 +440,7 @@ of the newline or \R options with similar syntax. More than one of them may
appear. appear.
<pre> <pre>
(*LIMIT_MATCH=d) set the match limit to d (decimal number) (*LIMIT_MATCH=d) set the match limit to d (decimal number)
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) (*LIMIT_DEPTH=d) set the backtracking limit to d (decimal number)
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching (*NOTEMPTY) set PCRE2_NOTEMPTY when matching
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
(*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS) (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
@ -450,11 +450,11 @@ appear.
(*UTF) set appropriate UTF mode for the library in use (*UTF) set appropriate UTF mode for the library in use
(*UCP) set PCRE2_UCP (use Unicode properties for \d etc) (*UCP) set PCRE2_UCP (use Unicode properties for \d etc)
</pre> </pre>
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the Note that LIMIT_MATCH and LIMIT_DEPTH can only reduce the value of the limits
limits set by the caller of <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>, not set by the caller of <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>, not
increase them. The application can lock out the use of (*UTF) and (*UCP) by increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The
setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at application can lock out the use of (*UTF) and (*UCP) by setting the
compile time. PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
</P> </P>
<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br> <br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
<P> <P>
@ -596,9 +596,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br> <br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 23 December 2016 Last updated: 31 March 2017
<br> <br>
Copyright &copy; 1997-2016 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -1,4 +1,4 @@
.TH PCRE2 3 "23 March 2017" "PCRE2 10.30" .TH PCRE2 3 "01 April 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH INTRODUCTION .SH INTRODUCTION
@ -164,7 +164,6 @@ listing), and the short pages for individual functions, are concatenated in
pcre2perform discussion of performance issues pcre2perform discussion of performance issues
pcre2posix the POSIX-compatible C API for the 8-bit library pcre2posix the POSIX-compatible C API for the 8-bit library
pcre2sample discussion of the pcre2demo program pcre2sample discussion of the pcre2demo program
pcre2stack discussion of stack and memory usage
pcre2syntax quick syntax reference pcre2syntax quick syntax reference
pcre2test description of the \fBpcre2test\fP command pcre2test description of the \fBpcre2test\fP command
pcre2unicode discussion of Unicode and UTF support pcre2unicode discussion of Unicode and UTF support
@ -190,6 +189,6 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
.rs .rs
.sp .sp
.nf .nf
Last updated: 27 March 2017 Last updated: 01 April 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -146,7 +146,6 @@ USER DOCUMENTATION
pcre2perform discussion of performance issues pcre2perform discussion of performance issues
pcre2posix the POSIX-compatible C API for the 8-bit library pcre2posix the POSIX-compatible C API for the 8-bit library
pcre2sample discussion of the pcre2demo program pcre2sample discussion of the pcre2demo program
pcre2stack discussion of stack and memory usage
pcre2syntax quick syntax reference pcre2syntax quick syntax reference
pcre2test description of the pcre2test command pcre2test description of the pcre2test command
pcre2unicode discussion of Unicode and UTF support pcre2unicode discussion of Unicode and UTF support
@ -168,7 +167,7 @@ AUTHOR
REVISION REVISION
Last updated: 27 March 2017 Last updated: 01 April 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -3161,8 +3160,7 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
SEE ALSO SEE ALSO
pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3), pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3),
pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2stack(3), pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).
pcre2unicode(3).
AUTHOR AUTHOR
@ -3174,7 +3172,7 @@ AUTHOR
REVISION REVISION
Last updated: 27 March 2017 Last updated: 01 April 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -3425,7 +3423,8 @@ LIMITING PCRE2 RESOURCE USAGE
to the configure command. This value can also be overridden at run to the configure command. This value can also be overridden at run
time. As well as applying to pcre2_match(), this limit also controls time. As well as applying to pcre2_match(), this limit also controls
the depth of recursive function calls in pcre2_dfa_match(). These are the depth of recursive function calls in pcre2_dfa_match(). These are
used for lookaround assertions and recursion within patterns. used for lookaround assertions, atomic groups, and recursion within
patterns.
CREATING CHARACTER TABLES AT BUILD TIME CREATING CHARACTER TABLES AT BUILD TIME
@ -3687,7 +3686,7 @@ AUTHOR
REVISION REVISION
Last updated: 29 March 2017 Last updated: 31 March 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -4436,11 +4435,8 @@ CONTROLLING THE JIT STACK
It returns a pointer to an opaque structure of type pcre2_jit_stack, or It returns a pointer to an opaque structure of type pcre2_jit_stack, or
NULL if there is an error. The pcre2_jit_stack_free() function is used NULL if there is an error. The pcre2_jit_stack_free() function is used
to free a stack that is no longer needed. (For the technically minded: to free a stack that is no longer needed. (For the technically minded:
the address space is allocated by mmap or VirtualAlloc.) the address space is allocated by mmap or VirtualAlloc.) A maximum
stack size of 512K to 1M should be more than enough for any pattern.
JIT uses far less memory for recursion than the interpretive code, and
a maximum stack size of 512K to 1M should be more than enough for any
pattern.
The pcre2_jit_stack_assign() function specifies which stack JIT code The pcre2_jit_stack_assign() function specifies which stack JIT code
should use. Its arguments are as follows: should use. Its arguments are as follows:
@ -4664,7 +4660,7 @@ AUTHOR
REVISION REVISION
Last updated: 30 March 2017 Last updated: 31 March 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -9229,177 +9225,6 @@ REVISION
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2STACK(3) Library Functions Manual PCRE2STACK(3)
NAME
PCRE2 - Perl-compatible regular expressions (revised API)
PCRE2 DISCUSSION OF STACK USAGE
When you call pcre2_match(), it makes use of an internal function
called match(). This calls itself recursively at branch points in the
pattern, in order to remember the state of the match so that it can
back up and try a different alternative after a failure. As matching
proceeds deeper and deeper into the tree of possibilities, the recur-
sion depth increases. The match() function is also called in other cir-
cumstances, for example, whenever a parenthesized sub-pattern is
entered, and in certain cases of repetition.
Not all calls of match() increase the recursion depth; for an item such
as a* it may be called several times at the same level, after matching
different numbers of a's. Furthermore, in a number of cases where the
result of the recursive call would immediately be passed back as the
result of the current call (a "tail recursion"), the function is just
restarted instead.
Each time the internal match() function is called recursively, it uses
memory from the process stack. For certain kinds of pattern and data,
very large amounts of stack may be needed, despite the recognition of
"tail recursion". Note that if PCRE2 is compiled with the -fsani-
tize=address option of the GCC compiler, the stack requirements are
greatly increased.
The above comments apply when pcre2_match() is run in its normal inter-
pretive manner. If the compiled pattern was processed by pcre2_jit_com-
pile(), and just-in-time compiling was successful, and the options
passed to pcre2_match() were not incompatible, the matching process
uses the JIT-compiled code instead of the match() function. In this
case, the memory requirements are handled entirely differently. See the
pcre2jit documentation for details.
The pcre2_dfa_match() function operates in a different way to
pcre2_match(), and uses recursion only when there is a regular expres-
sion recursion or subroutine call in the pattern. This includes the
processing of assertion and "once-only" subpatterns, which are handled
like subroutine calls. Normally, these are never very deep, and the
limit on the complexity of pcre2_dfa_match() is controlled by the
amount of workspace it is given. However, it is possible to write pat-
terns with runaway infinite recursions; such patterns will cause
pcre2_dfa_match() to run out of stack unless a limit is applied (see
below).
The comments in the next three sections do not apply to
pcre2_dfa_match(); they are relevant only for pcre2_match() without the
JIT optimization.
Reducing pcre2_match()'s stack usage
You can often reduce the amount of recursion, and therefore the amount
of stack used, by modifying the pattern that is being matched. Con-
sider, for example, this pattern:
([^<]|<(?!inet))+
It matches from wherever it starts until it encounters "<inet" or the
end of the data, and is the kind of pattern that might be used when
processing an XML file. Each iteration of the outer parentheses matches
either one character that is not "<" or a "<" that is not followed by
"inet". However, each time a parenthesis is processed, a recursion
occurs, so this formulation uses a stack frame for each matched charac-
ter. For a long string, a lot of stack is required. Consider now this
rewritten pattern, which matches exactly the same strings:
([^<]++|<(?!inet))+
This uses very much less stack, because runs of characters that do not
contain "<" are "swallowed" in one item inside the parentheses. Recur-
sion happens only when a "<" character that is not followed by "inet"
is encountered (and we assume this is relatively rare). A possessive
quantifier is used to stop any backtracking into the runs of non-"<"
characters, but that is not related to stack usage.
This example shows that one way of avoiding stack problems when match-
ing long subject strings is to write repeated parenthesized subpatterns
to match more than one character whenever possible.
Compiling PCRE2 to use heap instead of stack for pcre2_match()
In environments where stack memory is constrained, you might want to
compile PCRE2 to use heap memory instead of stack for remembering back-
up points when pcre2_match() is running. This makes it run more slowly,
however. Details of how to do this are given in the pcre2build documen-
tation. When built in this way, instead of using the stack, PCRE2 gets
memory for remembering backup points from the heap. By default, the
memory is obtained by calling the system malloc() function, but you can
arrange to supply your own memory management function. For details, see
the section entitled "The match context" in the pcre2api documentation.
Since the block sizes are always the same, it may be possible to imple-
ment a customized memory handler that is more efficient than the stan-
dard function. The memory blocks obtained for this purpose are retained
and re-used if possible while pcre2_match() is running. They are all
freed just before it exits.
Limiting pcre2_match()'s stack usage
You can set limits on the number of times the internal match() function
is called, both in total and recursively. If a limit is exceeded,
pcre2_match() returns an error code. Setting suitable limits should
prevent it from running out of stack. The default values of the limits
are very large, and unlikely ever to operate. They can be changed when
PCRE2 is built, and they can also be set when pcre2_match() is called.
For details of these interfaces, see the pcre2build documentation and
the section entitled "The match context" in the pcre2api documentation.
As a very rough rule of thumb, you should reckon on about 500 bytes per
recursion. Thus, if you want to limit your stack usage to 8Mb, you
should set the limit at 16000 recursions. A 64Mb stack, on the other
hand, can support around 128000 recursions.
The pcre2test test program has a modifier called "find_limits" which,
if applied to a subject line, causes it to find the smallest limits
that allow a a pattern to match. This is done by calling pcre2_match()
repeatedly with different limits.
Limiting pcre2_dfa_match()'s stack usage
The recursion limit, as described above for pcre2_match(), also applies
to pcre2_dfa_match(), whose use of recursive function calls for recur-
sions in the pattern can lead to runaway stack usage. The non-recursive
match limit is not relevant for DFA matching, and is ignored.
Changing stack size in Unix-like systems
In Unix-like environments, there is not often a problem with the stack
unless very long strings are involved, though the default limit on
stack size varies from system to system. Values from 8Mb to 64Mb are
common. You can find your default limit by running the command:
ulimit -s
Unfortunately, the effect of running out of stack is often SIGSEGV,
though sometimes a more explicit error message is given. You can nor-
mally increase the limit on stack size by code such as this:
struct rlimit rlim;
getrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur = 100*1024*1024;
setrlimit(RLIMIT_STACK, &rlim);
This reads the current limits (soft and hard) using getrlimit(), then
attempts to increase the soft limit to 100Mb using setrlimit(). You
must do this before calling pcre2_match().
Changing stack size in Mac OS X
Using setrlimit(), as described above, should also work on Mac OS X. It
is also possible to set a stack size when linking a program. There is a
discussion about stack sizes in Mac OS X at this web site:
http://developer.apple.com/qa/qa2005/qa1419.html.
AUTHOR
Philip Hazel
University Computing Service
Cambridge, England.
REVISION
Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -9686,7 +9511,7 @@ OPTION SETTING
one of them may appear. one of them may appear.
(*LIMIT_MATCH=d) set the match limit to d (decimal number) (*LIMIT_MATCH=d) set the match limit to d (decimal number)
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) (*LIMIT_DEPTH=d) set the backtracking limit to d (decimal number)
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching (*NOTEMPTY) set PCRE2_NOTEMPTY when matching
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
(*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS) (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
@ -9696,11 +9521,12 @@ OPTION SETTING
(*UTF) set appropriate UTF mode for the library in use (*UTF) set appropriate UTF mode for the library in use
(*UCP) set PCRE2_UCP (use Unicode properties for \d etc) (*UCP) set PCRE2_UCP (use Unicode properties for \d etc)
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of Note that LIMIT_MATCH and LIMIT_DEPTH can only reduce the value of the
the limits set by the caller of pcre2_match() or pcre2_dfa_match(), not limits set by the caller of pcre2_match() or pcre2_dfa_match(), not
increase them. The application can lock out the use of (*UTF) and increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH.
(*UCP) by setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, The application can lock out the use of (*UTF) and (*UCP) by setting
respectively, at compile time. the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at com-
pile time.
NEWLINE CONVENTION NEWLINE CONVENTION
@ -9841,8 +9667,8 @@ AUTHOR
REVISION REVISION
Last updated: 23 December 2016 Last updated: 31 March 2017
Copyright (c) 1997-2016 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "27 March 2017" "PCRE2 10.30" .TH PCRE2API 3 "01 April 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -3292,7 +3292,7 @@ fail, this error is given.
.sp .sp
\fBpcre2build\fP(3), \fBpcre2callout\fP(3), \fBpcre2demo(3)\fP, \fBpcre2build\fP(3), \fBpcre2callout\fP(3), \fBpcre2demo(3)\fP,
\fBpcre2matching\fP(3), \fBpcre2partial\fP(3), \fBpcre2posix\fP(3), \fBpcre2matching\fP(3), \fBpcre2partial\fP(3), \fBpcre2posix\fP(3),
\fBpcre2sample\fP(3), \fBpcre2stack\fP(3), \fBpcre2unicode\fP(3). \fBpcre2sample\fP(3), \fBpcre2unicode\fP(3).
. .
. .
.SH AUTHOR .SH AUTHOR
@ -3309,6 +3309,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 27 March 2017 Last updated: 01 April 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -721,9 +721,9 @@ OPTIONS COMPATIBILITY
Many of the short and long forms of pcre2grep's options are the same as Many of the short and long forms of pcre2grep's options are the same as
in the GNU grep program. Any long option of the form --xxx-regexp (GNU in the GNU grep program. Any long option of the form --xxx-regexp (GNU
terminology) is also available as --xxx-regex (PCRE2 terminology). How- terminology) is also available as --xxx-regex (PCRE2 terminology). How-
ever, the --file-list, --file-offsets, --include-dir, --line-offsets, ever, the --depth-limit, --file-list, --file-offsets, --include-dir,
--locale, --match-limit, -M, --multiline, -N, --newline, --om-separa- --line-offsets, --locale, --match-limit, -M, --multiline, -N, --new-
tor, --recursion-limit, -u, and --utf-8 options are specific to line, --om-separator, -u, and --utf-8 options are specific to
pcre2grep, as is the use of the --only-matching option with a capturing pcre2grep, as is the use of the --only-matching option with a capturing
parentheses number. parentheses number.
@ -857,5 +857,5 @@ AUTHOR
REVISION REVISION
Last updated: 21 March 2017 Last updated: 31 March 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.