Remove references to the now-deleted pcre2stack man page.

This commit is contained in:
Philip.Hazel 2017-04-01 09:38:58 +00:00
parent 66ec3fc62f
commit 0bf17d9974
12 changed files with 181 additions and 579 deletions

View File

@ -66,7 +66,7 @@ End
echo "Making pcre2.txt" echo "Making pcre2.txt"
for file in pcre2 pcre2api pcre2build pcre2callout pcre2compat pcre2jit \ for file in pcre2 pcre2api pcre2build pcre2callout pcre2compat pcre2jit \
pcre2limits pcre2matching pcre2partial pcre2pattern pcre2perform \ pcre2limits pcre2matching pcre2partial pcre2pattern pcre2perform \
pcre2posix pcre2sample pcre2serialize pcre2stack pcre2syntax \ pcre2posix pcre2sample pcre2serialize pcre2syntax \
pcre2unicode ; do pcre2unicode ; do
echo " Processing $file.3" echo " Processing $file.3"
nroff -c -man $file.3 >$file.rawtxt nroff -c -man $file.3 >$file.rawtxt
@ -146,7 +146,6 @@ for file in *.3 ; do
toc=-toc toc=-toc
if [ `expr $base : '.*_'` -ne 0 ] ; then toc="" ; fi if [ `expr $base : '.*_'` -ne 0 ] ; then toc="" ; fi
if [ "$base" = "pcre2sample" ] || \ if [ "$base" = "pcre2sample" ] || \
[ "$base" = "pcre2stack" ] || \
[ "$base" = "pcre2compat" ] || \ [ "$base" = "pcre2compat" ] || \
[ "$base" = "pcre2limits" ] || \ [ "$base" = "pcre2limits" ] || \
[ "$base" = "pcre2unicode" ] ; then [ "$base" = "pcre2unicode" ] ; then

View File

@ -167,7 +167,6 @@ listing), and the short pages for individual functions, are concatenated in
pcre2perform discussion of performance issues pcre2perform discussion of performance issues
pcre2posix the POSIX-compatible C API for the 8-bit library pcre2posix the POSIX-compatible C API for the 8-bit library
pcre2sample discussion of the pcre2demo program pcre2sample discussion of the pcre2demo program
pcre2stack discussion of stack and memory usage
pcre2syntax quick syntax reference pcre2syntax quick syntax reference
pcre2test description of the <b>pcre2test</b> command pcre2test description of the <b>pcre2test</b> command
pcre2unicode discussion of Unicode and UTF support pcre2unicode discussion of Unicode and UTF support
@ -190,7 +189,7 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
</P> </P>
<br><a name="SEC5" href="#TOC1">REVISION</a><br> <br><a name="SEC5" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 27 March 2017 Last updated: 01 April 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -3245,7 +3245,7 @@ fail, this error is given.
<P> <P>
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>, <b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3), <b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3). <b>pcre2sample</b>(3), <b>pcre2unicode</b>(3).
</P> </P>
<br><a name="SEC41" href="#TOC1">AUTHOR</a><br> <br><a name="SEC41" href="#TOC1">AUTHOR</a><br>
<P> <P>
@ -3258,7 +3258,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 27 March 2017 Last updated: 01 April 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -275,7 +275,7 @@ limit controls this; it defaults to the value that is set for
to the <b>configure</b> command. This value can also be overridden at run time. to the <b>configure</b> command. This value can also be overridden at run time.
As well as applying to <b>pcre2_match()</b>, this limit also controls the depth As well as applying to <b>pcre2_match()</b>, this limit also controls the depth
of recursive function calls in <b>pcre2_dfa_match()</b>. These are used for of recursive function calls in <b>pcre2_dfa_match()</b>. These are used for
lookaround assertions and recursion within patterns. lookaround assertions, atomic groups, and recursion within patterns.
</P> </P>
<br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br> <br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
<P> <P>
@ -530,7 +530,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC25" href="#TOC1">REVISION</a><br> <br><a name="SEC25" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 29 March 2017 Last updated: 31 March 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -732,12 +732,12 @@ relying on the C I/O library to convert this to an appropriate sequence.
Many of the short and long forms of <b>pcre2grep</b>'s options are the same Many of the short and long forms of <b>pcre2grep</b>'s options are the same
as in the GNU <b>grep</b> program. Any long option of the form as in the GNU <b>grep</b> program. Any long option of the form
<b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b> <b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
(PCRE2 terminology). However, the <b>--file-list</b>, <b>--file-offsets</b>, (PCRE2 terminology). However, the <b>--depth-limit</b>, <b>--file-list</b>,
<b>--include-dir</b>, <b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>--file-offsets</b>, <b>--include-dir</b>, <b>--line-offsets</b>,
<b>-M</b>, <b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>, <b>--multiline</b>, <b>-N</b>,
<b>--recursion-limit</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to <b>--newline</b>, <b>--om-separator</b>, <b>-u</b>, and <b>--utf-8</b> options are
<b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option with a specific to <b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option
capturing parentheses number. with a capturing parentheses number.
</P> </P>
<P> <P>
Although most of the common options work the same way, a few are different in Although most of the common options work the same way, a few are different in
@ -867,7 +867,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC15" href="#TOC1">REVISION</a><br> <br><a name="SEC15" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 21 March 2017 Last updated: 31 March 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -194,12 +194,8 @@ allocation functions, or NULL for standard memory allocation). It returns a
pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
that is no longer needed. (For the technically minded: the address space is that is no longer needed. (For the technically minded: the address space is
allocated by mmap or VirtualAlloc.) allocated by mmap or VirtualAlloc.) A maximum stack size of 512K to 1M should
</P> be more than enough for any pattern.
<P>
JIT uses far less memory for recursion than the interpretive code,
and a maximum stack size of 512K to 1M should be more than enough for any
pattern.
</P> </P>
<P> <P>
The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code The <b>pcre2_jit_stack_assign()</b> function specifies which stack JIT code
@ -436,7 +432,7 @@ Cambridge, England.
</P> </P>
<br><a name="SEC13" href="#TOC1">REVISION</a><br> <br><a name="SEC13" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 30 March 2017 Last updated: 31 March 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -1,217 +0,0 @@
<html>
<head>
<title>pcre2stack specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre2stack man page</h1>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
</p>
<p>
This page is part of the PCRE2 HTML documentation. It was generated
automatically from the original man page. If there is any nonsense in it,
please consult the man page, in case the conversion went wrong.
<br>
<br><b>
PCRE2 DISCUSSION OF STACK USAGE
</b><br>
<P>
When you call <b>pcre2_match()</b>, it makes use of an internal function called
<b>match()</b>. This calls itself recursively at branch points in the pattern,
in order to remember the state of the match so that it can back up and try a
different alternative after a failure. As matching proceeds deeper and deeper
into the tree of possibilities, the recursion depth increases. The
<b>match()</b> function is also called in other circumstances, for example,
whenever a parenthesized sub-pattern is entered, and in certain cases of
repetition.
</P>
<P>
Not all calls of <b>match()</b> increase the recursion depth; for an item such
as a* it may be called several times at the same level, after matching
different numbers of a's. Furthermore, in a number of cases where the result of
the recursive call would immediately be passed back as the result of the
current call (a "tail recursion"), the function is just restarted instead.
</P>
<P>
Each time the internal <b>match()</b> function is called recursively, it uses
memory from the process stack. For certain kinds of pattern and data, very
large amounts of stack may be needed, despite the recognition of "tail
recursion". Note that if PCRE2 is compiled with the -fsanitize=address option
of the GCC compiler, the stack requirements are greatly increased.
</P>
<P>
The above comments apply when <b>pcre2_match()</b> is run in its normal
interpretive manner. If the compiled pattern was processed by
<b>pcre2_jit_compile()</b>, and just-in-time compiling was successful, and the
options passed to <b>pcre2_match()</b> were not incompatible, the matching
process uses the JIT-compiled code instead of the <b>match()</b> function. In
this case, the memory requirements are handled entirely differently. See the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for details.
</P>
<P>
The <b>pcre2_dfa_match()</b> function operates in a different way to
<b>pcre2_match()</b>, and uses recursion only when there is a regular expression
recursion or subroutine call in the pattern. This includes the processing of
assertion and "once-only" subpatterns, which are handled like subroutine calls.
Normally, these are never very deep, and the limit on the complexity of
<b>pcre2_dfa_match()</b> is controlled by the amount of workspace it is given.
However, it is possible to write patterns with runaway infinite recursions;
such patterns will cause <b>pcre2_dfa_match()</b> to run out of stack unless a
limit is applied (see below).
</P>
<P>
The comments in the next three sections do not apply to
<b>pcre2_dfa_match()</b>; they are relevant only for <b>pcre2_match()</b> without
the JIT optimization.
</P>
<br><b>
Reducing <b>pcre2_match()</b>'s stack usage
</b><br>
<P>
You can often reduce the amount of recursion, and therefore the
amount of stack used, by modifying the pattern that is being matched. Consider,
for example, this pattern:
<pre>
([^&#60;]|&#60;(?!inet))+
</pre>
It matches from wherever it starts until it encounters "&#60;inet" or the end of
the data, and is the kind of pattern that might be used when processing an XML
file. Each iteration of the outer parentheses matches either one character that
is not "&#60;" or a "&#60;" that is not followed by "inet". However, each time a
parenthesis is processed, a recursion occurs, so this formulation uses a stack
frame for each matched character. For a long string, a lot of stack is
required. Consider now this rewritten pattern, which matches exactly the same
strings:
<pre>
([^&#60;]++|&#60;(?!inet))+
</pre>
This uses very much less stack, because runs of characters that do not contain
"&#60;" are "swallowed" in one item inside the parentheses. Recursion happens only
when a "&#60;" character that is not followed by "inet" is encountered (and we
assume this is relatively rare). A possessive quantifier is used to stop any
backtracking into the runs of non-"&#60;" characters, but that is not related to
stack usage.
</P>
<P>
This example shows that one way of avoiding stack problems when matching long
subject strings is to write repeated parenthesized subpatterns to match more
than one character whenever possible.
</P>
<br><b>
Compiling PCRE2 to use heap instead of stack for <b>pcre2_match()</b>
</b><br>
<P>
In environments where stack memory is constrained, you might want to compile
PCRE2 to use heap memory instead of stack for remembering back-up points when
<b>pcre2_match()</b> is running. This makes it run more slowly, however. Details
of how to do this are given in the
<a href="pcre2build.html"><b>pcre2build</b></a>
documentation. When built in this way, instead of using the stack, PCRE2
gets memory for remembering backup points from the heap. By default, the memory
is obtained by calling the system <b>malloc()</b> function, but you can arrange
to supply your own memory management function. For details, see the section
entitled
<a href="pcre2api.html#matchcontext">"The match context"</a>
in the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation. Since the block sizes are always the same, it may be possible to
implement a customized memory handler that is more efficient than the standard
function. The memory blocks obtained for this purpose are retained and re-used
if possible while <b>pcre2_match()</b> is running. They are all freed just
before it exits.
</P>
<br><b>
Limiting <b>pcre2_match()</b>'s stack usage
</b><br>
<P>
You can set limits on the number of times the internal <b>match()</b> function
is called, both in total and recursively. If a limit is exceeded,
<b>pcre2_match()</b> returns an error code. Setting suitable limits should
prevent it from running out of stack. The default values of the limits are very
large, and unlikely ever to operate. They can be changed when PCRE2 is built,
and they can also be set when <b>pcre2_match()</b> is called. For details of
these interfaces, see the
<a href="pcre2build.html"><b>pcre2build</b></a>
documentation and the section entitled
<a href="pcre2api.html#matchcontext">"The match context"</a>
in the
<a href="pcre2api.html"><b>pcre2api</b></a>
documentation.
</P>
<P>
As a very rough rule of thumb, you should reckon on about 500 bytes per
recursion. Thus, if you want to limit your stack usage to 8Mb, you should set
the limit at 16000 recursions. A 64Mb stack, on the other hand, can support
around 128000 recursions.
</P>
<P>
The <b>pcre2test</b> test program has a modifier called "find_limits" which, if
applied to a subject line, causes it to find the smallest limits that allow a a
pattern to match. This is done by calling <b>pcre2_match()</b> repeatedly with
different limits.
</P>
<br><b>
Limiting <b>pcre2_dfa_match()</b>'s stack usage
</b><br>
<P>
The recursion limit, as described above for <b>pcre2_match()</b>, also applies
to <b>pcre2_dfa_match()</b>, whose use of recursive function calls for
recursions in the pattern can lead to runaway stack usage. The non-recursive
match limit is not relevant for DFA matching, and is ignored.
</P>
<br><b>
Changing stack size in Unix-like systems
</b><br>
<P>
In Unix-like environments, there is not often a problem with the stack unless
very long strings are involved, though the default limit on stack size varies
from system to system. Values from 8Mb to 64Mb are common. You can find your
default limit by running the command:
<pre>
ulimit -s
</pre>
Unfortunately, the effect of running out of stack is often SIGSEGV, though
sometimes a more explicit error message is given. You can normally increase the
limit on stack size by code such as this:
<pre>
struct rlimit rlim;
getrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur = 100*1024*1024;
setrlimit(RLIMIT_STACK, &rlim);
</pre>
This reads the current limits (soft and hard) using <b>getrlimit()</b>, then
attempts to increase the soft limit to 100Mb using <b>setrlimit()</b>. You must
do this before calling <b>pcre2_match()</b>.
</P>
<br><b>
Changing stack size in Mac OS X
</b><br>
<P>
Using <b>setrlimit()</b>, as described above, should also work on Mac OS X. It
is also possible to set a stack size when linking a program. There is a
discussion about stack sizes in Mac OS X at this web site:
<a href="http://developer.apple.com/qa/qa2005/qa1419.html">http://developer.apple.com/qa/qa2005/qa1419.html.</a>
</P>
<br><b>
AUTHOR
</b><br>
<P>
Philip Hazel
<br>
University Computing Service
<br>
Cambridge, England.
<br>
</P>
<br><b>
REVISION
</b><br>
<P>
Last updated: 23 December 2016
<br>
Copyright &copy; 1997-2016 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
</p>

View File

@ -440,7 +440,7 @@ of the newline or \R options with similar syntax. More than one of them may
appear. appear.
<pre> <pre>
(*LIMIT_MATCH=d) set the match limit to d (decimal number) (*LIMIT_MATCH=d) set the match limit to d (decimal number)
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) (*LIMIT_DEPTH=d) set the backtracking limit to d (decimal number)
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching (*NOTEMPTY) set PCRE2_NOTEMPTY when matching
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
(*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS) (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
@ -450,11 +450,11 @@ appear.
(*UTF) set appropriate UTF mode for the library in use (*UTF) set appropriate UTF mode for the library in use
(*UCP) set PCRE2_UCP (use Unicode properties for \d etc) (*UCP) set PCRE2_UCP (use Unicode properties for \d etc)
</pre> </pre>
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the Note that LIMIT_MATCH and LIMIT_DEPTH can only reduce the value of the limits
limits set by the caller of <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>, not set by the caller of <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>, not
increase them. The application can lock out the use of (*UTF) and (*UCP) by increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The
setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at application can lock out the use of (*UTF) and (*UCP) by setting the
compile time. PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
</P> </P>
<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br> <br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
<P> <P>
@ -596,9 +596,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br> <br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 23 December 2016 Last updated: 31 March 2017
<br> <br>
Copyright &copy; 1997-2016 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -1,4 +1,4 @@
.TH PCRE2 3 "23 March 2017" "PCRE2 10.30" .TH PCRE2 3 "01 April 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH INTRODUCTION .SH INTRODUCTION
@ -164,7 +164,6 @@ listing), and the short pages for individual functions, are concatenated in
pcre2perform discussion of performance issues pcre2perform discussion of performance issues
pcre2posix the POSIX-compatible C API for the 8-bit library pcre2posix the POSIX-compatible C API for the 8-bit library
pcre2sample discussion of the pcre2demo program pcre2sample discussion of the pcre2demo program
pcre2stack discussion of stack and memory usage
pcre2syntax quick syntax reference pcre2syntax quick syntax reference
pcre2test description of the \fBpcre2test\fP command pcre2test description of the \fBpcre2test\fP command
pcre2unicode discussion of Unicode and UTF support pcre2unicode discussion of Unicode and UTF support
@ -190,6 +189,6 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
.rs .rs
.sp .sp
.nf .nf
Last updated: 27 March 2017 Last updated: 01 April 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -146,7 +146,6 @@ USER DOCUMENTATION
pcre2perform discussion of performance issues pcre2perform discussion of performance issues
pcre2posix the POSIX-compatible C API for the 8-bit library pcre2posix the POSIX-compatible C API for the 8-bit library
pcre2sample discussion of the pcre2demo program pcre2sample discussion of the pcre2demo program
pcre2stack discussion of stack and memory usage
pcre2syntax quick syntax reference pcre2syntax quick syntax reference
pcre2test description of the pcre2test command pcre2test description of the pcre2test command
pcre2unicode discussion of Unicode and UTF support pcre2unicode discussion of Unicode and UTF support
@ -168,7 +167,7 @@ AUTHOR
REVISION REVISION
Last updated: 27 March 2017 Last updated: 01 April 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -3161,8 +3160,7 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
SEE ALSO SEE ALSO
pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3), pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3),
pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2stack(3), pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).
pcre2unicode(3).
AUTHOR AUTHOR
@ -3174,7 +3172,7 @@ AUTHOR
REVISION REVISION
Last updated: 27 March 2017 Last updated: 01 April 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -3425,52 +3423,53 @@ LIMITING PCRE2 RESOURCE USAGE
to the configure command. This value can also be overridden at run to the configure command. This value can also be overridden at run
time. As well as applying to pcre2_match(), this limit also controls time. As well as applying to pcre2_match(), this limit also controls
the depth of recursive function calls in pcre2_dfa_match(). These are the depth of recursive function calls in pcre2_dfa_match(). These are
used for lookaround assertions and recursion within patterns. used for lookaround assertions, atomic groups, and recursion within
patterns.
CREATING CHARACTER TABLES AT BUILD TIME CREATING CHARACTER TABLES AT BUILD TIME
PCRE2 uses fixed tables for processing characters whose code points are PCRE2 uses fixed tables for processing characters whose code points are
less than 256. By default, PCRE2 is built with a set of tables that are less than 256. By default, PCRE2 is built with a set of tables that are
distributed in the file src/pcre2_chartables.c.dist. These tables are distributed in the file src/pcre2_chartables.c.dist. These tables are
for ASCII codes only. If you add for ASCII codes only. If you add
--enable-rebuild-chartables --enable-rebuild-chartables
to the configure command, the distributed tables are no longer used. to the configure command, the distributed tables are no longer used.
Instead, a program called dftables is compiled and run. This outputs Instead, a program called dftables is compiled and run. This outputs
the source for new set of tables, created in the default locale of your the source for new set of tables, created in the default locale of your
C run-time system. This method of replacing the tables does not work if C run-time system. This method of replacing the tables does not work if
you are cross compiling, because dftables is run on the local host. If you are cross compiling, because dftables is run on the local host. If
you need to create alternative tables when cross compiling, you will you need to create alternative tables when cross compiling, you will
have to do so "by hand". have to do so "by hand".
USING EBCDIC CODE USING EBCDIC CODE
PCRE2 assumes by default that it will run in an environment where the PCRE2 assumes by default that it will run in an environment where the
character code is ASCII or Unicode, which is a superset of ASCII. This character code is ASCII or Unicode, which is a superset of ASCII. This
is the case for most computer operating systems. PCRE2 can, however, be is the case for most computer operating systems. PCRE2 can, however, be
compiled to run in an 8-bit EBCDIC environment by adding compiled to run in an 8-bit EBCDIC environment by adding
--enable-ebcdic --disable-unicode --enable-ebcdic --disable-unicode
to the configure command. This setting implies --enable-rebuild-charta- to the configure command. This setting implies --enable-rebuild-charta-
bles. You should only use it if you know that you are in an EBCDIC bles. You should only use it if you know that you are in an EBCDIC
environment (for example, an IBM mainframe operating system). environment (for example, an IBM mainframe operating system).
It is not possible to support both EBCDIC and UTF-8 codes in the same It is not possible to support both EBCDIC and UTF-8 codes in the same
version of the library. Consequently, --enable-unicode and --enable- version of the library. Consequently, --enable-unicode and --enable-
ebcdic are mutually exclusive. ebcdic are mutually exclusive.
The EBCDIC character that corresponds to an ASCII LF is assumed to have The EBCDIC character that corresponds to an ASCII LF is assumed to have
the value 0x15 by default. However, in some EBCDIC environments, 0x25 the value 0x15 by default. However, in some EBCDIC environments, 0x25
is used. In such an environment you should use is used. In such an environment you should use
--enable-ebcdic-nl25 --enable-ebcdic-nl25
as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR
has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and
0x25 is not chosen as LF is made to correspond to the Unicode NEL char- 0x25 is not chosen as LF is made to correspond to the Unicode NEL char-
acter (which, in Unicode, is 0x85). acter (which, in Unicode, is 0x85).
@ -3483,34 +3482,34 @@ PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS
By default, on non-Windows systems, pcre2grep supports the use of call- By default, on non-Windows systems, pcre2grep supports the use of call-
outs with string arguments within the patterns it is matching, in order outs with string arguments within the patterns it is matching, in order
to run external scripts. For details, see the pcre2grep documentation. to run external scripts. For details, see the pcre2grep documentation.
This support can be disabled by adding --disable-pcre2grep-callout to This support can be disabled by adding --disable-pcre2grep-callout to
the configure command. the configure command.
PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT
By default, pcre2grep reads all files as plain text. You can build it By default, pcre2grep reads all files as plain text. You can build it
so that it recognizes files whose names end in .gz or .bz2, and reads so that it recognizes files whose names end in .gz or .bz2, and reads
them with libz or libbz2, respectively, by adding one or both of them with libz or libbz2, respectively, by adding one or both of
--enable-pcre2grep-libz --enable-pcre2grep-libz
--enable-pcre2grep-libbz2 --enable-pcre2grep-libbz2
to the configure command. These options naturally require that the rel- to the configure command. These options naturally require that the rel-
evant libraries are installed on your system. Configuration will fail evant libraries are installed on your system. Configuration will fail
if they are not. if they are not.
PCRE2GREP BUFFER SIZE PCRE2GREP BUFFER SIZE
pcre2grep uses an internal buffer to hold a "window" on the file it is pcre2grep uses an internal buffer to hold a "window" on the file it is
scanning, in order to be able to output "before" and "after" lines when scanning, in order to be able to output "before" and "after" lines when
it finds a match. The starting size of the buffer is controlled by a it finds a match. The starting size of the buffer is controlled by a
parameter whose default value is 20K. The buffer itself is three times parameter whose default value is 20K. The buffer itself is three times
this size, but because of the way it is used for holding "before" this size, but because of the way it is used for holding "before"
lines, the longest line that is guaranteed to be processable is the lines, the longest line that is guaranteed to be processable is the
parameter size. If a longer line is encountered, pcre2grep automati- parameter size. If a longer line is encountered, pcre2grep automati-
cally expands the buffer, up to a specified maximum size, whose default cally expands the buffer, up to a specified maximum size, whose default
is 1M or the starting size, whichever is the larger. You can change the is 1M or the starting size, whichever is the larger. You can change the
default parameter values by adding, for example, default parameter values by adding, for example,
@ -3518,8 +3517,8 @@ PCRE2GREP BUFFER SIZE
--with-pcre2grep-bufsize=51200 --with-pcre2grep-bufsize=51200
--with-pcre2grep-max-bufsize=2097152 --with-pcre2grep-max-bufsize=2097152
to the configure command. The caller of pcre2grep can override these to the configure command. The caller of pcre2grep can override these
values by using --buffer-size and --max-buffer-size on the command values by using --buffer-size and --max-buffer-size on the command
line. line.
@ -3530,26 +3529,26 @@ PCRE2TEST OPTION FOR LIBREADLINE SUPPORT
--enable-pcre2test-libreadline --enable-pcre2test-libreadline
--enable-pcre2test-libedit --enable-pcre2test-libedit
to the configure command, pcre2test is linked with the libreadline to the configure command, pcre2test is linked with the libreadline
orlibedit library, respectively, and when its input is from a terminal, orlibedit library, respectively, and when its input is from a terminal,
it reads it using the readline() function. This provides line-editing it reads it using the readline() function. This provides line-editing
and history facilities. Note that libreadline is GPL-licensed, so if and history facilities. Note that libreadline is GPL-licensed, so if
you distribute a binary of pcre2test linked in this way, there may be you distribute a binary of pcre2test linked in this way, there may be
licensing issues. These can be avoided by linking instead with libedit, licensing issues. These can be avoided by linking instead with libedit,
which has a BSD licence. which has a BSD licence.
Setting --enable-pcre2test-libreadline causes the -lreadline option to Setting --enable-pcre2test-libreadline causes the -lreadline option to
be added to the pcre2test build. In many operating environments with a be added to the pcre2test build. In many operating environments with a
sytem-installed readline library this is sufficient. However, in some sytem-installed readline library this is sufficient. However, in some
environments (e.g. if an unmodified distribution version of readline is environments (e.g. if an unmodified distribution version of readline is
in use), some extra configuration may be necessary. The INSTALL file in use), some extra configuration may be necessary. The INSTALL file
for libreadline says this: for libreadline says this:
"Readline uses the termcap functions, but does not link with "Readline uses the termcap functions, but does not link with
the termcap or curses library itself, allowing applications the termcap or curses library itself, allowing applications
which link with readline the to choose an appropriate library." which link with readline the to choose an appropriate library."
If your environment has not been set up so that an appropriate library If your environment has not been set up so that an appropriate library
is automatically included, you may need to add something like is automatically included, you may need to add something like
LIBS="-ncurses" LIBS="-ncurses"
@ -3563,7 +3562,7 @@ INCLUDING DEBUGGING CODE
--enable-debug --enable-debug
to the configure command, additional debugging code is included in the to the configure command, additional debugging code is included in the
build. This feature is intended for use by the PCRE2 maintainers. build. This feature is intended for use by the PCRE2 maintainers.
@ -3573,15 +3572,15 @@ DEBUGGING WITH VALGRIND SUPPORT
--enable-valgrind --enable-valgrind
to the configure command, PCRE2 will use valgrind annotations to mark to the configure command, PCRE2 will use valgrind annotations to mark
certain memory regions as unaddressable. This allows it to detect certain memory regions as unaddressable. This allows it to detect
invalid memory accesses, and is mostly useful for debugging PCRE2 invalid memory accesses, and is mostly useful for debugging PCRE2
itself. itself.
CODE COVERAGE REPORTING CODE COVERAGE REPORTING
If your C compiler is gcc, you can build a version of PCRE2 that can If your C compiler is gcc, you can build a version of PCRE2 that can
generate a code coverage report for its test suite. To enable this, you generate a code coverage report for its test suite. To enable this, you
must install lcov version 1.6 or above. Then specify must install lcov version 1.6 or above. Then specify
@ -3590,20 +3589,20 @@ CODE COVERAGE REPORTING
to the configure command and build PCRE2 in the usual way. to the configure command and build PCRE2 in the usual way.
Note that using ccache (a caching C compiler) is incompatible with code Note that using ccache (a caching C compiler) is incompatible with code
coverage reporting. If you have configured ccache to run automatically coverage reporting. If you have configured ccache to run automatically
on your system, you must set the environment variable on your system, you must set the environment variable
CCACHE_DISABLE=1 CCACHE_DISABLE=1
before running make to build PCRE2, so that ccache is not used. before running make to build PCRE2, so that ccache is not used.
When --enable-coverage is used, the following addition targets are When --enable-coverage is used, the following addition targets are
added to the Makefile: added to the Makefile:
make coverage make coverage
This creates a fresh coverage report for the PCRE2 test suite. It is This creates a fresh coverage report for the PCRE2 test suite. It is
equivalent to running "make coverage-reset", "make coverage-baseline", equivalent to running "make coverage-reset", "make coverage-baseline",
"make check", and then "make coverage-report". "make check", and then "make coverage-report".
make coverage-reset make coverage-reset
@ -3620,56 +3619,56 @@ CODE COVERAGE REPORTING
make coverage-clean-report make coverage-clean-report
This removes the generated coverage report without cleaning the cover- This removes the generated coverage report without cleaning the cover-
age data itself. age data itself.
make coverage-clean-data make coverage-clean-data
This removes the captured coverage data without removing the coverage This removes the captured coverage data without removing the coverage
files created at compile time (*.gcno). files created at compile time (*.gcno).
make coverage-clean make coverage-clean
This cleans all coverage data including the generated coverage report. This cleans all coverage data including the generated coverage report.
For more information about code coverage, see the gcov and lcov docu- For more information about code coverage, see the gcov and lcov docu-
mentation. mentation.
SUPPORT FOR FUZZERS SUPPORT FOR FUZZERS
There is a special option for use by people who want to run fuzzing There is a special option for use by people who want to run fuzzing
tests on PCRE2: tests on PCRE2:
--enable-fuzz-support --enable-fuzz-support
At present this applies only to the 8-bit library. If set, it causes an At present this applies only to the 8-bit library. If set, it causes an
extra library called libpcre2-fuzzsupport.a to be built, but not extra library called libpcre2-fuzzsupport.a to be built, but not
installed. This contains a single function called LLVMFuzzerTestOneIn- installed. This contains a single function called LLVMFuzzerTestOneIn-
put() whose arguments are a pointer to a string and the length of the put() whose arguments are a pointer to a string and the length of the
string. When called, this function tries to compile the string as a string. When called, this function tries to compile the string as a
pattern, and if that succeeds, to match it. This is done both with no pattern, and if that succeeds, to match it. This is done both with no
options and with some random options bits that are generated from the options and with some random options bits that are generated from the
string. string.
Setting --enable-fuzz-support also causes a binary called pcre2fuz- Setting --enable-fuzz-support also causes a binary called pcre2fuz-
zcheck to be created. This is normally run under valgrind or used when zcheck to be created. This is normally run under valgrind or used when
PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing
function and outputs information about it is doing. The input strings function and outputs information about it is doing. The input strings
are specified by arguments: if an argument starts with "=" the rest of are specified by arguments: if an argument starts with "=" the rest of
it is a literal input string. Otherwise, it is assumed to be a file it is a literal input string. Otherwise, it is assumed to be a file
name, and the contents of the file are the test string. name, and the contents of the file are the test string.
OBSOLETE OPTION OBSOLETE OPTION
In versions of PCRE2 prior to 10.30, there were two ways of handling In versions of PCRE2 prior to 10.30, there were two ways of handling
backtracking in the pcre2_match() function. The default was to use the backtracking in the pcre2_match() function. The default was to use the
system stack, but if system stack, but if
--disable-stack-for-recursion --disable-stack-for-recursion
was set, memory on the heap was used. From release 10.30 onwards this was set, memory on the heap was used. From release 10.30 onwards this
has changed (the stack is no lonter used) and this option now does has changed (the stack is no lonter used) and this option now does
nothing except give a warning. nothing except give a warning.
@ -3687,7 +3686,7 @@ AUTHOR
REVISION REVISION
Last updated: 29 March 2017 Last updated: 31 March 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -4436,13 +4435,10 @@ CONTROLLING THE JIT STACK
It returns a pointer to an opaque structure of type pcre2_jit_stack, or It returns a pointer to an opaque structure of type pcre2_jit_stack, or
NULL if there is an error. The pcre2_jit_stack_free() function is used NULL if there is an error. The pcre2_jit_stack_free() function is used
to free a stack that is no longer needed. (For the technically minded: to free a stack that is no longer needed. (For the technically minded:
the address space is allocated by mmap or VirtualAlloc.) the address space is allocated by mmap or VirtualAlloc.) A maximum
stack size of 512K to 1M should be more than enough for any pattern.
JIT uses far less memory for recursion than the interpretive code, and The pcre2_jit_stack_assign() function specifies which stack JIT code
a maximum stack size of 512K to 1M should be more than enough for any
pattern.
The pcre2_jit_stack_assign() function specifies which stack JIT code
should use. Its arguments are as follows: should use. Its arguments are as follows:
pcre2_match_context *mcontext pcre2_match_context *mcontext
@ -4451,7 +4447,7 @@ CONTROLLING THE JIT STACK
The first argument is a pointer to a match context. When this is subse- The first argument is a pointer to a match context. When this is subse-
quently passed to a matching function, its information determines which quently passed to a matching function, its information determines which
JIT stack is used. There are three cases for the values of the other JIT stack is used. There are three cases for the values of the other
two options: two options:
(1) If callback is NULL and data is NULL, an internal 32K block (1) If callback is NULL and data is NULL, an internal 32K block
@ -4469,34 +4465,34 @@ CONTROLLING THE JIT STACK
return value must be a valid JIT stack, the result of calling return value must be a valid JIT stack, the result of calling
pcre2_jit_stack_create(). pcre2_jit_stack_create().
A callback function is obeyed whenever JIT code is about to be run; it A callback function is obeyed whenever JIT code is about to be run; it
is not obeyed when pcre2_match() is called with options that are incom- is not obeyed when pcre2_match() is called with options that are incom-
patible for JIT matching. A callback function can therefore be used to patible for JIT matching. A callback function can therefore be used to
determine whether a match operation was executed by JIT or by the determine whether a match operation was executed by JIT or by the
interpreter. interpreter.
You may safely use the same JIT stack for more than one pattern (either You may safely use the same JIT stack for more than one pattern (either
by assigning directly or by callback), as long as the patterns are by assigning directly or by callback), as long as the patterns are
matched sequentially in the same thread. Currently, the only way to set matched sequentially in the same thread. Currently, the only way to set
up non-sequential matches in one thread is to use callouts: if a call- up non-sequential matches in one thread is to use callouts: if a call-
out function starts another match, that match must use a different JIT out function starts another match, that match must use a different JIT
stack to the one used for currently suspended match(es). stack to the one used for currently suspended match(es).
In a multithread application, if you do not specify a JIT stack, or if In a multithread application, if you do not specify a JIT stack, or if
you assign or pass back NULL from a callback, that is thread-safe, you assign or pass back NULL from a callback, that is thread-safe,
because each thread has its own machine stack. However, if you assign because each thread has its own machine stack. However, if you assign
or pass back a non-NULL JIT stack, this must be a different stack for or pass back a non-NULL JIT stack, this must be a different stack for
each thread so that the application is thread-safe. each thread so that the application is thread-safe.
Strictly speaking, even more is allowed. You can assign the same non- Strictly speaking, even more is allowed. You can assign the same non-
NULL stack to a match context that is used by any number of patterns, NULL stack to a match context that is used by any number of patterns,
as long as they are not used for matching by multiple threads at the as long as they are not used for matching by multiple threads at the
same time. For example, you could use the same stack in all compiled same time. For example, you could use the same stack in all compiled
patterns, with a global mutex in the callback to wait until the stack patterns, with a global mutex in the callback to wait until the stack
is available for use. However, this is an inefficient solution, and not is available for use. However, this is an inefficient solution, and not
recommended. recommended.
This is a suggestion for how a multithreaded program that needs to set This is a suggestion for how a multithreaded program that needs to set
up non-default JIT stacks might operate: up non-default JIT stacks might operate:
During thread initalization During thread initalization
@ -4508,7 +4504,7 @@ CONTROLLING THE JIT STACK
Use a one-line callback function Use a one-line callback function
return thread_local_var return thread_local_var
All the functions described in this section do nothing if JIT is not All the functions described in this section do nothing if JIT is not
available. available.
@ -4517,20 +4513,20 @@ JIT STACK FAQ
(1) Why do we need JIT stacks? (1) Why do we need JIT stacks?
PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a stack PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a stack
where the local data of the current node is pushed before checking its where the local data of the current node is pushed before checking its
child nodes. Allocating real machine stack on some platforms is diffi- child nodes. Allocating real machine stack on some platforms is diffi-
cult. For example, the stack chain needs to be updated every time if we cult. For example, the stack chain needs to be updated every time if we
extend the stack on PowerPC. Although it is possible, its updating extend the stack on PowerPC. Although it is possible, its updating
time overhead decreases performance. So we do the recursion in memory. time overhead decreases performance. So we do the recursion in memory.
(2) Why don't we simply allocate blocks of memory with malloc()? (2) Why don't we simply allocate blocks of memory with malloc()?
Modern operating systems have a nice feature: they can reserve an Modern operating systems have a nice feature: they can reserve an
address space instead of allocating memory. We can safely allocate mem- address space instead of allocating memory. We can safely allocate mem-
ory pages inside this address space, so the stack could grow without ory pages inside this address space, so the stack could grow without
moving memory data (this is important because of pointers). Thus we can moving memory data (this is important because of pointers). Thus we can
allocate 1M address space, and use only a single memory page (usually allocate 1M address space, and use only a single memory page (usually
4K) if that is enough. However, we can still grow up to 1M anytime if 4K) if that is enough. However, we can still grow up to 1M anytime if
needed. needed.
(3) Who "owns" a JIT stack? (3) Who "owns" a JIT stack?
@ -4538,8 +4534,8 @@ JIT STACK FAQ
The owner of the stack is the user program, not the JIT studied pattern The owner of the stack is the user program, not the JIT studied pattern
or anything else. The user program must ensure that if a stack is being or anything else. The user program must ensure that if a stack is being
used by pcre2_match(), (that is, it is assigned to a match context that used by pcre2_match(), (that is, it is assigned to a match context that
is passed to the pattern currently running), that stack must not be is passed to the pattern currently running), that stack must not be
used by any other threads (to avoid overwriting the same memory area). used by any other threads (to avoid overwriting the same memory area).
The best practice for multithreaded programs is to allocate a stack for The best practice for multithreaded programs is to allocate a stack for
each thread, and return this stack through the JIT callback function. each thread, and return this stack through the JIT callback function.
@ -4547,36 +4543,36 @@ JIT STACK FAQ
You can free a JIT stack at any time, as long as it will not be used by You can free a JIT stack at any time, as long as it will not be used by
pcre2_match() again. When you assign the stack to a match context, only pcre2_match() again. When you assign the stack to a match context, only
a pointer is set. There is no reference counting or any other magic. a pointer is set. There is no reference counting or any other magic.
You can free compiled patterns, contexts, and stacks in any order, any- You can free compiled patterns, contexts, and stacks in any order, any-
time. Just do not call pcre2_match() with a match context pointing to time. Just do not call pcre2_match() with a match context pointing to
an already freed stack, as that will cause SEGFAULT. (Also, do not free an already freed stack, as that will cause SEGFAULT. (Also, do not free
a stack currently used by pcre2_match() in another thread). You can a stack currently used by pcre2_match() in another thread). You can
also replace the stack in a context at any time when it is not in use. also replace the stack in a context at any time when it is not in use.
You should free the previous stack before assigning a replacement. You should free the previous stack before assigning a replacement.
(5) Should I allocate/free a stack every time before/after calling (5) Should I allocate/free a stack every time before/after calling
pcre2_match()? pcre2_match()?
No, because this is too costly in terms of resources. However, you No, because this is too costly in terms of resources. However, you
could implement some clever idea which release the stack if it is not could implement some clever idea which release the stack if it is not
used in let's say two minutes. The JIT callback can help to achieve used in let's say two minutes. The JIT callback can help to achieve
this without keeping a list of patterns. this without keeping a list of patterns.
(6) OK, the stack is for long term memory allocation. But what happens (6) OK, the stack is for long term memory allocation. But what happens
if a pattern causes stack overflow with a stack of 1M? Is that 1M kept if a pattern causes stack overflow with a stack of 1M? Is that 1M kept
until the stack is freed? until the stack is freed?
Especially on embedded sytems, it might be a good idea to release mem- Especially on embedded sytems, it might be a good idea to release mem-
ory sometimes without freeing the stack. There is no API for this at ory sometimes without freeing the stack. There is no API for this at
the moment. Probably a function call which returns with the currently the moment. Probably a function call which returns with the currently
allocated memory for any stack and another which allows releasing mem- allocated memory for any stack and another which allows releasing mem-
ory (shrinking the stack) would be a good idea if someone needs this. ory (shrinking the stack) would be a good idea if someone needs this.
(7) This is too much of a headache. Isn't there any better solution for (7) This is too much of a headache. Isn't there any better solution for
JIT stack handling? JIT stack handling?
No, thanks to Windows. If POSIX threads were used everywhere, we could No, thanks to Windows. If POSIX threads were used everywhere, we could
throw out this complicated API. throw out this complicated API.
@ -4585,18 +4581,18 @@ FREEING JIT SPECULATIVE MEMORY
void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext); void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext);
The JIT executable allocator does not free all memory when it is possi- The JIT executable allocator does not free all memory when it is possi-
ble. It expects new allocations, and keeps some free memory around to ble. It expects new allocations, and keeps some free memory around to
improve allocation speed. However, in low memory conditions, it might improve allocation speed. However, in low memory conditions, it might
be better to free all possible memory. You can cause this to happen by be better to free all possible memory. You can cause this to happen by
calling pcre2_jit_free_unused_memory(). Its argument is a general con- calling pcre2_jit_free_unused_memory(). Its argument is a general con-
text, for custom memory management, or NULL for standard memory manage- text, for custom memory management, or NULL for standard memory manage-
ment. ment.
EXAMPLE CODE EXAMPLE CODE
This is a single-threaded example that specifies a JIT stack without This is a single-threaded example that specifies a JIT stack without
using a callback. A real program should include error checking after using a callback. A real program should include error checking after
all the function calls. all the function calls.
int rc; int rc;
@ -4624,29 +4620,29 @@ EXAMPLE CODE
JIT FAST PATH API JIT FAST PATH API
Because the API described above falls back to interpreted matching when Because the API described above falls back to interpreted matching when
JIT is not available, it is convenient for programs that are written JIT is not available, it is convenient for programs that are written
for general use in many environments. However, calling JIT via for general use in many environments. However, calling JIT via
pcre2_match() does have a performance impact. Programs that are written pcre2_match() does have a performance impact. Programs that are written
for use where JIT is known to be available, and which need the best for use where JIT is known to be available, and which need the best
possible performance, can instead use a "fast path" API to call JIT possible performance, can instead use a "fast path" API to call JIT
matching directly instead of calling pcre2_match() (obviously only for matching directly instead of calling pcre2_match() (obviously only for
patterns that have been successfully processed by pcre2_jit_compile()). patterns that have been successfully processed by pcre2_jit_compile()).
The fast path function is called pcre2_jit_match(), and it takes The fast path function is called pcre2_jit_match(), and it takes
exactly the same arguments as pcre2_match(). The return values are also exactly the same arguments as pcre2_match(). The return values are also
the same, plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or the same, plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or
complete) is requested that was not compiled. Unsupported option bits complete) is requested that was not compiled. Unsupported option bits
(for example, PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT (for example, PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT
option. option.
When you call pcre2_match(), as well as testing for invalid options, a When you call pcre2_match(), as well as testing for invalid options, a
number of other sanity checks are performed on the arguments. For exam- number of other sanity checks are performed on the arguments. For exam-
ple, if the subject pointer is NULL, an immediate error is given. Also, ple, if the subject pointer is NULL, an immediate error is given. Also,
unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for
validity. In the interests of speed, these checks do not happen on the validity. In the interests of speed, these checks do not happen on the
JIT fast path, and if invalid data is passed, the result is undefined. JIT fast path, and if invalid data is passed, the result is undefined.
Bypassing the sanity checks and the pcre2_match() wrapping can give Bypassing the sanity checks and the pcre2_match() wrapping can give
speedups of more than 10%. speedups of more than 10%.
@ -4664,7 +4660,7 @@ AUTHOR
REVISION REVISION
Last updated: 30 March 2017 Last updated: 31 March 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -9229,177 +9225,6 @@ REVISION
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
PCRE2STACK(3) Library Functions Manual PCRE2STACK(3)
NAME
PCRE2 - Perl-compatible regular expressions (revised API)
PCRE2 DISCUSSION OF STACK USAGE
When you call pcre2_match(), it makes use of an internal function
called match(). This calls itself recursively at branch points in the
pattern, in order to remember the state of the match so that it can
back up and try a different alternative after a failure. As matching
proceeds deeper and deeper into the tree of possibilities, the recur-
sion depth increases. The match() function is also called in other cir-
cumstances, for example, whenever a parenthesized sub-pattern is
entered, and in certain cases of repetition.
Not all calls of match() increase the recursion depth; for an item such
as a* it may be called several times at the same level, after matching
different numbers of a's. Furthermore, in a number of cases where the
result of the recursive call would immediately be passed back as the
result of the current call (a "tail recursion"), the function is just
restarted instead.
Each time the internal match() function is called recursively, it uses
memory from the process stack. For certain kinds of pattern and data,
very large amounts of stack may be needed, despite the recognition of
"tail recursion". Note that if PCRE2 is compiled with the -fsani-
tize=address option of the GCC compiler, the stack requirements are
greatly increased.
The above comments apply when pcre2_match() is run in its normal inter-
pretive manner. If the compiled pattern was processed by pcre2_jit_com-
pile(), and just-in-time compiling was successful, and the options
passed to pcre2_match() were not incompatible, the matching process
uses the JIT-compiled code instead of the match() function. In this
case, the memory requirements are handled entirely differently. See the
pcre2jit documentation for details.
The pcre2_dfa_match() function operates in a different way to
pcre2_match(), and uses recursion only when there is a regular expres-
sion recursion or subroutine call in the pattern. This includes the
processing of assertion and "once-only" subpatterns, which are handled
like subroutine calls. Normally, these are never very deep, and the
limit on the complexity of pcre2_dfa_match() is controlled by the
amount of workspace it is given. However, it is possible to write pat-
terns with runaway infinite recursions; such patterns will cause
pcre2_dfa_match() to run out of stack unless a limit is applied (see
below).
The comments in the next three sections do not apply to
pcre2_dfa_match(); they are relevant only for pcre2_match() without the
JIT optimization.
Reducing pcre2_match()'s stack usage
You can often reduce the amount of recursion, and therefore the amount
of stack used, by modifying the pattern that is being matched. Con-
sider, for example, this pattern:
([^<]|<(?!inet))+
It matches from wherever it starts until it encounters "<inet" or the
end of the data, and is the kind of pattern that might be used when
processing an XML file. Each iteration of the outer parentheses matches
either one character that is not "<" or a "<" that is not followed by
"inet". However, each time a parenthesis is processed, a recursion
occurs, so this formulation uses a stack frame for each matched charac-
ter. For a long string, a lot of stack is required. Consider now this
rewritten pattern, which matches exactly the same strings:
([^<]++|<(?!inet))+
This uses very much less stack, because runs of characters that do not
contain "<" are "swallowed" in one item inside the parentheses. Recur-
sion happens only when a "<" character that is not followed by "inet"
is encountered (and we assume this is relatively rare). A possessive
quantifier is used to stop any backtracking into the runs of non-"<"
characters, but that is not related to stack usage.
This example shows that one way of avoiding stack problems when match-
ing long subject strings is to write repeated parenthesized subpatterns
to match more than one character whenever possible.
Compiling PCRE2 to use heap instead of stack for pcre2_match()
In environments where stack memory is constrained, you might want to
compile PCRE2 to use heap memory instead of stack for remembering back-
up points when pcre2_match() is running. This makes it run more slowly,
however. Details of how to do this are given in the pcre2build documen-
tation. When built in this way, instead of using the stack, PCRE2 gets
memory for remembering backup points from the heap. By default, the
memory is obtained by calling the system malloc() function, but you can
arrange to supply your own memory management function. For details, see
the section entitled "The match context" in the pcre2api documentation.
Since the block sizes are always the same, it may be possible to imple-
ment a customized memory handler that is more efficient than the stan-
dard function. The memory blocks obtained for this purpose are retained
and re-used if possible while pcre2_match() is running. They are all
freed just before it exits.
Limiting pcre2_match()'s stack usage
You can set limits on the number of times the internal match() function
is called, both in total and recursively. If a limit is exceeded,
pcre2_match() returns an error code. Setting suitable limits should
prevent it from running out of stack. The default values of the limits
are very large, and unlikely ever to operate. They can be changed when
PCRE2 is built, and they can also be set when pcre2_match() is called.
For details of these interfaces, see the pcre2build documentation and
the section entitled "The match context" in the pcre2api documentation.
As a very rough rule of thumb, you should reckon on about 500 bytes per
recursion. Thus, if you want to limit your stack usage to 8Mb, you
should set the limit at 16000 recursions. A 64Mb stack, on the other
hand, can support around 128000 recursions.
The pcre2test test program has a modifier called "find_limits" which,
if applied to a subject line, causes it to find the smallest limits
that allow a a pattern to match. This is done by calling pcre2_match()
repeatedly with different limits.
Limiting pcre2_dfa_match()'s stack usage
The recursion limit, as described above for pcre2_match(), also applies
to pcre2_dfa_match(), whose use of recursive function calls for recur-
sions in the pattern can lead to runaway stack usage. The non-recursive
match limit is not relevant for DFA matching, and is ignored.
Changing stack size in Unix-like systems
In Unix-like environments, there is not often a problem with the stack
unless very long strings are involved, though the default limit on
stack size varies from system to system. Values from 8Mb to 64Mb are
common. You can find your default limit by running the command:
ulimit -s
Unfortunately, the effect of running out of stack is often SIGSEGV,
though sometimes a more explicit error message is given. You can nor-
mally increase the limit on stack size by code such as this:
struct rlimit rlim;
getrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur = 100*1024*1024;
setrlimit(RLIMIT_STACK, &rlim);
This reads the current limits (soft and hard) using getrlimit(), then
attempts to increase the soft limit to 100Mb using setrlimit(). You
must do this before calling pcre2_match().
Changing stack size in Mac OS X
Using setrlimit(), as described above, should also work on Mac OS X. It
is also possible to set a stack size when linking a program. There is a
discussion about stack sizes in Mac OS X at this web site:
http://developer.apple.com/qa/qa2005/qa1419.html.
AUTHOR
Philip Hazel
University Computing Service
Cambridge, England.
REVISION
Last updated: 23 December 2016
Copyright (c) 1997-2016 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------
@ -9686,7 +9511,7 @@ OPTION SETTING
one of them may appear. one of them may appear.
(*LIMIT_MATCH=d) set the match limit to d (decimal number) (*LIMIT_MATCH=d) set the match limit to d (decimal number)
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) (*LIMIT_DEPTH=d) set the backtracking limit to d (decimal number)
(*NOTEMPTY) set PCRE2_NOTEMPTY when matching (*NOTEMPTY) set PCRE2_NOTEMPTY when matching
(*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
(*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS) (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
@ -9696,16 +9521,17 @@ OPTION SETTING
(*UTF) set appropriate UTF mode for the library in use (*UTF) set appropriate UTF mode for the library in use
(*UCP) set PCRE2_UCP (use Unicode properties for \d etc) (*UCP) set PCRE2_UCP (use Unicode properties for \d etc)
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of Note that LIMIT_MATCH and LIMIT_DEPTH can only reduce the value of the
the limits set by the caller of pcre2_match() or pcre2_dfa_match(), not limits set by the caller of pcre2_match() or pcre2_dfa_match(), not
increase them. The application can lock out the use of (*UTF) and increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH.
(*UCP) by setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, The application can lock out the use of (*UTF) and (*UCP) by setting
respectively, at compile time. the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at com-
pile time.
NEWLINE CONVENTION NEWLINE CONVENTION
These are recognized only at the very start of the pattern or after These are recognized only at the very start of the pattern or after
option settings with a similar syntax. option settings with a similar syntax.
(*CR) carriage return only (*CR) carriage return only
@ -9717,7 +9543,7 @@ NEWLINE CONVENTION
WHAT \R MATCHES WHAT \R MATCHES
These are recognized only at the very start of the pattern or after These are recognized only at the very start of the pattern or after
option setting with a similar syntax. option setting with a similar syntax.
(*BSR_ANYCRLF) CR, LF, or CRLF (*BSR_ANYCRLF) CR, LF, or CRLF
@ -9786,8 +9612,8 @@ CONDITIONAL PATTERNS
(?(VERSION[>]=n.m) test PCRE2 version (?(VERSION[>]=n.m) test PCRE2 version
(?(assert) assertion condition (?(assert) assertion condition
Note the ambiguity of (?(R) and (?(Rn) which might be named reference Note the ambiguity of (?(R) and (?(Rn) which might be named reference
conditions or recursion tests. Such a condition is interpreted as a conditions or recursion tests. Such a condition is interpreted as a
reference condition if the relevant named group exists. reference condition if the relevant named group exists.
@ -9799,7 +9625,7 @@ BACKTRACKING CONTROL
(*FAIL) force backtrack; synonym (*F) (*FAIL) force backtrack; synonym (*F)
(*MARK:NAME) set name to be passed back; synonym (*:NAME) (*MARK:NAME) set name to be passed back; synonym (*:NAME)
The following act only when a subsequent match failure causes a back- The following act only when a subsequent match failure causes a back-
track to reach them. They all force a match failure, but they differ in track to reach them. They all force a match failure, but they differ in
what happens afterwards. Those that advance the start-of-match point do what happens afterwards. Those that advance the start-of-match point do
so only if the pattern is not anchored. so only if the pattern is not anchored.
@ -9821,14 +9647,14 @@ CALLOUTS
(?C"text") callout with string data (?C"text") callout with string data
The allowed string delimiters are ` ' " ^ % # $ (which are the same for The allowed string delimiters are ` ' " ^ % # $ (which are the same for
the start and the end), and the starting delimiter { matched with the the start and the end), and the starting delimiter { matched with the
ending delimiter }. To encode the ending delimiter within the string, ending delimiter }. To encode the ending delimiter within the string,
double it. double it.
SEE ALSO SEE ALSO
pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3),
pcre2(3). pcre2(3).
@ -9841,8 +9667,8 @@ AUTHOR
REVISION REVISION
Last updated: 23 December 2016 Last updated: 31 March 2017
Copyright (c) 1997-2016 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "27 March 2017" "PCRE2 10.30" .TH PCRE2API 3 "01 April 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -3292,7 +3292,7 @@ fail, this error is given.
.sp .sp
\fBpcre2build\fP(3), \fBpcre2callout\fP(3), \fBpcre2demo(3)\fP, \fBpcre2build\fP(3), \fBpcre2callout\fP(3), \fBpcre2demo(3)\fP,
\fBpcre2matching\fP(3), \fBpcre2partial\fP(3), \fBpcre2posix\fP(3), \fBpcre2matching\fP(3), \fBpcre2partial\fP(3), \fBpcre2posix\fP(3),
\fBpcre2sample\fP(3), \fBpcre2stack\fP(3), \fBpcre2unicode\fP(3). \fBpcre2sample\fP(3), \fBpcre2unicode\fP(3).
. .
. .
.SH AUTHOR .SH AUTHOR
@ -3309,6 +3309,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 27 March 2017 Last updated: 01 April 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi

View File

@ -721,9 +721,9 @@ OPTIONS COMPATIBILITY
Many of the short and long forms of pcre2grep's options are the same as Many of the short and long forms of pcre2grep's options are the same as
in the GNU grep program. Any long option of the form --xxx-regexp (GNU in the GNU grep program. Any long option of the form --xxx-regexp (GNU
terminology) is also available as --xxx-regex (PCRE2 terminology). How- terminology) is also available as --xxx-regex (PCRE2 terminology). How-
ever, the --file-list, --file-offsets, --include-dir, --line-offsets, ever, the --depth-limit, --file-list, --file-offsets, --include-dir,
--locale, --match-limit, -M, --multiline, -N, --newline, --om-separa- --line-offsets, --locale, --match-limit, -M, --multiline, -N, --new-
tor, --recursion-limit, -u, and --utf-8 options are specific to line, --om-separator, -u, and --utf-8 options are specific to
pcre2grep, as is the use of the --only-matching option with a capturing pcre2grep, as is the use of the --only-matching option with a capturing
parentheses number. parentheses number.
@ -857,5 +857,5 @@ AUTHOR
REVISION REVISION
Last updated: 21 March 2017 Last updated: 31 March 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.