Documentation update.

This commit is contained in:
Philip.Hazel 2017-03-24 16:53:38 +00:00
parent 32bab50c01
commit 3aeb812180
24 changed files with 1293 additions and 1357 deletions

View File

@ -1,10 +1,6 @@
Building PCRE2 without using autotools Building PCRE2 without using autotools
-------------------------------------- --------------------------------------
This document has been converted from the PCRE1 document. I have removed a
number of sections about building in various environments, as they applied only
to PCRE1 and are probably out of date.
This document contains the following sections: This document contains the following sections:
General General
@ -183,21 +179,9 @@ can skip ahead to the CMake section.
STACK SIZE IN WINDOWS ENVIRONMENTS STACK SIZE IN WINDOWS ENVIRONMENTS
The default processor stack size of 1Mb in some Windows environments is too Prior to release 10.30 the default system stack size of 1Mb in some Windows
small for matching patterns that need much recursion. In particular, test 2 may environments caused issues with some tests. This should no longer be the case
fail because of this. Normally, running out of stack causes a crash, but there for 10.30 and later releases.
have been cases where the test program has just died silently. See your linker
documentation for how to increase stack size if you experience problems. If you
are using CMake (see "BUILDING PCRE2 ON WINDOWS WITH CMAKE" below) and the gcc
compiler, you can increase the stack size for pcre2test and pcre2grep by
setting the CMAKE_EXE_LINKER_FLAGS variable to "-Wl,--stack,8388608" (for
example). The Linux default of 8Mb is a reasonable choice for the stack, though
even that can be too small for some pattern/subject combinations.
PCRE2 has a compile configuration option to disable the use of stack for
recursion so that heap is used instead. However, pattern matching is
significantly slower when this is done. There is more about stack usage in the
"pcre2stack" documentation.
LINKING PROGRAMS IN WINDOWS ENVIRONMENTS LINKING PROGRAMS IN WINDOWS ENVIRONMENTS
@ -393,4 +377,4 @@ and executable, is in EBCDIC and native z/OS file formats and this is the
recommended download site. recommended download site.
============================= =============================
Last Updated: 13 October 2016 Last Updated: 17 March 2017

View File

@ -15,8 +15,8 @@ subscribe or manage your subscription here:
https://lists.exim.org/mailman/listinfo/pcre-dev https://lists.exim.org/mailman/listinfo/pcre-dev
Please read the NEWS file if you are upgrading from a previous release. Please read the NEWS file if you are upgrading from a previous release. The
The contents of this README file are: contents of this README file are:
The PCRE2 APIs The PCRE2 APIs
Documentation for PCRE2 Documentation for PCRE2
@ -44,8 +44,8 @@ wrappers.
The distribution does contain a set of C wrapper functions for the 8-bit The distribution does contain a set of C wrapper functions for the 8-bit
library that are based on the POSIX regular expression API (see the pcre2posix library that are based on the POSIX regular expression API (see the pcre2posix
man page). These can be found in a library called libpcre2-posix. Note that this man page). These can be found in a library called libpcre2-posix. Note that
just provides a POSIX calling interface to PCRE2; the regular expressions this just provides a POSIX calling interface to PCRE2; the regular expressions
themselves still follow Perl syntax and semantics. The POSIX API is restricted, themselves still follow Perl syntax and semantics. The POSIX API is restricted,
and does not give full access to all of PCRE2's facilities. and does not give full access to all of PCRE2's facilities.
@ -95,10 +95,9 @@ PCRE2 documentation is supplied in two other forms:
Building PCRE2 on non-Unix-like systems Building PCRE2 on non-Unix-like systems
--------------------------------------- ---------------------------------------
For a non-Unix-like system, please read the comments in the file For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and your system supports the use of "configure" and "make" you may be able to build
"make" you may be able to build PCRE2 using autotools in the same way as for PCRE2 using autotools in the same way as for many Unix-like systems.
many Unix-like systems.
PCRE2 can also be configured using CMake, which can be run in various ways PCRE2 can also be configured using CMake, which can be run in various ways
(command line, GUI, etc). This creates Makefiles, solution files, etc. The file (command line, GUI, etc). This creates Makefiles, solution files, etc. The file
@ -174,19 +173,19 @@ library. They are also documented in the pcre2build man page.
architectures. If you try to enable it on an unsupported architecture, there architectures. If you try to enable it on an unsupported architecture, there
will be a compile time error. will be a compile time error.
. If you do not want to make use of the support for UTF-8 Unicode character . If you do not want to make use of the default support for UTF-8 Unicode
strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit character strings in the 8-bit library, UTF-16 Unicode character strings in
library, or UTF-32 Unicode character strings in the 32-bit library, you can the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
add --disable-unicode to the "configure" command. This reduces the size of library, you can add --disable-unicode to the "configure" command. This
the libraries. It is not possible to configure one library with Unicode reduces the size of the libraries. It is not possible to configure one
support, and another without, in the same configuration. library with Unicode support, and another without, in the same configuration.
It is also not possible to use --enable-ebcdic (see below) with Unicode
support, so if this option is set, you must also use --disable-unicode.
When Unicode support is available, the use of a UTF encoding still has to be When Unicode support is available, the use of a UTF encoding still has to be
enabled by setting the PCRE2_UTF option at run time or starting a pattern enabled by setting the PCRE2_UTF option at run time or starting a pattern
with (*UTF). When PCRE2 is compiled with Unicode support, its input can only with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. It is either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.
not possible to use both --enable-unicode and --enable-ebcdic at the same
time.
As well as supporting UTF strings, Unicode support includes support for the As well as supporting UTF strings, Unicode support includes support for the
\P, \p, and \X sequences that recognize Unicode character properties. \P, \p, and \X sequences that recognize Unicode character properties.
@ -232,18 +231,18 @@ library. They are also documented in the pcre2build man page.
--with-match-limit=500000 --with-match-limit=500000
on the "configure" command. This is just the default; individual calls to on the "configure" command. This is just the default; individual calls to
pcre2_match() can supply their own value. There is more discussion on the pcre2_match() can supply their own value. There is more discussion in the
pcre2api man page. pcre2api man page (search for pcre2_set_match_limit).
. There is a separate counter that limits the depth of recursive function calls . There is a separate counter that limits the depth of nested backtracking
during a matching process. This also has a default of ten million, which is during a matching process, which in turn limits the amount of memory that is
essentially "unlimited". You can change the default by setting, for example, used. This also has a default of ten million, which is essentially
"unlimited". You can change the default by setting, for example,
--with-match-limit-recursion=500000 --with-match-limit-depth=5000
Recursive function calls use up the runtime stack; running out of stack can There is more discussion in the pcre2api man page (search for
cause programs to crash in strange ways. There is a discussion about stack pcre2_set_depth_limit).
sizes in the pcre2stack man page.
. In the 8-bit library, the default maximum compiled pattern size is around . In the 8-bit library, the default maximum compiled pattern size is around
64K bytes. You can increase this by adding --with-link-size=3 to the 64K bytes. You can increase this by adding --with-link-size=3 to the
@ -254,20 +253,6 @@ library. They are also documented in the pcre2build man page.
performance in the 8-bit and 16-bit libraries. In the 32-bit library, the performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
link size setting is ignored, as 4-byte offsets are always used. link size setting is ignored, as 4-byte offsets are always used.
. You can build PCRE2 so that its internal match() function that is called from
pcre2_match() does not call itself recursively. Instead, it uses memory
blocks obtained from the heap to save data that would otherwise be saved on
the stack. To build PCRE2 like this, use
--disable-stack-for-recursion
on the "configure" command. PCRE2 runs more slowly in this mode, but it may
be necessary in environments with limited stack sizes. This applies only to
the normal execution of the pcre2_match() function; if JIT support is being
successfully used, it is not relevant. Equally, it does not apply to
pcre2_dfa_match(), which does not use deeply nested recursion. There is a
discussion about stack sizes in the pcre2stack man page.
. For speed, PCRE2 uses four tables for manipulating and identifying characters . For speed, PCRE2 uses four tables for manipulating and identifying characters
whose code point values are less than 256. By default, it uses a set of whose code point values are less than 256. By default, it uses a set of
tables for ASCII encoding that is part of the distribution. If you specify tables for ASCII encoding that is part of the distribution. If you specify
@ -389,6 +374,13 @@ library. They are also documented in the pcre2build man page.
string. Otherwise, it is assumed to be a file name, and the contents of the string. Otherwise, it is assumed to be a file name, and the contents of the
file are the test string. file are the test string.
. Releases before 10.30 could be compiled with --disable-stack-for-recursion,
which caused pcre2_match() to use individual blocks on the heap for
backtracking instead of recursive function calls (which use the stack). This
is now obsolete since pcre2_match() was refactored always to use the heap (in
a much more efficient way than before). This option is retained for backwards
compatibility, but has no effect other than to output a warning.
The "configure" script builds the following files for the basic C library: The "configure" script builds the following files for the basic C library:
. Makefile the makefile that builds the library . Makefile the makefile that builds the library
@ -662,25 +654,32 @@ Unicode support is enabled.
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
16-bit and 32-bit modes. These are tests that generate different output in 16-bit and 32-bit modes. These are tests that generate different output in
8-bit mode. Each pair are for general cases and Unicode support, respectively. 8-bit mode. Each pair are for general cases and Unicode support, respectively.
Test 13 checks the handling of non-UTF characters greater than 255 by Test 13 checks the handling of non-UTF characters greater than 255 by
pcre2_dfa_match() in 16-bit and 32-bit modes. pcre2_dfa_match() in 16-bit and 32-bit modes.
Test 14 contains a number of tests that must not be run with JIT. They check, Test 14 contains some special UTF and UCP tests that give different output for
the different widths.
Test 15 contains a number of tests that must not be run with JIT. They check,
among other non-JIT things, the match-limiting features of the intepretive among other non-JIT things, the match-limiting features of the intepretive
matcher. matcher.
Test 15 is run only when JIT support is not available. It checks that an Test 16 is run only when JIT support is not available. It checks that an
attempt to use JIT has the expected behaviour. attempt to use JIT has the expected behaviour.
Test 16 is run only when JIT support is available. It checks JIT complete and Test 17 is run only when JIT support is available. It checks JIT complete and
partial modes, match-limiting under JIT, and other JIT-specific features. partial modes, match-limiting under JIT, and other JIT-specific features.
Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
the 8-bit library, without and with Unicode support, respectively. the 8-bit library, without and with Unicode support, respectively.
Test 19 checks the serialization functions by writing a set of compiled Test 20 checks the serialization functions by writing a set of compiled
patterns to a file, and then reloading and checking them. patterns to a file, and then reloading and checking them.
Tests 21 and 22 test \C support when the use of \C is not locked out, without
and with UTF support, respectively. Test 23 tests \C when it is locked out.
Character tables Character tables
---------------- ----------------
@ -866,4 +865,4 @@ The distribution should contain the files listed below.
Philip Hazel Philip Hazel
Email local part: ph10 Email local part: ph10
Email domain: cam.ac.uk Email domain: cam.ac.uk
Last updated: 01 November 2016 Last updated: 17 March 2017

View File

@ -109,7 +109,7 @@ lose performance.
One way of guarding against this possibility is to use the One way of guarding against this possibility is to use the
<b>pcre2_pattern_info()</b> function to check the compiled pattern's options for <b>pcre2_pattern_info()</b> function to check the compiled pattern's options for
PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when calling PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when calling
<b>pcre2_compile()</b>. This causes an compile time error if a pattern contains <b>pcre2_compile()</b>. This causes a compile time error if the pattern contains
a UTF-setting sequence. a UTF-setting sequence.
</P> </P>
<P> <P>
@ -137,7 +137,8 @@ large search tree against a string that will never match. Nested unlimited
repeats in a pattern are a common example. PCRE2 provides some protection repeats in a pattern are a common example. PCRE2 provides some protection
against this: see the <b>pcre2_set_match_limit()</b> function in the against this: see the <b>pcre2_set_match_limit()</b> function in the
<a href="pcre2api.html"><b>pcre2api</b></a> <a href="pcre2api.html"><b>pcre2api</b></a>
page. page. There is a similar function called <b>pcre2_set_depth_limit()</b> that can
be used to restrict the amount of memory that is used.
</P> </P>
<br><a name="SEC3" href="#TOC1">USER DOCUMENTATION</a><br> <br><a name="SEC3" href="#TOC1">USER DOCUMENTATION</a><br>
<P> <P>
@ -166,7 +167,7 @@ listing), and the short pages for individual functions, are concatenated in
pcre2perform discussion of performance issues pcre2perform discussion of performance issues
pcre2posix the POSIX-compatible C API for the 8-bit library pcre2posix the POSIX-compatible C API for the 8-bit library
pcre2sample discussion of the pcre2demo program pcre2sample discussion of the pcre2demo program
pcre2stack discussion of stack usage pcre2stack discussion of stack and memory usage
pcre2syntax quick syntax reference pcre2syntax quick syntax reference
pcre2test description of the <b>pcre2test</b> command pcre2test description of the <b>pcre2test</b> command
pcre2unicode discussion of Unicode and UTF support pcre2unicode discussion of Unicode and UTF support
@ -189,9 +190,9 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
</P> </P>
<br><a name="SEC5" href="#TOC1">REVISION</a><br> <br><a name="SEC5" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 16 October 2015 Last updated: 27 March 2017
<br> <br>
Copyright &copy; 1997-2015 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -36,20 +36,21 @@ for success and non-zero otherwise. The arguments are:
<i>callout_data</i> User data that is passed to the callback <i>callout_data</i> User data that is passed to the callback
</pre> </pre>
The <i>callback()</i> function is passed a pointer to a data block containing The <i>callback()</i> function is passed a pointer to a data block containing
the following fields: the following fields (not necessarily in this order):
<pre> <pre>
<i>version</i> Block version number uint32_t <i>version</i> Block version number
<i>pattern_position</i> Offset to next item in pattern uint32_t <i>callout_number</i> Number for numbered callouts
<i>next_item_length</i> Length of next item in pattern PCRE2_SIZE <i>pattern_position</i> Offset to next item in pattern
<i>callout_number</i> Number for numbered callouts PCRE2_SIZE <i>next_item_length</i> Length of next item in pattern
<i>callout_string_offset</i> Offset to string within pattern PCRE2_SIZE <i>callout_string_offset</i> Offset to string within pattern
<i>callout_string_length</i> Length of callout string PCRE2_SIZE <i>callout_string_length</i> Length of callout string
<i>callout_string</i> Points to callout string or is NULL PCRE2_SPTR <i>callout_string</i> Points to callout string or is NULL
</pre> </pre>
The second argument is the callout data that was passed to The second argument passed to the <b>callback()</b> function is the callout data
<b>pcre2_callout_enumerate()</b>. The <b>callback()</b> function must return zero that was passed to <b>pcre2_callout_enumerate()</b>. The <b>callback()</b>
for success. Any other value causes the pattern scan to stop, with the value function must return zero for success. Any other value causes the pattern scan
being passed back as the result of <b>pcre2_callout_enumerate()</b>. to stop, with the value being passed back as the result of
<b>pcre2_callout_enumerate()</b>.
</P> </P>
<P> <P>
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API in the

View File

@ -26,7 +26,9 @@ DESCRIPTION
</b><br> </b><br>
<P> <P>
This function frees the memory used for a compiled pattern, including any This function frees the memory used for a compiled pattern, including any
memory used by the JIT compiler. memory used by the JIT compiler. If the compiled pattern was created by a call
to <b>pcre2_code_copy_with_tables()</b>, the memory for the character tables is
also freed.
</P> </P>
<P> <P>
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API in the

View File

@ -37,19 +37,24 @@ arguments are:
<i>erroffset</i> Where to put an error offset <i>erroffset</i> Where to put an error offset
<i>ccontext</i> Pointer to a compile context or NULL <i>ccontext</i> Pointer to a compile context or NULL
</pre> </pre>
The length of the string and any error offset that is returned are in code The length of the pattern and any error offset that is returned are in code
units, not characters. A compile context is needed only if you want to change units, not characters. A compile context is needed only if you want to provide
custom memory allocation functions, or to provide an external function for
system stack size checking, or to change one or more of these parameters:
<pre> <pre>
What \R matches (Unicode newlines or CR, LF, CRLF only) What \R matches (Unicode newlines, or CR, LF, CRLF only);
PCRE2's character tables PCRE2's character tables;
The newline character sequence The newline character sequence;
The compile time nested parentheses limit The compile time nested parentheses limit;
The maximum pattern length (in code units) that is allowed.
</pre> </pre>
or provide an external function for stack size checking. The option bits are: The option bits are:
<pre> <pre>
PCRE2_ANCHORED Force pattern anchoring PCRE2_ANCHORED Force pattern anchoring
PCRE2_ALLOW_EMPTY_CLASS Allow empty classes
PCRE2_ALT_BSUX Alternative handling of \u, \U, and \x PCRE2_ALT_BSUX Alternative handling of \u, \U, and \x
PCRE2_ALT_CIRCUMFLEX Alternative handling of ^ in multiline mode PCRE2_ALT_CIRCUMFLEX Alternative handling of ^ in multiline mode
PCRE2_ALT_VERBNAMES Process backslashes in verb names
PCRE2_AUTO_CALLOUT Compile automatic callouts PCRE2_AUTO_CALLOUT Compile automatic callouts
PCRE2_CASELESS Do caseless matching PCRE2_CASELESS Do caseless matching
PCRE2_DOLLAR_ENDONLY $ not to match newline at end PCRE2_DOLLAR_ENDONLY $ not to match newline at end
@ -71,19 +76,21 @@ or provide an external function for stack size checking. The option bits are:
(only relevant if PCRE2_UTF is set) (only relevant if PCRE2_UTF is set)
PCRE2_UCP Use Unicode properties for \d, \w, etc. PCRE2_UCP Use Unicode properties for \d, \w, etc.
PCRE2_UNGREEDY Invert greediness of quantifiers PCRE2_UNGREEDY Invert greediness of quantifiers
PCRE2_USE_OFFSET_LIMIT Enable offset limit for unanchored matching
PCRE2_UTF Treat pattern and subjects as UTF strings PCRE2_UTF Treat pattern and subjects as UTF strings
</pre> </pre>
PCRE2 must be built with Unicode support in order to use PCRE2_UTF, PCRE2_UCP PCRE2 must be built with Unicode support (the default) in order to use
and related options. PCRE2_UTF, PCRE2_UCP and related options.
</P> </P>
<P> <P>
The yield of the function is a pointer to a private data structure that The yield of the function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected. contains the compiled pattern, or NULL if an error was detected.
</P> </P>
<P> <P>
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API, with more detail on
each option, in the
<a href="pcre2api.html"><b>pcre2api</b></a> <a href="pcre2api.html"><b>pcre2api</b></a>
page and a description of the POSIX API in the page, and a description of the POSIX API in the
<a href="pcre2posix.html"><b>pcre2posix</b></a> <a href="pcre2posix.html"><b>pcre2posix</b></a>
page. page.
<p> <p>

View File

@ -45,10 +45,9 @@ point to a uint32_t integer variable. The available codes are:
PCRE2_CONFIG_BSR Indicates what \R matches by default: PCRE2_CONFIG_BSR Indicates what \R matches by default:
PCRE2_BSR_UNICODE PCRE2_BSR_UNICODE
PCRE2_BSR_ANYCRLF PCRE2_BSR_ANYCRLF
PCRE2_CONFIG_JIT Availability of just-in-time compiler PCRE2_CONFIG_DEPTHLIMIT Default backtracking depth limit
support (1=yes 0=no) PCRE2_CONFIG_JIT Availability of just-in-time compiler support (1=yes 0=no)
PCRE2_CONFIG_JITTARGET Information about the target archi- PCRE2_CONFIG_JITTARGET Information (a string) about the target architecture for the JIT compiler
tecture for the JIT compiler
PCRE2_CONFIG_LINKSIZE Configured internal link size (2, 3, 4) PCRE2_CONFIG_LINKSIZE Configured internal link size (2, 3, 4)
PCRE2_CONFIG_MATCHLIMIT Default internal resource limit PCRE2_CONFIG_MATCHLIMIT Default internal resource limit
PCRE2_CONFIG_NEWLINE Code for the default newline sequence: PCRE2_CONFIG_NEWLINE Code for the default newline sequence:
@ -58,11 +57,9 @@ point to a uint32_t integer variable. The available codes are:
PCRE2_NEWLINE_ANY PCRE2_NEWLINE_ANY
PCRE2_NEWLINE_ANYCRLF PCRE2_NEWLINE_ANYCRLF
PCRE2_CONFIG_PARENSLIMIT Default parentheses nesting limit PCRE2_CONFIG_PARENSLIMIT Default parentheses nesting limit
PCRE2_CONFIG_RECURSIONLIMIT Internal recursion depth limit PCRE2_CONFIG_RECURSIONLIMIT Obsolete: use PCRE2_CONFIG_DEPTHLIMIT
PCRE2_CONFIG_STACKRECURSE Recursion implementation (1=stack PCRE2_CONFIG_STACKRECURSE Obsolete: always returns 0
0=heap) PCRE2_CONFIG_UNICODE Availability of Unicode support (1=yes 0=no)
PCRE2_CONFIG_UNICODE Availability of Unicode support (1=yes
0=no)
PCRE2_CONFIG_UNICODE_VERSION The Unicode version (a string) PCRE2_CONFIG_UNICODE_VERSION The Unicode version (a string)
PCRE2_CONFIG_VERSION The PCRE2 version (a string) PCRE2_CONFIG_VERSION The PCRE2 version (a string)
</pre> </pre>

View File

@ -31,8 +31,9 @@ DESCRIPTION
<P> <P>
This function matches a compiled regular expression against a given subject This function matches a compiled regular expression against a given subject
string, using an alternative matching algorithm that scans the subject string string, using an alternative matching algorithm that scans the subject string
just once (<i>not</i> Perl-compatible). (The Perl-compatible matching function just once (except when processing lookaround assertions). This function is
is <b>pcre2_match()</b>.) The arguments for this function are: <i>not</i> Perl-compatible (the Perl-compatible matching function is
<b>pcre2_match()</b>). The arguments for this function are:
<pre> <pre>
<i>code</i> Points to the compiled pattern <i>code</i> Points to the compiled pattern
<i>subject</i> Points to the subject string <i>subject</i> Points to the subject string
@ -45,22 +46,18 @@ is <b>pcre2_match()</b>.) The arguments for this function are:
<i>wscount</i> Number of elements in the vector <i>wscount</i> Number of elements in the vector
</pre> </pre>
For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
up a callout function or specify the recursion limit. The <i>length</i> and up a callout function or specify the recursion depth limit. The <i>length</i>
<i>startoffset</i> values are code units, not characters. The options are: and <i>startoffset</i> values are code units, not characters. The options are:
<pre> <pre>
PCRE2_ANCHORED Match only at the first position PCRE2_ANCHORED Match only at the first position
PCRE2_NOTBOL Subject is not the beginning of a line PCRE2_NOTBOL Subject is not the beginning of a line
PCRE2_NOTEOL Subject is not the end of a line PCRE2_NOTEOL Subject is not the end of a line
PCRE2_NOTEMPTY An empty string is not a valid match PCRE2_NOTEMPTY An empty string is not a valid match
PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject is not a valid match
is not a valid match PCRE2_NO_UTF_CHECK Do not check the subject for UTF validity (only relevant if PCRE2_UTF
PCRE2_NO_UTF_CHECK Do not check the subject for UTF
validity (only relevant if PCRE2_UTF
was set at compile time) was set at compile time)
PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match even if there is a full match
match if no full matches are found PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial match if no full matches are found
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match
even if there is a full match as well
PCRE2_DFA_RESTART Restart after a partial match PCRE2_DFA_RESTART Restart after a partial match
PCRE2_DFA_SHORTEST Return only the shortest match PCRE2_DFA_SHORTEST Return only the shortest match
</pre> </pre>

View File

@ -34,11 +34,11 @@ errors are negative numbers. The arguments are:
<i>buffer</i> where to put the message <i>buffer</i> where to put the message
<i>bufflen</i> the length of the buffer (code units) <i>bufflen</i> the length of the buffer (code units)
</pre> </pre>
The function returns the length of the message, excluding the trailing zero, or The function returns the length of the message in code units, excluding the
the negative error code PCRE2_ERROR_NOMEMORY if the buffer is too small. In trailing zero, or the negative error code PCRE2_ERROR_NOMEMORY if the buffer is
this case, the returned message is truncated (but still with a trailing zero). too small. In this case, the returned message is truncated (but still with a
If <i>errorcode</i> does not contain a recognized error code number, the trailing zero). If <i>errorcode</i> does not contain a recognized error code
negative value PCRE2_ERROR_BADDATA is returned. number, the negative value PCRE2_ERROR_BADDATA is returned.
</P> </P>
<P> <P>
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API in the

View File

@ -32,10 +32,9 @@ maximum size to which it is allowed to grow. The final argument is a general
context, for memory allocation functions, or NULL for standard memory context, for memory allocation functions, or NULL for standard memory
allocation. The result can be passed to the JIT run-time code by calling allocation. The result can be passed to the JIT run-time code by calling
<b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern, <b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern,
which can then be processed by <b>pcre2_match()</b>. If the "fast path" JIT which can then be processed by <b>pcre2_match()</b> or <b>pcre2_jit_match()</b>.
matcher, <b>pcre2_jit_match()</b> is used, the stack can be passed directly as A maximum stack size of 512K to 1M should be more than enough for any pattern.
an argument. A maximum stack size of 512K to 1M should be more than enough for For more details, see the
any pattern. For more details, see the
<a href="pcre2jit.html"><b>pcre2jit</b></a> <a href="pcre2jit.html"><b>pcre2jit</b></a>
page. page.
</P> </P>

View File

@ -25,10 +25,10 @@ SYNOPSIS
DESCRIPTION DESCRIPTION
</b><br> </b><br>
<P> <P>
This function builds a set of character tables for character values less than This function builds a set of character tables for character code points that
256. These can be passed to <b>pcre2_compile()</b> in a compile context in order are less than 256. These can be passed to <b>pcre2_compile()</b> in a compile
to override the internal, built-in tables (which were either defaulted or made context in order to override the internal, built-in tables (which were either
by <b>pcre2_maketables()</b> when PCRE2 was compiled). See the defaulted or made by <b>pcre2_maketables()</b> when PCRE2 was compiled). See the
<a href="pcre2_set_character_tables.html"><b>pcre2_set_character_tables()</b></a> <a href="pcre2_set_character_tables.html"><b>pcre2_set_character_tables()</b></a>
page. You might want to do this if you are using a non-standard locale. page. You might want to do this if you are using a non-standard locale.
</P> </P>

View File

@ -2575,8 +2575,8 @@ The internal recursion limit was reached.
A text message for an error code from any PCRE2 function (compile, match, or A text message for an error code from any PCRE2 function (compile, match, or
auxiliary) can be obtained by calling <b>pcre2_get_error_message()</b>. The code auxiliary) can be obtained by calling <b>pcre2_get_error_message()</b>. The code
is passed as the first argument, with the remaining two arguments specifying a is passed as the first argument, with the remaining two arguments specifying a
code unit buffer and its length, into which the text message is placed. Note code unit buffer and its length in code units, into which the text message is
that the message is returned in code units of the appropriate width for the placed. The message is returned in code units of the appropriate width for the
library that is being used. library that is being used.
</P> </P>
<P> <P>
@ -3265,9 +3265,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC41" href="#TOC1">REVISION</a><br> <br><a name="SEC41" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 23 December 2016 Last updated: 21 March 2017
<br> <br>
Copyright &copy; 1997-2016 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -280,6 +280,10 @@ operating systems the effect of reading a directory like this is an immediate
end-of-file; in others it may provoke an error. end-of-file; in others it may provoke an error.
</P> </P>
<P> <P>
<b>--depth-limit</b>=<i>number</i>
See <b>--match-limit</b> below.
</P>
<P>
<b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>, <b>--regexp=</b><i>pattern</i> <b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>, <b>--regexp=</b><i>pattern</i>
Specify a pattern to be matched. This option can be used multiple times in Specify a pattern to be matched. This option can be used multiple times in
order to specify several patterns. It can also be used as a way of specifying a order to specify several patterns. It can also be used as a way of specifying a
@ -498,29 +502,22 @@ used. There is no short form for this option.
</P> </P>
<P> <P>
<b>--match-limit</b>=<i>number</i> <b>--match-limit</b>=<i>number</i>
Processing some regular expression patterns can require a very large amount of Processing some regular expression patterns may take a very long time to search
memory, leading in some cases to a program crash if not enough is available. for all possible matching strings. Others may require a very large amount of
Other patterns may take a very long time to search for all possible matching memory. There are two options that set resource limits for matching.
strings. The <b>pcre2_match()</b> function that is called by <b>pcre2grep</b> to
do the matching has two parameters that can limit the resources that it uses.
<br> <br>
<br> <br>
The <b>--match-limit</b> option provides a means of limiting resource usage The <b>--match-limit</b> option provides a means of limiting computing resource
when processing patterns that are not going to match, but which have a very usage when processing patterns that are not going to match, but which have a
large number of possibilities in their search trees. The classic example is a very large number of possibilities in their search trees. The classic example
pattern that uses nested unlimited repeats. Internally, PCRE2 uses a function is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a
called <b>match()</b> which it calls repeatedly (sometimes recursively). The counter that is incremented each time around its main processing loop. If the
limit set by <b>--match-limit</b> is imposed on the number of times this value set by <b>--match-limit</b> is reached, an error occurs.
function is called during a match, which has the effect of limiting the amount
of backtracking that can take place.
<br> <br>
<br> <br>
The <b>--recursion-limit</b> option is similar to <b>--match-limit</b>, but The <b>--depth-limit</b> option limits the depth of nested backtracking points,
instead of limiting the total number of times that <b>match()</b> is called, it which in turn limits the amount of memory that is used. This limit is of use
limits the depth of recursive calls, which in turn limits the amount of memory only if it is set smaller than <b>--match-limit</b>.
that can be used. The recursion depth is a smaller number than the total number
of calls, because not all calls to <b>match()</b> are recursive. This limit is
of use only if it is set smaller than <b>--match-limit</b>.
<br> <br>
<br> <br>
There are no short forms for these options. The default settings are specified There are no short forms for these options. The default settings are specified
@ -843,9 +840,9 @@ there are more than 20 such errors, <b>pcre2grep</b> gives up.
</P> </P>
<P> <P>
The <b>--match-limit</b> option of <b>pcre2grep</b> can be used to set the The <b>--match-limit</b> option of <b>pcre2grep</b> can be used to set the
overall resource limit; there is a second option called <b>--recursion-limit</b> overall resource limit; there is a second option called <b>--depth-limit</b>
that sets a limit on the amount of memory (usually stack) that is used (see the that sets a limit on the amount of memory that is used (see the discussion of
discussion of these options above). these options above).
</P> </P>
<br><a name="SEC12" href="#TOC1">DIAGNOSTICS</a><br> <br><a name="SEC12" href="#TOC1">DIAGNOSTICS</a><br>
<P> <P>
@ -870,9 +867,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC15" href="#TOC1">REVISION</a><br> <br><a name="SEC15" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 31 December 2016 Last updated: 21 March 2017
<br> <br>
Copyright &copy; 1997-2016 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -170,20 +170,24 @@ the application to apply the JIT optimization by calling
<b>pcre2_jit_compile()</b> is ignored. <b>pcre2_jit_compile()</b> is ignored.
</P> </P>
<br><b> <br><b>
Setting match and recursion limits Setting match and backtracking depth limits
</b><br> </b><br>
<P> <P>
The caller of <b>pcre2_match()</b> can set a limit on the number of times the The pcre2_match() function contains a counter that is incremented every time it
internal <b>match()</b> function is called and on the maximum depth of goes round its main loop. The caller of <b>pcre2_match()</b> can set a limit on
recursive calls. These facilities are provided to catch runaway matches that this counter, which therefore limits the amount of computing resource used for
are provoked by patterns with huge matching trees (a typical example is a a match. The maximum depth of nested backtracking can also be limited, and this
pattern with nested unlimited repeats) and to avoid running out of system stack restricts the amount of heap memory that is used.
by too much recursion. When one of these limits is reached, <b>pcre2_match()</b> </P>
gives an error return. The limits can also be set by items at the start of the <P>
pattern of the form These facilities are provided to catch runaway matches that are provoked by
patterns with huge matching trees (a typical example is a pattern with nested
unlimited repeats applied to a long string that does not match). When one of
these limits is reached, <b>pcre2_match()</b> gives an error return. The limits
can also be set by items at the start of the pattern of the form
<pre> <pre>
(*LIMIT_MATCH=d) (*LIMIT_MATCH=d)
(*LIMIT_RECURSION=d) (*LIMIT_DEPTH=d)
</pre> </pre>
where d is any number of decimal digits. However, the value of the setting must where d is any number of decimal digits. However, the value of the setting must
be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b> be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
@ -192,10 +196,15 @@ limits set by the programmer, but not raise them. If there is more than one
setting of one of these limits, the lower value is used. setting of one of these limits, the lower value is used.
</P> </P>
<P> <P>
Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is
still recognized for backwards compatibility.
</P>
<P>
The match limit is used (but in a different way) when JIT is being used, but it The match limit is used (but in a different way) when JIT is being used, but it
is not relevant, and is ignored, when matching with <b>pcre2_dfa_match()</b>. is not relevant, and is ignored, when matching with <b>pcre2_dfa_match()</b>.
However, the recursion limit is relevant for DFA matching, which does use some However, the depth limit is relevant for DFA matching, which uses function
function recursion, in particular, for recursions within the pattern. recursion for recursions within the pattern. In this case, the depth limit
controls the amount of system stack that is used.
<a name="newlines"></a></P> <a name="newlines"></a></P>
<br><b> <br><b>
Newline conventions Newline conventions
@ -235,8 +244,8 @@ The newline convention affects where the circumflex and dollar assertions are
true. It also affects the interpretation of the dot metacharacter when true. It also affects the interpretation of the dot metacharacter when
PCRE2_DOTALL is not set, and the behaviour of \N. However, it does not affect PCRE2_DOTALL is not set, and the behaviour of \N. However, it does not affect
what the \R escape sequence matches. By default, this is any Unicode newline what the \R escape sequence matches. By default, this is any Unicode newline
sequence, for Perl compatibility. However, this can be changed; see the sequence, for Perl compatibility. However, this can be changed; see the next
description of \R in the section entitled section and the description of \R in the section entitled
<a href="#newlineseq">"Newline sequences"</a> <a href="#newlineseq">"Newline sequences"</a>
below. A change of \R setting can be combined with a change of newline below. A change of \R setting can be combined with a change of newline
convention. convention.
@ -254,7 +263,7 @@ corresponding to PCRE2_BSR_UNICODE.
<br><a name="SEC3" href="#TOC1">EBCDIC CHARACTER CODES</a><br> <br><a name="SEC3" href="#TOC1">EBCDIC CHARACTER CODES</a><br>
<P> <P>
PCRE2 can be compiled to run in an environment that uses EBCDIC as its PCRE2 can be compiled to run in an environment that uses EBCDIC as its
character code rather than ASCII or Unicode (typically a mainframe system). In character code instead of ASCII or Unicode (typically a mainframe system). In
the sections below, character code values are ASCII or Unicode; in an EBCDIC the sections below, character code values are ASCII or Unicode; in an EBCDIC
environment these characters may have different code values, and there are no environment these characters may have different code values, and there are no
code points greater than 255. code points greater than 255.
@ -318,11 +327,11 @@ that character may have. This use of backslash as an escape character applies
both inside and outside character classes. both inside and outside character classes.
</P> </P>
<P> <P>
For example, if you want to match a * character, you write \* in the pattern. For example, if you want to match a * character, you must write \* in the
This escaping action applies whether or not the following character would pattern. This escaping action applies whether or not the following character
otherwise be interpreted as a metacharacter, so it is always safe to precede a would otherwise be interpreted as a metacharacter, so it is always safe to
non-alphanumeric with backslash to specify that it stands for itself. In precede a non-alphanumeric with backslash to specify that it stands for itself.
particular, if you want to match a backslash, you write \\. In particular, if you want to match a backslash, you write \\.
</P> </P>
<P> <P>
In a UTF mode, only ASCII numbers and letters have any special meaning after a In a UTF mode, only ASCII numbers and letters have any special meaning after a
@ -353,7 +362,7 @@ An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
by \E later in the pattern, the literal interpretation continues to the end of by \E later in the pattern, the literal interpretation continues to the end of
the pattern (that is, \E is assumed at the end). If the isolated \Q is inside the pattern (that is, \E is assumed at the end). If the isolated \Q is inside
a character class, this causes an error, because the character class is not a character class, this causes an error, because the character class is not
terminated. terminated by a closing square bracket.
<a name="digitsafterbackslash"></a></P> <a name="digitsafterbackslash"></a></P>
<br><b> <br><b>
Non-printing characters Non-printing characters
@ -476,9 +485,9 @@ a hexadecimal digit appears between \x{ and }, or if there is no terminating
<P> <P>
If the PCRE2_ALT_BSUX option is set, the interpretation of \x is as just If the PCRE2_ALT_BSUX option is set, the interpretation of \x is as just
described only when it is followed by two hexadecimal digits. Otherwise, it described only when it is followed by two hexadecimal digits. Otherwise, it
matches a literal "x" character. In this mode mode, support for code points matches a literal "x" character. In this mode, support for code points greater
greater than 256 is provided by \u, which must be followed by four hexadecimal than 256 is provided by \u, which must be followed by four hexadecimal digits;
digits; otherwise it matches a literal "u" character. otherwise it matches a literal "u" character.
</P> </P>
<P> <P>
Characters whose value is less than 256 can be defined by either of the two Characters whose value is less than 256 can be defined by either of the two
@ -493,12 +502,10 @@ Constraints on character values
Characters that are specified using octal or hexadecimal numbers are Characters that are specified using octal or hexadecimal numbers are
limited to certain values, as follows: limited to certain values, as follows:
<pre> <pre>
8-bit non-UTF mode less than 0x100 8-bit non-UTF mode no greater than 0xff
8-bit UTF-8 mode less than 0x10ffff and a valid codepoint 16-bit non-UTF mode no greater than 0xffff
16-bit non-UTF mode less than 0x10000 32-bit non-UTF mode no greater than 0xffffffff
16-bit UTF-16 mode less than 0x10ffff and a valid codepoint All UTF modes no greater than 0x10ffff and a valid codepoint
32-bit non-UTF mode less than 0x100000000
32-bit UTF-32 mode less than 0x10ffff and a valid codepoint
</pre> </pre>
Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called
"surrogate" codepoints), and 0xffef. "surrogate" codepoints), and 0xffef.
@ -525,7 +532,7 @@ In Perl, the sequences \l, \L, \u, and \U are recognized by its string
handler and used to modify the case of following characters. By default, PCRE2 handler and used to modify the case of following characters. By default, PCRE2
does not support these escape sequences. However, if the PCRE2_ALT_BSUX option does not support these escape sequences. However, if the PCRE2_ALT_BSUX option
is set, \U matches a "U" character, and \u can be used to define a character is set, \U matches a "U" character, and \u can be used to define a character
by code point, as described in the previous section. by code point, as described above.
</P> </P>
<br><b> <br><b>
Absolute and relative back references Absolute and relative back references
@ -714,7 +721,9 @@ When PCRE2 is built with Unicode support (the default), three additional escape
sequences that match characters with specific properties are available. In sequences that match characters with specific properties are available. In
8-bit non-UTF-8 mode, these sequences are of course limited to testing 8-bit non-UTF-8 mode, these sequences are of course limited to testing
characters whose codepoints are less than 256, but they do work in this mode. characters whose codepoints are less than 256, but they do work in this mode.
The extra escape sequences are: In 32-bit non-UTF mode, codepoints greater than 0x10ffff (the Unicode limit)
may be encountered. These are all treated as being in the Common script and
with an unassigned type. The extra escape sequences are:
<pre> <pre>
\p{<i>xx</i>} a character with the <i>xx</i> property \p{<i>xx</i>} a character with the <i>xx</i> property
\P{<i>xx</i>} a character without the <i>xx</i> property \P{<i>xx</i>} a character without the <i>xx</i> property
@ -2214,16 +2223,8 @@ except that it does not cause the current matching position to be changed.
Assertion subpatterns are not capturing subpatterns. If such an assertion Assertion subpatterns are not capturing subpatterns. If such an assertion
contains capturing subpatterns within it, these are counted for the purposes of contains capturing subpatterns within it, these are counted for the purposes of
numbering the capturing subpatterns in the whole pattern. However, substring numbering the capturing subpatterns in the whole pattern. However, substring
capturing is carried out only for positive assertions. (Perl sometimes, but not capturing is normally carried out only for positive assertions (but see the
always, does do capturing in negative assertions.) discussion of conditional subpatterns below).
</P>
<P>
WARNING: If a positive assertion containing one or more capturing subpatterns
succeeds, but failure to match later in the pattern causes backtracking over
this assertion, the captures within the assertion are reset only if no higher
numbered captures are already set. This is, unfortunately, a fundamental
limitation of the current implementation; it may get removed in a future
reworking.
</P> </P>
<P> <P>
For compatibility with Perl, most assertion subpatterns may be repeated; though For compatibility with Perl, most assertion subpatterns may be repeated; though
@ -2601,6 +2602,12 @@ presence of at least one letter in the subject. If a letter is found, the
subject is matched against the first alternative; otherwise it is matched subject is matched against the first alternative; otherwise it is matched
against the second. This pattern matches strings in one of the two forms against the second. This pattern matches strings in one of the two forms
dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits. dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
</P>
<P>
For Perl compatibility, if an assertion that is a condition contains capturing
subpatterns, any capturing that occurs is retained afterwards, for both
positive and negative assertions. (Compare non-conditional assertions, when
captures are retained only for positive assertions.)
<a name="comments"></a></P> <a name="comments"></a></P>
<br><a name="SEC22" href="#TOC1">COMMENTS</a><br> <br><a name="SEC22" href="#TOC1">COMMENTS</a><br>
<P> <P>
@ -2773,93 +2780,57 @@ is the actual recursive call.
Differences in recursion processing between PCRE2 and Perl Differences in recursion processing between PCRE2 and Perl
</b><br> </b><br>
<P> <P>
Recursion processing in PCRE2 differs from Perl in two important ways. In PCRE2 Some former differences between PCRE2 and Perl no longer exist.
(like Python, but unlike Perl), a recursive subpattern call is always treated
as an atomic group. That is, once it has matched some of the subject string, it
is never re-entered, even if it contains untried alternatives and there is a
subsequent matching failure. This can be illustrated by the following pattern,
which purports to match a palindromic string that contains an odd number of
characters (for example, "a", "aba", "abcba", "abcdcba"):
<pre>
^(.|(.)(?1)\2)$
</pre>
The idea is that it either matches a single character, or two identical
characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE2
it does not if the pattern is longer than three characters. Consider the
subject string "abcba":
</P> </P>
<P> <P>
At the top level, the first character is matched, but as it is not at the end Before release 10.30, recursion processing in PCRE2 differed from Perl in that
of the string, the first alternative fails; the second alternative is taken a recursive subpattern call was always treated as an atomic group. That is,
and the recursion kicks in. The recursive call to subpattern 1 successfully once it had matched some of the subject string, it was never re-entered, even
matches the next character ("b"). (Note that the beginning and end of line if it contained untried alternatives and there was a subsequent matching
tests are not part of the recursion). failure. (Historical note: PCRE implemented recursion before Perl did.)
</P> </P>
<P> <P>
Back at the top level, the next character ("c") is compared with what Starting with release 10.30, recursive subroutine calls are no longer treated
subpattern 2 matched, which was "a". This fails. Because the recursion is as atomic. That is, they can be re-entered to try unused alternatives if there
treated as an atomic group, there are now no backtracking points, and so the is a matching failure later in the pattern. This is now compatible with the way
entire match fails. (Perl is able, at this point, to re-enter the recursion and Perl works. If you want a subroutine call to be atomic, you must explicitly
try the second alternative.) However, if the pattern is written with the enclose it in an atomic group.
alternatives in the other order, things are different:
<pre>
^((.)(?1)\2|.)$
</pre>
This time, the recursing alternative is tried first, and continues to recurse
until it runs out of characters, at which point the recursion fails. But this
time we do have another alternative to try at the higher level. That is the big
difference: in the previous case the remaining alternative is at a deeper
recursion level, which PCRE2 cannot use.
</P> </P>
<P> <P>
To change the pattern so that it matches all palindromic strings, not just Supporting backtracking into recursions simplifies certain types of recursive
those with an odd number of characters, it is tempting to change the pattern to pattern. For example, this pattern matches palindromic strings:
this:
<pre> <pre>
^((.)(?1)\2|.?)$ ^((.)(?1)\2|.?)$
</pre> </pre>
Again, this works in Perl, but not in PCRE2, and for the same reason. When a The second branch in the group matches a single central character in the
deeper recursion has matched a single character, it cannot be entered again in palindrome when there are an odd number of characters, or nothing when there
order to match an empty string. The solution is to separate the two cases, and are an even number of characters, but in order to work it has to be able to try
write out the odd and even cases as alternatives at the higher level: the second case when the rest of the pattern match fails. If you want to match
typical palindromic phrases, the pattern has to ignore all non-word characters,
which can be done like this:
<pre> <pre>
^(?:((.)(?1)\2|)|((.)(?3)\4|.)) ^\W*+((.)\W*+(?1)\W*+\2|\W*+.?)\W*+$
</pre>
If you want to match typical palindromic phrases, the pattern has to ignore all
non-word characters, which can be done like this:
<pre>
^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
</pre> </pre>
If run with the PCRE2_CASELESS option, this pattern matches phrases such as "A If run with the PCRE2_CASELESS option, this pattern matches phrases such as "A
man, a plan, a canal: Panama!" and it works in both PCRE2 and Perl. Note the man, a plan, a canal: Panama!". Note the use of the possessive quantifier *+ to
use of the possessive quantifier *+ to avoid backtracking into sequences of avoid backtracking into sequences of non-word characters. Without this, PCRE2
non-word characters. Without this, PCRE2 takes a great deal longer (ten times takes a great deal longer (ten times or more) to match typical phrases, and
or more) to match typical phrases, and Perl takes so long that you think it has Perl takes so long that you think it has gone into a loop.
gone into a loop.
</P> </P>
<P> <P>
<b>WARNING</b>: The palindrome-matching patterns above work only if the subject Another way in which PCRE2 and Perl used to differ in their recursion
string does not start with a palindrome that is shorter than the entire string. processing is in the handling of captured values. Formerly in Perl, when a
For example, although "abcba" is correctly matched, if the subject is "ababa", subpattern was called recursively or as a subpattern (see the next section), it
PCRE2 finds the palindrome "aba" at the start, then fails at top level because had no access to any values that were captured outside the recursion, whereas
the end of the string does not follow. Once again, it cannot jump back into the in PCRE2 these values can be referenced. Consider this pattern:
recursion to try other alternatives, so the entire match fails.
</P>
<P>
The second way in which PCRE2 and Perl differ in their recursion processing is
in the handling of captured values. In Perl, when a subpattern is called
recursively or as a subpattern (see the next section), it has no access to any
values that were captured outside the recursion, whereas in PCRE2 these values
can be referenced. Consider this pattern:
<pre> <pre>
^(.)(\1|a(?2)) ^(.)(\1|a(?2))
</pre> </pre>
In PCRE2, this pattern matches "bab". The first capturing parentheses match "b", This pattern matches "bab". The first capturing parentheses match "b", then in
then in the second group, when the back reference \1 fails to match "b", the the second group, when the back reference \1 fails to match "b", the second
second alternative matches "a" and then recurses. In the recursion, \1 does alternative matches "a" and then recurses. In the recursion, \1 does now match
now match "b" and so the whole match succeeds. In Perl, the pattern fails to "b" and so the whole match succeeds. This match used to fail in Perl, but in
match because inside the recursive call \1 cannot access the externally set later versions (I tried 5.024) it now works.
value.
<a name="subpatternsassubroutines"></a></P> <a name="subpatternsassubroutines"></a></P>
<br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br> <br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
<P> <P>
@ -2886,11 +2857,10 @@ is used, it does match "sense and responsibility" as well as the other two
strings. Another example is given in the discussion of DEFINE above. strings. Another example is given in the discussion of DEFINE above.
</P> </P>
<P> <P>
All subroutine calls, whether recursive or not, are always treated as atomic Like recursions, subroutine calls used to be treated as atomic, but this
groups. That is, once a subroutine has matched some of the subject string, it changed at PCRE2 release 10.30, so backtracking into subroutine calls can now
is never re-entered, even if it contains untried alternatives and there is a occur. However, any capturing parentheses that are set during the subroutine
subsequent matching failure. Any capturing parentheses that are set during the call revert to their previous values afterwards.
subroutine call revert to their previous values afterwards.
</P> </P>
<P> <P>
Processing options such as case-independence are fixed when a subpattern is Processing options such as case-independence are fixed when a subpattern is
@ -2998,17 +2968,10 @@ The doubling is removed before the string is passed to the callout function.
<a name="backtrackcontrol"></a></P> <a name="backtrackcontrol"></a></P>
<br><a name="SEC27" href="#TOC1">BACKTRACKING CONTROL</a><br> <br><a name="SEC27" href="#TOC1">BACKTRACKING CONTROL</a><br>
<P> <P>
Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which There are a number of special "Backtracking Control Verbs" (to use Perl's
are still described in the Perl documentation as "experimental and subject to terminology) that modify the behaviour of backtracking during matching. They
change or removal in a future version of Perl". It goes on to say: "Their usage are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form,
in production code should be noted to avoid problems during upgrades." The same possibly behaving differently depending on whether or not a name is present.
remarks apply to the PCRE2 features described in this section.
</P>
<P>
The new verbs make use of what was previously invalid syntax: an opening
parenthesis followed by an asterisk. They are generally of the form (*VERB) or
(*VERB:NAME). Some verbs take either form, possibly behaving differently
depending on whether or not a name is present.
</P> </P>
<P> <P>
By default, for compatibility with Perl, a name is any sequence of characters By default, for compatibility with Perl, a name is any sequence of characters
@ -3040,7 +3003,7 @@ not there. Any number of these verbs may occur in a pattern.
<P> <P>
Since these verbs are specifically related to backtracking, most of them can be Since these verbs are specifically related to backtracking, most of them can be
used only when the pattern is to be matched using the traditional matching used only when the pattern is to be matched using the traditional matching
function, because these use a backtracking algorithm. With the exception of function, because that uses a backtracking algorithm. With the exception of
(*FAIL), which behaves like a failing negative assertion, the backtracking (*FAIL), which behaves like a failing negative assertion, the backtracking
control verbs cause an error if encountered by the DFA matching function. control verbs cause an error if encountered by the DFA matching function.
</P> </P>
@ -3178,11 +3141,11 @@ Verbs that act after backtracking
The following verbs do nothing when they are encountered. Matching continues The following verbs do nothing when they are encountered. Matching continues
with what follows, but if there is no subsequent match, causing a backtrack to with what follows, but if there is no subsequent match, causing a backtrack to
the verb, a failure is forced. That is, backtracking cannot pass to the left of the verb, a failure is forced. That is, backtracking cannot pass to the left of
the verb. However, when one of these verbs appears inside an atomic group the verb. However, when one of these verbs appears inside an atomic group or in
(which includes any group that is called as a subroutine) or in an assertion an assertion that is true, its effect is confined to that group, because once
that is true, its effect is confined to that group, because once the group has the group has been matched, there is never any backtracking into it. In this
been matched, there is never any backtracking into it. In this situation, situation, backtracking has to jump to the left of the entire atomic group or
backtracking has to jump to the left of the entire atomic group or assertion. assertion.
</P> </P>
<P> <P>
These verbs differ in exactly what kind of failure occurs when backtracking These verbs differ in exactly what kind of failure occurs when backtracking
@ -3246,8 +3209,8 @@ expressed in any other way. In an anchored pattern (*PRUNE) has the same effect
as (*COMMIT). as (*COMMIT).
</P> </P>
<P> <P>
The behaviour of (*PRUNE:NAME) is the not the same as (*MARK:NAME)(*PRUNE). The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is
It is like (*MARK:NAME) in that the name is remembered for passing back to the like (*MARK:NAME) in that the name is remembered for passing back to the
caller. However, (*SKIP:NAME) searches only for names set with (*MARK), caller. However, (*SKIP:NAME) searches only for names set with (*MARK),
ignoring those set by (*PRUNE) or (*THEN). ignoring those set by (*PRUNE) or (*THEN).
<pre> <pre>
@ -3452,9 +3415,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br> <br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 27 December 2016 Last updated: 18 March 2017
<br> <br>
Copyright &copy; 1997-2016 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -55,7 +55,10 @@ The facility for saving and restoring compiled patterns is intended for use
within individual applications. As such, the data supplied to within individual applications. As such, the data supplied to
<b>pcre2_serialize_decode()</b> is expected to be trusted data, not data from <b>pcre2_serialize_decode()</b> is expected to be trusted data, not data from
arbitrary external sources. There is only some simple consistency checking, not arbitrary external sources. There is only some simple consistency checking, not
complete validation of what is being re-loaded. complete validation of what is being re-loaded. Corrupted data may cause
undefined results. For example, if the length field of a pattern in the
serialized data is corrupted, the deserializing code may read beyond the end of
the byte stream that is passed to it.
</P> </P>
<br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br> <br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
<P> <P>
@ -190,9 +193,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC6" href="#TOC1">REVISION</a><br> <br><a name="SEC6" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 24 May 2016 Last updated: 21 March 2017
<br> <br>
Copyright &copy; 1997-2016 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

View File

@ -126,12 +126,13 @@ character values up to 0x7fffffff. Each character is placed in one 16-bit or
to occur). to occur).
</P> </P>
<P> <P>
UTF-8 is not capable of encoding values greater than 0x7fffffff, but such UTF-8 (in its original definition) is not capable of encoding values greater
values can be handled by the 32-bit library. When testing this library in than 0x7fffffff, but such values can be handled by the 32-bit library. When
non-UTF mode with <b>utf8_input</b> set, if any character is preceded by the testing this library in non-UTF mode with <b>utf8_input</b> set, if any
byte 0xff (which is an illegal byte in UTF-8) 0x80000000 is added to the character is preceded by the byte 0xff (which is an illegal byte in UTF-8)
character's value. This is the only way of passing such code points in a 0x80000000 is added to the character's value. This is the only way of passing
pattern string. For subject strings, using an escape sequence is preferable. such code points in a pattern string. For subject strings, using an escape
sequence is preferable.
</P> </P>
<br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br> <br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br>
<P> <P>
@ -602,6 +603,7 @@ about the pattern:
/B bincode show binary code without lengths /B bincode show binary code without lengths
callout_info show callout information callout_info show callout information
debug same as info,fullbincode debug same as info,fullbincode
framesize show matching frame size
fullbincode show binary code with lengths fullbincode show binary code with lengths
/I info show info about compiled pattern /I info show info about compiled pattern
hex unquoted characters are hexadecimal hex unquoted characters are hexadecimal
@ -689,6 +691,11 @@ not necessarily the last character. These lines are omitted if no starting or
ending code units are recorded. ending code units are recorded.
</P> </P>
<P> <P>
The <b>framesize</b> modifier shows the size, in bytes, of the storage frames
used by <b>pcre2_match()</b> for handling backtracking. The size depends on the
number of capturing parentheses in the pattern.
</P>
<P>
The <b>callout_info</b> modifier requests information about all the callouts in The <b>callout_info</b> modifier requests information about all the callouts in
the pattern. A list of them is output at the end of any other information that the pattern. A list of them is output at the end of any other information that
is requested. For each callout, either its number or string is given, followed is requested. For each callout, either its number or string is given, followed
@ -1073,6 +1080,7 @@ pattern.
callout_fail=&#60;n&#62;[:&#60;m&#62;] control callout failure callout_fail=&#60;n&#62;[:&#60;m&#62;] control callout failure
callout_none do not supply a callout function callout_none do not supply a callout function
copy=&#60;number or name&#62; copy captured substring copy=&#60;number or name&#62; copy captured substring
depth_limit=&#60;n&#62; set a depth limit
dfa use <b>pcre2_dfa_match()</b> dfa use <b>pcre2_dfa_match()</b>
find_limits find match and recursion limits find_limits find match and recursion limits
get=&#60;number or name&#62; extract captured substring get=&#60;number or name&#62; extract captured substring
@ -1086,7 +1094,7 @@ pattern.
offset=&#60;n&#62; set starting offset offset=&#60;n&#62; set starting offset
offset_limit=&#60;n&#62; set offset limit offset_limit=&#60;n&#62; set offset limit
ovector=&#60;n&#62; set size of output vector ovector=&#60;n&#62; set size of output vector
recursion_limit=&#60;n&#62; set a recursion limit recursion_limit=&#60;n&#62; obsolete synonym for depth_limit
replace=&#60;string&#62; specify a replacement string replace=&#60;string&#62; specify a replacement string
startchar show startchar when relevant startchar show startchar when relevant
startoffset=&#60;n&#62; same as offset=&#60;n&#62; startoffset=&#60;n&#62; same as offset=&#60;n&#62;
@ -1320,10 +1328,10 @@ stack that is larger than the default 32K is necessary only for very
complicated patterns. complicated patterns.
</P> </P>
<br><b> <br><b>
Setting match and recursion limits Setting match and depth limits
</b><br> </b><br>
<P> <P>
The <b>match_limit</b> and <b>recursion_limit</b> modifiers set the appropriate The <b>match_limit</b> and <b>depth_limit</b> modifiers set the appropriate
limits in the match context. These values are ignored when the limits in the match context. These values are ignored when the
<b>find_limits</b> modifier is specified. <b>find_limits</b> modifier is specified.
</P> </P>
@ -1333,23 +1341,23 @@ Finding minimum limits
<P> <P>
If the <b>find_limits</b> modifier is present, <b>pcre2test</b> calls If the <b>find_limits</b> modifier is present, <b>pcre2test</b> calls
<b>pcre2_match()</b> several times, setting different values in the match <b>pcre2_match()</b> several times, setting different values in the match
context via <b>pcre2_set_match_limit()</b> and <b>pcre2_set_recursion_limit()</b> context via <b>pcre2_set_match_limit()</b> and <b>pcre2_set_depth_limit()</b>
until it finds the minimum values for each parameter that allow until it finds the minimum values for each parameter that allow
<b>pcre2_match()</b> to complete without error. <b>pcre2_match()</b> to complete without error.
</P> </P>
<P> <P>
If JIT is being used, only the match limit is relevant. If DFA matching is If JIT is being used, only the match limit is relevant. If DFA matching is
being used, neither limit is relevant, and this modifier is ignored (with a being used, only the depth limit is relevant, but at present this modifier is
warning message). ignored (with a warning message).
</P> </P>
<P> <P>
The <i>match_limit</i> number is a measure of the amount of backtracking The <i>match_limit</i> number is a measure of the amount of backtracking
that takes place, and learning the minimum value can be instructive. For most that takes place, and learning the minimum value can be instructive. For most
simple matches, the number is quite small, but for patterns with very large simple matches, the number is quite small, but for patterns with very large
numbers of matching possibilities, it can become large very quickly with numbers of matching possibilities, it can become large very quickly with
increasing length of subject string. The <i>match_limit_recursion</i> number is increasing length of subject string. The <i>depth_limit</i> number is
a measure of how much stack (or, if PCRE2 is compiled with NO_RECURSE, how much a measure of how much memory for recording backtracking points is needed to
heap) memory is needed to complete the match attempt. complete the match attempt.
</P> </P>
<br><b> <br><b>
Showing MARK names Showing MARK names
@ -1466,7 +1474,7 @@ code unit offset of the start of the failing character is also output. Here is
an example of an interactive <b>pcre2test</b> run. an example of an interactive <b>pcre2test</b> run.
<pre> <pre>
$ pcre2test $ pcre2test
PCRE2 version 9.00 2014-05-10 PCRE2 version 10.22 2016-07-29
re&#62; /^abc(\d+)/ re&#62; /^abc(\d+)/
data&#62; abc123 data&#62; abc123
@ -1779,9 +1787,9 @@ Cambridge, England.
</P> </P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br> <br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 28 December 2016 Last updated: 21 March 2017
<br> <br>
Copyright &copy; 1997-2016 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>
<p> <p>
Return to the <a href="index.html">PCRE2 index page</a>. Return to the <a href="index.html">PCRE2 index page</a>.

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
.TH PCRE2_CONFIG 3 "20 April 2014" "PCRE2 10.0" .TH PCRE2_CONFIG 3 "24 March 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -31,10 +31,13 @@ point to a uint32_t integer variable. The available codes are:
PCRE2_CONFIG_BSR Indicates what \eR matches by default: PCRE2_CONFIG_BSR Indicates what \eR matches by default:
PCRE2_BSR_UNICODE PCRE2_BSR_UNICODE
PCRE2_BSR_ANYCRLF PCRE2_BSR_ANYCRLF
PCRE2_CONFIG_DEPTHLIMIT Default backtracking depth limit
.\" JOIN
PCRE2_CONFIG_JIT Availability of just-in-time compiler PCRE2_CONFIG_JIT Availability of just-in-time compiler
support (1=yes 0=no) support (1=yes 0=no)
PCRE2_CONFIG_JITTARGET Information about the target archi- .\" JOIN
tecture for the JIT compiler PCRE2_CONFIG_JITTARGET Information (a string) about the target
architecture for the JIT compiler
PCRE2_CONFIG_LINKSIZE Configured internal link size (2, 3, 4) PCRE2_CONFIG_LINKSIZE Configured internal link size (2, 3, 4)
PCRE2_CONFIG_MATCHLIMIT Default internal resource limit PCRE2_CONFIG_MATCHLIMIT Default internal resource limit
PCRE2_CONFIG_NEWLINE Code for the default newline sequence: PCRE2_CONFIG_NEWLINE Code for the default newline sequence:
@ -44,9 +47,9 @@ point to a uint32_t integer variable. The available codes are:
PCRE2_NEWLINE_ANY PCRE2_NEWLINE_ANY
PCRE2_NEWLINE_ANYCRLF PCRE2_NEWLINE_ANYCRLF
PCRE2_CONFIG_PARENSLIMIT Default parentheses nesting limit PCRE2_CONFIG_PARENSLIMIT Default parentheses nesting limit
PCRE2_CONFIG_RECURSIONLIMIT Internal recursion depth limit PCRE2_CONFIG_RECURSIONLIMIT Obsolete: use PCRE2_CONFIG_DEPTHLIMIT
PCRE2_CONFIG_STACKRECURSE Recursion implementation (1=stack PCRE2_CONFIG_STACKRECURSE Obsolete: always returns 0
0=heap) .\" JOIN
PCRE2_CONFIG_UNICODE Availability of Unicode support (1=yes PCRE2_CONFIG_UNICODE Availability of Unicode support (1=yes
0=no) 0=no)
PCRE2_CONFIG_UNICODE_VERSION The Unicode version (a string) PCRE2_CONFIG_UNICODE_VERSION The Unicode version (a string)

View File

@ -1,4 +1,4 @@
.TH PCRE2_DFA_MATCH 3 "23 December 2016" "PCRE2 10.23" .TH PCRE2_DFA_MATCH 3 "24 March 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -19,8 +19,9 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
This function matches a compiled regular expression against a given subject This function matches a compiled regular expression against a given subject
string, using an alternative matching algorithm that scans the subject string string, using an alternative matching algorithm that scans the subject string
just once (\fInot\fP Perl-compatible). (The Perl-compatible matching function just once (except when processing lookaround assertions). This function is
is \fBpcre2_match()\fP.) The arguments for this function are: \fInot\fP Perl-compatible (the Perl-compatible matching function is
\fBpcre2_match()\fP). The arguments for this function are:
.sp .sp
\fIcode\fP Points to the compiled pattern \fIcode\fP Points to the compiled pattern
\fIsubject\fP Points to the subject string \fIsubject\fP Points to the subject string
@ -33,22 +34,26 @@ is \fBpcre2_match()\fP.) The arguments for this function are:
\fIwscount\fP Number of elements in the vector \fIwscount\fP Number of elements in the vector
.sp .sp
For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
up a callout function or specify the recursion limit. The \fIlength\fP and up a callout function or specify the recursion depth limit. The \fIlength\fP
\fIstartoffset\fP values are code units, not characters. The options are: and \fIstartoffset\fP values are code units, not characters. The options are:
.sp .sp
PCRE2_ANCHORED Match only at the first position PCRE2_ANCHORED Match only at the first position
PCRE2_NOTBOL Subject is not the beginning of a line PCRE2_NOTBOL Subject is not the beginning of a line
PCRE2_NOTEOL Subject is not the end of a line PCRE2_NOTEOL Subject is not the end of a line
PCRE2_NOTEMPTY An empty string is not a valid match PCRE2_NOTEMPTY An empty string is not a valid match
.\" JOIN
PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject
is not a valid match is not a valid match
.\" JOIN
PCRE2_NO_UTF_CHECK Do not check the subject for UTF PCRE2_NO_UTF_CHECK Do not check the subject for UTF
validity (only relevant if PCRE2_UTF validity (only relevant if PCRE2_UTF
was set at compile time) was set at compile time)
.\" JOIN
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial
match even if there is a full match
.\" JOIN
PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial
match if no full matches are found match if no full matches are found
PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial match
even if there is a full match as well
PCRE2_DFA_RESTART Restart after a partial match PCRE2_DFA_RESTART Restart after a partial match
PCRE2_DFA_SHORTEST Return only the shortest match PCRE2_DFA_SHORTEST Return only the shortest match
.sp .sp

View File

@ -1,4 +1,4 @@
.TH PCRE2_GET_ERROR_MESSAGE 3 "17 June 2016" "PCRE2 10.22" .TH PCRE2_GET_ERROR_MESSAGE 3 "24 March 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -22,11 +22,11 @@ errors are negative numbers. The arguments are:
\fIbuffer\fP where to put the message \fIbuffer\fP where to put the message
\fIbufflen\fP the length of the buffer (code units) \fIbufflen\fP the length of the buffer (code units)
.sp .sp
The function returns the length of the message, excluding the trailing zero, or The function returns the length of the message in code units, excluding the
the negative error code PCRE2_ERROR_NOMEMORY if the buffer is too small. In trailing zero, or the negative error code PCRE2_ERROR_NOMEMORY if the buffer is
this case, the returned message is truncated (but still with a trailing zero). too small. In this case, the returned message is truncated (but still with a
If \fIerrorcode\fP does not contain a recognized error code number, the trailing zero). If \fIerrorcode\fP does not contain a recognized error code
negative value PCRE2_ERROR_BADDATA is returned. number, the negative value PCRE2_ERROR_BADDATA is returned.
.P .P
There is a complete description of the PCRE2 native API in the There is a complete description of the PCRE2 native API in the
.\" HREF .\" HREF

View File

@ -1,4 +1,4 @@
.TH PCRE2_JIT_STACK_CREATE 3 "03 November 2014" "PCRE2 10.00" .TH PCRE2_JIT_STACK_CREATE 3 "24 March 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -20,10 +20,9 @@ maximum size to which it is allowed to grow. The final argument is a general
context, for memory allocation functions, or NULL for standard memory context, for memory allocation functions, or NULL for standard memory
allocation. The result can be passed to the JIT run-time code by calling allocation. The result can be passed to the JIT run-time code by calling
\fBpcre2_jit_stack_assign()\fP to associate the stack with a compiled pattern, \fBpcre2_jit_stack_assign()\fP to associate the stack with a compiled pattern,
which can then be processed by \fBpcre2_match()\fP. If the "fast path" JIT which can then be processed by \fBpcre2_match()\fP or \fBpcre2_jit_match()\fP.
matcher, \fBpcre2_jit_match()\fP is used, the stack can be passed directly as A maximum stack size of 512K to 1M should be more than enough for any pattern.
an argument. A maximum stack size of 512K to 1M should be more than enough for For more details, see the
any pattern. For more details, see the
.\" HREF .\" HREF
\fBpcre2jit\fP \fBpcre2jit\fP
.\" .\"

View File

@ -1,4 +1,4 @@
.TH PCRE2_MAKETABLES 3 "21 October 2014" "PCRE2 10.00" .TH PCRE2_MAKETABLES 3 "24 March 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -12,10 +12,10 @@ PCRE2 - Perl-compatible regular expressions (revised API)
.SH DESCRIPTION .SH DESCRIPTION
.rs .rs
.sp .sp
This function builds a set of character tables for character values less than This function builds a set of character tables for character code points that
256. These can be passed to \fBpcre2_compile()\fP in a compile context in order are less than 256. These can be passed to \fBpcre2_compile()\fP in a compile
to override the internal, built-in tables (which were either defaulted or made context in order to override the internal, built-in tables (which were either
by \fBpcre2_maketables()\fP when PCRE2 was compiled). See the defaulted or made by \fBpcre2_maketables()\fP when PCRE2 was compiled). See the
.\" HREF .\" HREF
\fBpcre2_set_character_tables()\fP \fBpcre2_set_character_tables()\fP
.\" .\"

View File

@ -255,6 +255,9 @@ OPTIONS
directory like this is an immediate end-of-file; in others it directory like this is an immediate end-of-file; in others it
may provoke an error. may provoke an error.
--depth-limit=number
See --match-limit below.
-e pattern, --regex=pattern, --regexp=pattern -e pattern, --regex=pattern, --regexp=pattern
Specify a pattern to be matched. This option can be used mul- Specify a pattern to be matched. This option can be used mul-
tiple times in order to specify several patterns. It can also tiple times in order to specify several patterns. It can also
@ -477,32 +480,24 @@ OPTIONS
no short form for this option. no short form for this option.
--match-limit=number --match-limit=number
Processing some regular expression patterns can require a Processing some regular expression patterns may take a very
very large amount of memory, leading in some cases to a pro- long time to search for all possible matching strings. Others
gram crash if not enough is available. Other patterns may may require a very large amount of memory. There are two
take a very long time to search for all possible matching options that set resource limits for matching.
strings. The pcre2_match() function that is called by
pcre2grep to do the matching has two parameters that can
limit the resources that it uses.
The --match-limit option provides a means of limiting The --match-limit option provides a means of limiting comput-
resource usage when processing patterns that are not going to ing resource usage when processing patterns that are not
match, but which have a very large number of possibilities in going to match, but which have a very large number of possi-
their search trees. The classic example is a pattern that bilities in their search trees. The classic example is a pat-
uses nested unlimited repeats. Internally, PCRE2 uses a func- tern that uses nested unlimited repeats. Internally, PCRE2
tion called match() which it calls repeatedly (sometimes has a counter that is incremented each time around its main
recursively). The limit set by --match-limit is imposed on processing loop. If the value set by --match-limit is
the number of times this function is called during a match, reached, an error occurs.
which has the effect of limiting the amount of backtracking
that can take place.
The --recursion-limit option is similar to --match-limit, but The --depth-limit option limits the depth of nested back-
instead of limiting the total number of times that match() is tracking points, which in turn limits the amount of memory
called, it limits the depth of recursive calls, which in turn that is used. This limit is of use only if it is set smaller
limits the amount of memory that can be used. The recursion than --match-limit.
depth is a smaller number than the total number of calls,
because not all calls to match() are recursive. This limit is
of use only if it is set smaller than --match-limit.
There are no short forms for these options. The default set- There are no short forms for these options. The default set-
tings are specified when the PCRE2 library is compiled, with tings are specified when the PCRE2 library is compiled, with
@ -834,9 +829,9 @@ MATCHING ERRORS
such errors, pcre2grep gives up. such errors, pcre2grep gives up.
The --match-limit option of pcre2grep can be used to set the overall The --match-limit option of pcre2grep can be used to set the overall
resource limit; there is a second option called --recursion-limit that resource limit; there is a second option called --depth-limit that sets
sets a limit on the amount of memory (usually stack) that is used (see a limit on the amount of memory that is used (see the discussion of
the discussion of these options above). these options above).
DIAGNOSTICS DIAGNOSTICS
@ -862,5 +857,5 @@ AUTHOR
REVISION REVISION
Last updated: 31 December 2016 Last updated: 21 March 2017
Copyright (c) 1997-2016 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.

View File

@ -91,13 +91,13 @@ INPUT ENCODING
ter is placed in one 16-bit or 32-bit code unit (in the 16-bit case, ter is placed in one 16-bit or 32-bit code unit (in the 16-bit case,
values greater than 0xffff cause an error to occur). values greater than 0xffff cause an error to occur).
UTF-8 is not capable of encoding values greater than 0x7fffffff, but UTF-8 (in its original definition) is not capable of encoding values
such values can be handled by the 32-bit library. When testing this greater than 0x7fffffff, but such values can be handled by the 32-bit
library in non-UTF mode with utf8_input set, if any character is pre- library. When testing this library in non-UTF mode with utf8_input set,
ceded by the byte 0xff (which is an illegal byte in UTF-8) 0x80000000 if any character is preceded by the byte 0xff (which is an illegal byte
is added to the character's value. This is the only way of passing such in UTF-8) 0x80000000 is added to the character's value. This is the
code points in a pattern string. For subject strings, using an escape only way of passing such code points in a pattern string. For subject
sequence is preferable. strings, using an escape sequence is preferable.
COMMAND LINE OPTIONS COMMAND LINE OPTIONS
@ -544,6 +544,7 @@ PATTERN MODIFIERS
/B bincode show binary code without lengths /B bincode show binary code without lengths
callout_info show callout information callout_info show callout information
debug same as info,fullbincode debug same as info,fullbincode
framesize show matching frame size
fullbincode show binary code with lengths fullbincode show binary code with lengths
/I info show info about compiled pattern /I info show info about compiled pattern
hex unquoted characters are hexadecimal hex unquoted characters are hexadecimal
@ -624,6 +625,10 @@ PATTERN MODIFIERS
last character. These lines are omitted if no starting or ending code last character. These lines are omitted if no starting or ending code
units are recorded. units are recorded.
The framesize modifier shows the size, in bytes, of the storage frames
used by pcre2_match() for handling backtracking. The size depends on
the number of capturing parentheses in the pattern.
The callout_info modifier requests information about all the callouts The callout_info modifier requests information about all the callouts
in the pattern. A list of them is output at the end of any other infor- in the pattern. A list of them is output at the end of any other infor-
mation that is requested. For each callout, either its number or string mation that is requested. For each callout, either its number or string
@ -959,6 +964,7 @@ SUBJECT MODIFIERS
callout_fail=<n>[:<m>] control callout failure callout_fail=<n>[:<m>] control callout failure
callout_none do not supply a callout function callout_none do not supply a callout function
copy=<number or name> copy captured substring copy=<number or name> copy captured substring
depth_limit=<n> set a depth limit
dfa use pcre2_dfa_match() dfa use pcre2_dfa_match()
find_limits find match and recursion limits find_limits find match and recursion limits
get=<number or name> extract captured substring get=<number or name> extract captured substring
@ -972,7 +978,7 @@ SUBJECT MODIFIERS
offset=<n> set starting offset offset=<n> set starting offset
offset_limit=<n> set offset limit offset_limit=<n> set offset limit
ovector=<n> set size of output vector ovector=<n> set size of output vector
recursion_limit=<n> set a recursion limit recursion_limit=<n> obsolete synonym for depth_limit
replace=<string> specify a replacement string replace=<string> specify a replacement string
startchar show startchar when relevant startchar show startchar when relevant
startoffset=<n> same as offset=<n> startoffset=<n> same as offset=<n>
@ -1188,133 +1194,132 @@ SUBJECT MODIFIERS
Providing a stack that is larger than the default 32K is necessary only Providing a stack that is larger than the default 32K is necessary only
for very complicated patterns. for very complicated patterns.
Setting match and recursion limits Setting match and depth limits
The match_limit and recursion_limit modifiers set the appropriate lim- The match_limit and depth_limit modifiers set the appropriate limits in
its in the match context. These values are ignored when the find_limits the match context. These values are ignored when the find_limits modi-
modifier is specified. fier is specified.
Finding minimum limits Finding minimum limits
If the find_limits modifier is present, pcre2test calls pcre2_match() If the find_limits modifier is present, pcre2test calls pcre2_match()
several times, setting different values in the match context via several times, setting different values in the match context via
pcre2_set_match_limit() and pcre2_set_recursion_limit() until it finds pcre2_set_match_limit() and pcre2_set_depth_limit() until it finds the
the minimum values for each parameter that allow pcre2_match() to com- minimum values for each parameter that allow pcre2_match() to complete
plete without error. without error.
If JIT is being used, only the match limit is relevant. If DFA matching If JIT is being used, only the match limit is relevant. If DFA matching
is being used, neither limit is relevant, and this modifier is ignored is being used, only the depth limit is relevant, but at present this
(with a warning message). modifier is ignored (with a warning message).
The match_limit number is a measure of the amount of backtracking that The match_limit number is a measure of the amount of backtracking that
takes place, and learning the minimum value can be instructive. For takes place, and learning the minimum value can be instructive. For
most simple matches, the number is quite small, but for patterns with most simple matches, the number is quite small, but for patterns with
very large numbers of matching possibilities, it can become large very very large numbers of matching possibilities, it can become large very
quickly with increasing length of subject string. The quickly with increasing length of subject string. The depth_limit num-
match_limit_recursion number is a measure of how much stack (or, if ber is a measure of how much memory for recording backtracking points
PCRE2 is compiled with NO_RECURSE, how much heap) memory is needed to is needed to complete the match attempt.
complete the match attempt.
Showing MARK names Showing MARK names
The mark modifier causes the names from backtracking control verbs that The mark modifier causes the names from backtracking control verbs that
are returned from calls to pcre2_match() to be displayed. If a mark is are returned from calls to pcre2_match() to be displayed. If a mark is
returned for a match, non-match, or partial match, pcre2test shows it. returned for a match, non-match, or partial match, pcre2test shows it.
For a match, it is on a line by itself, tagged with "MK:". Otherwise, For a match, it is on a line by itself, tagged with "MK:". Otherwise,
it is added to the non-match message. it is added to the non-match message.
Showing memory usage Showing memory usage
The memory modifier causes pcre2test to log all memory allocation and The memory modifier causes pcre2test to log all memory allocation and
freeing calls that occur during a match operation. freeing calls that occur during a match operation.
Setting a starting offset Setting a starting offset
The offset modifier sets an offset in the subject string at which The offset modifier sets an offset in the subject string at which
matching starts. Its value is a number of code units, not characters. matching starts. Its value is a number of code units, not characters.
Setting an offset limit Setting an offset limit
The offset_limit modifier sets a limit for unanchored matches. If a The offset_limit modifier sets a limit for unanchored matches. If a
match cannot be found starting at or before this offset in the subject, match cannot be found starting at or before this offset in the subject,
a "no match" return is given. The data value is a number of code units, a "no match" return is given. The data value is a number of code units,
not characters. When this modifier is used, the use_offset_limit modi- not characters. When this modifier is used, the use_offset_limit modi-
fier must have been set for the pattern; if not, an error is generated. fier must have been set for the pattern; if not, an error is generated.
Setting the size of the output vector Setting the size of the output vector
The ovector modifier applies only to the subject line in which it The ovector modifier applies only to the subject line in which it
appears, though of course it can also be used to set a default in a appears, though of course it can also be used to set a default in a
#subject command. It specifies the number of pairs of offsets that are #subject command. It specifies the number of pairs of offsets that are
available for storing matching information. The default is 15. available for storing matching information. The default is 15.
A value of zero is useful when testing the POSIX API because it causes A value of zero is useful when testing the POSIX API because it causes
regexec() to be called with a NULL capture vector. When not testing the regexec() to be called with a NULL capture vector. When not testing the
POSIX API, a value of zero is used to cause pcre2_match_data_cre- POSIX API, a value of zero is used to cause pcre2_match_data_cre-
ate_from_pattern() to be called, in order to create a match block of ate_from_pattern() to be called, in order to create a match block of
exactly the right size for the pattern. (It is not possible to create a exactly the right size for the pattern. (It is not possible to create a
match block with a zero-length ovector; there is always at least one match block with a zero-length ovector; there is always at least one
pair of offsets.) pair of offsets.)
Passing the subject as zero-terminated Passing the subject as zero-terminated
By default, the subject string is passed to a native API matching func- By default, the subject string is passed to a native API matching func-
tion with its correct length. In order to test the facility for passing tion with its correct length. In order to test the facility for passing
a zero-terminated string, the zero_terminate modifier is provided. It a zero-terminated string, the zero_terminate modifier is provided. It
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
via the POSIX interface, this modifier has no effect, as there is no via the POSIX interface, this modifier has no effect, as there is no
facility for passing a length.) facility for passing a length.)
When testing pcre2_substitute(), this modifier also has the effect of When testing pcre2_substitute(), this modifier also has the effect of
passing the replacement string as zero-terminated. passing the replacement string as zero-terminated.
Passing a NULL context Passing a NULL context
Normally, pcre2test passes a context block to pcre2_match(), Normally, pcre2test passes a context block to pcre2_match(),
pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
set, however, NULL is passed. This is for testing that the matching set, however, NULL is passed. This is for testing that the matching
functions behave correctly in this case (they use default values). This functions behave correctly in this case (they use default values). This
modifier cannot be used with the find_limits modifier or when testing modifier cannot be used with the find_limits modifier or when testing
the substitution function. the substitution function.
THE ALTERNATIVE MATCHING FUNCTION THE ALTERNATIVE MATCHING FUNCTION
By default, pcre2test uses the standard PCRE2 matching function, By default, pcre2test uses the standard PCRE2 matching function,
pcre2_match() to match each subject line. PCRE2 also supports an alter- pcre2_match() to match each subject line. PCRE2 also supports an alter-
native matching function, pcre2_dfa_match(), which operates in a dif- native matching function, pcre2_dfa_match(), which operates in a dif-
ferent way, and has some restrictions. The differences between the two ferent way, and has some restrictions. The differences between the two
functions are described in the pcre2matching documentation. functions are described in the pcre2matching documentation.
If the dfa modifier is set, the alternative matching function is used. If the dfa modifier is set, the alternative matching function is used.
This function finds all possible matches at a given point in the sub- This function finds all possible matches at a given point in the sub-
ject. If, however, the dfa_shortest modifier is set, processing stops ject. If, however, the dfa_shortest modifier is set, processing stops
after the first match is found. This is always the shortest possible after the first match is found. This is always the shortest possible
match. match.
DEFAULT OUTPUT FROM pcre2test DEFAULT OUTPUT FROM pcre2test
This section describes the output when the normal matching function, This section describes the output when the normal matching function,
pcre2_match(), is being used. pcre2_match(), is being used.
When a match succeeds, pcre2test outputs the list of captured sub- When a match succeeds, pcre2test outputs the list of captured sub-
strings, starting with number 0 for the string that matched the whole strings, starting with number 0 for the string that matched the whole
pattern. Otherwise, it outputs "No match" when the return is pattern. Otherwise, it outputs "No match" when the return is
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
this is the entire substring that was inspected during the partial this is the entire substring that was inspected during the partial
match; it may include characters before the actual match start if a match; it may include characters before the actual match start if a
lookbehind assertion, \K, \b, or \B was involved.) lookbehind assertion, \K, \b, or \B was involved.)
For any other return, pcre2test outputs the PCRE2 negative error number For any other return, pcre2test outputs the PCRE2 negative error number
and a short descriptive phrase. If the error is a failed UTF string and a short descriptive phrase. If the error is a failed UTF string
check, the code unit offset of the start of the failing character is check, the code unit offset of the start of the failing character is
also output. Here is an example of an interactive pcre2test run. also output. Here is an example of an interactive pcre2test run.
$ pcre2test $ pcre2test
PCRE2 version 9.00 2014-05-10 PCRE2 version 10.22 2016-07-29
re> /^abc(\d+)/ re> /^abc(\d+)/
data> abc123 data> abc123
@ -1326,8 +1331,8 @@ DEFAULT OUTPUT FROM pcre2test
Unset capturing substrings that are not followed by one that is set are Unset capturing substrings that are not followed by one that is set are
not shown by pcre2test unless the allcaptures modifier is specified. In not shown by pcre2test unless the allcaptures modifier is specified. In
the following example, there are two capturing substrings, but when the the following example, there are two capturing substrings, but when the
first data line is matched, the second, unset substring is not shown. first data line is matched, the second, unset substring is not shown.
An "internal" unset substring is shown as "<unset>", as for the second An "internal" unset substring is shown as "<unset>", as for the second
data line. data line.
re> /(a)|(b)/ re> /(a)|(b)/
@ -1339,11 +1344,11 @@ DEFAULT OUTPUT FROM pcre2test
1: <unset> 1: <unset>
2: b 2: b
If the strings contain any non-printing characters, they are output as If the strings contain any non-printing characters, they are output as
\xhh escapes if the value is less than 256 and UTF mode is not set. \xhh escapes if the value is less than 256 and UTF mode is not set.
Otherwise they are output as \x{hh...} escapes. See below for the defi- Otherwise they are output as \x{hh...} escapes. See below for the defi-
nition of non-printing characters. If the aftertext modifier is set, nition of non-printing characters. If the aftertext modifier is set,
the output for substring 0 is followed by the the rest of the subject the output for substring 0 is followed by the the rest of the subject
string, identified by "0+" like this: string, identified by "0+" like this:
re> /cat/aftertext re> /cat/aftertext
@ -1351,7 +1356,7 @@ DEFAULT OUTPUT FROM pcre2test
0: cat 0: cat
0+ aract 0+ aract
If global matching is requested, the results of successive matching If global matching is requested, the results of successive matching
attempts are output in sequence, like this: attempts are output in sequence, like this:
re> /\Bi(\w\w)/g re> /\Bi(\w\w)/g
@ -1363,8 +1368,8 @@ DEFAULT OUTPUT FROM pcre2test
0: ipp 0: ipp
1: pp 1: pp
"No match" is output only if the first match attempt fails. Here is an "No match" is output only if the first match attempt fails. Here is an
example of a failure message (the offset 4 that is specified by the example of a failure message (the offset 4 that is specified by the
offset modifier is past the end of the subject string): offset modifier is past the end of the subject string):
re> /xyz/ re> /xyz/
@ -1372,7 +1377,7 @@ DEFAULT OUTPUT FROM pcre2test
Error -24 (bad offset value) Error -24 (bad offset value)
Note that whereas patterns can be continued over several lines (a plain Note that whereas patterns can be continued over several lines (a plain
">" prompt is used for continuations), subject lines may not. However ">" prompt is used for continuations), subject lines may not. However
newlines can be included in a subject by means of the \n escape (or \r, newlines can be included in a subject by means of the \n escape (or \r,
\r\n, etc., depending on the newline sequence setting). \r\n, etc., depending on the newline sequence setting).
@ -1380,7 +1385,7 @@ DEFAULT OUTPUT FROM pcre2test
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
When the alternative matching function, pcre2_dfa_match(), is used, the When the alternative matching function, pcre2_dfa_match(), is used, the
output consists of a list of all the matches that start at the first output consists of a list of all the matches that start at the first
point in the subject where there is at least one match. For example: point in the subject where there is at least one match. For example:
re> /(tang|tangerine|tan)/ re> /(tang|tangerine|tan)/
@ -1389,11 +1394,11 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1: tang 1: tang
2: tan 2: tan
Using the normal matching function on this data finds only "tang". The Using the normal matching function on this data finds only "tang". The
longest matching string is always given first (and numbered zero). longest matching string is always given first (and numbered zero).
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
followed by the partially matching substring. Note that this is the followed by the partially matching substring. Note that this is the
entire substring that was inspected during the partial match; it may entire substring that was inspected during the partial match; it may
include characters before the actual match start if a lookbehind asser- include characters before the actual match start if a lookbehind asser-
tion, \b, or \B was involved. (\K is not supported for DFA matching.) tion, \b, or \B was involved. (\K is not supported for DFA matching.)
@ -1409,16 +1414,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1: tan 1: tan
0: tan 0: tan
The alternative matching function does not support substring capture, The alternative matching function does not support substring capture,
so the modifiers that are concerned with captured substrings are not so the modifiers that are concerned with captured substrings are not
relevant. relevant.
RESTARTING AFTER A PARTIAL MATCH RESTARTING AFTER A PARTIAL MATCH
When the alternative matching function has given the PCRE2_ERROR_PAR- When the alternative matching function has given the PCRE2_ERROR_PAR-
TIAL return, indicating that the subject partially matched the pattern, TIAL return, indicating that the subject partially matched the pattern,
you can restart the match with additional subject data by means of the you can restart the match with additional subject data by means of the
dfa_restart modifier. For example: dfa_restart modifier. For example:
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
@ -1427,45 +1432,45 @@ RESTARTING AFTER A PARTIAL MATCH
data> n05\=dfa,dfa_restart data> n05\=dfa,dfa_restart
0: n05 0: n05
For further information about partial matching, see the pcre2partial For further information about partial matching, see the pcre2partial
documentation. documentation.
CALLOUTS CALLOUTS
If the pattern contains any callout requests, pcre2test's callout func- If the pattern contains any callout requests, pcre2test's callout func-
tion is called during matching unless callout_none is specified. This tion is called during matching unless callout_none is specified. This
works with both matching functions. works with both matching functions.
The callout function in pcre2test returns zero (carry on matching) by The callout function in pcre2test returns zero (carry on matching) by
default, but you can use a callout_fail modifier in a subject line (as default, but you can use a callout_fail modifier in a subject line (as
described above) to change this and other parameters of the callout. described above) to change this and other parameters of the callout.
Inserting callouts can be helpful when using pcre2test to check compli- Inserting callouts can be helpful when using pcre2test to check compli-
cated regular expressions. For further information about callouts, see cated regular expressions. For further information about callouts, see
the pcre2callout documentation. the pcre2callout documentation.
The output for callouts with numerical arguments and those with string The output for callouts with numerical arguments and those with string
arguments is slightly different. arguments is slightly different.
Callouts with numerical arguments Callouts with numerical arguments
By default, the callout function displays the callout number, the start By default, the callout function displays the callout number, the start
and current positions in the subject text at the callout time, and the and current positions in the subject text at the callout time, and the
next pattern item to be tested. For example: next pattern item to be tested. For example:
--->pqrabcdef --->pqrabcdef
0 ^ ^ \d 0 ^ ^ \d
This output indicates that callout number 0 occurred for a match This output indicates that callout number 0 occurred for a match
attempt starting at the fourth character of the subject string, when attempt starting at the fourth character of the subject string, when
the pointer was at the seventh character, and when the next pattern the pointer was at the seventh character, and when the next pattern
item was \d. Just one circumflex is output if the start and current item was \d. Just one circumflex is output if the start and current
positions are the same, or if the current position precedes the start positions are the same, or if the current position precedes the start
position, which can happen if the callout is in a lookbehind assertion. position, which can happen if the callout is in a lookbehind assertion.
Callouts numbered 255 are assumed to be automatic callouts, inserted as Callouts numbered 255 are assumed to be automatic callouts, inserted as
a result of the /auto_callout pattern modifier. In this case, instead a result of the /auto_callout pattern modifier. In this case, instead
of showing the callout number, the offset in the pattern, preceded by a of showing the callout number, the offset in the pattern, preceded by a
plus, is output. For example: plus, is output. For example:
@ -1479,7 +1484,7 @@ CALLOUTS
0: E* 0: E*
If a pattern contains (*MARK) items, an additional line is output when- If a pattern contains (*MARK) items, an additional line is output when-
ever a change of latest mark is passed to the callout function. For ever a change of latest mark is passed to the callout function. For
example: example:
re> /a(*MARK:X)bc/auto_callout re> /a(*MARK:X)bc/auto_callout
@ -1493,17 +1498,17 @@ CALLOUTS
+12 ^ ^ +12 ^ ^
0: abc 0: abc
The mark changes between matching "a" and "b", but stays the same for The mark changes between matching "a" and "b", but stays the same for
the rest of the match, so nothing more is output. If, as a result of the rest of the match, so nothing more is output. If, as a result of
backtracking, the mark reverts to being unset, the text "<unset>" is backtracking, the mark reverts to being unset, the text "<unset>" is
output. output.
Callouts with string arguments Callouts with string arguments
The output for a callout with a string argument is similar, except that The output for a callout with a string argument is similar, except that
instead of outputting a callout number before the position indicators, instead of outputting a callout number before the position indicators,
the callout string and its offset in the pattern string are output the callout string and its offset in the pattern string are output
before the reflection of the subject string, and the subject string is before the reflection of the subject string, and the subject string is
reflected for each callout. For example: reflected for each callout. For example:
re> /^ab(?C'first')cd(?C"second")ef/ re> /^ab(?C'first')cd(?C"second")ef/
@ -1520,43 +1525,43 @@ CALLOUTS
NON-PRINTING CHARACTERS NON-PRINTING CHARACTERS
When pcre2test is outputting text in the compiled version of a pattern, When pcre2test is outputting text in the compiled version of a pattern,
bytes other than 32-126 are always treated as non-printing characters bytes other than 32-126 are always treated as non-printing characters
and are therefore shown as hex escapes. and are therefore shown as hex escapes.
When pcre2test is outputting text that is a matched part of a subject When pcre2test is outputting text that is a matched part of a subject
string, it behaves in the same way, unless a different locale has been string, it behaves in the same way, unless a different locale has been
set for the pattern (using the locale modifier). In this case, the set for the pattern (using the locale modifier). In this case, the
isprint() function is used to distinguish printing and non-printing isprint() function is used to distinguish printing and non-printing
characters. characters.
SAVING AND RESTORING COMPILED PATTERNS SAVING AND RESTORING COMPILED PATTERNS
It is possible to save compiled patterns on disc or elsewhere, and It is possible to save compiled patterns on disc or elsewhere, and
reload them later, subject to a number of restrictions. JIT data cannot reload them later, subject to a number of restrictions. JIT data cannot
be saved. The host on which the patterns are reloaded must be running be saved. The host on which the patterns are reloaded must be running
the same version of PCRE2, with the same code unit width, and must also the same version of PCRE2, with the same code unit width, and must also
have the same endianness, pointer width and PCRE2_SIZE type. Before have the same endianness, pointer width and PCRE2_SIZE type. Before
compiled patterns can be saved they must be serialized, that is, con- compiled patterns can be saved they must be serialized, that is, con-
verted to a stream of bytes. A single byte stream may contain any num- verted to a stream of bytes. A single byte stream may contain any num-
ber of compiled patterns, but they must all use the same character ber of compiled patterns, but they must all use the same character
tables. A single copy of the tables is included in the byte stream (its tables. A single copy of the tables is included in the byte stream (its
size is 1088 bytes). size is 1088 bytes).
The functions whose names begin with pcre2_serialize_ are used for The functions whose names begin with pcre2_serialize_ are used for
serializing and de-serializing. They are described in the pcre2serial- serializing and de-serializing. They are described in the pcre2serial-
ize documentation. In this section we describe the features of ize documentation. In this section we describe the features of
pcre2test that can be used to test these functions. pcre2test that can be used to test these functions.
When a pattern with push modifier is successfully compiled, it is When a pattern with push modifier is successfully compiled, it is
pushed onto a stack of compiled patterns, and pcre2test expects the pushed onto a stack of compiled patterns, and pcre2test expects the
next line to contain a new pattern (or command) instead of a subject next line to contain a new pattern (or command) instead of a subject
line. By contrast, the pushcopy modifier causes a copy of the compiled line. By contrast, the pushcopy modifier causes a copy of the compiled
pattern to be stacked, leaving the original available for immediate pattern to be stacked, leaving the original available for immediate
matching. By using push and/or pushcopy, a number of patterns can be matching. By using push and/or pushcopy, a number of patterns can be
compiled and retained. These modifiers are incompatible with posix, and compiled and retained. These modifiers are incompatible with posix, and
control modifiers that act at match time are ignored (with a message) control modifiers that act at match time are ignored (with a message)
for the stacked patterns. The jitverify modifier applies only at com- for the stacked patterns. The jitverify modifier applies only at com-
pile time. pile time.
The command The command
@ -1564,21 +1569,21 @@ SAVING AND RESTORING COMPILED PATTERNS
#save <filename> #save <filename>
causes all the stacked patterns to be serialized and the result written causes all the stacked patterns to be serialized and the result written
to the named file. Afterwards, all the stacked patterns are freed. The to the named file. Afterwards, all the stacked patterns are freed. The
command command
#load <filename> #load <filename>
reads the data in the file, and then arranges for it to be de-serial- reads the data in the file, and then arranges for it to be de-serial-
ized, with the resulting compiled patterns added to the pattern stack. ized, with the resulting compiled patterns added to the pattern stack.
The pattern on the top of the stack can be retrieved by the #pop com- The pattern on the top of the stack can be retrieved by the #pop com-
mand, which must be followed by lines of subjects that are to be mand, which must be followed by lines of subjects that are to be
matched with the pattern, terminated as usual by an empty line or end matched with the pattern, terminated as usual by an empty line or end
of file. This command may be followed by a modifier list containing of file. This command may be followed by a modifier list containing
only control modifiers that act after a pattern has been compiled. In only control modifiers that act after a pattern has been compiled. In
particular, hex, posix, posix_nosub, push, and pushcopy are not particular, hex, posix, posix_nosub, push, and pushcopy are not
allowed, nor are any option-setting modifiers. The JIT modifiers are, allowed, nor are any option-setting modifiers. The JIT modifiers are,
however permitted. Here is an example that saves and reloads two pat- however permitted. Here is an example that saves and reloads two pat-
terns. terns.
/abc/push /abc/push
@ -1591,10 +1596,10 @@ SAVING AND RESTORING COMPILED PATTERNS
#pop jit,bincode #pop jit,bincode
abc abc
If jitverify is used with #pop, it does not automatically imply jit, If jitverify is used with #pop, it does not automatically imply jit,
which is different behaviour from when it is used on a pattern. which is different behaviour from when it is used on a pattern.
The #popcopy command is analagous to the pushcopy modifier in that it The #popcopy command is analagous to the pushcopy modifier in that it
makes current a copy of the topmost stack pattern, leaving the original makes current a copy of the topmost stack pattern, leaving the original
still on the stack. still on the stack.
@ -1614,5 +1619,5 @@ AUTHOR
REVISION REVISION
Last updated: 28 December 2016 Last updated: 21 March 2017
Copyright (c) 1997-2016 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.