Documentation update.

This commit is contained in:
Philip.Hazel 2017-03-28 16:34:29 +00:00
parent 447d1b3083
commit 6c7fa44939
5 changed files with 1206 additions and 1232 deletions

View File

@ -46,7 +46,7 @@ A match context is needed only if you want to:
Set a matching offset limit
Change the backtracking match limit
Change the backtracking depth limit
Set custom memory management in the match context
Set custom memory management specifically for the match
</pre>
The <i>length</i> and <i>startoffset</i> values are code
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a

View File

@ -23,37 +23,38 @@ please consult the man page, in case the conversion went wrong.
<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a>
<li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
<li><a name="TOC11" href="#SEC11">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
<li><a name="TOC12" href="#SEC12">PCRE2 API OVERVIEW</a>
<li><a name="TOC13" href="#SEC13">STRING LENGTHS AND OFFSETS</a>
<li><a name="TOC14" href="#SEC14">NEWLINES</a>
<li><a name="TOC15" href="#SEC15">MULTITHREADING</a>
<li><a name="TOC16" href="#SEC16">PCRE2 CONTEXTS</a>
<li><a name="TOC17" href="#SEC17">CHECKING BUILD-TIME OPTIONS</a>
<li><a name="TOC18" href="#SEC18">COMPILING A PATTERN</a>
<li><a name="TOC19" href="#SEC19">COMPILATION ERROR CODES</a>
<li><a name="TOC20" href="#SEC20">JUST-IN-TIME (JIT) COMPILATION</a>
<li><a name="TOC21" href="#SEC21">LOCALE SUPPORT</a>
<li><a name="TOC22" href="#SEC22">INFORMATION ABOUT A COMPILED PATTERN</a>
<li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A PATTERN'S CALLOUTS</a>
<li><a name="TOC24" href="#SEC24">SERIALIZATION AND PRECOMPILING</a>
<li><a name="TOC25" href="#SEC25">THE MATCH DATA BLOCK</a>
<li><a name="TOC26" href="#SEC26">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
<li><a name="TOC27" href="#SEC27">NEWLINE HANDLING WHEN MATCHING</a>
<li><a name="TOC28" href="#SEC28">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
<li><a name="TOC29" href="#SEC29">OTHER INFORMATION ABOUT A MATCH</a>
<li><a name="TOC30" href="#SEC30">ERROR RETURNS FROM <b>pcre2_match()</b></a>
<li><a name="TOC31" href="#SEC31">OBTAINING A TEXTUAL ERROR MESSAGE</a>
<li><a name="TOC32" href="#SEC32">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
<li><a name="TOC33" href="#SEC33">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
<li><a name="TOC34" href="#SEC34">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
<li><a name="TOC35" href="#SEC35">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
<li><a name="TOC36" href="#SEC36">DUPLICATE SUBPATTERN NAMES</a>
<li><a name="TOC37" href="#SEC37">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
<li><a name="TOC38" href="#SEC38">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
<li><a name="TOC39" href="#SEC39">SEE ALSO</a>
<li><a name="TOC40" href="#SEC40">AUTHOR</a>
<li><a name="TOC41" href="#SEC41">REVISION</a>
<li><a name="TOC11" href="#SEC11">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a>
<li><a name="TOC12" href="#SEC12">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
<li><a name="TOC13" href="#SEC13">PCRE2 API OVERVIEW</a>
<li><a name="TOC14" href="#SEC14">STRING LENGTHS AND OFFSETS</a>
<li><a name="TOC15" href="#SEC15">NEWLINES</a>
<li><a name="TOC16" href="#SEC16">MULTITHREADING</a>
<li><a name="TOC17" href="#SEC17">PCRE2 CONTEXTS</a>
<li><a name="TOC18" href="#SEC18">CHECKING BUILD-TIME OPTIONS</a>
<li><a name="TOC19" href="#SEC19">COMPILING A PATTERN</a>
<li><a name="TOC20" href="#SEC20">COMPILATION ERROR CODES</a>
<li><a name="TOC21" href="#SEC21">JUST-IN-TIME (JIT) COMPILATION</a>
<li><a name="TOC22" href="#SEC22">LOCALE SUPPORT</a>
<li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A COMPILED PATTERN</a>
<li><a name="TOC24" href="#SEC24">INFORMATION ABOUT A PATTERN'S CALLOUTS</a>
<li><a name="TOC25" href="#SEC25">SERIALIZATION AND PRECOMPILING</a>
<li><a name="TOC26" href="#SEC26">THE MATCH DATA BLOCK</a>
<li><a name="TOC27" href="#SEC27">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
<li><a name="TOC28" href="#SEC28">NEWLINE HANDLING WHEN MATCHING</a>
<li><a name="TOC29" href="#SEC29">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
<li><a name="TOC30" href="#SEC30">OTHER INFORMATION ABOUT A MATCH</a>
<li><a name="TOC31" href="#SEC31">ERROR RETURNS FROM <b>pcre2_match()</b></a>
<li><a name="TOC32" href="#SEC32">OBTAINING A TEXTUAL ERROR MESSAGE</a>
<li><a name="TOC33" href="#SEC33">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
<li><a name="TOC34" href="#SEC34">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
<li><a name="TOC35" href="#SEC35">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
<li><a name="TOC36" href="#SEC36">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
<li><a name="TOC37" href="#SEC37">DUPLICATE SUBPATTERN NAMES</a>
<li><a name="TOC38" href="#SEC38">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
<li><a name="TOC39" href="#SEC39">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
<li><a name="TOC40" href="#SEC40">SEE ALSO</a>
<li><a name="TOC41" href="#SEC41">AUTHOR</a>
<li><a name="TOC42" href="#SEC42">REVISION</a>
</ul>
<P>
<b>#include &#60;pcre2.h&#62;</b>
@ -177,22 +178,16 @@ document for an overview of all the PCRE2 documentation.
<b> void *<i>callout_data</i>);</b>
<br>
<br>
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> uint32_t <i>value</i>);</b>
<br>
<br>
<b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> PCRE2_SIZE <i>value</i>);</b>
<br>
<br>
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> uint32_t <i>value</i>);</b>
<br>
<br>
<b>int pcre2_set_recursion_memory_management(</b>
<b> pcre2_match_context *<i>mcontext</i>,</b>
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> uint32_t <i>value</i>);</b>
</P>
<br><a name="SEC6" href="#TOC1">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a><br>
<P>
@ -314,7 +309,24 @@ document for an overview of all the PCRE2 documentation.
<br>
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
</P>
<br><a name="SEC11" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
<br><a name="SEC11" href="#TOC1">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a><br>
<P>
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> uint32_t <i>value</i>);</b>
<br>
<br>
<b>int pcre2_set_recursion_memory_management(</b>
<b> pcre2_match_context *<i>mcontext</i>,</b>
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
<br>
<br>
These functions became obsolete at release 10.30 and are retained only for
backward compatibility. They should not be used in new code. The first is
replaced by <b>pcre2_set_depth_limit()</b>; the second is no longer needed and
no longer has any effect (it always returns zero).
</P>
<br><a name="SEC12" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
<P>
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
units, respectively. However, there is just one header file, <b>pcre2.h</b>.
@ -368,14 +380,14 @@ When using multiple libraries in an application, you must take care when
processing any particular pattern to use only functions from a single library.
For example, if you want to run a match using a pattern that was compiled with
<b>pcre2_compile_16()</b>, you must do so with <b>pcre2_match_16()</b>, not
<b>pcre2_match_8()</b>.
<b>pcre2_match_8()</b> or <b>pcre2_match_32</b>.
</P>
<P>
In the function summaries above, and in the rest of this document and other
PCRE2 documents, functions and data types are described using their generic
names, without the 8, 16, or 32 suffix.
</P>
<br><a name="SEC12" href="#TOC1">PCRE2 API OVERVIEW</a><br>
<br><a name="SEC13" href="#TOC1">PCRE2 API OVERVIEW</a><br>
<P>
PCRE2 has its own native API, which is described in this document. There are
also some wrapper functions for the 8-bit library that correspond to the
@ -397,7 +409,7 @@ against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
<b>pcre2.h</b>.
</P>
<P>
The functions <b>pcre2_compile()</b>, and <b>pcre2_match()</b> are used for
The functions <b>pcre2_compile()</b> and <b>pcre2_match()</b> are used for
compiling and matching regular expressions in a Perl-compatible manner. A
sample program that demonstrates the simplest way of using them is provided in
the file called <i>pcre2demo.c</i> in the PCRE2 source distribution. A listing
@ -408,10 +420,17 @@ documentation, and the
documentation describes how to compile and run it.
</P>
<P>
Just-in-time compiler support is an optional feature of PCRE2 that can be built
in appropriate hardware environments. It greatly speeds up the matching
The compiling and matching functions recognize various options that are passed
as bits in an options argument. There are also some more complicated parameters
such as custom memory management functions and resource limits that are passed
in "contexts" (which are just memory blocks, described below). Simple
applications do not need to make use of contexts.
</P>
<P>
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
built in appropriate hardware environments. It greatly speeds up the matching
performance of many patterns. Programs can request that it be used if
available, by calling <b>pcre2_jit_compile()</b> after a pattern has been
available by calling <b>pcre2_jit_compile()</b> after a pattern has been
successfully compiled by <b>pcre2_compile()</b>. This does nothing if JIT
support is not available.
</P>
@ -423,8 +442,8 @@ More complicated programs might need to make use of the specialist functions
<P>
JIT matching is automatically used by <b>pcre2_match()</b> if it is available,
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
matching, which gives improved performance. The JIT-specific functions are
discussed in the
matching, which gives improved performance at the expense of less sanity
checking. The JIT-specific functions are discussed in the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation.
</P>
@ -433,7 +452,7 @@ A second matching function, <b>pcre2_dfa_match()</b>, which is not
Perl-compatible, is also provided. This uses a different algorithm for the
matching. The alternative algorithm finds all possible matches (at a given
point in the subject), and scans the subject just once (unless there are
lookbehind assertions). However, this algorithm does not return captured
lookaround assertions). However, this algorithm does not return captured
substrings. A description of the two matching algorithms and their advantages
and disadvantages is given in the
<a href="pcre2matching.html"><b>pcre2matching</b></a>
@ -476,7 +495,7 @@ Functions with names ending with <b>_free()</b> are used for freeing memory
blocks of various sorts. In all cases, if one of these functions is called with
a NULL argument, it does nothing.
</P>
<br><a name="SEC13" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
<br><a name="SEC14" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
<P>
The PCRE2 API uses string lengths and offsets into strings of code units in
several places. These values are always of type PCRE2_SIZE, which is an
@ -486,7 +505,7 @@ as a special indicator for zero-terminated strings and unset offsets.
Therefore, the longest string that can be handled is one less than this
maximum.
<a name="newlines"></a></P>
<br><a name="SEC14" href="#TOC1">NEWLINES</a><br>
<br><a name="SEC15" href="#TOC1">NEWLINES</a><br>
<P>
PCRE2 supports five different conventions for indicating line breaks in
strings: a single CR (carriage return) character, a single LF (linefeed)
@ -521,7 +540,7 @@ The choice of newline convention does not affect the interpretation of
the \n or \r escape sequences, nor does it affect what \R matches; this has
its own separate convention.
</P>
<br><a name="SEC15" href="#TOC1">MULTITHREADING</a><br>
<br><a name="SEC16" href="#TOC1">MULTITHREADING</a><br>
<P>
In a multithreaded application it is important to keep thread-specific data
separate from data that can be shared between threads. The PCRE2 library code
@ -543,8 +562,8 @@ and does not change when the pattern is matched. Therefore, it is thread-safe,
that is, the same compiled pattern can be used by more than one thread
simultaneously. For example, an application can compile all its patterns at the
start, before forking off multiple threads that use them. However, if the
just-in-time optimization feature is being used, it needs separate memory stack
areas for each thread. See the
just-in-time (JIT) optimization feature is being used, it needs separate memory
stack areas for each thread. See the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for more details.
</P>
@ -596,12 +615,12 @@ thread-specific copy.
Match blocks
</b><br>
<P>
The matching functions need a block of memory for working space and for storing
the results of a match. This includes details of what was matched, as well as
additional information such as the name of a (*MARK) setting. Each thread must
provide its own copy of this memory.
The matching functions need a block of memory for storing the results of a
match. This includes details of what was matched, as well as additional
information such as the name of a (*MARK) setting. Each thread must provide its
own copy of this memory.
</P>
<br><a name="SEC16" href="#TOC1">PCRE2 CONTEXTS</a><br>
<br><a name="SEC17" href="#TOC1">PCRE2 CONTEXTS</a><br>
<P>
Some PCRE2 functions have a lot of parameters, many of which are used only by
specialist applications, for example, those that use custom memory management
@ -663,15 +682,15 @@ The memory used for a general context should be freed by calling:
The compile context
</b><br>
<P>
A compile context is required if you want to change the default values of any
of the following compile-time parameters:
A compile context is required if you want to provide an external function for
stack checking during compilation or to change the default values of any of the
following compile-time parameters:
<pre>
What \R matches (Unicode newlines or CR, LF, CRLF only)
PCRE2's character tables
The newline character sequence
The compile time nested parentheses limit
The maximum length of the pattern string
An external function for stack checking
</pre>
A compile context is also required if you are using custom memory management.
If none of these apply, just pass NULL as the context argument of
@ -713,11 +732,11 @@ in the current locale.
<b> PCRE2_SIZE <i>value</i>);</b>
<br>
<br>
This sets a maximum length, in code units, for the pattern string that is to be
compiled. If the pattern is longer, an error is generated. This facility is
provided so that applications that accept patterns from external sources can
limit their size. The default is the largest number that a PCRE2_SIZE variable
can hold, which is effectively unlimited.
This sets a maximum length, in code units, for any pattern string that is
compiled with this context. If the pattern is longer, an error is generated.
This facility is provided so that applications that accept patterns from
external sources can limit their size. The default is the largest number that a
PCRE2_SIZE variable can hold, which is effectively unlimited.
<b>int pcre2_set_newline(pcre2_compile_context *<i>ccontext</i>,</b>
<b> uint32_t <i>value</i>);</b>
<br>
@ -729,8 +748,14 @@ sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
PCRE2_NEWLINE_ANY (any Unicode newline sequence).
</P>
<P>
When a pattern is compiled with the PCRE2_EXTENDED option, the value of this
parameter affects the recognition of white space and the end of internal
A pattern can override the value set in the compile context by starting with a
sequence such as (*CRLF). See the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
page for details.
</P>
<P>
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
convention affects the recognition of white space and the end of internal
comments starting with #. The value is saved with the compiled pattern for
subsequent use by the JIT compiler and by the two interpreted matching
functions, <i>pcre2_match()</i> and <i>pcre2_dfa_match()</i>.
@ -764,15 +789,14 @@ zero if all is well, or non-zero to force an error.
The match context
</b><br>
<P>
A match context is required if you want to change the default values of any
of the following match-time parameters:
A match context is required if you want to:
<pre>
A callout function
The offset limit for matching an unanchored pattern
The limit for calling <b>match()</b> (see below)
The limit for calling <b>match()</b> recursively
Set up a callout function
Set an offset limit for matching an unanchored pattern
Change the backtracking match limit
Change the backtracking depth limit
Set custom memory management specifically for the match
</pre>
A match context is also required if you are using custom memory management.
If none of these apply, just pass NULL as the context argument of
<b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>.
</P>
@ -797,7 +821,7 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
<b> void *<i>callout_data</i>);</b>
<br>
<br>
This sets up a "callout" function, which PCRE2 will call at specified points
This sets up a "callout" function for PCRE2 to call at specified points
during a matching operation. Details are given in the
<a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation.
@ -816,8 +840,8 @@ A match can never be found if the <i>startoffset</i> argument of
limit.
</P>
<P>
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when calling
<b>pcre2_compile()</b> so that when JIT is in use, different code can be
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
calling <b>pcre2_compile()</b> so that when JIT is in use, different code can be
compiled. If a match is started with a non-default match limit when
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
</P>
@ -837,10 +861,10 @@ which have a very large number of possibilities in their search trees. The
classic example is a pattern that uses nested unlimited repeats.
</P>
<P>
Internally, <b>pcre2_match()</b> uses a function called <b>match()</b>, which it
calls repeatedly (sometimes recursively). The limit set by <i>match_limit</i> is
imposed on the number of times this function is called during a match, which
has the effect of limiting the amount of backtracking that can take place. For
There is an internal counter in <b>pcre2_match()</b> that is incremented each
time round its main matching loop. If this value reaches the match limit,
<b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
the effect of limiting the amount of backtracking that can take place. For
patterns that are not anchored, the count restarts from zero for each position
in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>,
which ignores it.
@ -855,8 +879,7 @@ matching can continue.
</P>
<P>
The default value for the limit can be set when PCRE2 is built; the default
default is 10 million, which handles all but the most extreme cases. If the
limit is exceeded, <b>pcre2_match()</b> returns PCRE2_ERROR_MATCHLIMIT. A value
default is 10 million, which handles all but the most extreme cases. A value
for the match limit may also be supplied by an item at the start of a pattern
of the form
<pre>
@ -865,64 +888,38 @@ of the form
where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
limit is set, less than the default.
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> uint32_t <i>value</i>);</b>
<br>
<br>
The <i>recursion_limit</i> parameter is similar to <i>match_limit</i>, but
instead of limiting the total number of times that <b>match()</b> is called, it
limits the depth of recursion. The recursion depth is a smaller number than the
total number of calls, because not all calls to <b>match()</b> are recursive.
This limit is of use only if it is set smaller than <i>match_limit</i>.
This parameter limits the depth of nested backtracking in <b>pcre2_match()</b>.
Each time a nested backtracking point is passed, a new memory "frame" is used
to remember the state of matching at that point. Thus, this parameter
indirectly limits the amount of memory that is used in a match.
</P>
<P>
Limiting the recursion depth limits the amount of system stack that can be
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
stack, the amount of heap memory that can be used. This limit is not relevant,
and is ignored, when matching is done using JIT compiled code. However, it is
supported by <b>pcre2_dfa_match()</b>, which uses recursive function calls less
frequently than <b>pcre2_match()</b>, but which can be caused to use a lot of
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
This limit is not relevant, and is ignored, when matching is done using JIT
compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which uses
it to limit the depth of internal recursive function calls that implement
lookaround assertions and pattern recursions. This is, therefore, an indirect
limit on the amount of system stack that is used. A recursive pattern such as
/(.)(?1)/, when matched to a very long string using <b>pcre2_dfa_match()</b>,
can use a great deal of stack.
</P>
<P>
The default value for <i>recursion_limit</i> can be set when PCRE2 is built; the
default default is the same value as the default for <i>match_limit</i>. If the
limit is exceeded, <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> return
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
supplied by an item at the start of a pattern of the form
The default value for the depth limit can be set when PCRE2 is built; the
default default is the same value as the default for the match limit. If the
limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> returns
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
item at the start of a pattern of the form
<pre>
(*LIMIT_RECURSION=ddd)
(*LIMIT_DEPTH=ddd)
</pre>
where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of <b>pcre2_match()</b> or
<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default.
<b>int pcre2_set_recursion_memory_management(</b>
<b> pcre2_match_context *<i>mcontext</i>,</b>
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
<br>
<br>
This function sets up two additional custom memory management functions for use
by <b>pcre2_match()</b> when PCRE2 is compiled to use the heap for remembering
backtracking data, instead of recursive function calls that use the system
stack. There is a discussion about PCRE2's stack usage in the
<a href="pcre2stack.html"><b>pcre2stack</b></a>
documentation. See the
<a href="pcre2build.html"><b>pcre2build</b></a>
documentation for details of how to build PCRE2.
</P>
<P>
Using the heap for recursion is a non-standard way of building PCRE2, for use
in environments that have limited stacks. Because of the greater use of memory
management, <b>pcre2_match()</b> runs more slowly. Functions that are different
to the general custom memory functions are provided so that special-purpose
external code can be used for this case, because the memory blocks are all the
same size. The blocks are retained by <b>pcre2_match()</b> until it is about to
exit so that they can be re-used when possible during the match. In the absence
of these functions, the normal custom memory management functions are used, if
supplied, otherwise the system functions.
</P>
<br><a name="SEC17" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
<br><a name="SEC18" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
<P>
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
</P>
@ -954,6 +951,13 @@ sequences the \R escape sequence matches by default. A value of
PCRE2_BSR_UNICODE means that \R matches any Unicode line ending sequence; a
value of PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF. The
default can be overridden when a pattern is compiled.
<pre>
PCRE2_CONFIG_DEPTHLIMIT
</pre>
The output is a uint32_t integer that gives the default limit for the depth of
nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions
and lookarounds in <b>pcre2_dfa_match()</b>. Further details are given with
<b>pcre2_set_depth_limit()</b> above.
<pre>
PCRE2_CONFIG_JIT
</pre>
@ -989,9 +993,9 @@ be compiled by those two libraries, but at the expense of slower matching.
<pre>
PCRE2_CONFIG_MATCHLIMIT
</pre>
The output is a uint32_t integer that gives the default limit for the number of
internal matching function calls in a <b>pcre2_match()</b> execution. Further
details are given with <b>pcre2_match()</b> below.
The output is a uint32_t integer that gives the default match limit for
<b>pcre2_match()</b>. Further details are given with
<b>pcre2_set_match_limit()</b> above.
<pre>
PCRE2_CONFIG_NEWLINE
</pre>
@ -1015,20 +1019,11 @@ amount of system stack used when a pattern is compiled. It is specified when
PCRE2 is built; the default is 250. This limit does not take into account the
stack that may already be used by the calling application. For finer control
over compilation stack usage, see <b>pcre2_set_compile_recursion_guard()</b>.
<pre>
PCRE2_CONFIG_RECURSIONLIMIT
</pre>
The output is a uint32_t integer that gives the default limit for the depth of
recursion when calling the internal matching function in a <b>pcre2_match()</b>
execution. Further details are given with <b>pcre2_match()</b> below.
<pre>
PCRE2_CONFIG_STACKRECURSE
</pre>
The output is a uint32_t integer that is set to one if internal recursion when
running <b>pcre2_match()</b> is implemented by recursive function calls that use
the system stack to remember their state. This is the usual way that PCRE2 is
compiled. The output is zero if PCRE2 was compiled to use blocks of data on the
heap instead of recursive function calls.
This parameter is obsolete and should not be used in new code. The output is a
uint32_t integer that is always set to zero.
<pre>
PCRE2_CONFIG_UNICODE_VERSION
</pre>
@ -1047,14 +1042,14 @@ available; otherwise it is set to zero. Unicode support implies UTF support.
<pre>
PCRE2_CONFIG_VERSION
</pre>
The <i>where</i> argument should point to a buffer that is at least 12 code
The <i>where</i> argument should point to a buffer that is at least 24 code
units long. (The exact length required can be found by calling
<b>pcre2_config()</b> with <b>where</b> set to NULL.) The buffer is filled with
the PCRE2 version string, zero-terminated. The number of code units used is
returned. This is the length of the string plus one unit for the terminating
zero.
<a name="compiling"></a></P>
<br><a name="SEC18" href="#TOC1">COMPILING A PATTERN</a><br>
<br><a name="SEC19" href="#TOC1">COMPILING A PATTERN</a><br>
<P>
<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
@ -1240,13 +1235,14 @@ option is set, normal backslash processing is applied to verb names and only an
unescaped closing parenthesis terminates the name. A closing parenthesis can be
included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
option is set, unescaped whitespace in verb names is skipped and #-comments are
recognized, exactly as in the rest of the pattern.
recognized in this mode, exactly as in the rest of the pattern.
<pre>
PCRE2_AUTO_CALLOUT
</pre>
If this bit is set, <b>pcre2_compile()</b> automatically inserts callout items,
all with number 255, before each pattern item, except immediately before or
after a callout in the pattern. For discussion of the callout facility, see the
after an explicit callout in the pattern. For discussion of the callout
facility, see the
<a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation.
<pre>
@ -1472,9 +1468,8 @@ and
<a href="pcre2unicode.html#utf32strings">UTF-32 strings</a>
in the
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
document.
If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a negative
error code.
document. If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a
negative error code.
</P>
<P>
If you know that your pattern is valid, and you want to skip this check for
@ -1495,7 +1490,7 @@ in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
page. If you set PCRE2_UCP, matching one of the items it affects takes much
longer. The option is available only if PCRE2 has been compiled with Unicode
support.
support (which is the default).
<pre>
PCRE2_UNGREEDY
</pre>
@ -1525,9 +1520,9 @@ the behaviour of PCRE2 are given in the
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
page.
</P>
<br><a name="SEC19" href="#TOC1">COMPILATION ERROR CODES</a><br>
<br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
<P>
There are over 80 positive error codes that <b>pcre2_compile()</b> may return
There are nearly 100 positive error codes that <b>pcre2_compile()</b> may return
(via <i>errorcode</i>) if it finds an error in the pattern. There are also some
negative error codes that are used for invalid UTF strings. These are the same
as given by <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, and are described
@ -1538,7 +1533,7 @@ error message"
<a href="#geterrormessage">below)</a>
can be called to obtain a textual error message from any error code.
<a name="jitcompiling"></a></P>
<br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
<br><a name="SEC21" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
<P>
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
<br>
@ -1574,18 +1569,18 @@ documentation.
JIT compilation is a heavyweight optimization. It can take some time for
patterns to be analyzed, and for one-off matches and simple patterns the
benefit of faster execution might be offset by a much slower compilation time.
Most, but not all patterns can be optimized by the JIT compiler.
Most (but not all) patterns can be optimized by the JIT compiler.
<a name="localesupport"></a></P>
<br><a name="SEC21" href="#TOC1">LOCALE SUPPORT</a><br>
<br><a name="SEC22" href="#TOC1">LOCALE SUPPORT</a><br>
<P>
PCRE2 handles caseless matching, and determines whether characters are letters,
digits, or whatever, by reference to a set of tables, indexed by character code
point. This applies only to characters whose code points are less than 256. By
default, higher-valued code points never match escapes such as \w or \d.
However, if PCRE2 is built with UTF support, all characters can be tested with
\p and \P, or, alternatively, the PCRE2_UCP option can be set when a pattern
is compiled; this causes \w and friends to use Unicode property support
instead of the built-in tables.
However, if PCRE2 is built with Unicode support, all characters can be tested
with \p and \P, or, alternatively, the PCRE2_UCP option can be set when a
pattern is compiled; this causes \w and friends to use Unicode property
support instead of the built-in tables.
</P>
<P>
The use of locales with Unicode is discouraged. If you are handling characters
@ -1629,10 +1624,10 @@ available for as long as it is needed.
The pointer that is passed (via the compile context) to <b>pcre2_compile()</b>
is saved with the compiled pattern, and the same tables are used by
<b>pcre2_match()</b> and <b>pcre_dfa_match()</b>. Thus, for any single pattern,
compilation, and matching all happen in the same locale, but different patterns
compilation and matching both happen in the same locale, but different patterns
can be processed in different locales.
<a name="infoaboutpattern"></a></P>
<br><a name="SEC22" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
<br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
<P>
<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
</P>
@ -1645,7 +1640,7 @@ pattern. The second argument specifies which piece of information is required,
and the third argument is a pointer to a variable to receive the data. If the
third argument is NULL, the first argument is ignored, and the function returns
the size in bytes of the variable that is required for the information
requested. Otherwise, The yield of the function is zero for success, or one of
requested. Otherwise, the yield of the function is zero for success, or one of
the following negative numbers:
<pre>
PCRE2_ERROR_NULL the argument <i>code</i> was NULL
@ -1698,8 +1693,8 @@ following are true:
.* is not in an atomic group
.* is not in a capturing group that is the subject of a back reference
PCRE2_DOTALL is in force for .*
Neither (*PRUNE) nor (*SKIP) appears in the pattern.
PCRE2_NO_DOTSTAR_ANCHOR is not set.
Neither (*PRUNE) nor (*SKIP) appears in the pattern
PCRE2_NO_DOTSTAR_ANCHOR is not set
</pre>
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
options returned for PCRE2_INFO_ALLOPTIONS.
@ -1726,6 +1721,13 @@ matches only CR, LF, or CRLF.
Return the highest capturing subpattern number in the pattern. In patterns
where (?| is not used, this is also the total number of capturing subpatterns.
The third argument should point to an <b>uint32_t</b> variable.
<pre>
PCRE2_INFO_DEPTHLIMIT
</pre>
If the pattern set a backtracking depth limit by including an item of the form
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
<pre>
PCRE2_INFO_FIRSTBITMAP
</pre>
@ -1757,6 +1759,14 @@ argument should point to an <b>uint32_t</b> variable. In the 8-bit library, the
value is always less than 256. In the 16-bit library the value can be up to
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
and up to 0xffffffff when not using UTF-32 mode.
<pre>
PCRE2_INFO_FRAMESIZE
</pre>
Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
without the use of JIT. The third argument should point to an <b>size_t</b>
variable. The frame size depends on the number of capturing parentheses in the
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
<pre>
PCRE2_INFO_HASBACKSLASHC
</pre>
@ -1767,7 +1777,8 @@ argument should point to an <b>uint32_t</b> variable.
</pre>
Return 1 if the pattern contains any explicit matches for CR or LF characters,
otherwise 0. The third argument should point to an <b>uint32_t</b> variable. An
explicit match is either a literal CR or LF character, or \r or \n.
explicit match is either a literal CR or LF character, or \r or \n or one of
the equivalent hexadecimal or octal escape sequences.
<pre>
PCRE2_INFO_JCHANGED
</pre>
@ -1904,7 +1915,7 @@ different for each compiled pattern.
<pre>
PCRE2_INFO_NEWLINE
</pre>
The output is a <b>uint32_t</b> with one of the following values:
The output is one of the following <b>uint32_t</b> values:
<pre>
PCRE2_NEWLINE_CR Carriage return (CR)
PCRE2_NEWLINE_LF Linefeed (LF)
@ -1912,15 +1923,8 @@ The output is a <b>uint32_t</b> with one of the following values:
PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
</pre>
This specifies the default character sequence that will be recognized as
meaning "newline" while matching.
<pre>
PCRE2_INFO_RECURSIONLIMIT
</pre>
If the pattern set a recursion limit by including an item of the form
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
argument should point to an unsigned 32-bit integer. If no such value has been
set, the call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
This identifies the character sequence that will be recognized as meaning
"newline" while matching.
<pre>
PCRE2_INFO_SIZE
</pre>
@ -1933,7 +1937,7 @@ value returned by this option, because there are cases where the code that
calculates the size has to over-estimate. Processing a pattern with the JIT
compiler does not alter the value returned by this option.
<a name="infoaboutcallouts"></a></P>
<br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br>
<br><a name="SEC24" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br>
<P>
<b>int pcre2_callout_enumerate(const pcre2_code *<i>code</i>,</b>
<b> int (*<i>callback</i>)(pcre2_callout_enumerate_block *, void *),</b>
@ -1952,7 +1956,7 @@ contents of the callout enumeration block are described in the
<a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation, which also gives further details about callouts.
</P>
<br><a name="SEC24" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
<br><a name="SEC25" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
<P>
It is possible to save compiled patterns on disc or elsewhere, and reload them
later, subject to a number of restrictions. The functions whose names begin
@ -1961,7 +1965,7 @@ the
<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
documentation.
<a name="matchdatablock"></a></P>
<br><a name="SEC25" href="#TOC1">THE MATCH DATA BLOCK</a><br>
<br><a name="SEC26" href="#TOC1">THE MATCH DATA BLOCK</a><br>
<P>
<b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
<b> pcre2_general_context *<i>gcontext</i>);</b>
@ -1986,9 +1990,9 @@ Before calling <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or
the creation functions above. For <b>pcre2_match_data_create()</b>, the first
argument is the number of pairs of offsets in the <i>ovector</i>. One pair of
offsets is required to identify the string that matched the whole pattern, with
another pair for each captured substring. For example, a value of 4 creates
enough space to record the matched portion of the subject plus three captured
substrings. A minimum of at least 1 pair is imposed by
an additional pair for each captured substring. For example, a value of 4
creates enough space to record the matched portion of the subject plus three
captured substrings. A minimum of at least 1 pair is imposed by
<b>pcre2_match_data_create()</b>, so it is always possible to return the overall
matched string.
</P>
@ -2032,7 +2036,7 @@ match data block (for that match) have taken place.
When a match data block itself is no longer needed, it should be freed by
calling <b>pcre2_match_data_free()</b>.
</P>
<br><a name="SEC26" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
<br><a name="SEC27" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
<P>
<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
@ -2126,9 +2130,11 @@ character is CR followed by LF, advance the starting offset by two characters
instead of one.
</P>
<P>
If a non-zero starting offset is passed when the pattern is anchored, one
If a non-zero starting offset is passed when the pattern is anchored, an single
attempt to match at the given offset is made. This can only succeed if the
pattern does not require the match to be at the start of the subject.
pattern does not require the match to be at the start of the subject. In other
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \A.
<a name="matchoptions"></a></P>
<br><b>
Option bits for <b>pcre2_match()</b>
@ -2142,9 +2148,9 @@ described below.
</P>
<P>
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
compiler. If it is set, JIT matching is disabled and the normal interpretive
code in <b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT (obviously), the
remaining options are supported for JIT matching.
compiler. If it is set, JIT matching is disabled and the interpretive code in
<b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT (obviously), the remaining
options are supported for JIT matching.
<pre>
PCRE2_ANCHORED
</pre>
@ -2229,13 +2235,13 @@ page.
If you know that your subject is valid, and you want to skip these checks for
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
<b>pcre2_match()</b>. You might want to do this for the second and subsequent
calls to <b>pcre2_match()</b> if you are making repeated calls to find all the
matches in a single subject string.
calls to <b>pcre2_match()</b> if you are making repeated calls to find other
matches in the same subject string.
</P>
<P>
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid string
as a subject, or an invalid value of <i>startoffset</i>, is undefined. Your
program may crash or loop indefinitely.
WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
string as a subject, or an invalid value of <i>startoffset</i>, is undefined.
Your program may crash or loop indefinitely.
<pre>
PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT
@ -2262,7 +2268,7 @@ examples, in the
<a href="pcre2partial.html"><b>pcre2partial</b></a>
documentation.
</P>
<br><a name="SEC27" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
<br><a name="SEC28" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
<P>
When PCRE2 is built, a default newline convention is set; this is usually the
standard convention for the operating system. The default can be overridden in
@ -2294,15 +2300,15 @@ reference, and so advances only by one character after the first failure.
</P>
<P>
An explicit match for CR of LF is either a literal appearance of one of those
characters in the pattern, or one of the \r or \n escape sequences. Implicit
matches such as [^X] do not count, nor does \s, even though it includes CR and
LF in the characters that it matches.
characters in the pattern, or one of the \r or \n or equivalent octal or
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
does \s, even though it includes CR and LF in the characters that it matches.
</P>
<P>
Notwithstanding the above, anomalous effects may still occur when CRLF is a
valid newline sequence and explicit \r or \n escapes appear in the pattern.
<a name="matchedstrings"></a></P>
<br><a name="SEC28" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
<br><a name="SEC29" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
<P>
<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
<br>
@ -2352,12 +2358,12 @@ identify the part of the subject that was partially matched. See the
documentation for details of partial matching.
</P>
<P>
After a successful match, the first pair of offsets identifies the portion of
the subject string that was matched by the entire pattern. The next pair is
used for the first capturing subpattern, and so on. The value returned by
After a fully successful match, the first pair of offsets identifies the
portion of the subject string that was matched by the entire pattern. The next
pair is used for the first captured substring, and so on. The value returned by
<b>pcre2_match()</b> is one more than the highest numbered pair that has been
set. For example, if two substrings have been captured, the returned value is
3. If there are no capturing subpatterns, the return value from a successful
3. If there are no captured substrings, the return value from a successful
match is 1, indicating that just the first pair of offsets has been set.
</P>
<P>
@ -2375,11 +2381,7 @@ returned.
If the ovector is too small to hold all the captured substring offsets, as much
as possible is filled in, and the function returns a value of zero. If captured
substrings are not of interest, <b>pcre2_match()</b> may be called with a match
data block whose ovector is of minimum length (that is, one pair). However, if
the pattern contains back references and the <i>ovector</i> is not big enough to
remember the related substrings, PCRE2 has to get additional memory for use
during matching. Thus it is usually advisable to set up a match data block
containing an ovector of reasonable size.
data block whose ovector is of minimum length (that is, one pair).
</P>
<P>
It is possible for capturing subpattern number <i>n+1</i> to match some part of
@ -2405,7 +2407,7 @@ parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by
<b>pcre2_match()</b>. The other elements retain whatever values they previously
had.
<a name="matchotherdata"></a></P>
<br><a name="SEC29" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
<br><a name="SEC30" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
<P>
<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
<br>
@ -2455,7 +2457,7 @@ the code unit offset of the invalid UTF character. Details are given in the
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
page.
<a name="errorlist"></a></P>
<br><a name="SEC30" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
<br><a name="SEC31" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
<P>
If <b>pcre2_match()</b> fails, it returns a negative number. This can be
converted to a text string by calling the <b>pcre2_get_error_message()</b>
@ -2487,8 +2489,9 @@ returned when the magic number is not present.
<pre>
PCRE2_ERROR_BADMODE
</pre>
This error is given when a pattern that was compiled by the 8-bit library is
passed to a 16-bit or 32-bit library function, or vice versa.
This error is given when a compiled pattern is passed to a function in a
library of a different code unit width, for example, a pattern compiled by
the 8-bit library is passed to a 16-bit or 32-bit library function.
<pre>
PCRE2_ERROR_BADOFFSET
</pre>
@ -2512,20 +2515,15 @@ use by callout functions that want to cause <b>pcre2_match()</b> or
<b>pcre2_callout_enumerate()</b> to return a distinctive error code. See the
<a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation for details.
<pre>
PCRE2_ERROR_DEPTHLIMIT
</pre>
The nested backtracking depth limit was reached.
<pre>
PCRE2_ERROR_INTERNAL
</pre>
An unexpected internal error has occurred. This error could be caused by a bug
in PCRE2 or by overwriting of the compiled pattern.
<pre>
PCRE2_ERROR_JIT_BADOPTION
</pre>
This error is returned when a pattern that was successfully studied using JIT
is being matched, but the matching mode (partial or complete match) does not
correspond to any JIT compilation mode. When the JIT fast path function is
used, this error may be also given for invalid options. See the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for more details.
<pre>
PCRE2_ERROR_JIT_STACKLIMIT
</pre>
@ -2537,15 +2535,13 @@ documentation for more details.
<pre>
PCRE2_ERROR_MATCHLIMIT
</pre>
The backtracking limit was reached.
The backtracking match limit was reached.
<pre>
PCRE2_ERROR_NOMEMORY
</pre>
If a pattern contains back references, but the ovector is not big enough to
remember the referenced substrings, PCRE2 gets a block of memory at the start
of matching to use for this purpose. There are some other special cases where
extra memory is needed during matching. This error is given when memory cannot
be obtained.
If a pattern contains many nested backtracking points, heap memory is used to
remember them. This error is given when the memory allocation function (default
or custom) fails.
<pre>
PCRE2_ERROR_NULL
</pre>
@ -2561,12 +2557,8 @@ in the subject string. Some simple patterns that might do this are detected and
faulted at compile time, but more complicated cases, in particular mutual
recursions between two different subpatterns, cannot be detected until matching
is attempted.
<pre>
PCRE2_ERROR_RECURSIONLIMIT
</pre>
The internal recursion limit was reached.
<a name="geterrormessage"></a></P>
<br><a name="SEC31" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
<br><a name="SEC32" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
<P>
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
<b> PCRE2_SIZE <i>bufflen</i>);</b>
@ -2587,7 +2579,7 @@ returned. If the buffer is too small, the message is truncated (but still with
a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned.
None of the messages are very long; a buffer size of 120 code units is ample.
<a name="extractbynumber"></a></P>
<br><a name="SEC32" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
<br><a name="SEC33" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
<P>
<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
@ -2684,7 +2676,7 @@ The substring did not participate in the match. For example, if the pattern is
(abc)|(def) and the subject is "def", and the ovector contains at least two
capturing slots, substring number 1 is unset.
</P>
<br><a name="SEC33" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
<br><a name="SEC34" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
<P>
<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
@ -2723,7 +2715,7 @@ can be distinguished from a genuine zero-length substring by inspecting the
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
<a name="extractbyname"></a></P>
<br><a name="SEC34" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
<br><a name="SEC35" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
<P>
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
<b> PCRE2_SPTR <i>name</i>);</b>
@ -2755,8 +2747,8 @@ calling <b>pcre2_substring_number_from_name()</b>. The first argument is the
compiled pattern, and the second is the name. The yield of the function is the
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
that name. Given the number, you can extract the substring directly, or use one
of the functions described above.
that name. Given the number, you can extract the substring directly from the
ovector, or use one of the "bynumber" functions described above.
</P>
<P>
For convenience, there are also "byname" functions that correspond to the
@ -2783,7 +2775,7 @@ names are not included in the compiled code. The matching process uses only
numbers. For this reason, the use of different names for subpatterns of the
same number causes an error at compile time.
</P>
<br><a name="SEC35" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
<br><a name="SEC36" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
<P>
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
@ -2990,7 +2982,7 @@ obtained by calling the <b>pcre2_get_error_message()</b> function (see
"Obtaining a textual error message"
<a href="#geterrormessage">above).</a>
</P>
<br><a name="SEC36" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
<br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
<P>
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
@ -3035,7 +3027,7 @@ in the section entitled <i>Information about a pattern</i>. Given all the
relevant entries for the name, you can extract each of their numbers, and hence
the captured data.
</P>
<br><a name="SEC37" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
<br><a name="SEC38" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
<P>
The traditional matching function uses a similar algorithm to Perl, which stops
when it finds the first match at a given point in the subject. If you want to
@ -3053,7 +3045,7 @@ substring. Then return 1, which forces <b>pcre2_match()</b> to backtrack and try
other alternatives. Ultimately, when it runs out of matches,
<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
<a name="dfamatch"></a></P>
<br><a name="SEC38" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
<br><a name="SEC39" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
<P>
<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
@ -3064,11 +3056,12 @@ other alternatives. Ultimately, when it runs out of matches,
<P>
The function <b>pcre2_dfa_match()</b> is called to match a subject string
against a compiled pattern, using a matching algorithm that scans the subject
string just once, and does not backtrack. This has different characteristics to
the normal algorithm, and is not compatible with Perl. Some of the features of
PCRE2 patterns are not supported. Nevertheless, there are times when this kind
of matching can be useful. For a discussion of the two matching algorithms, and
a list of features that <b>pcre2_dfa_match()</b> does not support, see the
string just once (not counting lookaround assertions), and does not backtrack.
This has different characteristics to the normal algorithm, and is not
compatible with Perl. Some of the features of PCRE2 patterns are not supported.
Nevertheless, there are times when this kind of matching can be useful. For a
discussion of the two matching algorithms, and a list of features that
<b>pcre2_dfa_match()</b> does not support, see the
<a href="pcre2matching.html"><b>pcre2matching</b></a>
documentation.
</P>
@ -3248,13 +3241,13 @@ some plausibility checks are made on the contents of the workspace, which
should contain data about the previous partial match. If any of these checks
fail, this error is given.
</P>
<br><a name="SEC39" href="#TOC1">SEE ALSO</a><br>
<br><a name="SEC40" href="#TOC1">SEE ALSO</a><br>
<P>
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
</P>
<br><a name="SEC40" href="#TOC1">AUTHOR</a><br>
<br><a name="SEC41" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
@ -3263,9 +3256,9 @@ University Computing Service
Cambridge, England.
<br>
</P>
<br><a name="SEC41" href="#TOC1">REVISION</a><br>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 21 March 2017
Last updated: 27 March 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -281,19 +281,14 @@ PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS
int (*callout_function)(pcre2_callout_block *, void *),
void *callout_data);
int pcre2_set_match_limit(pcre2_match_context *mcontext,
uint32_t value);
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
PCRE2_SIZE value);
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
int pcre2_set_match_limit(pcre2_match_context *mcontext,
uint32_t value);
int pcre2_set_recursion_memory_management(
pcre2_match_context *mcontext,
void *(*private_malloc)(PCRE2_SIZE, void *),
void (*private_free)(void *, void *), void *memory_data);
int pcre2_set_depth_limit(pcre2_match_context *mcontext,
uint32_t value);
PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS
@ -397,6 +392,22 @@ PCRE2 NATIVE API AUXILIARY FUNCTIONS
int pcre2_config(uint32_t what, void *where);
PCRE2 NATIVE API OBSOLETE FUNCTIONS
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
uint32_t value);
int pcre2_set_recursion_memory_management(
pcre2_match_context *mcontext,
void *(*private_malloc)(PCRE2_SIZE, void *),
void (*private_free)(void *, void *), void *memory_data);
These functions became obsolete at release 10.30 and are retained only
for backward compatibility. They should not be used in new code. The
first is replaced by pcre2_set_depth_limit(); the second is no longer
needed and no longer has any effect (it always returns zero).
PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit
@ -449,7 +460,7 @@ PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
when processing any particular pattern to use only functions from a
single library. For example, if you want to run a match using a pat-
tern that was compiled with pcre2_compile_16(), you must do so with
pcre2_match_16(), not pcre2_match_8().
pcre2_match_16(), not pcre2_match_8() or pcre2_match_32.
In the function summaries above, and in the rest of this document and
other PCRE2 documents, functions and data types are described using
@ -474,19 +485,26 @@ PCRE2 API OVERVIEW
program against a non-dll PCRE2 library, you must define PCRE2_STATIC
before including pcre2.h.
The functions pcre2_compile(), and pcre2_match() are used for compiling
The functions pcre2_compile() and pcre2_match() are used for compiling
and matching regular expressions in a Perl-compatible manner. A sample
program that demonstrates the simplest way of using them is provided in
the file called pcre2demo.c in the PCRE2 source distribution. A listing
of this program is given in the pcre2demo documentation, and the
pcre2sample documentation describes how to compile and run it.
Just-in-time compiler support is an optional feature of PCRE2 that can
be built in appropriate hardware environments. It greatly speeds up the
matching performance of many patterns. Programs can request that it be
used if available, by calling pcre2_jit_compile() after a pattern has
been successfully compiled by pcre2_compile(). This does nothing if JIT
support is not available.
The compiling and matching functions recognize various options that are
passed as bits in an options argument. There are also some more compli-
cated parameters such as custom memory management functions and
resource limits that are passed in "contexts" (which are just memory
blocks, described below). Simple applications do not need to make use
of contexts.
Just-in-time (JIT) compiler support is an optional feature of PCRE2
that can be built in appropriate hardware environments. It greatly
speeds up the matching performance of many patterns. Programs can
request that it be used if available by calling pcre2_jit_compile()
after a pattern has been successfully compiled by pcre2_compile(). This
does nothing if JIT support is not available.
More complicated programs might need to make use of the specialist
functions pcre2_jit_stack_create(), pcre2_jit_stack_free(), and
@ -495,14 +513,15 @@ PCRE2 API OVERVIEW
JIT matching is automatically used by pcre2_match() if it is available,
unless the PCRE2_NO_JIT option is set. There is also a direct interface
for JIT matching, which gives improved performance. The JIT-specific
functions are discussed in the pcre2jit documentation.
for JIT matching, which gives improved performance at the expense of
less sanity checking. The JIT-specific functions are discussed in the
pcre2jit documentation.
A second matching function, pcre2_dfa_match(), which is not Perl-com-
patible, is also provided. This uses a different algorithm for the
matching. The alternative algorithm finds all possible matches (at a
given point in the subject), and scans the subject just once (unless
there are lookbehind assertions). However, this algorithm does not
there are lookaround assertions). However, this algorithm does not
return captured substrings. A description of the two matching algo-
rithms and their advantages and disadvantages is given in the
pcre2matching documentation. There is no JIT support for
@ -603,9 +622,9 @@ MULTITHREADING
is thread-safe, that is, the same compiled pattern can be used by more
than one thread simultaneously. For example, an application can compile
all its patterns at the start, before forking off multiple threads that
use them. However, if the just-in-time optimization feature is being
used, it needs separate memory stack areas for each thread. See the
pcre2jit documentation for more details.
use them. However, if the just-in-time (JIT) optimization feature is
being used, it needs separate memory stack areas for each thread. See
the pcre2jit documentation for more details.
In a more complicated situation, where patterns are compiled only when
they are first needed, but are still shared between threads, pointers
@ -650,10 +669,10 @@ MULTITHREADING
Match blocks
The matching functions need a block of memory for working space and for
storing the results of a match. This includes details of what was
matched, as well as additional information such as the name of a
(*MARK) setting. Each thread must provide its own copy of this memory.
The matching functions need a block of memory for storing the results
of a match. This includes details of what was matched, as well as addi-
tional information such as the name of a (*MARK) setting. Each thread
must provide its own copy of this memory.
PCRE2 CONTEXTS
@ -718,15 +737,15 @@ PCRE2 CONTEXTS
The compile context
A compile context is required if you want to change the default values
of any of the following compile-time parameters:
A compile context is required if you want to provide an external func-
tion for stack checking during compilation or to change the default
values of any of the following compile-time parameters:
What \R matches (Unicode newlines or CR, LF, CRLF only)
PCRE2's character tables
The newline character sequence
The compile time nested parentheses limit
The maximum length of the pattern string
An external function for stack checking
A compile context is also required if you are using custom memory man-
agement. If none of these apply, just pass NULL as the context argu-
@ -766,12 +785,12 @@ PCRE2 CONTEXTS
int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext,
PCRE2_SIZE value);
This sets a maximum length, in code units, for the pattern string that
is to be compiled. If the pattern is longer, an error is generated.
This facility is provided so that applications that accept patterns
from external sources can limit their size. The default is the largest
number that a PCRE2_SIZE variable can hold, which is effectively unlim-
ited.
This sets a maximum length, in code units, for any pattern string that
is compiled with this context. If the pattern is longer, an error is
generated. This facility is provided so that applications that accept
patterns from external sources can limit their size. The default is the
largest number that a PCRE2_SIZE variable can hold, which is effec-
tively unlimited.
int pcre2_set_newline(pcre2_compile_context *ccontext,
uint32_t value);
@ -782,11 +801,14 @@ PCRE2 CONTEXTS
two-character sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any
of the above), or PCRE2_NEWLINE_ANY (any Unicode newline sequence).
When a pattern is compiled with the PCRE2_EXTENDED option, the value of
this parameter affects the recognition of white space and the end of
internal comments starting with #. The value is saved with the compiled
pattern for subsequent use by the JIT compiler and by the two inter-
preted matching functions, pcre2_match() and pcre2_dfa_match().
A pattern can override the value set in the compile context by starting
with a sequence such as (*CRLF). See the pcre2pattern page for details.
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
convention affects the recognition of white space and the end of inter-
nal comments starting with #. The value is saved with the compiled pat-
tern for subsequent use by the JIT compiler and by the two interpreted
matching functions, pcre2_match() and pcre2_dfa_match().
int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext,
uint32_t value);
@ -815,17 +837,16 @@ PCRE2 CONTEXTS
The match context
A match context is required if you want to change the default values of
any of the following match-time parameters:
A match context is required if you want to:
A callout function
The offset limit for matching an unanchored pattern
The limit for calling match() (see below)
The limit for calling match() recursively
Set up a callout function
Set an offset limit for matching an unanchored pattern
Change the backtracking match limit
Change the backtracking depth limit
Set custom memory management specifically for the match
A match context is also required if you are using custom memory manage-
ment. If none of these apply, just pass NULL as the context argument
of pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match().
If none of these apply, just pass NULL as the context argument of
pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match().
A match context is created, copied, and freed by the following func-
tions:
@ -846,9 +867,9 @@ PCRE2 CONTEXTS
int (*callout_function)(pcre2_callout_block *, void *),
void *callout_data);
This sets up a "callout" function, which PCRE2 will call at specified
points during a matching operation. Details are given in the pcre2call-
out documentation.
This sets up a "callout" function for PCRE2 to call at specified points
during a matching operation. Details are given in the pcre2callout doc-
umentation.
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
PCRE2_SIZE value);
@ -863,10 +884,11 @@ PCRE2 CONTEXTS
argument of pcre2_match() or pcre2_dfa_match() is greater than the off-
set limit.
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when
calling pcre2_compile() so that when JIT is in use, different code can
be compiled. If a match is started with a non-default match limit when
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT
option when calling pcre2_compile() so that when JIT is in use, differ-
ent code can be compiled. If a match is started with a non-default
match limit when PCRE2_USE_OFFSET_LIMIT is not set, an error is gener-
ated.
The offset limit facility can be used to track progress when searching
large subject strings. See also the PCRE2_FIRSTLINE option, which
@ -884,13 +906,13 @@ PCRE2 CONTEXTS
search trees. The classic example is a pattern that uses nested unlim-
ited repeats.
Internally, pcre2_match() uses a function called match(), which it
calls repeatedly (sometimes recursively). The limit set by match_limit
is imposed on the number of times this function is called during a
match, which has the effect of limiting the amount of backtracking that
can take place. For patterns that are not anchored, the count restarts
from zero for each position in the subject string. This limit is not
relevant to pcre2_dfa_match(), which ignores it.
There is an internal counter in pcre2_match() that is incremented each
time round its main matching loop. If this value reaches the match
limit, pcre2_match() returns the negative value PCRE2_ERROR_MATCHLIMIT.
This has the effect of limiting the amount of backtracking that can
take place. For patterns that are not anchored, the count restarts from
zero for each position in the subject string. This limit is not rele-
vant to pcre2_dfa_match(), which ignores it.
When pcre2_match() is called with a pattern that was successfully pro-
cessed by pcre2_jit_compile(), the way in which matching is executed is
@ -901,9 +923,8 @@ PCRE2 CONTEXTS
The default value for the limit can be set when PCRE2 is built; the
default default is 10 million, which handles all but the most extreme
cases. If the limit is exceeded, pcre2_match() returns
PCRE2_ERROR_MATCHLIMIT. A value for the match limit may also be sup-
plied by an item at the start of a pattern of the form
cases. A value for the match limit may also be supplied by an item at
the start of a pattern of the form
(*LIMIT_MATCH=ddd)
@ -911,59 +932,35 @@ PCRE2 CONTEXTS
unless ddd is less than the limit set by the caller of pcre2_match()
or, if no such limit is set, less than the default.
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
int pcre2_set_depth_limit(pcre2_match_context *mcontext,
uint32_t value);
The recursion_limit parameter is similar to match_limit, but instead of
limiting the total number of times that match() is called, it limits
the depth of recursion. The recursion depth is a smaller number than
the total number of calls, because not all calls to match() are recur-
sive. This limit is of use only if it is set smaller than match_limit.
This parameter limits the depth of nested backtracking in
pcre2_match(). Each time a nested backtracking point is passed, a new
memory "frame" is used to remember the state of matching at that point.
Thus, this parameter indirectly limits the amount of memory that is
used in a match.
Limiting the recursion depth limits the amount of system stack that can
be used, or, when PCRE2 has been compiled to use memory on the heap
instead of the stack, the amount of heap memory that can be used. This
limit is not relevant, and is ignored, when matching is done using JIT
compiled code. However, it is supported by pcre2_dfa_match(), which
uses recursive function calls less frequently than pcre2_match(), but
which can be caused to use a lot of stack by a recursive pattern such
as /(.)(?1)/ matched to a very long string.
This limit is not relevant, and is ignored, when matching is done using
JIT compiled code. However, it is supported by pcre2_dfa_match(), which
uses it to limit the depth of internal recursive function calls that
implement lookaround assertions and pattern recursions. This is, there-
fore, an indirect limit on the amount of system stack that is used. A
recursive pattern such as /(.)(?1)/, when matched to a very long string
using pcre2_dfa_match(), can use a great deal of stack.
The default value for recursion_limit can be set when PCRE2 is built;
the default default is the same value as the default for match_limit.
If the limit is exceeded, pcre2_match() and pcre2_dfa_match() return
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
The default value for the depth limit can be set when PCRE2 is built;
the default default is the same value as the default for the match
limit. If the limit is exceeded, pcre2_match() or pcre2_dfa_match()
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
supplied by an item at the start of a pattern of the form
(*LIMIT_RECURSION=ddd)
(*LIMIT_DEPTH=ddd)
where ddd is a decimal number. However, such a setting is ignored
unless ddd is less than the limit set by the caller of pcre2_match() or
pcre2_dfa_match() or, if no such limit is set, less than the default.
int pcre2_set_recursion_memory_management(
pcre2_match_context *mcontext,
void *(*private_malloc)(PCRE2_SIZE, void *),
void (*private_free)(void *, void *), void *memory_data);
This function sets up two additional custom memory management functions
for use by pcre2_match() when PCRE2 is compiled to use the heap for
remembering backtracking data, instead of recursive function calls that
use the system stack. There is a discussion about PCRE2's stack usage
in the pcre2stack documentation. See the pcre2build documentation for
details of how to build PCRE2.
Using the heap for recursion is a non-standard way of building PCRE2,
for use in environments that have limited stacks. Because of the
greater use of memory management, pcre2_match() runs more slowly. Func-
tions that are different to the general custom memory functions are
provided so that special-purpose external code can be used for this
case, because the memory blocks are all the same size. The blocks are
retained by pcre2_match() until it is about to exit so that they can be
re-used when possible during the match. In the absence of these func-
tions, the normal custom memory management functions are used, if sup-
plied, otherwise the system functions.
CHECKING BUILD-TIME OPTIONS
@ -996,6 +993,13 @@ CHECKING BUILD-TIME OPTIONS
sequence; a value of PCRE2_BSR_ANYCRLF means that \R matches only CR,
LF, or CRLF. The default can be overridden when a pattern is compiled.
PCRE2_CONFIG_DEPTHLIMIT
The output is a uint32_t integer that gives the default limit for the
depth of nested backtracking in pcre2_match() or the depth of nested
recursions and lookarounds in pcre2_dfa_match(). Further details are
given with pcre2_set_depth_limit() above.
PCRE2_CONFIG_JIT
The output is a uint32_t integer that is set to one if support for
@ -1030,9 +1034,9 @@ CHECKING BUILD-TIME OPTIONS
PCRE2_CONFIG_MATCHLIMIT
The output is a uint32_t integer that gives the default limit for the
number of internal matching function calls in a pcre2_match() execu-
tion. Further details are given with pcre2_match() below.
The output is a uint32_t integer that gives the default match limit for
pcre2_match(). Further details are given with pcre2_set_match_limit()
above.
PCRE2_CONFIG_NEWLINE
@ -1059,21 +1063,10 @@ CHECKING BUILD-TIME OPTIONS
application. For finer control over compilation stack usage, see
pcre2_set_compile_recursion_guard().
PCRE2_CONFIG_RECURSIONLIMIT
The output is a uint32_t integer that gives the default limit for the
depth of recursion when calling the internal matching function in a
pcre2_match() execution. Further details are given with pcre2_match()
below.
PCRE2_CONFIG_STACKRECURSE
The output is a uint32_t integer that is set to one if internal recur-
sion when running pcre2_match() is implemented by recursive function
calls that use the system stack to remember their state. This is the
usual way that PCRE2 is compiled. The output is zero if PCRE2 was com-
piled to use blocks of data on the heap instead of recursive function
calls.
This parameter is obsolete and should not be used in new code. The out-
put is a uint32_t integer that is always set to zero.
PCRE2_CONFIG_UNICODE_VERSION
@ -1093,7 +1086,7 @@ CHECKING BUILD-TIME OPTIONS
PCRE2_CONFIG_VERSION
The where argument should point to a buffer that is at least 12 code
The where argument should point to a buffer that is at least 24 code
units long. (The exact length required can be found by calling
pcre2_config() with where set to NULL.) The buffer is filled with the
PCRE2 version string, zero-terminated. The number of code units used is
@ -1267,14 +1260,15 @@ COMPILING A PATTERN
parenthesis terminates the name. A closing parenthesis can be included
in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
option is set, unescaped whitespace in verb names is skipped and #-com-
ments are recognized, exactly as in the rest of the pattern.
ments are recognized in this mode, exactly as in the rest of the pat-
tern.
PCRE2_AUTO_CALLOUT
If this bit is set, pcre2_compile() automatically inserts callout
items, all with number 255, before each pattern item, except immedi-
ately before or after a callout in the pattern. For discussion of the
callout facility, see the pcre2callout documentation.
ately before or after an explicit callout in the pattern. For discus-
sion of the callout facility, see the pcre2callout documentation.
PCRE2_CASELESS
@ -1517,7 +1511,7 @@ COMPILING A PATTERN
section on generic character types in the pcre2pattern page. If you set
PCRE2_UCP, matching one of the items it affects takes much longer. The
option is available only if PCRE2 has been compiled with Unicode sup-
port.
port (which is the default).
PCRE2_UNGREEDY
@ -1548,13 +1542,13 @@ COMPILING A PATTERN
COMPILATION ERROR CODES
There are over 80 positive error codes that pcre2_compile() may return
(via errorcode) if it finds an error in the pattern. There are also
some negative error codes that are used for invalid UTF strings. These
are the same as given by pcre2_match() and pcre2_dfa_match(), and are
described in the pcre2unicode page. The pcre2_get_error_message() func-
tion (see "Obtaining a textual error message" below) can be called to
obtain a textual error message from any error code.
There are nearly 100 positive error codes that pcre2_compile() may
return (via errorcode) if it finds an error in the pattern. There are
also some negative error codes that are used for invalid UTF strings.
These are the same as given by pcre2_match() and pcre2_dfa_match(), and
are described in the pcre2unicode page. The pcre2_get_error_message()
function (see "Obtaining a textual error message" below) can be called
to obtain a textual error message from any error code.
JUST-IN-TIME (JIT) COMPILATION
@ -1585,7 +1579,7 @@ JUST-IN-TIME (JIT) COMPILATION
JIT compilation is a heavyweight optimization. It can take some time
for patterns to be analyzed, and for one-off matches and simple pat-
terns the benefit of faster execution might be offset by a much slower
compilation time. Most, but not all patterns can be optimized by the
compilation time. Most (but not all) patterns can be optimized by the
JIT compiler.
@ -1595,8 +1589,8 @@ LOCALE SUPPORT
letters, digits, or whatever, by reference to a set of tables, indexed
by character code point. This applies only to characters whose code
points are less than 256. By default, higher-valued code points never
match escapes such as \w or \d. However, if PCRE2 is built with UTF
support, all characters can be tested with \p and \P, or, alterna-
match escapes such as \w or \d. However, if PCRE2 is built with Uni-
code support, all characters can be tested with \p and \P, or, alterna-
tively, the PCRE2_UCP option can be set when a pattern is compiled;
this causes \w and friends to use Unicode property support instead of
the built-in tables.
@ -1639,7 +1633,7 @@ LOCALE SUPPORT
The pointer that is passed (via the compile context) to pcre2_compile()
is saved with the compiled pattern, and the same tables are used by
pcre2_match() and pcre_dfa_match(). Thus, for any single pattern, com-
pilation, and matching all happen in the same locale, but different
pilation and matching both happen in the same locale, but different
patterns can be processed in different locales.
@ -1654,7 +1648,7 @@ INFORMATION ABOUT A COMPILED PATTERN
is required, and the third argument is a pointer to a variable to
receive the data. If the third argument is NULL, the first argument is
ignored, and the function returns the size in bytes of the variable
that is required for the information requested. Otherwise, The yield of
that is required for the information requested. Otherwise, the yield of
the function is zero for success, or one of the following negative num-
bers:
@ -1710,8 +1704,8 @@ INFORMATION ABOUT A COMPILED PATTERN
.* is not in a capturing group that is the subject
of a back reference
PCRE2_DOTALL is in force for .*
Neither (*PRUNE) nor (*SKIP) appears in the pattern.
PCRE2_NO_DOTSTAR_ANCHOR is not set.
Neither (*PRUNE) nor (*SKIP) appears in the pattern
PCRE2_NO_DOTSTAR_ANCHOR is not set
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in
the options returned for PCRE2_INFO_ALLOPTIONS.
@ -1740,6 +1734,14 @@ INFORMATION ABOUT A COMPILED PATTERN
terns where (?| is not used, this is also the total number of capturing
subpatterns. The third argument should point to an uint32_t variable.
PCRE2_INFO_DEPTHLIMIT
If the pattern set a backtracking depth limit by including an item of
the form (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The
third argument should point to an unsigned 32-bit integer. If no such
value has been set, the call to pcre2_pattern_info() returns the error
PCRE2_ERROR_UNSET.
PCRE2_INFO_FIRSTBITMAP
In the absence of a single first code unit for a non-anchored pattern,
@ -1772,6 +1774,15 @@ INFORMATION ABOUT A COMPILED PATTERN
value can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32
mode.
PCRE2_INFO_FRAMESIZE
Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by pcre2_match()
without the use of JIT. The third argument should point to an size_t
variable. The frame size depends on the number of capturing parentheses
in the pattern. Each additional capturing group adds two PCRE2_SIZE
variables.
PCRE2_INFO_HASBACKSLASHC
Return 1 if the pattern contains any instances of \C, otherwise 0. The
@ -1782,7 +1793,8 @@ INFORMATION ABOUT A COMPILED PATTERN
Return 1 if the pattern contains any explicit matches for CR or LF
characters, otherwise 0. The third argument should point to an uint32_t
variable. An explicit match is either a literal CR or LF character, or
\r or \n.
\r or \n or one of the equivalent hexadecimal or octal escape
sequences.
PCRE2_INFO_JCHANGED
@ -1918,7 +1930,7 @@ INFORMATION ABOUT A COMPILED PATTERN
PCRE2_INFO_NEWLINE
The output is a uint32_t with one of the following values:
The output is one of the following uint32_t values:
PCRE2_NEWLINE_CR Carriage return (CR)
PCRE2_NEWLINE_LF Linefeed (LF)
@ -1926,16 +1938,8 @@ INFORMATION ABOUT A COMPILED PATTERN
PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
This specifies the default character sequence that will be recognized
as meaning "newline" while matching.
PCRE2_INFO_RECURSIONLIMIT
If the pattern set a recursion limit by including an item of the form
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
argument should point to an unsigned 32-bit integer. If no such value
has been set, the call to pcre2_pattern_info() returns the error
PCRE2_ERROR_UNSET.
This identifies the character sequence that will be recognized as mean-
ing "newline" while matching.
PCRE2_INFO_SIZE
@ -1998,8 +2002,8 @@ THE MATCH DATA BLOCK
you must create a match data block by calling one of the creation func-
tions above. For pcre2_match_data_create(), the first argument is the
number of pairs of offsets in the ovector. One pair of offsets is
required to identify the string that matched the whole pattern, with
another pair for each captured substring. For example, a value of 4
required to identify the string that matched the whole pattern, with an
additional pair for each captured substring. For example, a value of 4
creates enough space to record the matched portion of the subject plus
three captured substrings. A minimum of at least 1 pair is imposed by
pcre2_match_data_create(), so it is always possible to return the over-
@ -2124,9 +2128,11 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
ing offset by two characters instead of one.
If a non-zero starting offset is passed when the pattern is anchored,
one attempt to match at the given offset is made. This can only succeed
if the pattern does not require the match to be at the start of the
subject.
an single attempt to match at the given offset is made. This can only
succeed if the pattern does not require the match to be at the start of
the subject. In other words, the anchoring must be the result of set-
ting the PCRE2_ANCHORED option or the use of .* with PCRE2_DOTALL, not
by starting the pattern with ^ or \A.
Option bits for pcre2_match()
@ -2138,9 +2144,8 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
Setting PCRE2_ANCHORED at match time is not supported by the just-in-
time (JIT) compiler. If it is set, JIT matching is disabled and the
normal interpretive code in pcre2_match() is run. Apart from
PCRE2_NO_JIT (obviously), the remaining options are supported for JIT
matching.
interpretive code in pcre2_match() is run. Apart from PCRE2_NO_JIT
(obviously), the remaining options are supported for JIT matching.
PCRE2_ANCHORED
@ -2221,11 +2226,11 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
checks for performance reasons, you can set the PCRE2_NO_UTF_CHECK
option when calling pcre2_match(). You might want to do this for the
second and subsequent calls to pcre2_match() if you are making repeated
calls to find all the matches in a single subject string.
calls to find other matches in the same subject string.
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
string as a subject, or an invalid value of startoffset, is undefined.
Your program may crash or loop indefinitely.
WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an
invalid string as a subject, or an invalid value of startoffset, is
undefined. Your program may crash or loop indefinitely.
PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT
@ -2278,9 +2283,10 @@ NEWLINE HANDLING WHEN MATCHING
acter after the first failure.
An explicit match for CR of LF is either a literal appearance of one of
those characters in the pattern, or one of the \r or \n escape
sequences. Implicit matches such as [^X] do not count, nor does \s,
even though it includes CR and LF in the characters that it matches.
those characters in the pattern, or one of the \r or \n or equivalent
octal or hexadecimal escape sequences. Implicit matches such as [^X] do
not count, nor does \s, even though it includes CR and LF in the char-
acters that it matches.
Notwithstanding the above, anomalous effects may still occur when CRLF
is a valid newline sequence and explicit \r or \n escapes appear in the
@ -2325,14 +2331,14 @@ HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
They identify the part of the subject that was partially matched. See
the pcre2partial documentation for details of partial matching.
After a successful match, the first pair of offsets identifies the por-
tion of the subject string that was matched by the entire pattern. The
next pair is used for the first capturing subpattern, and so on. The
value returned by pcre2_match() is one more than the highest numbered
pair that has been set. For example, if two substrings have been cap-
tured, the returned value is 3. If there are no capturing subpatterns,
the return value from a successful match is 1, indicating that just the
first pair of offsets has been set.
After a fully successful match, the first pair of offsets identifies
the portion of the subject string that was matched by the entire pat-
tern. The next pair is used for the first captured substring, and so
on. The value returned by pcre2_match() is one more than the highest
numbered pair that has been set. For example, if two substrings have
been captured, the returned value is 3. If there are no captured sub-
strings, the return value from a successful match is 1, indicating that
just the first pair of offsets has been set.
If a pattern uses the \K escape sequence within a positive assertion,
the reported start of a successful match can be greater than the end of
@ -2347,11 +2353,7 @@ HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
as much as possible is filled in, and the function returns a value of
zero. If captured substrings are not of interest, pcre2_match() may be
called with a match data block whose ovector is of minimum length (that
is, one pair). However, if the pattern contains back references and the
ovector is not big enough to remember the related substrings, PCRE2 has
to get additional memory for use during matching. Thus it is usually
advisable to set up a match data block containing an ovector of reason-
able size.
is, one pair).
It is possible for capturing subpattern number n+1 to match some part
of the subject when subpattern n has not been used at all. For example,
@ -2450,9 +2452,10 @@ ERROR RETURNS FROM pcre2_match()
PCRE2_ERROR_BADMODE
This error is given when a pattern that was compiled by the 8-bit
library is passed to a 16-bit or 32-bit library function, or vice
versa.
This error is given when a compiled pattern is passed to a function in
a library of a different code unit width, for example, a pattern com-
piled by the 8-bit library is passed to a 16-bit or 32-bit library
function.
PCRE2_ERROR_BADOFFSET
@ -2476,19 +2479,15 @@ ERROR RETURNS FROM pcre2_match()
pcre2_callout_enumerate() to return a distinctive error code. See the
pcre2callout documentation for details.
PCRE2_ERROR_DEPTHLIMIT
The nested backtracking depth limit was reached.
PCRE2_ERROR_INTERNAL
An unexpected internal error has occurred. This error could be caused
by a bug in PCRE2 or by overwriting of the compiled pattern.
PCRE2_ERROR_JIT_BADOPTION
This error is returned when a pattern that was successfully studied
using JIT is being matched, but the matching mode (partial or complete
match) does not correspond to any JIT compilation mode. When the JIT
fast path function is used, this error may be also given for invalid
options. See the pcre2jit documentation for more details.
PCRE2_ERROR_JIT_STACKLIMIT
This error is returned when a pattern that was successfully studied
@ -2498,15 +2497,13 @@ ERROR RETURNS FROM pcre2_match()
PCRE2_ERROR_MATCHLIMIT
The backtracking limit was reached.
The backtracking match limit was reached.
PCRE2_ERROR_NOMEMORY
If a pattern contains back references, but the ovector is not big
enough to remember the referenced substrings, PCRE2 gets a block of
memory at the start of matching to use for this purpose. There are some
other special cases where extra memory is needed during matching. This
error is given when memory cannot be obtained.
If a pattern contains many nested backtracking points, heap memory is
used to remember them. This error is given when the memory allocation
function (default or custom) fails.
PCRE2_ERROR_NULL
@ -2522,10 +2519,6 @@ ERROR RETURNS FROM pcre2_match()
plicated cases, in particular mutual recursions between two different
subpatterns, cannot be detected until matching is attempted.
PCRE2_ERROR_RECURSIONLIMIT
The internal recursion limit was reached.
OBTAINING A TEXTUAL ERROR MESSAGE
@ -2703,8 +2696,8 @@ EXTRACTING CAPTURED SUBSTRINGS BY NAME
the function is the subpattern number, PCRE2_ERROR_NOSUBSTRING if there
is no subpattern of that name, or PCRE2_ERROR_NOUNIQUESUBSTRING if
there is more than one subpattern of that name. Given the number, you
can extract the substring directly, or use one of the functions
described above.
can extract the substring directly from the ovector, or use one of the
"bynumber" functions described above.
For convenience, there are also "byname" functions that correspond to
the "bynumber" functions, the only difference being that the second
@ -2991,13 +2984,13 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
The function pcre2_dfa_match() is called to match a subject string
against a compiled pattern, using a matching algorithm that scans the
subject string just once, and does not backtrack. This has different
characteristics to the normal algorithm, and is not compatible with
Perl. Some of the features of PCRE2 patterns are not supported. Never-
theless, there are times when this kind of matching can be useful. For
a discussion of the two matching algorithms, and a list of features
that pcre2_dfa_match() does not support, see the pcre2matching documen-
tation.
subject string just once (not counting lookaround assertions), and does
not backtrack. This has different characteristics to the normal algo-
rithm, and is not compatible with Perl. Some of the features of PCRE2
patterns are not supported. Nevertheless, there are times when this
kind of matching can be useful. For a discussion of the two matching
algorithms, and a list of features that pcre2_dfa_match() does not sup-
port, see the pcre2matching documentation.
The arguments for the pcre2_dfa_match() function are the same as for
pcre2_match(), plus two extras. The ovector within the match data block
@ -3181,7 +3174,7 @@ AUTHOR
REVISION
Last updated: 21 March 2017
Last updated: 27 March 2017
Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------

View File

@ -34,7 +34,7 @@ A match context is needed only if you want to:
Set a matching offset limit
Change the backtracking match limit
Change the backtracking depth limit
Set custom memory management in the match context
Set custom memory management specifically for the match
.sp
The \fIlength\fP and \fIstartoffset\fP values are code
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "21 March 2017" "PCRE2 10.30"
.TH PCRE2API 3 "27 March 2017" "PCRE2 10.30"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@ -120,19 +120,14 @@ document for an overview of all the PCRE2 documentation.
.B " int (*\fIcallout_function\fP)(pcre2_callout_block *, void *),"
.B " void *\fIcallout_data\fP);"
.sp
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
.B " uint32_t \fIvalue\fP);"
.sp
.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
.B " PCRE2_SIZE \fIvalue\fP);"
.sp
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
.B " uint32_t \fIvalue\fP);"
.sp
.B int pcre2_set_recursion_memory_management(
.B " pcre2_match_context *\fImcontext\fP,"
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
.B " uint32_t \fIvalue\fP);"
.fi
.
.
@ -252,6 +247,25 @@ document for an overview of all the PCRE2 documentation.
.fi
.
.
.SH "PCRE2 NATIVE API OBSOLETE FUNCTIONS"
.rs
.sp
.nf
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
.B " uint32_t \fIvalue\fP);"
.sp
.B int pcre2_set_recursion_memory_management(
.B " pcre2_match_context *\fImcontext\fP,"
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
.fi
.sp
These functions became obsolete at release 10.30 and are retained only for
backward compatibility. They should not be used in new code. The first is
replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and
no longer has any effect (it always returns zero).
.
.
.SH "PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES"
.rs
.sp
@ -302,7 +316,7 @@ When using multiple libraries in an application, you must take care when
processing any particular pattern to use only functions from a single library.
For example, if you want to run a match using a pattern that was compiled with
\fBpcre2_compile_16()\fP, you must do so with \fBpcre2_match_16()\fP, not
\fBpcre2_match_8()\fP.
\fBpcre2_match_8()\fP or \fBpcre2_match_32\fP.
.P
In the function summaries above, and in the rest of this document and other
PCRE2 documents, functions and data types are described using their generic
@ -331,7 +345,7 @@ In a Windows environment, if you want to statically link an application program
against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
\fBpcre2.h\fP.
.P
The functions \fBpcre2_compile()\fP, and \fBpcre2_match()\fP are used for
The functions \fBpcre2_compile()\fP and \fBpcre2_match()\fP are used for
compiling and matching regular expressions in a Perl-compatible manner. A
sample program that demonstrates the simplest way of using them is provided in
the file called \fIpcre2demo.c\fP in the PCRE2 source distribution. A listing
@ -345,10 +359,16 @@ documentation, and the
.\"
documentation describes how to compile and run it.
.P
Just-in-time compiler support is an optional feature of PCRE2 that can be built
in appropriate hardware environments. It greatly speeds up the matching
The compiling and matching functions recognize various options that are passed
as bits in an options argument. There are also some more complicated parameters
such as custom memory management functions and resource limits that are passed
in "contexts" (which are just memory blocks, described below). Simple
applications do not need to make use of contexts.
.P
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
built in appropriate hardware environments. It greatly speeds up the matching
performance of many patterns. Programs can request that it be used if
available, by calling \fBpcre2_jit_compile()\fP after a pattern has been
available by calling \fBpcre2_jit_compile()\fP after a pattern has been
successfully compiled by \fBpcre2_compile()\fP. This does nothing if JIT
support is not available.
.P
@ -358,8 +378,8 @@ More complicated programs might need to make use of the specialist functions
.P
JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
matching, which gives improved performance. The JIT-specific functions are
discussed in the
matching, which gives improved performance at the expense of less sanity
checking. The JIT-specific functions are discussed in the
.\" HREF
\fBpcre2jit\fP
.\"
@ -369,7 +389,7 @@ A second matching function, \fBpcre2_dfa_match()\fP, which is not
Perl-compatible, is also provided. This uses a different algorithm for the
matching. The alternative algorithm finds all possible matches (at a given
point in the subject), and scans the subject just once (unless there are
lookbehind assertions). However, this algorithm does not return captured
lookaround assertions). However, this algorithm does not return captured
substrings. A description of the two matching algorithms and their advantages
and disadvantages is given in the
.\" HREF
@ -484,8 +504,8 @@ and does not change when the pattern is matched. Therefore, it is thread-safe,
that is, the same compiled pattern can be used by more than one thread
simultaneously. For example, an application can compile all its patterns at the
start, before forking off multiple threads that use them. However, if the
just-in-time optimization feature is being used, it needs separate memory stack
areas for each thread. See the
just-in-time (JIT) optimization feature is being used, it needs separate memory
stack areas for each thread. See the
.\" HREF
\fBpcre2jit\fP
.\"
@ -536,10 +556,10 @@ thread-specific copy.
.SS "Match blocks"
.rs
.sp
The matching functions need a block of memory for working space and for storing
the results of a match. This includes details of what was matched, as well as
additional information such as the name of a (*MARK) setting. Each thread must
provide its own copy of this memory.
The matching functions need a block of memory for storing the results of a
match. This includes details of what was matched, as well as additional
information such as the name of a (*MARK) setting. Each thread must provide its
own copy of this memory.
.
.
.SH "PCRE2 CONTEXTS"
@ -611,15 +631,15 @@ The memory used for a general context should be freed by calling:
.SS "The compile context"
.rs
.sp
A compile context is required if you want to change the default values of any
of the following compile-time parameters:
A compile context is required if you want to provide an external function for
stack checking during compilation or to change the default values of any of the
following compile-time parameters:
.sp
What \eR matches (Unicode newlines or CR, LF, CRLF only)
PCRE2's character tables
The newline character sequence
The compile time nested parentheses limit
The maximum length of the pattern string
An external function for stack checking
.sp
A compile context is also required if you are using custom memory management.
If none of these apply, just pass NULL as the context argument of
@ -666,11 +686,11 @@ in the current locale.
.B " PCRE2_SIZE \fIvalue\fP);"
.fi
.sp
This sets a maximum length, in code units, for the pattern string that is to be
compiled. If the pattern is longer, an error is generated. This facility is
provided so that applications that accept patterns from external sources can
limit their size. The default is the largest number that a PCRE2_SIZE variable
can hold, which is effectively unlimited.
This sets a maximum length, in code units, for any pattern string that is
compiled with this context. If the pattern is longer, an error is generated.
This facility is provided so that applications that accept patterns from
external sources can limit their size. The default is the largest number that a
PCRE2_SIZE variable can hold, which is effectively unlimited.
.sp
.nf
.B int pcre2_set_newline(pcre2_compile_context *\fIccontext\fP,
@ -683,8 +703,15 @@ PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
PCRE2_NEWLINE_ANY (any Unicode newline sequence).
.P
When a pattern is compiled with the PCRE2_EXTENDED option, the value of this
parameter affects the recognition of white space and the end of internal
A pattern can override the value set in the compile context by starting with a
sequence such as (*CRLF). See the
.\" HREF
\fBpcre2pattern\fP
.\"
page for details.
.P
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
convention affects the recognition of white space and the end of internal
comments starting with #. The value is saved with the compiled pattern for
subsequent use by the JIT compiler and by the two interpreted matching
functions, \fIpcre2_match()\fP and \fIpcre2_dfa_match()\fP.
@ -722,15 +749,14 @@ zero if all is well, or non-zero to force an error.
.SS "The match context"
.rs
.sp
A match context is required if you want to change the default values of any
of the following match-time parameters:
A match context is required if you want to:
.sp
A callout function
The offset limit for matching an unanchored pattern
The limit for calling \fBmatch()\fP (see below)
The limit for calling \fBmatch()\fP recursively
Set up a callout function
Set an offset limit for matching an unanchored pattern
Change the backtracking match limit
Change the backtracking depth limit
Set custom memory management specifically for the match
.sp
A match context is also required if you are using custom memory management.
If none of these apply, just pass NULL as the context argument of
\fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or \fBpcre2_jit_match()\fP.
.P
@ -756,7 +782,7 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
.B " void *\fIcallout_data\fP);"
.fi
.sp
This sets up a "callout" function, which PCRE2 will call at specified points
This sets up a "callout" function for PCRE2 to call at specified points
during a matching operation. Details are given in the
.\" HREF
\fBpcre2callout\fP
@ -778,8 +804,8 @@ A match can never be found if the \fIstartoffset\fP argument of
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP is greater than the offset
limit.
.P
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when calling
\fBpcre2_compile()\fP so that when JIT is in use, different code can be
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
calling \fBpcre2_compile()\fP so that when JIT is in use, different code can be
compiled. If a match is started with a non-default match limit when
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
.P
@ -799,10 +825,10 @@ up too many resources when processing patterns that are not going to match, but
which have a very large number of possibilities in their search trees. The
classic example is a pattern that uses nested unlimited repeats.
.P
Internally, \fBpcre2_match()\fP uses a function called \fBmatch()\fP, which it
calls repeatedly (sometimes recursively). The limit set by \fImatch_limit\fP is
imposed on the number of times this function is called during a match, which
has the effect of limiting the amount of backtracking that can take place. For
There is an internal counter in \fBpcre2_match()\fP that is incremented each
time round its main matching loop. If this value reaches the match limit,
\fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
the effect of limiting the amount of backtracking that can take place. For
patterns that are not anchored, the count restarts from zero for each position
in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP,
which ignores it.
@ -815,8 +841,7 @@ is also used in this case (but in a different way) to limit how long the
matching can continue.
.P
The default value for the limit can be set when PCRE2 is built; the default
default is 10 million, which handles all but the most extreme cases. If the
limit is exceeded, \fBpcre2_match()\fP returns PCRE2_ERROR_MATCHLIMIT. A value
default is 10 million, which handles all but the most extreme cases. A value
for the match limit may also be supplied by an item at the start of a pattern
of the form
.sp
@ -827,65 +852,34 @@ less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
limit is set, less than the default.
.sp
.nf
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
.B " uint32_t \fIvalue\fP);"
.fi
.sp
The \fIrecursion_limit\fP parameter is similar to \fImatch_limit\fP, but
instead of limiting the total number of times that \fBmatch()\fP is called, it
limits the depth of recursion. The recursion depth is a smaller number than the
total number of calls, because not all calls to \fBmatch()\fP are recursive.
This limit is of use only if it is set smaller than \fImatch_limit\fP.
This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP.
Each time a nested backtracking point is passed, a new memory "frame" is used
to remember the state of matching at that point. Thus, this parameter
indirectly limits the amount of memory that is used in a match.
.P
Limiting the recursion depth limits the amount of system stack that can be
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
stack, the amount of heap memory that can be used. This limit is not relevant,
and is ignored, when matching is done using JIT compiled code. However, it is
supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less
frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
This limit is not relevant, and is ignored, when matching is done using JIT
compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which uses
it to limit the depth of internal recursive function calls that implement
lookaround assertions and pattern recursions. This is, therefore, an indirect
limit on the amount of system stack that is used. A recursive pattern such as
/(.)(?1)/, when matched to a very long string using \fBpcre2_dfa_match()\fP,
can use a great deal of stack.
.P
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the
default default is the same value as the default for \fImatch_limit\fP. If the
limit is exceeded, \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP return
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
supplied by an item at the start of a pattern of the form
The default value for the depth limit can be set when PCRE2 is built; the
default default is the same value as the default for the match limit. If the
limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP returns
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
item at the start of a pattern of the form
.sp
(*LIMIT_RECURSION=ddd)
(*LIMIT_DEPTH=ddd)
.sp
where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of \fBpcre2_match()\fP or
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
.sp
.nf
.B int pcre2_set_recursion_memory_management(
.B " pcre2_match_context *\fImcontext\fP,"
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
.fi
.sp
This function sets up two additional custom memory management functions for use
by \fBpcre2_match()\fP when PCRE2 is compiled to use the heap for remembering
backtracking data, instead of recursive function calls that use the system
stack. There is a discussion about PCRE2's stack usage in the
.\" HREF
\fBpcre2stack\fP
.\"
documentation. See the
.\" HREF
\fBpcre2build\fP
.\"
documentation for details of how to build PCRE2.
.P
Using the heap for recursion is a non-standard way of building PCRE2, for use
in environments that have limited stacks. Because of the greater use of memory
management, \fBpcre2_match()\fP runs more slowly. Functions that are different
to the general custom memory functions are provided so that special-purpose
external code can be used for this case, because the memory blocks are all the
same size. The blocks are retained by \fBpcre2_match()\fP until it is about to
exit so that they can be re-used when possible during the match. In the absence
of these functions, the normal custom memory management functions are used, if
supplied, otherwise the system functions.
.
.
.SH "CHECKING BUILD-TIME OPTIONS"
@ -920,6 +914,13 @@ sequences the \eR escape sequence matches by default. A value of
PCRE2_BSR_UNICODE means that \eR matches any Unicode line ending sequence; a
value of PCRE2_BSR_ANYCRLF means that \eR matches only CR, LF, or CRLF. The
default can be overridden when a pattern is compiled.
.sp
PCRE2_CONFIG_DEPTHLIMIT
.sp
The output is a uint32_t integer that gives the default limit for the depth of
nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions
and lookarounds in \fBpcre2_dfa_match()\fP. Further details are given with
\fBpcre2_set_depth_limit()\fP above.
.sp
PCRE2_CONFIG_JIT
.sp
@ -954,9 +955,9 @@ be compiled by those two libraries, but at the expense of slower matching.
.sp
PCRE2_CONFIG_MATCHLIMIT
.sp
The output is a uint32_t integer that gives the default limit for the number of
internal matching function calls in a \fBpcre2_match()\fP execution. Further
details are given with \fBpcre2_match()\fP below.
The output is a uint32_t integer that gives the default match limit for
\fBpcre2_match()\fP. Further details are given with
\fBpcre2_set_match_limit()\fP above.
.sp
PCRE2_CONFIG_NEWLINE
.sp
@ -980,20 +981,11 @@ amount of system stack used when a pattern is compiled. It is specified when
PCRE2 is built; the default is 250. This limit does not take into account the
stack that may already be used by the calling application. For finer control
over compilation stack usage, see \fBpcre2_set_compile_recursion_guard()\fP.
.sp
PCRE2_CONFIG_RECURSIONLIMIT
.sp
The output is a uint32_t integer that gives the default limit for the depth of
recursion when calling the internal matching function in a \fBpcre2_match()\fP
execution. Further details are given with \fBpcre2_match()\fP below.
.sp
PCRE2_CONFIG_STACKRECURSE
.sp
The output is a uint32_t integer that is set to one if internal recursion when
running \fBpcre2_match()\fP is implemented by recursive function calls that use
the system stack to remember their state. This is the usual way that PCRE2 is
compiled. The output is zero if PCRE2 was compiled to use blocks of data on the
heap instead of recursive function calls.
This parameter is obsolete and should not be used in new code. The output is a
uint32_t integer that is always set to zero.
.sp
PCRE2_CONFIG_UNICODE_VERSION
.sp
@ -1012,7 +1004,7 @@ available; otherwise it is set to zero. Unicode support implies UTF support.
.sp
PCRE2_CONFIG_VERSION
.sp
The \fIwhere\fP argument should point to a buffer that is at least 12 code
The \fIwhere\fP argument should point to a buffer that is at least 24 code
units long. (The exact length required can be found by calling
\fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with
the PCRE2 version string, zero-terminated. The number of code units used is
@ -1208,13 +1200,14 @@ option is set, normal backslash processing is applied to verb names and only an
unescaped closing parenthesis terminates the name. A closing parenthesis can be
included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED
option is set, unescaped whitespace in verb names is skipped and #-comments are
recognized, exactly as in the rest of the pattern.
recognized in this mode, exactly as in the rest of the pattern.
.sp
PCRE2_AUTO_CALLOUT
.sp
If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
all with number 255, before each pattern item, except immediately before or
after a callout in the pattern. For discussion of the callout facility, see the
after an explicit callout in the pattern. For discussion of the callout
facility, see the
.\" HREF
\fBpcre2callout\fP
.\"
@ -1452,9 +1445,8 @@ in the
.\" HREF
\fBpcre2unicode\fP
.\"
document.
If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a negative
error code.
document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a
negative error code.
.P
If you know that your pattern is valid, and you want to skip this check for
performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When it is set,
@ -1479,7 +1471,7 @@ in the
.\"
page. If you set PCRE2_UCP, matching one of the items it affects takes much
longer. The option is available only if PCRE2 has been compiled with Unicode
support.
support (which is the default).
.sp
PCRE2_UNGREEDY
.sp
@ -1518,7 +1510,7 @@ page.
.SH "COMPILATION ERROR CODES"
.rs
.sp
There are over 80 positive error codes that \fBpcre2_compile()\fP may return
There are nearly 100 positive error codes that \fBpcre2_compile()\fP may return
(via \fIerrorcode\fP) if it finds an error in the pattern. There are also some
negative error codes that are used for invalid UTF strings. These are the same
as given by \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, and are described
@ -1570,7 +1562,7 @@ documentation.
JIT compilation is a heavyweight optimization. It can take some time for
patterns to be analyzed, and for one-off matches and simple patterns the
benefit of faster execution might be offset by a much slower compilation time.
Most, but not all patterns can be optimized by the JIT compiler.
Most (but not all) patterns can be optimized by the JIT compiler.
.
.
.\" HTML <a name="localesupport"></a>
@ -1581,10 +1573,10 @@ PCRE2 handles caseless matching, and determines whether characters are letters,
digits, or whatever, by reference to a set of tables, indexed by character code
point. This applies only to characters whose code points are less than 256. By
default, higher-valued code points never match escapes such as \ew or \ed.
However, if PCRE2 is built with UTF support, all characters can be tested with
\ep and \eP, or, alternatively, the PCRE2_UCP option can be set when a pattern
is compiled; this causes \ew and friends to use Unicode property support
instead of the built-in tables.
However, if PCRE2 is built with Unicode support, all characters can be tested
with \ep and \eP, or, alternatively, the PCRE2_UCP option can be set when a
pattern is compiled; this causes \ew and friends to use Unicode property
support instead of the built-in tables.
.P
The use of locales with Unicode is discouraged. If you are handling characters
with code points greater than 128, you should either use Unicode support, or
@ -1623,7 +1615,7 @@ available for as long as it is needed.
The pointer that is passed (via the compile context) to \fBpcre2_compile()\fP
is saved with the compiled pattern, and the same tables are used by
\fBpcre2_match()\fP and \fBpcre_dfa_match()\fP. Thus, for any single pattern,
compilation, and matching all happen in the same locale, but different patterns
compilation and matching both happen in the same locale, but different patterns
can be processed in different locales.
.
.
@ -1646,7 +1638,7 @@ pattern. The second argument specifies which piece of information is required,
and the third argument is a pointer to a variable to receive the data. If the
third argument is NULL, the first argument is ignored, and the function returns
the size in bytes of the variable that is required for the information
requested. Otherwise, The yield of the function is zero for success, or one of
requested. Otherwise, the yield of the function is zero for success, or one of
the following negative numbers:
.sp
PCRE2_ERROR_NULL the argument \fIcode\fP was NULL
@ -1699,8 +1691,8 @@ following are true:
.* is not in a capturing group that is the subject
of a back reference
PCRE2_DOTALL is in force for .*
Neither (*PRUNE) nor (*SKIP) appears in the pattern.
PCRE2_NO_DOTSTAR_ANCHOR is not set.
Neither (*PRUNE) nor (*SKIP) appears in the pattern
PCRE2_NO_DOTSTAR_ANCHOR is not set
.sp
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
options returned for PCRE2_INFO_ALLOPTIONS.
@ -1727,6 +1719,13 @@ matches only CR, LF, or CRLF.
Return the highest capturing subpattern number in the pattern. In patterns
where (?| is not used, this is also the total number of capturing subpatterns.
The third argument should point to an \fBuint32_t\fP variable.
.sp
PCRE2_INFO_DEPTHLIMIT
.sp
If the pattern set a backtracking depth limit by including an item of the form
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
.sp
PCRE2_INFO_FIRSTBITMAP
.sp
@ -1758,6 +1757,14 @@ argument should point to an \fBuint32_t\fP variable. In the 8-bit library, the
value is always less than 256. In the 16-bit library the value can be up to
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
and up to 0xffffffff when not using UTF-32 mode.
.sp
PCRE2_INFO_FRAMESIZE
.sp
Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
without the use of JIT. The third argument should point to an \fBsize_t\fP
variable. The frame size depends on the number of capturing parentheses in the
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
.sp
PCRE2_INFO_HASBACKSLASHC
.sp
@ -1768,7 +1775,8 @@ argument should point to an \fBuint32_t\fP variable.
.sp
Return 1 if the pattern contains any explicit matches for CR or LF characters,
otherwise 0. The third argument should point to an \fBuint32_t\fP variable. An
explicit match is either a literal CR or LF character, or \er or \en.
explicit match is either a literal CR or LF character, or \er or \en or one of
the equivalent hexadecimal or octal escape sequences.
.sp
PCRE2_INFO_JCHANGED
.sp
@ -1907,7 +1915,7 @@ different for each compiled pattern.
.sp
PCRE2_INFO_NEWLINE
.sp
The output is a \fBuint32_t\fP with one of the following values:
The output is one of the following \fBuint32_t\fP values:
.sp
PCRE2_NEWLINE_CR Carriage return (CR)
PCRE2_NEWLINE_LF Linefeed (LF)
@ -1915,15 +1923,8 @@ The output is a \fBuint32_t\fP with one of the following values:
PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
.sp
This specifies the default character sequence that will be recognized as
meaning "newline" while matching.
.sp
PCRE2_INFO_RECURSIONLIMIT
.sp
If the pattern set a recursion limit by including an item of the form
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
argument should point to an unsigned 32-bit integer. If no such value has been
set, the call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
This identifies the character sequence that will be recognized as meaning
"newline" while matching.
.sp
PCRE2_INFO_SIZE
.sp
@ -2000,9 +2001,9 @@ Before calling \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or
the creation functions above. For \fBpcre2_match_data_create()\fP, the first
argument is the number of pairs of offsets in the \fIovector\fP. One pair of
offsets is required to identify the string that matched the whole pattern, with
another pair for each captured substring. For example, a value of 4 creates
enough space to record the matched portion of the subject plus three captured
substrings. A minimum of at least 1 pair is imposed by
an additional pair for each captured substring. For example, a value of 4
creates enough space to record the matched portion of the subject plus three
captured substrings. A minimum of at least 1 pair is imposed by
\fBpcre2_match_data_create()\fP, so it is always possible to return the overall
matched string.
.P
@ -2145,9 +2146,11 @@ newline convention recognizes CRLF as a newline, and if so, and the current
character is CR followed by LF, advance the starting offset by two characters
instead of one.
.P
If a non-zero starting offset is passed when the pattern is anchored, one
If a non-zero starting offset is passed when the pattern is anchored, an single
attempt to match at the given offset is made. This can only succeed if the
pattern does not require the match to be at the start of the subject.
pattern does not require the match to be at the start of the subject. In other
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
.
.
.\" HTML <a name="matchoptions"></a>
@ -2161,9 +2164,9 @@ PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is
described below.
.P
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
compiler. If it is set, JIT matching is disabled and the normal interpretive
code in \fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT (obviously), the
remaining options are supported for JIT matching.
compiler. If it is set, JIT matching is disabled and the interpretive code in
\fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT (obviously), the remaining
options are supported for JIT matching.
.sp
PCRE2_ANCHORED
.sp
@ -2257,12 +2260,12 @@ page.
If you know that your subject is valid, and you want to skip these checks for
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
\fBpcre2_match()\fP. You might want to do this for the second and subsequent
calls to \fBpcre2_match()\fP if you are making repeated calls to find all the
matches in a single subject string.
calls to \fBpcre2_match()\fP if you are making repeated calls to find other
matches in the same subject string.
.P
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid string
as a subject, or an invalid value of \fIstartoffset\fP, is undefined. Your
program may crash or loop indefinitely.
WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
string as a subject, or an invalid value of \fIstartoffset\fP, is undefined.
Your program may crash or loop indefinitely.
.sp
PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT
@ -2329,9 +2332,9 @@ start, it skips both the CR and the LF before retrying. However, the pattern
reference, and so advances only by one character after the first failure.
.P
An explicit match for CR of LF is either a literal appearance of one of those
characters in the pattern, or one of the \er or \en escape sequences. Implicit
matches such as [^X] do not count, nor does \es, even though it includes CR and
LF in the characters that it matches.
characters in the pattern, or one of the \er or \en or equivalent octal or
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
does \es, even though it includes CR and LF in the characters that it matches.
.P
Notwithstanding the above, anomalous effects may still occur when CRLF is a
valid newline sequence and explicit \er or \en escapes appear in the pattern.
@ -2395,12 +2398,12 @@ identify the part of the subject that was partially matched. See the
.\"
documentation for details of partial matching.
.P
After a successful match, the first pair of offsets identifies the portion of
the subject string that was matched by the entire pattern. The next pair is
used for the first capturing subpattern, and so on. The value returned by
After a fully successful match, the first pair of offsets identifies the
portion of the subject string that was matched by the entire pattern. The next
pair is used for the first captured substring, and so on. The value returned by
\fBpcre2_match()\fP is one more than the highest numbered pair that has been
set. For example, if two substrings have been captured, the returned value is
3. If there are no capturing subpatterns, the return value from a successful
3. If there are no captured substrings, the return value from a successful
match is 1, indicating that just the first pair of offsets has been set.
.P
If a pattern uses the \eK escape sequence within a positive assertion, the
@ -2415,11 +2418,7 @@ returned.
If the ovector is too small to hold all the captured substring offsets, as much
as possible is filled in, and the function returns a value of zero. If captured
substrings are not of interest, \fBpcre2_match()\fP may be called with a match
data block whose ovector is of minimum length (that is, one pair). However, if
the pattern contains back references and the \fIovector\fP is not big enough to
remember the related substrings, PCRE2 has to get additional memory for use
during matching. Thus it is usually advisable to set up a match data block
containing an ovector of reasonable size.
data block whose ovector is of minimum length (that is, one pair).
.P
It is possible for capturing subpattern number \fIn+1\fP to match some part of
the subject when subpattern \fIn\fP has not been used at all. For example, if
@ -2535,8 +2534,9 @@ returned when the magic number is not present.
.sp
PCRE2_ERROR_BADMODE
.sp
This error is given when a pattern that was compiled by the 8-bit library is
passed to a 16-bit or 32-bit library function, or vice versa.
This error is given when a compiled pattern is passed to a function in a
library of a different code unit width, for example, a pattern compiled by
the 8-bit library is passed to a 16-bit or 32-bit library function.
.sp
PCRE2_ERROR_BADOFFSET
.sp
@ -2562,22 +2562,15 @@ use by callout functions that want to cause \fBpcre2_match()\fP or
\fBpcre2callout\fP
.\"
documentation for details.
.sp
PCRE2_ERROR_DEPTHLIMIT
.sp
The nested backtracking depth limit was reached.
.sp
PCRE2_ERROR_INTERNAL
.sp
An unexpected internal error has occurred. This error could be caused by a bug
in PCRE2 or by overwriting of the compiled pattern.
.sp
PCRE2_ERROR_JIT_BADOPTION
.sp
This error is returned when a pattern that was successfully studied using JIT
is being matched, but the matching mode (partial or complete match) does not
correspond to any JIT compilation mode. When the JIT fast path function is
used, this error may be also given for invalid options. See the
.\" HREF
\fBpcre2jit\fP
.\"
documentation for more details.
.sp
PCRE2_ERROR_JIT_STACKLIMIT
.sp
@ -2591,15 +2584,13 @@ documentation for more details.
.sp
PCRE2_ERROR_MATCHLIMIT
.sp
The backtracking limit was reached.
The backtracking match limit was reached.
.sp
PCRE2_ERROR_NOMEMORY
.sp
If a pattern contains back references, but the ovector is not big enough to
remember the referenced substrings, PCRE2 gets a block of memory at the start
of matching to use for this purpose. There are some other special cases where
extra memory is needed during matching. This error is given when memory cannot
be obtained.
If a pattern contains many nested backtracking points, heap memory is used to
remember them. This error is given when the memory allocation function (default
or custom) fails.
.sp
PCRE2_ERROR_NULL
.sp
@ -2615,10 +2606,6 @@ in the subject string. Some simple patterns that might do this are detected and
faulted at compile time, but more complicated cases, in particular mutual
recursions between two different subpatterns, cannot be detected until matching
is attempted.
.sp
PCRE2_ERROR_RECURSIONLIMIT
.sp
The internal recursion limit was reached.
.
.
.\" HTML <a name="geterrormessage"></a>
@ -2808,8 +2795,8 @@ calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
compiled pattern, and the second is the name. The yield of the function is the
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
that name. Given the number, you can extract the substring directly, or use one
of the functions described above.
that name. Given the number, you can extract the substring directly from the
ovector, or use one of the "bynumber" functions described above.
.P
For convenience, there are also "byname" functions that correspond to the
"bynumber" functions, the only difference being that the second argument is a
@ -3113,11 +3100,12 @@ other alternatives. Ultimately, when it runs out of matches,
.P
The function \fBpcre2_dfa_match()\fP is called to match a subject string
against a compiled pattern, using a matching algorithm that scans the subject
string just once, and does not backtrack. This has different characteristics to
the normal algorithm, and is not compatible with Perl. Some of the features of
PCRE2 patterns are not supported. Nevertheless, there are times when this kind
of matching can be useful. For a discussion of the two matching algorithms, and
a list of features that \fBpcre2_dfa_match()\fP does not support, see the
string just once (not counting lookaround assertions), and does not backtrack.
This has different characteristics to the normal algorithm, and is not
compatible with Perl. Some of the features of PCRE2 patterns are not supported.
Nevertheless, there are times when this kind of matching can be useful. For a
discussion of the two matching algorithms, and a list of features that
\fBpcre2_dfa_match()\fP does not support, see the
.\" HREF
\fBpcre2matching\fP
.\"
@ -3321,6 +3309,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 21 March 2017
Last updated: 27 March 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi