Documentation update.

This commit is contained in:
Philip.Hazel 2017-03-28 16:34:29 +00:00
parent 447d1b3083
commit 6c7fa44939
5 changed files with 1206 additions and 1232 deletions

View File

@ -46,7 +46,7 @@ A match context is needed only if you want to:
Set a matching offset limit Set a matching offset limit
Change the backtracking match limit Change the backtracking match limit
Change the backtracking depth limit Change the backtracking depth limit
Set custom memory management in the match context Set custom memory management specifically for the match
</pre> </pre>
The <i>length</i> and <i>startoffset</i> values are code The <i>length</i> and <i>startoffset</i> values are code
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a

View File

@ -23,37 +23,38 @@ please consult the man page, in case the conversion went wrong.
<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a> <li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a> <li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a>
<li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a> <li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
<li><a name="TOC11" href="#SEC11">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a> <li><a name="TOC11" href="#SEC11">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a>
<li><a name="TOC12" href="#SEC12">PCRE2 API OVERVIEW</a> <li><a name="TOC12" href="#SEC12">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
<li><a name="TOC13" href="#SEC13">STRING LENGTHS AND OFFSETS</a> <li><a name="TOC13" href="#SEC13">PCRE2 API OVERVIEW</a>
<li><a name="TOC14" href="#SEC14">NEWLINES</a> <li><a name="TOC14" href="#SEC14">STRING LENGTHS AND OFFSETS</a>
<li><a name="TOC15" href="#SEC15">MULTITHREADING</a> <li><a name="TOC15" href="#SEC15">NEWLINES</a>
<li><a name="TOC16" href="#SEC16">PCRE2 CONTEXTS</a> <li><a name="TOC16" href="#SEC16">MULTITHREADING</a>
<li><a name="TOC17" href="#SEC17">CHECKING BUILD-TIME OPTIONS</a> <li><a name="TOC17" href="#SEC17">PCRE2 CONTEXTS</a>
<li><a name="TOC18" href="#SEC18">COMPILING A PATTERN</a> <li><a name="TOC18" href="#SEC18">CHECKING BUILD-TIME OPTIONS</a>
<li><a name="TOC19" href="#SEC19">COMPILATION ERROR CODES</a> <li><a name="TOC19" href="#SEC19">COMPILING A PATTERN</a>
<li><a name="TOC20" href="#SEC20">JUST-IN-TIME (JIT) COMPILATION</a> <li><a name="TOC20" href="#SEC20">COMPILATION ERROR CODES</a>
<li><a name="TOC21" href="#SEC21">LOCALE SUPPORT</a> <li><a name="TOC21" href="#SEC21">JUST-IN-TIME (JIT) COMPILATION</a>
<li><a name="TOC22" href="#SEC22">INFORMATION ABOUT A COMPILED PATTERN</a> <li><a name="TOC22" href="#SEC22">LOCALE SUPPORT</a>
<li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A PATTERN'S CALLOUTS</a> <li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A COMPILED PATTERN</a>
<li><a name="TOC24" href="#SEC24">SERIALIZATION AND PRECOMPILING</a> <li><a name="TOC24" href="#SEC24">INFORMATION ABOUT A PATTERN'S CALLOUTS</a>
<li><a name="TOC25" href="#SEC25">THE MATCH DATA BLOCK</a> <li><a name="TOC25" href="#SEC25">SERIALIZATION AND PRECOMPILING</a>
<li><a name="TOC26" href="#SEC26">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a> <li><a name="TOC26" href="#SEC26">THE MATCH DATA BLOCK</a>
<li><a name="TOC27" href="#SEC27">NEWLINE HANDLING WHEN MATCHING</a> <li><a name="TOC27" href="#SEC27">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
<li><a name="TOC28" href="#SEC28">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a> <li><a name="TOC28" href="#SEC28">NEWLINE HANDLING WHEN MATCHING</a>
<li><a name="TOC29" href="#SEC29">OTHER INFORMATION ABOUT A MATCH</a> <li><a name="TOC29" href="#SEC29">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
<li><a name="TOC30" href="#SEC30">ERROR RETURNS FROM <b>pcre2_match()</b></a> <li><a name="TOC30" href="#SEC30">OTHER INFORMATION ABOUT A MATCH</a>
<li><a name="TOC31" href="#SEC31">OBTAINING A TEXTUAL ERROR MESSAGE</a> <li><a name="TOC31" href="#SEC31">ERROR RETURNS FROM <b>pcre2_match()</b></a>
<li><a name="TOC32" href="#SEC32">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a> <li><a name="TOC32" href="#SEC32">OBTAINING A TEXTUAL ERROR MESSAGE</a>
<li><a name="TOC33" href="#SEC33">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a> <li><a name="TOC33" href="#SEC33">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
<li><a name="TOC34" href="#SEC34">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a> <li><a name="TOC34" href="#SEC34">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
<li><a name="TOC35" href="#SEC35">CREATING A NEW STRING WITH SUBSTITUTIONS</a> <li><a name="TOC35" href="#SEC35">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
<li><a name="TOC36" href="#SEC36">DUPLICATE SUBPATTERN NAMES</a> <li><a name="TOC36" href="#SEC36">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
<li><a name="TOC37" href="#SEC37">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a> <li><a name="TOC37" href="#SEC37">DUPLICATE SUBPATTERN NAMES</a>
<li><a name="TOC38" href="#SEC38">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a> <li><a name="TOC38" href="#SEC38">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
<li><a name="TOC39" href="#SEC39">SEE ALSO</a> <li><a name="TOC39" href="#SEC39">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
<li><a name="TOC40" href="#SEC40">AUTHOR</a> <li><a name="TOC40" href="#SEC40">SEE ALSO</a>
<li><a name="TOC41" href="#SEC41">REVISION</a> <li><a name="TOC41" href="#SEC41">AUTHOR</a>
<li><a name="TOC42" href="#SEC42">REVISION</a>
</ul> </ul>
<P> <P>
<b>#include &#60;pcre2.h&#62;</b> <b>#include &#60;pcre2.h&#62;</b>
@ -177,22 +178,16 @@ document for an overview of all the PCRE2 documentation.
<b> void *<i>callout_data</i>);</b> <b> void *<i>callout_data</i>);</b>
<br> <br>
<br> <br>
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> uint32_t <i>value</i>);</b>
<br>
<br>
<b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b> <b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> PCRE2_SIZE <i>value</i>);</b> <b> PCRE2_SIZE <i>value</i>);</b>
<br> <br>
<br> <br>
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b> <b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> uint32_t <i>value</i>);</b> <b> uint32_t <i>value</i>);</b>
<br> <br>
<br> <br>
<b>int pcre2_set_recursion_memory_management(</b> <b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> pcre2_match_context *<i>mcontext</i>,</b> <b> uint32_t <i>value</i>);</b>
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
</P> </P>
<br><a name="SEC6" href="#TOC1">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a><br> <br><a name="SEC6" href="#TOC1">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a><br>
<P> <P>
@ -314,7 +309,24 @@ document for an overview of all the PCRE2 documentation.
<br> <br>
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b> <b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
</P> </P>
<br><a name="SEC11" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br> <br><a name="SEC11" href="#TOC1">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a><br>
<P>
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> uint32_t <i>value</i>);</b>
<br>
<br>
<b>int pcre2_set_recursion_memory_management(</b>
<b> pcre2_match_context *<i>mcontext</i>,</b>
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
<br>
<br>
These functions became obsolete at release 10.30 and are retained only for
backward compatibility. They should not be used in new code. The first is
replaced by <b>pcre2_set_depth_limit()</b>; the second is no longer needed and
no longer has any effect (it always returns zero).
</P>
<br><a name="SEC12" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
<P> <P>
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
units, respectively. However, there is just one header file, <b>pcre2.h</b>. units, respectively. However, there is just one header file, <b>pcre2.h</b>.
@ -368,14 +380,14 @@ When using multiple libraries in an application, you must take care when
processing any particular pattern to use only functions from a single library. processing any particular pattern to use only functions from a single library.
For example, if you want to run a match using a pattern that was compiled with For example, if you want to run a match using a pattern that was compiled with
<b>pcre2_compile_16()</b>, you must do so with <b>pcre2_match_16()</b>, not <b>pcre2_compile_16()</b>, you must do so with <b>pcre2_match_16()</b>, not
<b>pcre2_match_8()</b>. <b>pcre2_match_8()</b> or <b>pcre2_match_32</b>.
</P> </P>
<P> <P>
In the function summaries above, and in the rest of this document and other In the function summaries above, and in the rest of this document and other
PCRE2 documents, functions and data types are described using their generic PCRE2 documents, functions and data types are described using their generic
names, without the 8, 16, or 32 suffix. names, without the 8, 16, or 32 suffix.
</P> </P>
<br><a name="SEC12" href="#TOC1">PCRE2 API OVERVIEW</a><br> <br><a name="SEC13" href="#TOC1">PCRE2 API OVERVIEW</a><br>
<P> <P>
PCRE2 has its own native API, which is described in this document. There are PCRE2 has its own native API, which is described in this document. There are
also some wrapper functions for the 8-bit library that correspond to the also some wrapper functions for the 8-bit library that correspond to the
@ -397,7 +409,7 @@ against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
<b>pcre2.h</b>. <b>pcre2.h</b>.
</P> </P>
<P> <P>
The functions <b>pcre2_compile()</b>, and <b>pcre2_match()</b> are used for The functions <b>pcre2_compile()</b> and <b>pcre2_match()</b> are used for
compiling and matching regular expressions in a Perl-compatible manner. A compiling and matching regular expressions in a Perl-compatible manner. A
sample program that demonstrates the simplest way of using them is provided in sample program that demonstrates the simplest way of using them is provided in
the file called <i>pcre2demo.c</i> in the PCRE2 source distribution. A listing the file called <i>pcre2demo.c</i> in the PCRE2 source distribution. A listing
@ -408,10 +420,17 @@ documentation, and the
documentation describes how to compile and run it. documentation describes how to compile and run it.
</P> </P>
<P> <P>
Just-in-time compiler support is an optional feature of PCRE2 that can be built The compiling and matching functions recognize various options that are passed
in appropriate hardware environments. It greatly speeds up the matching as bits in an options argument. There are also some more complicated parameters
such as custom memory management functions and resource limits that are passed
in "contexts" (which are just memory blocks, described below). Simple
applications do not need to make use of contexts.
</P>
<P>
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
built in appropriate hardware environments. It greatly speeds up the matching
performance of many patterns. Programs can request that it be used if performance of many patterns. Programs can request that it be used if
available, by calling <b>pcre2_jit_compile()</b> after a pattern has been available by calling <b>pcre2_jit_compile()</b> after a pattern has been
successfully compiled by <b>pcre2_compile()</b>. This does nothing if JIT successfully compiled by <b>pcre2_compile()</b>. This does nothing if JIT
support is not available. support is not available.
</P> </P>
@ -423,8 +442,8 @@ More complicated programs might need to make use of the specialist functions
<P> <P>
JIT matching is automatically used by <b>pcre2_match()</b> if it is available, JIT matching is automatically used by <b>pcre2_match()</b> if it is available,
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
matching, which gives improved performance. The JIT-specific functions are matching, which gives improved performance at the expense of less sanity
discussed in the checking. The JIT-specific functions are discussed in the
<a href="pcre2jit.html"><b>pcre2jit</b></a> <a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation. documentation.
</P> </P>
@ -433,7 +452,7 @@ A second matching function, <b>pcre2_dfa_match()</b>, which is not
Perl-compatible, is also provided. This uses a different algorithm for the Perl-compatible, is also provided. This uses a different algorithm for the
matching. The alternative algorithm finds all possible matches (at a given matching. The alternative algorithm finds all possible matches (at a given
point in the subject), and scans the subject just once (unless there are point in the subject), and scans the subject just once (unless there are
lookbehind assertions). However, this algorithm does not return captured lookaround assertions). However, this algorithm does not return captured
substrings. A description of the two matching algorithms and their advantages substrings. A description of the two matching algorithms and their advantages
and disadvantages is given in the and disadvantages is given in the
<a href="pcre2matching.html"><b>pcre2matching</b></a> <a href="pcre2matching.html"><b>pcre2matching</b></a>
@ -476,7 +495,7 @@ Functions with names ending with <b>_free()</b> are used for freeing memory
blocks of various sorts. In all cases, if one of these functions is called with blocks of various sorts. In all cases, if one of these functions is called with
a NULL argument, it does nothing. a NULL argument, it does nothing.
</P> </P>
<br><a name="SEC13" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br> <br><a name="SEC14" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
<P> <P>
The PCRE2 API uses string lengths and offsets into strings of code units in The PCRE2 API uses string lengths and offsets into strings of code units in
several places. These values are always of type PCRE2_SIZE, which is an several places. These values are always of type PCRE2_SIZE, which is an
@ -486,7 +505,7 @@ as a special indicator for zero-terminated strings and unset offsets.
Therefore, the longest string that can be handled is one less than this Therefore, the longest string that can be handled is one less than this
maximum. maximum.
<a name="newlines"></a></P> <a name="newlines"></a></P>
<br><a name="SEC14" href="#TOC1">NEWLINES</a><br> <br><a name="SEC15" href="#TOC1">NEWLINES</a><br>
<P> <P>
PCRE2 supports five different conventions for indicating line breaks in PCRE2 supports five different conventions for indicating line breaks in
strings: a single CR (carriage return) character, a single LF (linefeed) strings: a single CR (carriage return) character, a single LF (linefeed)
@ -521,7 +540,7 @@ The choice of newline convention does not affect the interpretation of
the \n or \r escape sequences, nor does it affect what \R matches; this has the \n or \r escape sequences, nor does it affect what \R matches; this has
its own separate convention. its own separate convention.
</P> </P>
<br><a name="SEC15" href="#TOC1">MULTITHREADING</a><br> <br><a name="SEC16" href="#TOC1">MULTITHREADING</a><br>
<P> <P>
In a multithreaded application it is important to keep thread-specific data In a multithreaded application it is important to keep thread-specific data
separate from data that can be shared between threads. The PCRE2 library code separate from data that can be shared between threads. The PCRE2 library code
@ -543,8 +562,8 @@ and does not change when the pattern is matched. Therefore, it is thread-safe,
that is, the same compiled pattern can be used by more than one thread that is, the same compiled pattern can be used by more than one thread
simultaneously. For example, an application can compile all its patterns at the simultaneously. For example, an application can compile all its patterns at the
start, before forking off multiple threads that use them. However, if the start, before forking off multiple threads that use them. However, if the
just-in-time optimization feature is being used, it needs separate memory stack just-in-time (JIT) optimization feature is being used, it needs separate memory
areas for each thread. See the stack areas for each thread. See the
<a href="pcre2jit.html"><b>pcre2jit</b></a> <a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for more details. documentation for more details.
</P> </P>
@ -596,12 +615,12 @@ thread-specific copy.
Match blocks Match blocks
</b><br> </b><br>
<P> <P>
The matching functions need a block of memory for working space and for storing The matching functions need a block of memory for storing the results of a
the results of a match. This includes details of what was matched, as well as match. This includes details of what was matched, as well as additional
additional information such as the name of a (*MARK) setting. Each thread must information such as the name of a (*MARK) setting. Each thread must provide its
provide its own copy of this memory. own copy of this memory.
</P> </P>
<br><a name="SEC16" href="#TOC1">PCRE2 CONTEXTS</a><br> <br><a name="SEC17" href="#TOC1">PCRE2 CONTEXTS</a><br>
<P> <P>
Some PCRE2 functions have a lot of parameters, many of which are used only by Some PCRE2 functions have a lot of parameters, many of which are used only by
specialist applications, for example, those that use custom memory management specialist applications, for example, those that use custom memory management
@ -663,15 +682,15 @@ The memory used for a general context should be freed by calling:
The compile context The compile context
</b><br> </b><br>
<P> <P>
A compile context is required if you want to change the default values of any A compile context is required if you want to provide an external function for
of the following compile-time parameters: stack checking during compilation or to change the default values of any of the
following compile-time parameters:
<pre> <pre>
What \R matches (Unicode newlines or CR, LF, CRLF only) What \R matches (Unicode newlines or CR, LF, CRLF only)
PCRE2's character tables PCRE2's character tables
The newline character sequence The newline character sequence
The compile time nested parentheses limit The compile time nested parentheses limit
The maximum length of the pattern string The maximum length of the pattern string
An external function for stack checking
</pre> </pre>
A compile context is also required if you are using custom memory management. A compile context is also required if you are using custom memory management.
If none of these apply, just pass NULL as the context argument of If none of these apply, just pass NULL as the context argument of
@ -713,11 +732,11 @@ in the current locale.
<b> PCRE2_SIZE <i>value</i>);</b> <b> PCRE2_SIZE <i>value</i>);</b>
<br> <br>
<br> <br>
This sets a maximum length, in code units, for the pattern string that is to be This sets a maximum length, in code units, for any pattern string that is
compiled. If the pattern is longer, an error is generated. This facility is compiled with this context. If the pattern is longer, an error is generated.
provided so that applications that accept patterns from external sources can This facility is provided so that applications that accept patterns from
limit their size. The default is the largest number that a PCRE2_SIZE variable external sources can limit their size. The default is the largest number that a
can hold, which is effectively unlimited. PCRE2_SIZE variable can hold, which is effectively unlimited.
<b>int pcre2_set_newline(pcre2_compile_context *<i>ccontext</i>,</b> <b>int pcre2_set_newline(pcre2_compile_context *<i>ccontext</i>,</b>
<b> uint32_t <i>value</i>);</b> <b> uint32_t <i>value</i>);</b>
<br> <br>
@ -729,8 +748,14 @@ sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
PCRE2_NEWLINE_ANY (any Unicode newline sequence). PCRE2_NEWLINE_ANY (any Unicode newline sequence).
</P> </P>
<P> <P>
When a pattern is compiled with the PCRE2_EXTENDED option, the value of this A pattern can override the value set in the compile context by starting with a
parameter affects the recognition of white space and the end of internal sequence such as (*CRLF). See the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
page for details.
</P>
<P>
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
convention affects the recognition of white space and the end of internal
comments starting with #. The value is saved with the compiled pattern for comments starting with #. The value is saved with the compiled pattern for
subsequent use by the JIT compiler and by the two interpreted matching subsequent use by the JIT compiler and by the two interpreted matching
functions, <i>pcre2_match()</i> and <i>pcre2_dfa_match()</i>. functions, <i>pcre2_match()</i> and <i>pcre2_dfa_match()</i>.
@ -764,15 +789,14 @@ zero if all is well, or non-zero to force an error.
The match context The match context
</b><br> </b><br>
<P> <P>
A match context is required if you want to change the default values of any A match context is required if you want to:
of the following match-time parameters:
<pre> <pre>
A callout function Set up a callout function
The offset limit for matching an unanchored pattern Set an offset limit for matching an unanchored pattern
The limit for calling <b>match()</b> (see below) Change the backtracking match limit
The limit for calling <b>match()</b> recursively Change the backtracking depth limit
Set custom memory management specifically for the match
</pre> </pre>
A match context is also required if you are using custom memory management.
If none of these apply, just pass NULL as the context argument of If none of these apply, just pass NULL as the context argument of
<b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>. <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>.
</P> </P>
@ -797,7 +821,7 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
<b> void *<i>callout_data</i>);</b> <b> void *<i>callout_data</i>);</b>
<br> <br>
<br> <br>
This sets up a "callout" function, which PCRE2 will call at specified points This sets up a "callout" function for PCRE2 to call at specified points
during a matching operation. Details are given in the during a matching operation. Details are given in the
<a href="pcre2callout.html"><b>pcre2callout</b></a> <a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation. documentation.
@ -816,8 +840,8 @@ A match can never be found if the <i>startoffset</i> argument of
limit. limit.
</P> </P>
<P> <P>
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when calling When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
<b>pcre2_compile()</b> so that when JIT is in use, different code can be calling <b>pcre2_compile()</b> so that when JIT is in use, different code can be
compiled. If a match is started with a non-default match limit when compiled. If a match is started with a non-default match limit when
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated. PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
</P> </P>
@ -837,10 +861,10 @@ which have a very large number of possibilities in their search trees. The
classic example is a pattern that uses nested unlimited repeats. classic example is a pattern that uses nested unlimited repeats.
</P> </P>
<P> <P>
Internally, <b>pcre2_match()</b> uses a function called <b>match()</b>, which it There is an internal counter in <b>pcre2_match()</b> that is incremented each
calls repeatedly (sometimes recursively). The limit set by <i>match_limit</i> is time round its main matching loop. If this value reaches the match limit,
imposed on the number of times this function is called during a match, which <b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
has the effect of limiting the amount of backtracking that can take place. For the effect of limiting the amount of backtracking that can take place. For
patterns that are not anchored, the count restarts from zero for each position patterns that are not anchored, the count restarts from zero for each position
in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>, in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>,
which ignores it. which ignores it.
@ -855,8 +879,7 @@ matching can continue.
</P> </P>
<P> <P>
The default value for the limit can be set when PCRE2 is built; the default The default value for the limit can be set when PCRE2 is built; the default
default is 10 million, which handles all but the most extreme cases. If the default is 10 million, which handles all but the most extreme cases. A value
limit is exceeded, <b>pcre2_match()</b> returns PCRE2_ERROR_MATCHLIMIT. A value
for the match limit may also be supplied by an item at the start of a pattern for the match limit may also be supplied by an item at the start of a pattern
of the form of the form
<pre> <pre>
@ -865,64 +888,38 @@ of the form
where ddd is a decimal number. However, such a setting is ignored unless ddd is where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of <b>pcre2_match()</b> or, if no such less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
limit is set, less than the default. limit is set, less than the default.
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b> <b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
<b> uint32_t <i>value</i>);</b> <b> uint32_t <i>value</i>);</b>
<br> <br>
<br> <br>
The <i>recursion_limit</i> parameter is similar to <i>match_limit</i>, but This parameter limits the depth of nested backtracking in <b>pcre2_match()</b>.
instead of limiting the total number of times that <b>match()</b> is called, it Each time a nested backtracking point is passed, a new memory "frame" is used
limits the depth of recursion. The recursion depth is a smaller number than the to remember the state of matching at that point. Thus, this parameter
total number of calls, because not all calls to <b>match()</b> are recursive. indirectly limits the amount of memory that is used in a match.
This limit is of use only if it is set smaller than <i>match_limit</i>.
</P> </P>
<P> <P>
Limiting the recursion depth limits the amount of system stack that can be This limit is not relevant, and is ignored, when matching is done using JIT
used, or, when PCRE2 has been compiled to use memory on the heap instead of the compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which uses
stack, the amount of heap memory that can be used. This limit is not relevant, it to limit the depth of internal recursive function calls that implement
and is ignored, when matching is done using JIT compiled code. However, it is lookaround assertions and pattern recursions. This is, therefore, an indirect
supported by <b>pcre2_dfa_match()</b>, which uses recursive function calls less limit on the amount of system stack that is used. A recursive pattern such as
frequently than <b>pcre2_match()</b>, but which can be caused to use a lot of /(.)(?1)/, when matched to a very long string using <b>pcre2_dfa_match()</b>,
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string. can use a great deal of stack.
</P> </P>
<P> <P>
The default value for <i>recursion_limit</i> can be set when PCRE2 is built; the The default value for the depth limit can be set when PCRE2 is built; the
default default is the same value as the default for <i>match_limit</i>. If the default default is the same value as the default for the match limit. If the
limit is exceeded, <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> return limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> returns
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
supplied by an item at the start of a pattern of the form item at the start of a pattern of the form
<pre> <pre>
(*LIMIT_RECURSION=ddd) (*LIMIT_DEPTH=ddd)
</pre> </pre>
where ddd is a decimal number. However, such a setting is ignored unless ddd is where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of <b>pcre2_match()</b> or less than the limit set by the caller of <b>pcre2_match()</b> or
<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default. <b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default.
<b>int pcre2_set_recursion_memory_management(</b>
<b> pcre2_match_context *<i>mcontext</i>,</b>
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
<br>
<br>
This function sets up two additional custom memory management functions for use
by <b>pcre2_match()</b> when PCRE2 is compiled to use the heap for remembering
backtracking data, instead of recursive function calls that use the system
stack. There is a discussion about PCRE2's stack usage in the
<a href="pcre2stack.html"><b>pcre2stack</b></a>
documentation. See the
<a href="pcre2build.html"><b>pcre2build</b></a>
documentation for details of how to build PCRE2.
</P> </P>
<P> <br><a name="SEC18" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
Using the heap for recursion is a non-standard way of building PCRE2, for use
in environments that have limited stacks. Because of the greater use of memory
management, <b>pcre2_match()</b> runs more slowly. Functions that are different
to the general custom memory functions are provided so that special-purpose
external code can be used for this case, because the memory blocks are all the
same size. The blocks are retained by <b>pcre2_match()</b> until it is about to
exit so that they can be re-used when possible during the match. In the absence
of these functions, the normal custom memory management functions are used, if
supplied, otherwise the system functions.
</P>
<br><a name="SEC17" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
<P> <P>
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b> <b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
</P> </P>
@ -954,6 +951,13 @@ sequences the \R escape sequence matches by default. A value of
PCRE2_BSR_UNICODE means that \R matches any Unicode line ending sequence; a PCRE2_BSR_UNICODE means that \R matches any Unicode line ending sequence; a
value of PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF. The value of PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF. The
default can be overridden when a pattern is compiled. default can be overridden when a pattern is compiled.
<pre>
PCRE2_CONFIG_DEPTHLIMIT
</pre>
The output is a uint32_t integer that gives the default limit for the depth of
nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions
and lookarounds in <b>pcre2_dfa_match()</b>. Further details are given with
<b>pcre2_set_depth_limit()</b> above.
<pre> <pre>
PCRE2_CONFIG_JIT PCRE2_CONFIG_JIT
</pre> </pre>
@ -989,9 +993,9 @@ be compiled by those two libraries, but at the expense of slower matching.
<pre> <pre>
PCRE2_CONFIG_MATCHLIMIT PCRE2_CONFIG_MATCHLIMIT
</pre> </pre>
The output is a uint32_t integer that gives the default limit for the number of The output is a uint32_t integer that gives the default match limit for
internal matching function calls in a <b>pcre2_match()</b> execution. Further <b>pcre2_match()</b>. Further details are given with
details are given with <b>pcre2_match()</b> below. <b>pcre2_set_match_limit()</b> above.
<pre> <pre>
PCRE2_CONFIG_NEWLINE PCRE2_CONFIG_NEWLINE
</pre> </pre>
@ -1015,20 +1019,11 @@ amount of system stack used when a pattern is compiled. It is specified when
PCRE2 is built; the default is 250. This limit does not take into account the PCRE2 is built; the default is 250. This limit does not take into account the
stack that may already be used by the calling application. For finer control stack that may already be used by the calling application. For finer control
over compilation stack usage, see <b>pcre2_set_compile_recursion_guard()</b>. over compilation stack usage, see <b>pcre2_set_compile_recursion_guard()</b>.
<pre>
PCRE2_CONFIG_RECURSIONLIMIT
</pre>
The output is a uint32_t integer that gives the default limit for the depth of
recursion when calling the internal matching function in a <b>pcre2_match()</b>
execution. Further details are given with <b>pcre2_match()</b> below.
<pre> <pre>
PCRE2_CONFIG_STACKRECURSE PCRE2_CONFIG_STACKRECURSE
</pre> </pre>
The output is a uint32_t integer that is set to one if internal recursion when This parameter is obsolete and should not be used in new code. The output is a
running <b>pcre2_match()</b> is implemented by recursive function calls that use uint32_t integer that is always set to zero.
the system stack to remember their state. This is the usual way that PCRE2 is
compiled. The output is zero if PCRE2 was compiled to use blocks of data on the
heap instead of recursive function calls.
<pre> <pre>
PCRE2_CONFIG_UNICODE_VERSION PCRE2_CONFIG_UNICODE_VERSION
</pre> </pre>
@ -1047,14 +1042,14 @@ available; otherwise it is set to zero. Unicode support implies UTF support.
<pre> <pre>
PCRE2_CONFIG_VERSION PCRE2_CONFIG_VERSION
</pre> </pre>
The <i>where</i> argument should point to a buffer that is at least 12 code The <i>where</i> argument should point to a buffer that is at least 24 code
units long. (The exact length required can be found by calling units long. (The exact length required can be found by calling
<b>pcre2_config()</b> with <b>where</b> set to NULL.) The buffer is filled with <b>pcre2_config()</b> with <b>where</b> set to NULL.) The buffer is filled with
the PCRE2 version string, zero-terminated. The number of code units used is the PCRE2 version string, zero-terminated. The number of code units used is
returned. This is the length of the string plus one unit for the terminating returned. This is the length of the string plus one unit for the terminating
zero. zero.
<a name="compiling"></a></P> <a name="compiling"></a></P>
<br><a name="SEC18" href="#TOC1">COMPILING A PATTERN</a><br> <br><a name="SEC19" href="#TOC1">COMPILING A PATTERN</a><br>
<P> <P>
<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b> <b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b> <b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
@ -1240,13 +1235,14 @@ option is set, normal backslash processing is applied to verb names and only an
unescaped closing parenthesis terminates the name. A closing parenthesis can be unescaped closing parenthesis terminates the name. A closing parenthesis can be
included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
option is set, unescaped whitespace in verb names is skipped and #-comments are option is set, unescaped whitespace in verb names is skipped and #-comments are
recognized, exactly as in the rest of the pattern. recognized in this mode, exactly as in the rest of the pattern.
<pre> <pre>
PCRE2_AUTO_CALLOUT PCRE2_AUTO_CALLOUT
</pre> </pre>
If this bit is set, <b>pcre2_compile()</b> automatically inserts callout items, If this bit is set, <b>pcre2_compile()</b> automatically inserts callout items,
all with number 255, before each pattern item, except immediately before or all with number 255, before each pattern item, except immediately before or
after a callout in the pattern. For discussion of the callout facility, see the after an explicit callout in the pattern. For discussion of the callout
facility, see the
<a href="pcre2callout.html"><b>pcre2callout</b></a> <a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation. documentation.
<pre> <pre>
@ -1472,9 +1468,8 @@ and
<a href="pcre2unicode.html#utf32strings">UTF-32 strings</a> <a href="pcre2unicode.html#utf32strings">UTF-32 strings</a>
in the in the
<a href="pcre2unicode.html"><b>pcre2unicode</b></a> <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
document. document. If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a
If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a negative negative error code.
error code.
</P> </P>
<P> <P>
If you know that your pattern is valid, and you want to skip this check for If you know that your pattern is valid, and you want to skip this check for
@ -1495,7 +1490,7 @@ in the
<a href="pcre2pattern.html"><b>pcre2pattern</b></a> <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
page. If you set PCRE2_UCP, matching one of the items it affects takes much page. If you set PCRE2_UCP, matching one of the items it affects takes much
longer. The option is available only if PCRE2 has been compiled with Unicode longer. The option is available only if PCRE2 has been compiled with Unicode
support. support (which is the default).
<pre> <pre>
PCRE2_UNGREEDY PCRE2_UNGREEDY
</pre> </pre>
@ -1525,9 +1520,9 @@ the behaviour of PCRE2 are given in the
<a href="pcre2unicode.html"><b>pcre2unicode</b></a> <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
page. page.
</P> </P>
<br><a name="SEC19" href="#TOC1">COMPILATION ERROR CODES</a><br> <br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
<P> <P>
There are over 80 positive error codes that <b>pcre2_compile()</b> may return There are nearly 100 positive error codes that <b>pcre2_compile()</b> may return
(via <i>errorcode</i>) if it finds an error in the pattern. There are also some (via <i>errorcode</i>) if it finds an error in the pattern. There are also some
negative error codes that are used for invalid UTF strings. These are the same negative error codes that are used for invalid UTF strings. These are the same
as given by <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, and are described as given by <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, and are described
@ -1538,7 +1533,7 @@ error message"
<a href="#geterrormessage">below)</a> <a href="#geterrormessage">below)</a>
can be called to obtain a textual error message from any error code. can be called to obtain a textual error message from any error code.
<a name="jitcompiling"></a></P> <a name="jitcompiling"></a></P>
<br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br> <br><a name="SEC21" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
<P> <P>
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b> <b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
<br> <br>
@ -1574,18 +1569,18 @@ documentation.
JIT compilation is a heavyweight optimization. It can take some time for JIT compilation is a heavyweight optimization. It can take some time for
patterns to be analyzed, and for one-off matches and simple patterns the patterns to be analyzed, and for one-off matches and simple patterns the
benefit of faster execution might be offset by a much slower compilation time. benefit of faster execution might be offset by a much slower compilation time.
Most, but not all patterns can be optimized by the JIT compiler. Most (but not all) patterns can be optimized by the JIT compiler.
<a name="localesupport"></a></P> <a name="localesupport"></a></P>
<br><a name="SEC21" href="#TOC1">LOCALE SUPPORT</a><br> <br><a name="SEC22" href="#TOC1">LOCALE SUPPORT</a><br>
<P> <P>
PCRE2 handles caseless matching, and determines whether characters are letters, PCRE2 handles caseless matching, and determines whether characters are letters,
digits, or whatever, by reference to a set of tables, indexed by character code digits, or whatever, by reference to a set of tables, indexed by character code
point. This applies only to characters whose code points are less than 256. By point. This applies only to characters whose code points are less than 256. By
default, higher-valued code points never match escapes such as \w or \d. default, higher-valued code points never match escapes such as \w or \d.
However, if PCRE2 is built with UTF support, all characters can be tested with However, if PCRE2 is built with Unicode support, all characters can be tested
\p and \P, or, alternatively, the PCRE2_UCP option can be set when a pattern with \p and \P, or, alternatively, the PCRE2_UCP option can be set when a
is compiled; this causes \w and friends to use Unicode property support pattern is compiled; this causes \w and friends to use Unicode property
instead of the built-in tables. support instead of the built-in tables.
</P> </P>
<P> <P>
The use of locales with Unicode is discouraged. If you are handling characters The use of locales with Unicode is discouraged. If you are handling characters
@ -1629,10 +1624,10 @@ available for as long as it is needed.
The pointer that is passed (via the compile context) to <b>pcre2_compile()</b> The pointer that is passed (via the compile context) to <b>pcre2_compile()</b>
is saved with the compiled pattern, and the same tables are used by is saved with the compiled pattern, and the same tables are used by
<b>pcre2_match()</b> and <b>pcre_dfa_match()</b>. Thus, for any single pattern, <b>pcre2_match()</b> and <b>pcre_dfa_match()</b>. Thus, for any single pattern,
compilation, and matching all happen in the same locale, but different patterns compilation and matching both happen in the same locale, but different patterns
can be processed in different locales. can be processed in different locales.
<a name="infoaboutpattern"></a></P> <a name="infoaboutpattern"></a></P>
<br><a name="SEC22" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br> <br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
<P> <P>
<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b> <b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
</P> </P>
@ -1645,7 +1640,7 @@ pattern. The second argument specifies which piece of information is required,
and the third argument is a pointer to a variable to receive the data. If the and the third argument is a pointer to a variable to receive the data. If the
third argument is NULL, the first argument is ignored, and the function returns third argument is NULL, the first argument is ignored, and the function returns
the size in bytes of the variable that is required for the information the size in bytes of the variable that is required for the information
requested. Otherwise, The yield of the function is zero for success, or one of requested. Otherwise, the yield of the function is zero for success, or one of
the following negative numbers: the following negative numbers:
<pre> <pre>
PCRE2_ERROR_NULL the argument <i>code</i> was NULL PCRE2_ERROR_NULL the argument <i>code</i> was NULL
@ -1698,8 +1693,8 @@ following are true:
.* is not in an atomic group .* is not in an atomic group
.* is not in a capturing group that is the subject of a back reference .* is not in a capturing group that is the subject of a back reference
PCRE2_DOTALL is in force for .* PCRE2_DOTALL is in force for .*
Neither (*PRUNE) nor (*SKIP) appears in the pattern. Neither (*PRUNE) nor (*SKIP) appears in the pattern
PCRE2_NO_DOTSTAR_ANCHOR is not set. PCRE2_NO_DOTSTAR_ANCHOR is not set
</pre> </pre>
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
options returned for PCRE2_INFO_ALLOPTIONS. options returned for PCRE2_INFO_ALLOPTIONS.
@ -1726,6 +1721,13 @@ matches only CR, LF, or CRLF.
Return the highest capturing subpattern number in the pattern. In patterns Return the highest capturing subpattern number in the pattern. In patterns
where (?| is not used, this is also the total number of capturing subpatterns. where (?| is not used, this is also the total number of capturing subpatterns.
The third argument should point to an <b>uint32_t</b> variable. The third argument should point to an <b>uint32_t</b> variable.
<pre>
PCRE2_INFO_DEPTHLIMIT
</pre>
If the pattern set a backtracking depth limit by including an item of the form
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
<pre> <pre>
PCRE2_INFO_FIRSTBITMAP PCRE2_INFO_FIRSTBITMAP
</pre> </pre>
@ -1757,6 +1759,14 @@ argument should point to an <b>uint32_t</b> variable. In the 8-bit library, the
value is always less than 256. In the 16-bit library the value can be up to value is always less than 256. In the 16-bit library the value can be up to
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff, 0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
and up to 0xffffffff when not using UTF-32 mode. and up to 0xffffffff when not using UTF-32 mode.
<pre>
PCRE2_INFO_FRAMESIZE
</pre>
Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
without the use of JIT. The third argument should point to an <b>size_t</b>
variable. The frame size depends on the number of capturing parentheses in the
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
<pre> <pre>
PCRE2_INFO_HASBACKSLASHC PCRE2_INFO_HASBACKSLASHC
</pre> </pre>
@ -1767,7 +1777,8 @@ argument should point to an <b>uint32_t</b> variable.
</pre> </pre>
Return 1 if the pattern contains any explicit matches for CR or LF characters, Return 1 if the pattern contains any explicit matches for CR or LF characters,
otherwise 0. The third argument should point to an <b>uint32_t</b> variable. An otherwise 0. The third argument should point to an <b>uint32_t</b> variable. An
explicit match is either a literal CR or LF character, or \r or \n. explicit match is either a literal CR or LF character, or \r or \n or one of
the equivalent hexadecimal or octal escape sequences.
<pre> <pre>
PCRE2_INFO_JCHANGED PCRE2_INFO_JCHANGED
</pre> </pre>
@ -1904,7 +1915,7 @@ different for each compiled pattern.
<pre> <pre>
PCRE2_INFO_NEWLINE PCRE2_INFO_NEWLINE
</pre> </pre>
The output is a <b>uint32_t</b> with one of the following values: The output is one of the following <b>uint32_t</b> values:
<pre> <pre>
PCRE2_NEWLINE_CR Carriage return (CR) PCRE2_NEWLINE_CR Carriage return (CR)
PCRE2_NEWLINE_LF Linefeed (LF) PCRE2_NEWLINE_LF Linefeed (LF)
@ -1912,15 +1923,8 @@ The output is a <b>uint32_t</b> with one of the following values:
PCRE2_NEWLINE_ANY Any Unicode line ending PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
</pre> </pre>
This specifies the default character sequence that will be recognized as This identifies the character sequence that will be recognized as meaning
meaning "newline" while matching. "newline" while matching.
<pre>
PCRE2_INFO_RECURSIONLIMIT
</pre>
If the pattern set a recursion limit by including an item of the form
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
argument should point to an unsigned 32-bit integer. If no such value has been
set, the call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
<pre> <pre>
PCRE2_INFO_SIZE PCRE2_INFO_SIZE
</pre> </pre>
@ -1933,7 +1937,7 @@ value returned by this option, because there are cases where the code that
calculates the size has to over-estimate. Processing a pattern with the JIT calculates the size has to over-estimate. Processing a pattern with the JIT
compiler does not alter the value returned by this option. compiler does not alter the value returned by this option.
<a name="infoaboutcallouts"></a></P> <a name="infoaboutcallouts"></a></P>
<br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br> <br><a name="SEC24" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br>
<P> <P>
<b>int pcre2_callout_enumerate(const pcre2_code *<i>code</i>,</b> <b>int pcre2_callout_enumerate(const pcre2_code *<i>code</i>,</b>
<b> int (*<i>callback</i>)(pcre2_callout_enumerate_block *, void *),</b> <b> int (*<i>callback</i>)(pcre2_callout_enumerate_block *, void *),</b>
@ -1952,7 +1956,7 @@ contents of the callout enumeration block are described in the
<a href="pcre2callout.html"><b>pcre2callout</b></a> <a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation, which also gives further details about callouts. documentation, which also gives further details about callouts.
</P> </P>
<br><a name="SEC24" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br> <br><a name="SEC25" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
<P> <P>
It is possible to save compiled patterns on disc or elsewhere, and reload them It is possible to save compiled patterns on disc or elsewhere, and reload them
later, subject to a number of restrictions. The functions whose names begin later, subject to a number of restrictions. The functions whose names begin
@ -1961,7 +1965,7 @@ the
<a href="pcre2serialize.html"><b>pcre2serialize</b></a> <a href="pcre2serialize.html"><b>pcre2serialize</b></a>
documentation. documentation.
<a name="matchdatablock"></a></P> <a name="matchdatablock"></a></P>
<br><a name="SEC25" href="#TOC1">THE MATCH DATA BLOCK</a><br> <br><a name="SEC26" href="#TOC1">THE MATCH DATA BLOCK</a><br>
<P> <P>
<b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b> <b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
<b> pcre2_general_context *<i>gcontext</i>);</b> <b> pcre2_general_context *<i>gcontext</i>);</b>
@ -1986,9 +1990,9 @@ Before calling <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or
the creation functions above. For <b>pcre2_match_data_create()</b>, the first the creation functions above. For <b>pcre2_match_data_create()</b>, the first
argument is the number of pairs of offsets in the <i>ovector</i>. One pair of argument is the number of pairs of offsets in the <i>ovector</i>. One pair of
offsets is required to identify the string that matched the whole pattern, with offsets is required to identify the string that matched the whole pattern, with
another pair for each captured substring. For example, a value of 4 creates an additional pair for each captured substring. For example, a value of 4
enough space to record the matched portion of the subject plus three captured creates enough space to record the matched portion of the subject plus three
substrings. A minimum of at least 1 pair is imposed by captured substrings. A minimum of at least 1 pair is imposed by
<b>pcre2_match_data_create()</b>, so it is always possible to return the overall <b>pcre2_match_data_create()</b>, so it is always possible to return the overall
matched string. matched string.
</P> </P>
@ -2032,7 +2036,7 @@ match data block (for that match) have taken place.
When a match data block itself is no longer needed, it should be freed by When a match data block itself is no longer needed, it should be freed by
calling <b>pcre2_match_data_free()</b>. calling <b>pcre2_match_data_free()</b>.
</P> </P>
<br><a name="SEC26" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br> <br><a name="SEC27" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
<P> <P>
<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b> <b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b> <b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
@ -2126,9 +2130,11 @@ character is CR followed by LF, advance the starting offset by two characters
instead of one. instead of one.
</P> </P>
<P> <P>
If a non-zero starting offset is passed when the pattern is anchored, one If a non-zero starting offset is passed when the pattern is anchored, an single
attempt to match at the given offset is made. This can only succeed if the attempt to match at the given offset is made. This can only succeed if the
pattern does not require the match to be at the start of the subject. pattern does not require the match to be at the start of the subject. In other
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \A.
<a name="matchoptions"></a></P> <a name="matchoptions"></a></P>
<br><b> <br><b>
Option bits for <b>pcre2_match()</b> Option bits for <b>pcre2_match()</b>
@ -2142,9 +2148,9 @@ described below.
</P> </P>
<P> <P>
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT) Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
compiler. If it is set, JIT matching is disabled and the normal interpretive compiler. If it is set, JIT matching is disabled and the interpretive code in
code in <b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT (obviously), the <b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT (obviously), the remaining
remaining options are supported for JIT matching. options are supported for JIT matching.
<pre> <pre>
PCRE2_ANCHORED PCRE2_ANCHORED
</pre> </pre>
@ -2229,13 +2235,13 @@ page.
If you know that your subject is valid, and you want to skip these checks for If you know that your subject is valid, and you want to skip these checks for
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
<b>pcre2_match()</b>. You might want to do this for the second and subsequent <b>pcre2_match()</b>. You might want to do this for the second and subsequent
calls to <b>pcre2_match()</b> if you are making repeated calls to find all the calls to <b>pcre2_match()</b> if you are making repeated calls to find other
matches in a single subject string. matches in the same subject string.
</P> </P>
<P> <P>
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid string WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
as a subject, or an invalid value of <i>startoffset</i>, is undefined. Your string as a subject, or an invalid value of <i>startoffset</i>, is undefined.
program may crash or loop indefinitely. Your program may crash or loop indefinitely.
<pre> <pre>
PCRE2_PARTIAL_HARD PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT PCRE2_PARTIAL_SOFT
@ -2262,7 +2268,7 @@ examples, in the
<a href="pcre2partial.html"><b>pcre2partial</b></a> <a href="pcre2partial.html"><b>pcre2partial</b></a>
documentation. documentation.
</P> </P>
<br><a name="SEC27" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br> <br><a name="SEC28" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
<P> <P>
When PCRE2 is built, a default newline convention is set; this is usually the When PCRE2 is built, a default newline convention is set; this is usually the
standard convention for the operating system. The default can be overridden in standard convention for the operating system. The default can be overridden in
@ -2294,15 +2300,15 @@ reference, and so advances only by one character after the first failure.
</P> </P>
<P> <P>
An explicit match for CR of LF is either a literal appearance of one of those An explicit match for CR of LF is either a literal appearance of one of those
characters in the pattern, or one of the \r or \n escape sequences. Implicit characters in the pattern, or one of the \r or \n or equivalent octal or
matches such as [^X] do not count, nor does \s, even though it includes CR and hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
LF in the characters that it matches. does \s, even though it includes CR and LF in the characters that it matches.
</P> </P>
<P> <P>
Notwithstanding the above, anomalous effects may still occur when CRLF is a Notwithstanding the above, anomalous effects may still occur when CRLF is a
valid newline sequence and explicit \r or \n escapes appear in the pattern. valid newline sequence and explicit \r or \n escapes appear in the pattern.
<a name="matchedstrings"></a></P> <a name="matchedstrings"></a></P>
<br><a name="SEC28" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br> <br><a name="SEC29" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
<P> <P>
<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b> <b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
<br> <br>
@ -2352,12 +2358,12 @@ identify the part of the subject that was partially matched. See the
documentation for details of partial matching. documentation for details of partial matching.
</P> </P>
<P> <P>
After a successful match, the first pair of offsets identifies the portion of After a fully successful match, the first pair of offsets identifies the
the subject string that was matched by the entire pattern. The next pair is portion of the subject string that was matched by the entire pattern. The next
used for the first capturing subpattern, and so on. The value returned by pair is used for the first captured substring, and so on. The value returned by
<b>pcre2_match()</b> is one more than the highest numbered pair that has been <b>pcre2_match()</b> is one more than the highest numbered pair that has been
set. For example, if two substrings have been captured, the returned value is set. For example, if two substrings have been captured, the returned value is
3. If there are no capturing subpatterns, the return value from a successful 3. If there are no captured substrings, the return value from a successful
match is 1, indicating that just the first pair of offsets has been set. match is 1, indicating that just the first pair of offsets has been set.
</P> </P>
<P> <P>
@ -2375,11 +2381,7 @@ returned.
If the ovector is too small to hold all the captured substring offsets, as much If the ovector is too small to hold all the captured substring offsets, as much
as possible is filled in, and the function returns a value of zero. If captured as possible is filled in, and the function returns a value of zero. If captured
substrings are not of interest, <b>pcre2_match()</b> may be called with a match substrings are not of interest, <b>pcre2_match()</b> may be called with a match
data block whose ovector is of minimum length (that is, one pair). However, if data block whose ovector is of minimum length (that is, one pair).
the pattern contains back references and the <i>ovector</i> is not big enough to
remember the related substrings, PCRE2 has to get additional memory for use
during matching. Thus it is usually advisable to set up a match data block
containing an ovector of reasonable size.
</P> </P>
<P> <P>
It is possible for capturing subpattern number <i>n+1</i> to match some part of It is possible for capturing subpattern number <i>n+1</i> to match some part of
@ -2405,7 +2407,7 @@ parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by
<b>pcre2_match()</b>. The other elements retain whatever values they previously <b>pcre2_match()</b>. The other elements retain whatever values they previously
had. had.
<a name="matchotherdata"></a></P> <a name="matchotherdata"></a></P>
<br><a name="SEC29" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br> <br><a name="SEC30" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
<P> <P>
<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b> <b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
<br> <br>
@ -2455,7 +2457,7 @@ the code unit offset of the invalid UTF character. Details are given in the
<a href="pcre2unicode.html"><b>pcre2unicode</b></a> <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
page. page.
<a name="errorlist"></a></P> <a name="errorlist"></a></P>
<br><a name="SEC30" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br> <br><a name="SEC31" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
<P> <P>
If <b>pcre2_match()</b> fails, it returns a negative number. This can be If <b>pcre2_match()</b> fails, it returns a negative number. This can be
converted to a text string by calling the <b>pcre2_get_error_message()</b> converted to a text string by calling the <b>pcre2_get_error_message()</b>
@ -2487,8 +2489,9 @@ returned when the magic number is not present.
<pre> <pre>
PCRE2_ERROR_BADMODE PCRE2_ERROR_BADMODE
</pre> </pre>
This error is given when a pattern that was compiled by the 8-bit library is This error is given when a compiled pattern is passed to a function in a
passed to a 16-bit or 32-bit library function, or vice versa. library of a different code unit width, for example, a pattern compiled by
the 8-bit library is passed to a 16-bit or 32-bit library function.
<pre> <pre>
PCRE2_ERROR_BADOFFSET PCRE2_ERROR_BADOFFSET
</pre> </pre>
@ -2512,20 +2515,15 @@ use by callout functions that want to cause <b>pcre2_match()</b> or
<b>pcre2_callout_enumerate()</b> to return a distinctive error code. See the <b>pcre2_callout_enumerate()</b> to return a distinctive error code. See the
<a href="pcre2callout.html"><b>pcre2callout</b></a> <a href="pcre2callout.html"><b>pcre2callout</b></a>
documentation for details. documentation for details.
<pre>
PCRE2_ERROR_DEPTHLIMIT
</pre>
The nested backtracking depth limit was reached.
<pre> <pre>
PCRE2_ERROR_INTERNAL PCRE2_ERROR_INTERNAL
</pre> </pre>
An unexpected internal error has occurred. This error could be caused by a bug An unexpected internal error has occurred. This error could be caused by a bug
in PCRE2 or by overwriting of the compiled pattern. in PCRE2 or by overwriting of the compiled pattern.
<pre>
PCRE2_ERROR_JIT_BADOPTION
</pre>
This error is returned when a pattern that was successfully studied using JIT
is being matched, but the matching mode (partial or complete match) does not
correspond to any JIT compilation mode. When the JIT fast path function is
used, this error may be also given for invalid options. See the
<a href="pcre2jit.html"><b>pcre2jit</b></a>
documentation for more details.
<pre> <pre>
PCRE2_ERROR_JIT_STACKLIMIT PCRE2_ERROR_JIT_STACKLIMIT
</pre> </pre>
@ -2537,15 +2535,13 @@ documentation for more details.
<pre> <pre>
PCRE2_ERROR_MATCHLIMIT PCRE2_ERROR_MATCHLIMIT
</pre> </pre>
The backtracking limit was reached. The backtracking match limit was reached.
<pre> <pre>
PCRE2_ERROR_NOMEMORY PCRE2_ERROR_NOMEMORY
</pre> </pre>
If a pattern contains back references, but the ovector is not big enough to If a pattern contains many nested backtracking points, heap memory is used to
remember the referenced substrings, PCRE2 gets a block of memory at the start remember them. This error is given when the memory allocation function (default
of matching to use for this purpose. There are some other special cases where or custom) fails.
extra memory is needed during matching. This error is given when memory cannot
be obtained.
<pre> <pre>
PCRE2_ERROR_NULL PCRE2_ERROR_NULL
</pre> </pre>
@ -2561,12 +2557,8 @@ in the subject string. Some simple patterns that might do this are detected and
faulted at compile time, but more complicated cases, in particular mutual faulted at compile time, but more complicated cases, in particular mutual
recursions between two different subpatterns, cannot be detected until matching recursions between two different subpatterns, cannot be detected until matching
is attempted. is attempted.
<pre>
PCRE2_ERROR_RECURSIONLIMIT
</pre>
The internal recursion limit was reached.
<a name="geterrormessage"></a></P> <a name="geterrormessage"></a></P>
<br><a name="SEC31" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br> <br><a name="SEC32" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
<P> <P>
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b> <b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
<b> PCRE2_SIZE <i>bufflen</i>);</b> <b> PCRE2_SIZE <i>bufflen</i>);</b>
@ -2587,7 +2579,7 @@ returned. If the buffer is too small, the message is truncated (but still with
a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned. a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned.
None of the messages are very long; a buffer size of 120 code units is ample. None of the messages are very long; a buffer size of 120 code units is ample.
<a name="extractbynumber"></a></P> <a name="extractbynumber"></a></P>
<br><a name="SEC32" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br> <br><a name="SEC33" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
<P> <P>
<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b> <b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b> <b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
@ -2684,7 +2676,7 @@ The substring did not participate in the match. For example, if the pattern is
(abc)|(def) and the subject is "def", and the ovector contains at least two (abc)|(def) and the subject is "def", and the ovector contains at least two
capturing slots, substring number 1 is unset. capturing slots, substring number 1 is unset.
</P> </P>
<br><a name="SEC33" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br> <br><a name="SEC34" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
<P> <P>
<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b> <b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b> <b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
@ -2723,7 +2715,7 @@ can be distinguished from a genuine zero-length substring by inspecting the
appropriate offset in the ovector, which contain PCRE2_UNSET for unset appropriate offset in the ovector, which contain PCRE2_UNSET for unset
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>. substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
<a name="extractbyname"></a></P> <a name="extractbyname"></a></P>
<br><a name="SEC34" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br> <br><a name="SEC35" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
<P> <P>
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b> <b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
<b> PCRE2_SPTR <i>name</i>);</b> <b> PCRE2_SPTR <i>name</i>);</b>
@ -2755,8 +2747,8 @@ calling <b>pcre2_substring_number_from_name()</b>. The first argument is the
compiled pattern, and the second is the name. The yield of the function is the compiled pattern, and the second is the name. The yield of the function is the
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
that name. Given the number, you can extract the substring directly, or use one that name. Given the number, you can extract the substring directly from the
of the functions described above. ovector, or use one of the "bynumber" functions described above.
</P> </P>
<P> <P>
For convenience, there are also "byname" functions that correspond to the For convenience, there are also "byname" functions that correspond to the
@ -2783,7 +2775,7 @@ names are not included in the compiled code. The matching process uses only
numbers. For this reason, the use of different names for subpatterns of the numbers. For this reason, the use of different names for subpatterns of the
same number causes an error at compile time. same number causes an error at compile time.
</P> </P>
<br><a name="SEC35" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br> <br><a name="SEC36" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
<P> <P>
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b> <b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b> <b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
@ -2990,7 +2982,7 @@ obtained by calling the <b>pcre2_get_error_message()</b> function (see
"Obtaining a textual error message" "Obtaining a textual error message"
<a href="#geterrormessage">above).</a> <a href="#geterrormessage">above).</a>
</P> </P>
<br><a name="SEC36" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br> <br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
<P> <P>
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b> <b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b> <b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
@ -3035,7 +3027,7 @@ in the section entitled <i>Information about a pattern</i>. Given all the
relevant entries for the name, you can extract each of their numbers, and hence relevant entries for the name, you can extract each of their numbers, and hence
the captured data. the captured data.
</P> </P>
<br><a name="SEC37" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br> <br><a name="SEC38" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
<P> <P>
The traditional matching function uses a similar algorithm to Perl, which stops The traditional matching function uses a similar algorithm to Perl, which stops
when it finds the first match at a given point in the subject. If you want to when it finds the first match at a given point in the subject. If you want to
@ -3053,7 +3045,7 @@ substring. Then return 1, which forces <b>pcre2_match()</b> to backtrack and try
other alternatives. Ultimately, when it runs out of matches, other alternatives. Ultimately, when it runs out of matches,
<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH. <b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
<a name="dfamatch"></a></P> <a name="dfamatch"></a></P>
<br><a name="SEC38" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br> <br><a name="SEC39" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
<P> <P>
<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b> <b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b> <b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
@ -3064,11 +3056,12 @@ other alternatives. Ultimately, when it runs out of matches,
<P> <P>
The function <b>pcre2_dfa_match()</b> is called to match a subject string The function <b>pcre2_dfa_match()</b> is called to match a subject string
against a compiled pattern, using a matching algorithm that scans the subject against a compiled pattern, using a matching algorithm that scans the subject
string just once, and does not backtrack. This has different characteristics to string just once (not counting lookaround assertions), and does not backtrack.
the normal algorithm, and is not compatible with Perl. Some of the features of This has different characteristics to the normal algorithm, and is not
PCRE2 patterns are not supported. Nevertheless, there are times when this kind compatible with Perl. Some of the features of PCRE2 patterns are not supported.
of matching can be useful. For a discussion of the two matching algorithms, and Nevertheless, there are times when this kind of matching can be useful. For a
a list of features that <b>pcre2_dfa_match()</b> does not support, see the discussion of the two matching algorithms, and a list of features that
<b>pcre2_dfa_match()</b> does not support, see the
<a href="pcre2matching.html"><b>pcre2matching</b></a> <a href="pcre2matching.html"><b>pcre2matching</b></a>
documentation. documentation.
</P> </P>
@ -3248,13 +3241,13 @@ some plausibility checks are made on the contents of the workspace, which
should contain data about the previous partial match. If any of these checks should contain data about the previous partial match. If any of these checks
fail, this error is given. fail, this error is given.
</P> </P>
<br><a name="SEC39" href="#TOC1">SEE ALSO</a><br> <br><a name="SEC40" href="#TOC1">SEE ALSO</a><br>
<P> <P>
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>, <b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3), <b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3). <b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
</P> </P>
<br><a name="SEC40" href="#TOC1">AUTHOR</a><br> <br><a name="SEC41" href="#TOC1">AUTHOR</a><br>
<P> <P>
Philip Hazel Philip Hazel
<br> <br>
@ -3263,9 +3256,9 @@ University Computing Service
Cambridge, England. Cambridge, England.
<br> <br>
</P> </P>
<br><a name="SEC41" href="#TOC1">REVISION</a><br> <br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P> <P>
Last updated: 21 March 2017 Last updated: 27 March 2017
<br> <br>
Copyright &copy; 1997-2017 University of Cambridge. Copyright &copy; 1997-2017 University of Cambridge.
<br> <br>

View File

@ -281,19 +281,14 @@ PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS
int (*callout_function)(pcre2_callout_block *, void *), int (*callout_function)(pcre2_callout_block *, void *),
void *callout_data); void *callout_data);
int pcre2_set_match_limit(pcre2_match_context *mcontext,
uint32_t value);
int pcre2_set_offset_limit(pcre2_match_context *mcontext, int pcre2_set_offset_limit(pcre2_match_context *mcontext,
PCRE2_SIZE value); PCRE2_SIZE value);
int pcre2_set_recursion_limit(pcre2_match_context *mcontext, int pcre2_set_match_limit(pcre2_match_context *mcontext,
uint32_t value); uint32_t value);
int pcre2_set_recursion_memory_management( int pcre2_set_depth_limit(pcre2_match_context *mcontext,
pcre2_match_context *mcontext, uint32_t value);
void *(*private_malloc)(PCRE2_SIZE, void *),
void (*private_free)(void *, void *), void *memory_data);
PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS
@ -397,6 +392,22 @@ PCRE2 NATIVE API AUXILIARY FUNCTIONS
int pcre2_config(uint32_t what, void *where); int pcre2_config(uint32_t what, void *where);
PCRE2 NATIVE API OBSOLETE FUNCTIONS
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
uint32_t value);
int pcre2_set_recursion_memory_management(
pcre2_match_context *mcontext,
void *(*private_malloc)(PCRE2_SIZE, void *),
void (*private_free)(void *, void *), void *memory_data);
These functions became obsolete at release 10.30 and are retained only
for backward compatibility. They should not be used in new code. The
first is replaced by pcre2_set_depth_limit(); the second is no longer
needed and no longer has any effect (it always returns zero).
PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit
@ -449,7 +460,7 @@ PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
when processing any particular pattern to use only functions from a when processing any particular pattern to use only functions from a
single library. For example, if you want to run a match using a pat- single library. For example, if you want to run a match using a pat-
tern that was compiled with pcre2_compile_16(), you must do so with tern that was compiled with pcre2_compile_16(), you must do so with
pcre2_match_16(), not pcre2_match_8(). pcre2_match_16(), not pcre2_match_8() or pcre2_match_32.
In the function summaries above, and in the rest of this document and In the function summaries above, and in the rest of this document and
other PCRE2 documents, functions and data types are described using other PCRE2 documents, functions and data types are described using
@ -474,19 +485,26 @@ PCRE2 API OVERVIEW
program against a non-dll PCRE2 library, you must define PCRE2_STATIC program against a non-dll PCRE2 library, you must define PCRE2_STATIC
before including pcre2.h. before including pcre2.h.
The functions pcre2_compile(), and pcre2_match() are used for compiling The functions pcre2_compile() and pcre2_match() are used for compiling
and matching regular expressions in a Perl-compatible manner. A sample and matching regular expressions in a Perl-compatible manner. A sample
program that demonstrates the simplest way of using them is provided in program that demonstrates the simplest way of using them is provided in
the file called pcre2demo.c in the PCRE2 source distribution. A listing the file called pcre2demo.c in the PCRE2 source distribution. A listing
of this program is given in the pcre2demo documentation, and the of this program is given in the pcre2demo documentation, and the
pcre2sample documentation describes how to compile and run it. pcre2sample documentation describes how to compile and run it.
Just-in-time compiler support is an optional feature of PCRE2 that can The compiling and matching functions recognize various options that are
be built in appropriate hardware environments. It greatly speeds up the passed as bits in an options argument. There are also some more compli-
matching performance of many patterns. Programs can request that it be cated parameters such as custom memory management functions and
used if available, by calling pcre2_jit_compile() after a pattern has resource limits that are passed in "contexts" (which are just memory
been successfully compiled by pcre2_compile(). This does nothing if JIT blocks, described below). Simple applications do not need to make use
support is not available. of contexts.
Just-in-time (JIT) compiler support is an optional feature of PCRE2
that can be built in appropriate hardware environments. It greatly
speeds up the matching performance of many patterns. Programs can
request that it be used if available by calling pcre2_jit_compile()
after a pattern has been successfully compiled by pcre2_compile(). This
does nothing if JIT support is not available.
More complicated programs might need to make use of the specialist More complicated programs might need to make use of the specialist
functions pcre2_jit_stack_create(), pcre2_jit_stack_free(), and functions pcre2_jit_stack_create(), pcre2_jit_stack_free(), and
@ -495,14 +513,15 @@ PCRE2 API OVERVIEW
JIT matching is automatically used by pcre2_match() if it is available, JIT matching is automatically used by pcre2_match() if it is available,
unless the PCRE2_NO_JIT option is set. There is also a direct interface unless the PCRE2_NO_JIT option is set. There is also a direct interface
for JIT matching, which gives improved performance. The JIT-specific for JIT matching, which gives improved performance at the expense of
functions are discussed in the pcre2jit documentation. less sanity checking. The JIT-specific functions are discussed in the
pcre2jit documentation.
A second matching function, pcre2_dfa_match(), which is not Perl-com- A second matching function, pcre2_dfa_match(), which is not Perl-com-
patible, is also provided. This uses a different algorithm for the patible, is also provided. This uses a different algorithm for the
matching. The alternative algorithm finds all possible matches (at a matching. The alternative algorithm finds all possible matches (at a
given point in the subject), and scans the subject just once (unless given point in the subject), and scans the subject just once (unless
there are lookbehind assertions). However, this algorithm does not there are lookaround assertions). However, this algorithm does not
return captured substrings. A description of the two matching algo- return captured substrings. A description of the two matching algo-
rithms and their advantages and disadvantages is given in the rithms and their advantages and disadvantages is given in the
pcre2matching documentation. There is no JIT support for pcre2matching documentation. There is no JIT support for
@ -603,9 +622,9 @@ MULTITHREADING
is thread-safe, that is, the same compiled pattern can be used by more is thread-safe, that is, the same compiled pattern can be used by more
than one thread simultaneously. For example, an application can compile than one thread simultaneously. For example, an application can compile
all its patterns at the start, before forking off multiple threads that all its patterns at the start, before forking off multiple threads that
use them. However, if the just-in-time optimization feature is being use them. However, if the just-in-time (JIT) optimization feature is
used, it needs separate memory stack areas for each thread. See the being used, it needs separate memory stack areas for each thread. See
pcre2jit documentation for more details. the pcre2jit documentation for more details.
In a more complicated situation, where patterns are compiled only when In a more complicated situation, where patterns are compiled only when
they are first needed, but are still shared between threads, pointers they are first needed, but are still shared between threads, pointers
@ -650,10 +669,10 @@ MULTITHREADING
Match blocks Match blocks
The matching functions need a block of memory for working space and for The matching functions need a block of memory for storing the results
storing the results of a match. This includes details of what was of a match. This includes details of what was matched, as well as addi-
matched, as well as additional information such as the name of a tional information such as the name of a (*MARK) setting. Each thread
(*MARK) setting. Each thread must provide its own copy of this memory. must provide its own copy of this memory.
PCRE2 CONTEXTS PCRE2 CONTEXTS
@ -718,15 +737,15 @@ PCRE2 CONTEXTS
The compile context The compile context
A compile context is required if you want to change the default values A compile context is required if you want to provide an external func-
of any of the following compile-time parameters: tion for stack checking during compilation or to change the default
values of any of the following compile-time parameters:
What \R matches (Unicode newlines or CR, LF, CRLF only) What \R matches (Unicode newlines or CR, LF, CRLF only)
PCRE2's character tables PCRE2's character tables
The newline character sequence The newline character sequence
The compile time nested parentheses limit The compile time nested parentheses limit
The maximum length of the pattern string The maximum length of the pattern string
An external function for stack checking
A compile context is also required if you are using custom memory man- A compile context is also required if you are using custom memory man-
agement. If none of these apply, just pass NULL as the context argu- agement. If none of these apply, just pass NULL as the context argu-
@ -766,12 +785,12 @@ PCRE2 CONTEXTS
int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext, int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext,
PCRE2_SIZE value); PCRE2_SIZE value);
This sets a maximum length, in code units, for the pattern string that This sets a maximum length, in code units, for any pattern string that
is to be compiled. If the pattern is longer, an error is generated. is compiled with this context. If the pattern is longer, an error is
This facility is provided so that applications that accept patterns generated. This facility is provided so that applications that accept
from external sources can limit their size. The default is the largest patterns from external sources can limit their size. The default is the
number that a PCRE2_SIZE variable can hold, which is effectively unlim- largest number that a PCRE2_SIZE variable can hold, which is effec-
ited. tively unlimited.
int pcre2_set_newline(pcre2_compile_context *ccontext, int pcre2_set_newline(pcre2_compile_context *ccontext,
uint32_t value); uint32_t value);
@ -782,11 +801,14 @@ PCRE2 CONTEXTS
two-character sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any two-character sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any
of the above), or PCRE2_NEWLINE_ANY (any Unicode newline sequence). of the above), or PCRE2_NEWLINE_ANY (any Unicode newline sequence).
When a pattern is compiled with the PCRE2_EXTENDED option, the value of A pattern can override the value set in the compile context by starting
this parameter affects the recognition of white space and the end of with a sequence such as (*CRLF). See the pcre2pattern page for details.
internal comments starting with #. The value is saved with the compiled
pattern for subsequent use by the JIT compiler and by the two inter- When a pattern is compiled with the PCRE2_EXTENDED option, the newline
preted matching functions, pcre2_match() and pcre2_dfa_match(). convention affects the recognition of white space and the end of inter-
nal comments starting with #. The value is saved with the compiled pat-
tern for subsequent use by the JIT compiler and by the two interpreted
matching functions, pcre2_match() and pcre2_dfa_match().
int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext, int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext,
uint32_t value); uint32_t value);
@ -815,17 +837,16 @@ PCRE2 CONTEXTS
The match context The match context
A match context is required if you want to change the default values of A match context is required if you want to:
any of the following match-time parameters:
A callout function Set up a callout function
The offset limit for matching an unanchored pattern Set an offset limit for matching an unanchored pattern
The limit for calling match() (see below) Change the backtracking match limit
The limit for calling match() recursively Change the backtracking depth limit
Set custom memory management specifically for the match
A match context is also required if you are using custom memory manage- If none of these apply, just pass NULL as the context argument of
ment. If none of these apply, just pass NULL as the context argument pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match().
of pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match().
A match context is created, copied, and freed by the following func- A match context is created, copied, and freed by the following func-
tions: tions:
@ -846,9 +867,9 @@ PCRE2 CONTEXTS
int (*callout_function)(pcre2_callout_block *, void *), int (*callout_function)(pcre2_callout_block *, void *),
void *callout_data); void *callout_data);
This sets up a "callout" function, which PCRE2 will call at specified This sets up a "callout" function for PCRE2 to call at specified points
points during a matching operation. Details are given in the pcre2call- during a matching operation. Details are given in the pcre2callout doc-
out documentation. umentation.
int pcre2_set_offset_limit(pcre2_match_context *mcontext, int pcre2_set_offset_limit(pcre2_match_context *mcontext,
PCRE2_SIZE value); PCRE2_SIZE value);
@ -863,10 +884,11 @@ PCRE2 CONTEXTS
argument of pcre2_match() or pcre2_dfa_match() is greater than the off- argument of pcre2_match() or pcre2_dfa_match() is greater than the off-
set limit. set limit.
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT
calling pcre2_compile() so that when JIT is in use, different code can option when calling pcre2_compile() so that when JIT is in use, differ-
be compiled. If a match is started with a non-default match limit when ent code can be compiled. If a match is started with a non-default
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated. match limit when PCRE2_USE_OFFSET_LIMIT is not set, an error is gener-
ated.
The offset limit facility can be used to track progress when searching The offset limit facility can be used to track progress when searching
large subject strings. See also the PCRE2_FIRSTLINE option, which large subject strings. See also the PCRE2_FIRSTLINE option, which
@ -884,13 +906,13 @@ PCRE2 CONTEXTS
search trees. The classic example is a pattern that uses nested unlim- search trees. The classic example is a pattern that uses nested unlim-
ited repeats. ited repeats.
Internally, pcre2_match() uses a function called match(), which it There is an internal counter in pcre2_match() that is incremented each
calls repeatedly (sometimes recursively). The limit set by match_limit time round its main matching loop. If this value reaches the match
is imposed on the number of times this function is called during a limit, pcre2_match() returns the negative value PCRE2_ERROR_MATCHLIMIT.
match, which has the effect of limiting the amount of backtracking that This has the effect of limiting the amount of backtracking that can
can take place. For patterns that are not anchored, the count restarts take place. For patterns that are not anchored, the count restarts from
from zero for each position in the subject string. This limit is not zero for each position in the subject string. This limit is not rele-
relevant to pcre2_dfa_match(), which ignores it. vant to pcre2_dfa_match(), which ignores it.
When pcre2_match() is called with a pattern that was successfully pro- When pcre2_match() is called with a pattern that was successfully pro-
cessed by pcre2_jit_compile(), the way in which matching is executed is cessed by pcre2_jit_compile(), the way in which matching is executed is
@ -901,9 +923,8 @@ PCRE2 CONTEXTS
The default value for the limit can be set when PCRE2 is built; the The default value for the limit can be set when PCRE2 is built; the
default default is 10 million, which handles all but the most extreme default default is 10 million, which handles all but the most extreme
cases. If the limit is exceeded, pcre2_match() returns cases. A value for the match limit may also be supplied by an item at
PCRE2_ERROR_MATCHLIMIT. A value for the match limit may also be sup- the start of a pattern of the form
plied by an item at the start of a pattern of the form
(*LIMIT_MATCH=ddd) (*LIMIT_MATCH=ddd)
@ -911,59 +932,35 @@ PCRE2 CONTEXTS
unless ddd is less than the limit set by the caller of pcre2_match() unless ddd is less than the limit set by the caller of pcre2_match()
or, if no such limit is set, less than the default. or, if no such limit is set, less than the default.
int pcre2_set_recursion_limit(pcre2_match_context *mcontext, int pcre2_set_depth_limit(pcre2_match_context *mcontext,
uint32_t value); uint32_t value);
The recursion_limit parameter is similar to match_limit, but instead of This parameter limits the depth of nested backtracking in
limiting the total number of times that match() is called, it limits pcre2_match(). Each time a nested backtracking point is passed, a new
the depth of recursion. The recursion depth is a smaller number than memory "frame" is used to remember the state of matching at that point.
the total number of calls, because not all calls to match() are recur- Thus, this parameter indirectly limits the amount of memory that is
sive. This limit is of use only if it is set smaller than match_limit. used in a match.
Limiting the recursion depth limits the amount of system stack that can This limit is not relevant, and is ignored, when matching is done using
be used, or, when PCRE2 has been compiled to use memory on the heap JIT compiled code. However, it is supported by pcre2_dfa_match(), which
instead of the stack, the amount of heap memory that can be used. This uses it to limit the depth of internal recursive function calls that
limit is not relevant, and is ignored, when matching is done using JIT implement lookaround assertions and pattern recursions. This is, there-
compiled code. However, it is supported by pcre2_dfa_match(), which fore, an indirect limit on the amount of system stack that is used. A
uses recursive function calls less frequently than pcre2_match(), but recursive pattern such as /(.)(?1)/, when matched to a very long string
which can be caused to use a lot of stack by a recursive pattern such using pcre2_dfa_match(), can use a great deal of stack.
as /(.)(?1)/ matched to a very long string.
The default value for recursion_limit can be set when PCRE2 is built; The default value for the depth limit can be set when PCRE2 is built;
the default default is the same value as the default for match_limit. the default default is the same value as the default for the match
If the limit is exceeded, pcre2_match() and pcre2_dfa_match() return limit. If the limit is exceeded, pcre2_match() or pcre2_dfa_match()
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
supplied by an item at the start of a pattern of the form supplied by an item at the start of a pattern of the form
(*LIMIT_RECURSION=ddd) (*LIMIT_DEPTH=ddd)
where ddd is a decimal number. However, such a setting is ignored where ddd is a decimal number. However, such a setting is ignored
unless ddd is less than the limit set by the caller of pcre2_match() or unless ddd is less than the limit set by the caller of pcre2_match() or
pcre2_dfa_match() or, if no such limit is set, less than the default. pcre2_dfa_match() or, if no such limit is set, less than the default.
int pcre2_set_recursion_memory_management(
pcre2_match_context *mcontext,
void *(*private_malloc)(PCRE2_SIZE, void *),
void (*private_free)(void *, void *), void *memory_data);
This function sets up two additional custom memory management functions
for use by pcre2_match() when PCRE2 is compiled to use the heap for
remembering backtracking data, instead of recursive function calls that
use the system stack. There is a discussion about PCRE2's stack usage
in the pcre2stack documentation. See the pcre2build documentation for
details of how to build PCRE2.
Using the heap for recursion is a non-standard way of building PCRE2,
for use in environments that have limited stacks. Because of the
greater use of memory management, pcre2_match() runs more slowly. Func-
tions that are different to the general custom memory functions are
provided so that special-purpose external code can be used for this
case, because the memory blocks are all the same size. The blocks are
retained by pcre2_match() until it is about to exit so that they can be
re-used when possible during the match. In the absence of these func-
tions, the normal custom memory management functions are used, if sup-
plied, otherwise the system functions.
CHECKING BUILD-TIME OPTIONS CHECKING BUILD-TIME OPTIONS
@ -996,6 +993,13 @@ CHECKING BUILD-TIME OPTIONS
sequence; a value of PCRE2_BSR_ANYCRLF means that \R matches only CR, sequence; a value of PCRE2_BSR_ANYCRLF means that \R matches only CR,
LF, or CRLF. The default can be overridden when a pattern is compiled. LF, or CRLF. The default can be overridden when a pattern is compiled.
PCRE2_CONFIG_DEPTHLIMIT
The output is a uint32_t integer that gives the default limit for the
depth of nested backtracking in pcre2_match() or the depth of nested
recursions and lookarounds in pcre2_dfa_match(). Further details are
given with pcre2_set_depth_limit() above.
PCRE2_CONFIG_JIT PCRE2_CONFIG_JIT
The output is a uint32_t integer that is set to one if support for The output is a uint32_t integer that is set to one if support for
@ -1030,9 +1034,9 @@ CHECKING BUILD-TIME OPTIONS
PCRE2_CONFIG_MATCHLIMIT PCRE2_CONFIG_MATCHLIMIT
The output is a uint32_t integer that gives the default limit for the The output is a uint32_t integer that gives the default match limit for
number of internal matching function calls in a pcre2_match() execu- pcre2_match(). Further details are given with pcre2_set_match_limit()
tion. Further details are given with pcre2_match() below. above.
PCRE2_CONFIG_NEWLINE PCRE2_CONFIG_NEWLINE
@ -1059,21 +1063,10 @@ CHECKING BUILD-TIME OPTIONS
application. For finer control over compilation stack usage, see application. For finer control over compilation stack usage, see
pcre2_set_compile_recursion_guard(). pcre2_set_compile_recursion_guard().
PCRE2_CONFIG_RECURSIONLIMIT
The output is a uint32_t integer that gives the default limit for the
depth of recursion when calling the internal matching function in a
pcre2_match() execution. Further details are given with pcre2_match()
below.
PCRE2_CONFIG_STACKRECURSE PCRE2_CONFIG_STACKRECURSE
The output is a uint32_t integer that is set to one if internal recur- This parameter is obsolete and should not be used in new code. The out-
sion when running pcre2_match() is implemented by recursive function put is a uint32_t integer that is always set to zero.
calls that use the system stack to remember their state. This is the
usual way that PCRE2 is compiled. The output is zero if PCRE2 was com-
piled to use blocks of data on the heap instead of recursive function
calls.
PCRE2_CONFIG_UNICODE_VERSION PCRE2_CONFIG_UNICODE_VERSION
@ -1093,7 +1086,7 @@ CHECKING BUILD-TIME OPTIONS
PCRE2_CONFIG_VERSION PCRE2_CONFIG_VERSION
The where argument should point to a buffer that is at least 12 code The where argument should point to a buffer that is at least 24 code
units long. (The exact length required can be found by calling units long. (The exact length required can be found by calling
pcre2_config() with where set to NULL.) The buffer is filled with the pcre2_config() with where set to NULL.) The buffer is filled with the
PCRE2 version string, zero-terminated. The number of code units used is PCRE2 version string, zero-terminated. The number of code units used is
@ -1267,14 +1260,15 @@ COMPILING A PATTERN
parenthesis terminates the name. A closing parenthesis can be included parenthesis terminates the name. A closing parenthesis can be included
in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
option is set, unescaped whitespace in verb names is skipped and #-com- option is set, unescaped whitespace in verb names is skipped and #-com-
ments are recognized, exactly as in the rest of the pattern. ments are recognized in this mode, exactly as in the rest of the pat-
tern.
PCRE2_AUTO_CALLOUT PCRE2_AUTO_CALLOUT
If this bit is set, pcre2_compile() automatically inserts callout If this bit is set, pcre2_compile() automatically inserts callout
items, all with number 255, before each pattern item, except immedi- items, all with number 255, before each pattern item, except immedi-
ately before or after a callout in the pattern. For discussion of the ately before or after an explicit callout in the pattern. For discus-
callout facility, see the pcre2callout documentation. sion of the callout facility, see the pcre2callout documentation.
PCRE2_CASELESS PCRE2_CASELESS
@ -1517,7 +1511,7 @@ COMPILING A PATTERN
section on generic character types in the pcre2pattern page. If you set section on generic character types in the pcre2pattern page. If you set
PCRE2_UCP, matching one of the items it affects takes much longer. The PCRE2_UCP, matching one of the items it affects takes much longer. The
option is available only if PCRE2 has been compiled with Unicode sup- option is available only if PCRE2 has been compiled with Unicode sup-
port. port (which is the default).
PCRE2_UNGREEDY PCRE2_UNGREEDY
@ -1548,13 +1542,13 @@ COMPILING A PATTERN
COMPILATION ERROR CODES COMPILATION ERROR CODES
There are over 80 positive error codes that pcre2_compile() may return There are nearly 100 positive error codes that pcre2_compile() may
(via errorcode) if it finds an error in the pattern. There are also return (via errorcode) if it finds an error in the pattern. There are
some negative error codes that are used for invalid UTF strings. These also some negative error codes that are used for invalid UTF strings.
are the same as given by pcre2_match() and pcre2_dfa_match(), and are These are the same as given by pcre2_match() and pcre2_dfa_match(), and
described in the pcre2unicode page. The pcre2_get_error_message() func- are described in the pcre2unicode page. The pcre2_get_error_message()
tion (see "Obtaining a textual error message" below) can be called to function (see "Obtaining a textual error message" below) can be called
obtain a textual error message from any error code. to obtain a textual error message from any error code.
JUST-IN-TIME (JIT) COMPILATION JUST-IN-TIME (JIT) COMPILATION
@ -1585,7 +1579,7 @@ JUST-IN-TIME (JIT) COMPILATION
JIT compilation is a heavyweight optimization. It can take some time JIT compilation is a heavyweight optimization. It can take some time
for patterns to be analyzed, and for one-off matches and simple pat- for patterns to be analyzed, and for one-off matches and simple pat-
terns the benefit of faster execution might be offset by a much slower terns the benefit of faster execution might be offset by a much slower
compilation time. Most, but not all patterns can be optimized by the compilation time. Most (but not all) patterns can be optimized by the
JIT compiler. JIT compiler.
@ -1595,8 +1589,8 @@ LOCALE SUPPORT
letters, digits, or whatever, by reference to a set of tables, indexed letters, digits, or whatever, by reference to a set of tables, indexed
by character code point. This applies only to characters whose code by character code point. This applies only to characters whose code
points are less than 256. By default, higher-valued code points never points are less than 256. By default, higher-valued code points never
match escapes such as \w or \d. However, if PCRE2 is built with UTF match escapes such as \w or \d. However, if PCRE2 is built with Uni-
support, all characters can be tested with \p and \P, or, alterna- code support, all characters can be tested with \p and \P, or, alterna-
tively, the PCRE2_UCP option can be set when a pattern is compiled; tively, the PCRE2_UCP option can be set when a pattern is compiled;
this causes \w and friends to use Unicode property support instead of this causes \w and friends to use Unicode property support instead of
the built-in tables. the built-in tables.
@ -1639,7 +1633,7 @@ LOCALE SUPPORT
The pointer that is passed (via the compile context) to pcre2_compile() The pointer that is passed (via the compile context) to pcre2_compile()
is saved with the compiled pattern, and the same tables are used by is saved with the compiled pattern, and the same tables are used by
pcre2_match() and pcre_dfa_match(). Thus, for any single pattern, com- pcre2_match() and pcre_dfa_match(). Thus, for any single pattern, com-
pilation, and matching all happen in the same locale, but different pilation and matching both happen in the same locale, but different
patterns can be processed in different locales. patterns can be processed in different locales.
@ -1654,7 +1648,7 @@ INFORMATION ABOUT A COMPILED PATTERN
is required, and the third argument is a pointer to a variable to is required, and the third argument is a pointer to a variable to
receive the data. If the third argument is NULL, the first argument is receive the data. If the third argument is NULL, the first argument is
ignored, and the function returns the size in bytes of the variable ignored, and the function returns the size in bytes of the variable
that is required for the information requested. Otherwise, The yield of that is required for the information requested. Otherwise, the yield of
the function is zero for success, or one of the following negative num- the function is zero for success, or one of the following negative num-
bers: bers:
@ -1710,8 +1704,8 @@ INFORMATION ABOUT A COMPILED PATTERN
.* is not in a capturing group that is the subject .* is not in a capturing group that is the subject
of a back reference of a back reference
PCRE2_DOTALL is in force for .* PCRE2_DOTALL is in force for .*
Neither (*PRUNE) nor (*SKIP) appears in the pattern. Neither (*PRUNE) nor (*SKIP) appears in the pattern
PCRE2_NO_DOTSTAR_ANCHOR is not set. PCRE2_NO_DOTSTAR_ANCHOR is not set
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in
the options returned for PCRE2_INFO_ALLOPTIONS. the options returned for PCRE2_INFO_ALLOPTIONS.
@ -1740,6 +1734,14 @@ INFORMATION ABOUT A COMPILED PATTERN
terns where (?| is not used, this is also the total number of capturing terns where (?| is not used, this is also the total number of capturing
subpatterns. The third argument should point to an uint32_t variable. subpatterns. The third argument should point to an uint32_t variable.
PCRE2_INFO_DEPTHLIMIT
If the pattern set a backtracking depth limit by including an item of
the form (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The
third argument should point to an unsigned 32-bit integer. If no such
value has been set, the call to pcre2_pattern_info() returns the error
PCRE2_ERROR_UNSET.
PCRE2_INFO_FIRSTBITMAP PCRE2_INFO_FIRSTBITMAP
In the absence of a single first code unit for a non-anchored pattern, In the absence of a single first code unit for a non-anchored pattern,
@ -1772,6 +1774,15 @@ INFORMATION ABOUT A COMPILED PATTERN
value can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 value can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32
mode. mode.
PCRE2_INFO_FRAMESIZE
Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by pcre2_match()
without the use of JIT. The third argument should point to an size_t
variable. The frame size depends on the number of capturing parentheses
in the pattern. Each additional capturing group adds two PCRE2_SIZE
variables.
PCRE2_INFO_HASBACKSLASHC PCRE2_INFO_HASBACKSLASHC
Return 1 if the pattern contains any instances of \C, otherwise 0. The Return 1 if the pattern contains any instances of \C, otherwise 0. The
@ -1782,7 +1793,8 @@ INFORMATION ABOUT A COMPILED PATTERN
Return 1 if the pattern contains any explicit matches for CR or LF Return 1 if the pattern contains any explicit matches for CR or LF
characters, otherwise 0. The third argument should point to an uint32_t characters, otherwise 0. The third argument should point to an uint32_t
variable. An explicit match is either a literal CR or LF character, or variable. An explicit match is either a literal CR or LF character, or
\r or \n. \r or \n or one of the equivalent hexadecimal or octal escape
sequences.
PCRE2_INFO_JCHANGED PCRE2_INFO_JCHANGED
@ -1918,7 +1930,7 @@ INFORMATION ABOUT A COMPILED PATTERN
PCRE2_INFO_NEWLINE PCRE2_INFO_NEWLINE
The output is a uint32_t with one of the following values: The output is one of the following uint32_t values:
PCRE2_NEWLINE_CR Carriage return (CR) PCRE2_NEWLINE_CR Carriage return (CR)
PCRE2_NEWLINE_LF Linefeed (LF) PCRE2_NEWLINE_LF Linefeed (LF)
@ -1926,16 +1938,8 @@ INFORMATION ABOUT A COMPILED PATTERN
PCRE2_NEWLINE_ANY Any Unicode line ending PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
This specifies the default character sequence that will be recognized This identifies the character sequence that will be recognized as mean-
as meaning "newline" while matching. ing "newline" while matching.
PCRE2_INFO_RECURSIONLIMIT
If the pattern set a recursion limit by including an item of the form
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
argument should point to an unsigned 32-bit integer. If no such value
has been set, the call to pcre2_pattern_info() returns the error
PCRE2_ERROR_UNSET.
PCRE2_INFO_SIZE PCRE2_INFO_SIZE
@ -1998,8 +2002,8 @@ THE MATCH DATA BLOCK
you must create a match data block by calling one of the creation func- you must create a match data block by calling one of the creation func-
tions above. For pcre2_match_data_create(), the first argument is the tions above. For pcre2_match_data_create(), the first argument is the
number of pairs of offsets in the ovector. One pair of offsets is number of pairs of offsets in the ovector. One pair of offsets is
required to identify the string that matched the whole pattern, with required to identify the string that matched the whole pattern, with an
another pair for each captured substring. For example, a value of 4 additional pair for each captured substring. For example, a value of 4
creates enough space to record the matched portion of the subject plus creates enough space to record the matched portion of the subject plus
three captured substrings. A minimum of at least 1 pair is imposed by three captured substrings. A minimum of at least 1 pair is imposed by
pcre2_match_data_create(), so it is always possible to return the over- pcre2_match_data_create(), so it is always possible to return the over-
@ -2124,9 +2128,11 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
ing offset by two characters instead of one. ing offset by two characters instead of one.
If a non-zero starting offset is passed when the pattern is anchored, If a non-zero starting offset is passed when the pattern is anchored,
one attempt to match at the given offset is made. This can only succeed an single attempt to match at the given offset is made. This can only
if the pattern does not require the match to be at the start of the succeed if the pattern does not require the match to be at the start of
subject. the subject. In other words, the anchoring must be the result of set-
ting the PCRE2_ANCHORED option or the use of .* with PCRE2_DOTALL, not
by starting the pattern with ^ or \A.
Option bits for pcre2_match() Option bits for pcre2_match()
@ -2138,9 +2144,8 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
Setting PCRE2_ANCHORED at match time is not supported by the just-in- Setting PCRE2_ANCHORED at match time is not supported by the just-in-
time (JIT) compiler. If it is set, JIT matching is disabled and the time (JIT) compiler. If it is set, JIT matching is disabled and the
normal interpretive code in pcre2_match() is run. Apart from interpretive code in pcre2_match() is run. Apart from PCRE2_NO_JIT
PCRE2_NO_JIT (obviously), the remaining options are supported for JIT (obviously), the remaining options are supported for JIT matching.
matching.
PCRE2_ANCHORED PCRE2_ANCHORED
@ -2221,11 +2226,11 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
checks for performance reasons, you can set the PCRE2_NO_UTF_CHECK checks for performance reasons, you can set the PCRE2_NO_UTF_CHECK
option when calling pcre2_match(). You might want to do this for the option when calling pcre2_match(). You might want to do this for the
second and subsequent calls to pcre2_match() if you are making repeated second and subsequent calls to pcre2_match() if you are making repeated
calls to find all the matches in a single subject string. calls to find other matches in the same subject string.
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an
string as a subject, or an invalid value of startoffset, is undefined. invalid string as a subject, or an invalid value of startoffset, is
Your program may crash or loop indefinitely. undefined. Your program may crash or loop indefinitely.
PCRE2_PARTIAL_HARD PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT PCRE2_PARTIAL_SOFT
@ -2278,9 +2283,10 @@ NEWLINE HANDLING WHEN MATCHING
acter after the first failure. acter after the first failure.
An explicit match for CR of LF is either a literal appearance of one of An explicit match for CR of LF is either a literal appearance of one of
those characters in the pattern, or one of the \r or \n escape those characters in the pattern, or one of the \r or \n or equivalent
sequences. Implicit matches such as [^X] do not count, nor does \s, octal or hexadecimal escape sequences. Implicit matches such as [^X] do
even though it includes CR and LF in the characters that it matches. not count, nor does \s, even though it includes CR and LF in the char-
acters that it matches.
Notwithstanding the above, anomalous effects may still occur when CRLF Notwithstanding the above, anomalous effects may still occur when CRLF
is a valid newline sequence and explicit \r or \n escapes appear in the is a valid newline sequence and explicit \r or \n escapes appear in the
@ -2325,14 +2331,14 @@ HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
They identify the part of the subject that was partially matched. See They identify the part of the subject that was partially matched. See
the pcre2partial documentation for details of partial matching. the pcre2partial documentation for details of partial matching.
After a successful match, the first pair of offsets identifies the por- After a fully successful match, the first pair of offsets identifies
tion of the subject string that was matched by the entire pattern. The the portion of the subject string that was matched by the entire pat-
next pair is used for the first capturing subpattern, and so on. The tern. The next pair is used for the first captured substring, and so
value returned by pcre2_match() is one more than the highest numbered on. The value returned by pcre2_match() is one more than the highest
pair that has been set. For example, if two substrings have been cap- numbered pair that has been set. For example, if two substrings have
tured, the returned value is 3. If there are no capturing subpatterns, been captured, the returned value is 3. If there are no captured sub-
the return value from a successful match is 1, indicating that just the strings, the return value from a successful match is 1, indicating that
first pair of offsets has been set. just the first pair of offsets has been set.
If a pattern uses the \K escape sequence within a positive assertion, If a pattern uses the \K escape sequence within a positive assertion,
the reported start of a successful match can be greater than the end of the reported start of a successful match can be greater than the end of
@ -2347,11 +2353,7 @@ HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
as much as possible is filled in, and the function returns a value of as much as possible is filled in, and the function returns a value of
zero. If captured substrings are not of interest, pcre2_match() may be zero. If captured substrings are not of interest, pcre2_match() may be
called with a match data block whose ovector is of minimum length (that called with a match data block whose ovector is of minimum length (that
is, one pair). However, if the pattern contains back references and the is, one pair).
ovector is not big enough to remember the related substrings, PCRE2 has
to get additional memory for use during matching. Thus it is usually
advisable to set up a match data block containing an ovector of reason-
able size.
It is possible for capturing subpattern number n+1 to match some part It is possible for capturing subpattern number n+1 to match some part
of the subject when subpattern n has not been used at all. For example, of the subject when subpattern n has not been used at all. For example,
@ -2450,9 +2452,10 @@ ERROR RETURNS FROM pcre2_match()
PCRE2_ERROR_BADMODE PCRE2_ERROR_BADMODE
This error is given when a pattern that was compiled by the 8-bit This error is given when a compiled pattern is passed to a function in
library is passed to a 16-bit or 32-bit library function, or vice a library of a different code unit width, for example, a pattern com-
versa. piled by the 8-bit library is passed to a 16-bit or 32-bit library
function.
PCRE2_ERROR_BADOFFSET PCRE2_ERROR_BADOFFSET
@ -2476,19 +2479,15 @@ ERROR RETURNS FROM pcre2_match()
pcre2_callout_enumerate() to return a distinctive error code. See the pcre2_callout_enumerate() to return a distinctive error code. See the
pcre2callout documentation for details. pcre2callout documentation for details.
PCRE2_ERROR_DEPTHLIMIT
The nested backtracking depth limit was reached.
PCRE2_ERROR_INTERNAL PCRE2_ERROR_INTERNAL
An unexpected internal error has occurred. This error could be caused An unexpected internal error has occurred. This error could be caused
by a bug in PCRE2 or by overwriting of the compiled pattern. by a bug in PCRE2 or by overwriting of the compiled pattern.
PCRE2_ERROR_JIT_BADOPTION
This error is returned when a pattern that was successfully studied
using JIT is being matched, but the matching mode (partial or complete
match) does not correspond to any JIT compilation mode. When the JIT
fast path function is used, this error may be also given for invalid
options. See the pcre2jit documentation for more details.
PCRE2_ERROR_JIT_STACKLIMIT PCRE2_ERROR_JIT_STACKLIMIT
This error is returned when a pattern that was successfully studied This error is returned when a pattern that was successfully studied
@ -2498,15 +2497,13 @@ ERROR RETURNS FROM pcre2_match()
PCRE2_ERROR_MATCHLIMIT PCRE2_ERROR_MATCHLIMIT
The backtracking limit was reached. The backtracking match limit was reached.
PCRE2_ERROR_NOMEMORY PCRE2_ERROR_NOMEMORY
If a pattern contains back references, but the ovector is not big If a pattern contains many nested backtracking points, heap memory is
enough to remember the referenced substrings, PCRE2 gets a block of used to remember them. This error is given when the memory allocation
memory at the start of matching to use for this purpose. There are some function (default or custom) fails.
other special cases where extra memory is needed during matching. This
error is given when memory cannot be obtained.
PCRE2_ERROR_NULL PCRE2_ERROR_NULL
@ -2522,10 +2519,6 @@ ERROR RETURNS FROM pcre2_match()
plicated cases, in particular mutual recursions between two different plicated cases, in particular mutual recursions between two different
subpatterns, cannot be detected until matching is attempted. subpatterns, cannot be detected until matching is attempted.
PCRE2_ERROR_RECURSIONLIMIT
The internal recursion limit was reached.
OBTAINING A TEXTUAL ERROR MESSAGE OBTAINING A TEXTUAL ERROR MESSAGE
@ -2703,8 +2696,8 @@ EXTRACTING CAPTURED SUBSTRINGS BY NAME
the function is the subpattern number, PCRE2_ERROR_NOSUBSTRING if there the function is the subpattern number, PCRE2_ERROR_NOSUBSTRING if there
is no subpattern of that name, or PCRE2_ERROR_NOUNIQUESUBSTRING if is no subpattern of that name, or PCRE2_ERROR_NOUNIQUESUBSTRING if
there is more than one subpattern of that name. Given the number, you there is more than one subpattern of that name. Given the number, you
can extract the substring directly, or use one of the functions can extract the substring directly from the ovector, or use one of the
described above. "bynumber" functions described above.
For convenience, there are also "byname" functions that correspond to For convenience, there are also "byname" functions that correspond to
the "bynumber" functions, the only difference being that the second the "bynumber" functions, the only difference being that the second
@ -2991,13 +2984,13 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
The function pcre2_dfa_match() is called to match a subject string The function pcre2_dfa_match() is called to match a subject string
against a compiled pattern, using a matching algorithm that scans the against a compiled pattern, using a matching algorithm that scans the
subject string just once, and does not backtrack. This has different subject string just once (not counting lookaround assertions), and does
characteristics to the normal algorithm, and is not compatible with not backtrack. This has different characteristics to the normal algo-
Perl. Some of the features of PCRE2 patterns are not supported. Never- rithm, and is not compatible with Perl. Some of the features of PCRE2
theless, there are times when this kind of matching can be useful. For patterns are not supported. Nevertheless, there are times when this
a discussion of the two matching algorithms, and a list of features kind of matching can be useful. For a discussion of the two matching
that pcre2_dfa_match() does not support, see the pcre2matching documen- algorithms, and a list of features that pcre2_dfa_match() does not sup-
tation. port, see the pcre2matching documentation.
The arguments for the pcre2_dfa_match() function are the same as for The arguments for the pcre2_dfa_match() function are the same as for
pcre2_match(), plus two extras. The ovector within the match data block pcre2_match(), plus two extras. The ovector within the match data block
@ -3181,7 +3174,7 @@ AUTHOR
REVISION REVISION
Last updated: 21 March 2017 Last updated: 27 March 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
------------------------------------------------------------------------------ ------------------------------------------------------------------------------

View File

@ -34,7 +34,7 @@ A match context is needed only if you want to:
Set a matching offset limit Set a matching offset limit
Change the backtracking match limit Change the backtracking match limit
Change the backtracking depth limit Change the backtracking depth limit
Set custom memory management in the match context Set custom memory management specifically for the match
.sp .sp
The \fIlength\fP and \fIstartoffset\fP values are code The \fIlength\fP and \fIstartoffset\fP values are code
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a

View File

@ -1,4 +1,4 @@
.TH PCRE2API 3 "21 March 2017" "PCRE2 10.30" .TH PCRE2API 3 "27 March 2017" "PCRE2 10.30"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.sp .sp
@ -120,19 +120,14 @@ document for an overview of all the PCRE2 documentation.
.B " int (*\fIcallout_function\fP)(pcre2_callout_block *, void *)," .B " int (*\fIcallout_function\fP)(pcre2_callout_block *, void *),"
.B " void *\fIcallout_data\fP);" .B " void *\fIcallout_data\fP);"
.sp .sp
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
.B " uint32_t \fIvalue\fP);"
.sp
.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP, .B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
.B " PCRE2_SIZE \fIvalue\fP);" .B " PCRE2_SIZE \fIvalue\fP);"
.sp .sp
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP, .B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
.B " uint32_t \fIvalue\fP);" .B " uint32_t \fIvalue\fP);"
.sp .sp
.B int pcre2_set_recursion_memory_management( .B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
.B " pcre2_match_context *\fImcontext\fP," .B " uint32_t \fIvalue\fP);"
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
.fi .fi
. .
. .
@ -252,6 +247,25 @@ document for an overview of all the PCRE2 documentation.
.fi .fi
. .
. .
.SH "PCRE2 NATIVE API OBSOLETE FUNCTIONS"
.rs
.sp
.nf
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
.B " uint32_t \fIvalue\fP);"
.sp
.B int pcre2_set_recursion_memory_management(
.B " pcre2_match_context *\fImcontext\fP,"
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
.fi
.sp
These functions became obsolete at release 10.30 and are retained only for
backward compatibility. They should not be used in new code. The first is
replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and
no longer has any effect (it always returns zero).
.
.
.SH "PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES" .SH "PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES"
.rs .rs
.sp .sp
@ -302,7 +316,7 @@ When using multiple libraries in an application, you must take care when
processing any particular pattern to use only functions from a single library. processing any particular pattern to use only functions from a single library.
For example, if you want to run a match using a pattern that was compiled with For example, if you want to run a match using a pattern that was compiled with
\fBpcre2_compile_16()\fP, you must do so with \fBpcre2_match_16()\fP, not \fBpcre2_compile_16()\fP, you must do so with \fBpcre2_match_16()\fP, not
\fBpcre2_match_8()\fP. \fBpcre2_match_8()\fP or \fBpcre2_match_32\fP.
.P .P
In the function summaries above, and in the rest of this document and other In the function summaries above, and in the rest of this document and other
PCRE2 documents, functions and data types are described using their generic PCRE2 documents, functions and data types are described using their generic
@ -331,7 +345,7 @@ In a Windows environment, if you want to statically link an application program
against a non-dll PCRE2 library, you must define PCRE2_STATIC before including against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
\fBpcre2.h\fP. \fBpcre2.h\fP.
.P .P
The functions \fBpcre2_compile()\fP, and \fBpcre2_match()\fP are used for The functions \fBpcre2_compile()\fP and \fBpcre2_match()\fP are used for
compiling and matching regular expressions in a Perl-compatible manner. A compiling and matching regular expressions in a Perl-compatible manner. A
sample program that demonstrates the simplest way of using them is provided in sample program that demonstrates the simplest way of using them is provided in
the file called \fIpcre2demo.c\fP in the PCRE2 source distribution. A listing the file called \fIpcre2demo.c\fP in the PCRE2 source distribution. A listing
@ -345,10 +359,16 @@ documentation, and the
.\" .\"
documentation describes how to compile and run it. documentation describes how to compile and run it.
.P .P
Just-in-time compiler support is an optional feature of PCRE2 that can be built The compiling and matching functions recognize various options that are passed
in appropriate hardware environments. It greatly speeds up the matching as bits in an options argument. There are also some more complicated parameters
such as custom memory management functions and resource limits that are passed
in "contexts" (which are just memory blocks, described below). Simple
applications do not need to make use of contexts.
.P
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
built in appropriate hardware environments. It greatly speeds up the matching
performance of many patterns. Programs can request that it be used if performance of many patterns. Programs can request that it be used if
available, by calling \fBpcre2_jit_compile()\fP after a pattern has been available by calling \fBpcre2_jit_compile()\fP after a pattern has been
successfully compiled by \fBpcre2_compile()\fP. This does nothing if JIT successfully compiled by \fBpcre2_compile()\fP. This does nothing if JIT
support is not available. support is not available.
.P .P
@ -358,8 +378,8 @@ More complicated programs might need to make use of the specialist functions
.P .P
JIT matching is automatically used by \fBpcre2_match()\fP if it is available, JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
matching, which gives improved performance. The JIT-specific functions are matching, which gives improved performance at the expense of less sanity
discussed in the checking. The JIT-specific functions are discussed in the
.\" HREF .\" HREF
\fBpcre2jit\fP \fBpcre2jit\fP
.\" .\"
@ -369,7 +389,7 @@ A second matching function, \fBpcre2_dfa_match()\fP, which is not
Perl-compatible, is also provided. This uses a different algorithm for the Perl-compatible, is also provided. This uses a different algorithm for the
matching. The alternative algorithm finds all possible matches (at a given matching. The alternative algorithm finds all possible matches (at a given
point in the subject), and scans the subject just once (unless there are point in the subject), and scans the subject just once (unless there are
lookbehind assertions). However, this algorithm does not return captured lookaround assertions). However, this algorithm does not return captured
substrings. A description of the two matching algorithms and their advantages substrings. A description of the two matching algorithms and their advantages
and disadvantages is given in the and disadvantages is given in the
.\" HREF .\" HREF
@ -484,8 +504,8 @@ and does not change when the pattern is matched. Therefore, it is thread-safe,
that is, the same compiled pattern can be used by more than one thread that is, the same compiled pattern can be used by more than one thread
simultaneously. For example, an application can compile all its patterns at the simultaneously. For example, an application can compile all its patterns at the
start, before forking off multiple threads that use them. However, if the start, before forking off multiple threads that use them. However, if the
just-in-time optimization feature is being used, it needs separate memory stack just-in-time (JIT) optimization feature is being used, it needs separate memory
areas for each thread. See the stack areas for each thread. See the
.\" HREF .\" HREF
\fBpcre2jit\fP \fBpcre2jit\fP
.\" .\"
@ -536,10 +556,10 @@ thread-specific copy.
.SS "Match blocks" .SS "Match blocks"
.rs .rs
.sp .sp
The matching functions need a block of memory for working space and for storing The matching functions need a block of memory for storing the results of a
the results of a match. This includes details of what was matched, as well as match. This includes details of what was matched, as well as additional
additional information such as the name of a (*MARK) setting. Each thread must information such as the name of a (*MARK) setting. Each thread must provide its
provide its own copy of this memory. own copy of this memory.
. .
. .
.SH "PCRE2 CONTEXTS" .SH "PCRE2 CONTEXTS"
@ -611,15 +631,15 @@ The memory used for a general context should be freed by calling:
.SS "The compile context" .SS "The compile context"
.rs .rs
.sp .sp
A compile context is required if you want to change the default values of any A compile context is required if you want to provide an external function for
of the following compile-time parameters: stack checking during compilation or to change the default values of any of the
following compile-time parameters:
.sp .sp
What \eR matches (Unicode newlines or CR, LF, CRLF only) What \eR matches (Unicode newlines or CR, LF, CRLF only)
PCRE2's character tables PCRE2's character tables
The newline character sequence The newline character sequence
The compile time nested parentheses limit The compile time nested parentheses limit
The maximum length of the pattern string The maximum length of the pattern string
An external function for stack checking
.sp .sp
A compile context is also required if you are using custom memory management. A compile context is also required if you are using custom memory management.
If none of these apply, just pass NULL as the context argument of If none of these apply, just pass NULL as the context argument of
@ -666,11 +686,11 @@ in the current locale.
.B " PCRE2_SIZE \fIvalue\fP);" .B " PCRE2_SIZE \fIvalue\fP);"
.fi .fi
.sp .sp
This sets a maximum length, in code units, for the pattern string that is to be This sets a maximum length, in code units, for any pattern string that is
compiled. If the pattern is longer, an error is generated. This facility is compiled with this context. If the pattern is longer, an error is generated.
provided so that applications that accept patterns from external sources can This facility is provided so that applications that accept patterns from
limit their size. The default is the largest number that a PCRE2_SIZE variable external sources can limit their size. The default is the largest number that a
can hold, which is effectively unlimited. PCRE2_SIZE variable can hold, which is effectively unlimited.
.sp .sp
.nf .nf
.B int pcre2_set_newline(pcre2_compile_context *\fIccontext\fP, .B int pcre2_set_newline(pcre2_compile_context *\fIccontext\fP,
@ -683,8 +703,15 @@ PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
PCRE2_NEWLINE_ANY (any Unicode newline sequence). PCRE2_NEWLINE_ANY (any Unicode newline sequence).
.P .P
When a pattern is compiled with the PCRE2_EXTENDED option, the value of this A pattern can override the value set in the compile context by starting with a
parameter affects the recognition of white space and the end of internal sequence such as (*CRLF). See the
.\" HREF
\fBpcre2pattern\fP
.\"
page for details.
.P
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
convention affects the recognition of white space and the end of internal
comments starting with #. The value is saved with the compiled pattern for comments starting with #. The value is saved with the compiled pattern for
subsequent use by the JIT compiler and by the two interpreted matching subsequent use by the JIT compiler and by the two interpreted matching
functions, \fIpcre2_match()\fP and \fIpcre2_dfa_match()\fP. functions, \fIpcre2_match()\fP and \fIpcre2_dfa_match()\fP.
@ -722,15 +749,14 @@ zero if all is well, or non-zero to force an error.
.SS "The match context" .SS "The match context"
.rs .rs
.sp .sp
A match context is required if you want to change the default values of any A match context is required if you want to:
of the following match-time parameters:
.sp .sp
A callout function Set up a callout function
The offset limit for matching an unanchored pattern Set an offset limit for matching an unanchored pattern
The limit for calling \fBmatch()\fP (see below) Change the backtracking match limit
The limit for calling \fBmatch()\fP recursively Change the backtracking depth limit
Set custom memory management specifically for the match
.sp .sp
A match context is also required if you are using custom memory management.
If none of these apply, just pass NULL as the context argument of If none of these apply, just pass NULL as the context argument of
\fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or \fBpcre2_jit_match()\fP. \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or \fBpcre2_jit_match()\fP.
.P .P
@ -756,7 +782,7 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
.B " void *\fIcallout_data\fP);" .B " void *\fIcallout_data\fP);"
.fi .fi
.sp .sp
This sets up a "callout" function, which PCRE2 will call at specified points This sets up a "callout" function for PCRE2 to call at specified points
during a matching operation. Details are given in the during a matching operation. Details are given in the
.\" HREF .\" HREF
\fBpcre2callout\fP \fBpcre2callout\fP
@ -778,8 +804,8 @@ A match can never be found if the \fIstartoffset\fP argument of
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP is greater than the offset \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP is greater than the offset
limit. limit.
.P .P
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when calling When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
\fBpcre2_compile()\fP so that when JIT is in use, different code can be calling \fBpcre2_compile()\fP so that when JIT is in use, different code can be
compiled. If a match is started with a non-default match limit when compiled. If a match is started with a non-default match limit when
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated. PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
.P .P
@ -799,10 +825,10 @@ up too many resources when processing patterns that are not going to match, but
which have a very large number of possibilities in their search trees. The which have a very large number of possibilities in their search trees. The
classic example is a pattern that uses nested unlimited repeats. classic example is a pattern that uses nested unlimited repeats.
.P .P
Internally, \fBpcre2_match()\fP uses a function called \fBmatch()\fP, which it There is an internal counter in \fBpcre2_match()\fP that is incremented each
calls repeatedly (sometimes recursively). The limit set by \fImatch_limit\fP is time round its main matching loop. If this value reaches the match limit,
imposed on the number of times this function is called during a match, which \fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
has the effect of limiting the amount of backtracking that can take place. For the effect of limiting the amount of backtracking that can take place. For
patterns that are not anchored, the count restarts from zero for each position patterns that are not anchored, the count restarts from zero for each position
in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP, in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP,
which ignores it. which ignores it.
@ -815,8 +841,7 @@ is also used in this case (but in a different way) to limit how long the
matching can continue. matching can continue.
.P .P
The default value for the limit can be set when PCRE2 is built; the default The default value for the limit can be set when PCRE2 is built; the default
default is 10 million, which handles all but the most extreme cases. If the default is 10 million, which handles all but the most extreme cases. A value
limit is exceeded, \fBpcre2_match()\fP returns PCRE2_ERROR_MATCHLIMIT. A value
for the match limit may also be supplied by an item at the start of a pattern for the match limit may also be supplied by an item at the start of a pattern
of the form of the form
.sp .sp
@ -827,65 +852,34 @@ less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
limit is set, less than the default. limit is set, less than the default.
.sp .sp
.nf .nf
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP, .B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
.B " uint32_t \fIvalue\fP);" .B " uint32_t \fIvalue\fP);"
.fi .fi
.sp .sp
The \fIrecursion_limit\fP parameter is similar to \fImatch_limit\fP, but This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP.
instead of limiting the total number of times that \fBmatch()\fP is called, it Each time a nested backtracking point is passed, a new memory "frame" is used
limits the depth of recursion. The recursion depth is a smaller number than the to remember the state of matching at that point. Thus, this parameter
total number of calls, because not all calls to \fBmatch()\fP are recursive. indirectly limits the amount of memory that is used in a match.
This limit is of use only if it is set smaller than \fImatch_limit\fP.
.P .P
Limiting the recursion depth limits the amount of system stack that can be This limit is not relevant, and is ignored, when matching is done using JIT
used, or, when PCRE2 has been compiled to use memory on the heap instead of the compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which uses
stack, the amount of heap memory that can be used. This limit is not relevant, it to limit the depth of internal recursive function calls that implement
and is ignored, when matching is done using JIT compiled code. However, it is lookaround assertions and pattern recursions. This is, therefore, an indirect
supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less limit on the amount of system stack that is used. A recursive pattern such as
frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of /(.)(?1)/, when matched to a very long string using \fBpcre2_dfa_match()\fP,
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string. can use a great deal of stack.
.P .P
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the The default value for the depth limit can be set when PCRE2 is built; the
default default is the same value as the default for \fImatch_limit\fP. If the default default is the same value as the default for the match limit. If the
limit is exceeded, \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP return limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP returns
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
supplied by an item at the start of a pattern of the form item at the start of a pattern of the form
.sp .sp
(*LIMIT_RECURSION=ddd) (*LIMIT_DEPTH=ddd)
.sp .sp
where ddd is a decimal number. However, such a setting is ignored unless ddd is where ddd is a decimal number. However, such a setting is ignored unless ddd is
less than the limit set by the caller of \fBpcre2_match()\fP or less than the limit set by the caller of \fBpcre2_match()\fP or
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default. \fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
.sp
.nf
.B int pcre2_set_recursion_memory_management(
.B " pcre2_match_context *\fImcontext\fP,"
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
.fi
.sp
This function sets up two additional custom memory management functions for use
by \fBpcre2_match()\fP when PCRE2 is compiled to use the heap for remembering
backtracking data, instead of recursive function calls that use the system
stack. There is a discussion about PCRE2's stack usage in the
.\" HREF
\fBpcre2stack\fP
.\"
documentation. See the
.\" HREF
\fBpcre2build\fP
.\"
documentation for details of how to build PCRE2.
.P
Using the heap for recursion is a non-standard way of building PCRE2, for use
in environments that have limited stacks. Because of the greater use of memory
management, \fBpcre2_match()\fP runs more slowly. Functions that are different
to the general custom memory functions are provided so that special-purpose
external code can be used for this case, because the memory blocks are all the
same size. The blocks are retained by \fBpcre2_match()\fP until it is about to
exit so that they can be re-used when possible during the match. In the absence
of these functions, the normal custom memory management functions are used, if
supplied, otherwise the system functions.
. .
. .
.SH "CHECKING BUILD-TIME OPTIONS" .SH "CHECKING BUILD-TIME OPTIONS"
@ -920,6 +914,13 @@ sequences the \eR escape sequence matches by default. A value of
PCRE2_BSR_UNICODE means that \eR matches any Unicode line ending sequence; a PCRE2_BSR_UNICODE means that \eR matches any Unicode line ending sequence; a
value of PCRE2_BSR_ANYCRLF means that \eR matches only CR, LF, or CRLF. The value of PCRE2_BSR_ANYCRLF means that \eR matches only CR, LF, or CRLF. The
default can be overridden when a pattern is compiled. default can be overridden when a pattern is compiled.
.sp
PCRE2_CONFIG_DEPTHLIMIT
.sp
The output is a uint32_t integer that gives the default limit for the depth of
nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions
and lookarounds in \fBpcre2_dfa_match()\fP. Further details are given with
\fBpcre2_set_depth_limit()\fP above.
.sp .sp
PCRE2_CONFIG_JIT PCRE2_CONFIG_JIT
.sp .sp
@ -954,9 +955,9 @@ be compiled by those two libraries, but at the expense of slower matching.
.sp .sp
PCRE2_CONFIG_MATCHLIMIT PCRE2_CONFIG_MATCHLIMIT
.sp .sp
The output is a uint32_t integer that gives the default limit for the number of The output is a uint32_t integer that gives the default match limit for
internal matching function calls in a \fBpcre2_match()\fP execution. Further \fBpcre2_match()\fP. Further details are given with
details are given with \fBpcre2_match()\fP below. \fBpcre2_set_match_limit()\fP above.
.sp .sp
PCRE2_CONFIG_NEWLINE PCRE2_CONFIG_NEWLINE
.sp .sp
@ -980,20 +981,11 @@ amount of system stack used when a pattern is compiled. It is specified when
PCRE2 is built; the default is 250. This limit does not take into account the PCRE2 is built; the default is 250. This limit does not take into account the
stack that may already be used by the calling application. For finer control stack that may already be used by the calling application. For finer control
over compilation stack usage, see \fBpcre2_set_compile_recursion_guard()\fP. over compilation stack usage, see \fBpcre2_set_compile_recursion_guard()\fP.
.sp
PCRE2_CONFIG_RECURSIONLIMIT
.sp
The output is a uint32_t integer that gives the default limit for the depth of
recursion when calling the internal matching function in a \fBpcre2_match()\fP
execution. Further details are given with \fBpcre2_match()\fP below.
.sp .sp
PCRE2_CONFIG_STACKRECURSE PCRE2_CONFIG_STACKRECURSE
.sp .sp
The output is a uint32_t integer that is set to one if internal recursion when This parameter is obsolete and should not be used in new code. The output is a
running \fBpcre2_match()\fP is implemented by recursive function calls that use uint32_t integer that is always set to zero.
the system stack to remember their state. This is the usual way that PCRE2 is
compiled. The output is zero if PCRE2 was compiled to use blocks of data on the
heap instead of recursive function calls.
.sp .sp
PCRE2_CONFIG_UNICODE_VERSION PCRE2_CONFIG_UNICODE_VERSION
.sp .sp
@ -1012,7 +1004,7 @@ available; otherwise it is set to zero. Unicode support implies UTF support.
.sp .sp
PCRE2_CONFIG_VERSION PCRE2_CONFIG_VERSION
.sp .sp
The \fIwhere\fP argument should point to a buffer that is at least 12 code The \fIwhere\fP argument should point to a buffer that is at least 24 code
units long. (The exact length required can be found by calling units long. (The exact length required can be found by calling
\fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with \fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with
the PCRE2 version string, zero-terminated. The number of code units used is the PCRE2 version string, zero-terminated. The number of code units used is
@ -1208,13 +1200,14 @@ option is set, normal backslash processing is applied to verb names and only an
unescaped closing parenthesis terminates the name. A closing parenthesis can be unescaped closing parenthesis terminates the name. A closing parenthesis can be
included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED
option is set, unescaped whitespace in verb names is skipped and #-comments are option is set, unescaped whitespace in verb names is skipped and #-comments are
recognized, exactly as in the rest of the pattern. recognized in this mode, exactly as in the rest of the pattern.
.sp .sp
PCRE2_AUTO_CALLOUT PCRE2_AUTO_CALLOUT
.sp .sp
If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items, If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
all with number 255, before each pattern item, except immediately before or all with number 255, before each pattern item, except immediately before or
after a callout in the pattern. For discussion of the callout facility, see the after an explicit callout in the pattern. For discussion of the callout
facility, see the
.\" HREF .\" HREF
\fBpcre2callout\fP \fBpcre2callout\fP
.\" .\"
@ -1452,9 +1445,8 @@ in the
.\" HREF .\" HREF
\fBpcre2unicode\fP \fBpcre2unicode\fP
.\" .\"
document. document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a
If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a negative negative error code.
error code.
.P .P
If you know that your pattern is valid, and you want to skip this check for If you know that your pattern is valid, and you want to skip this check for
performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When it is set, performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When it is set,
@ -1479,7 +1471,7 @@ in the
.\" .\"
page. If you set PCRE2_UCP, matching one of the items it affects takes much page. If you set PCRE2_UCP, matching one of the items it affects takes much
longer. The option is available only if PCRE2 has been compiled with Unicode longer. The option is available only if PCRE2 has been compiled with Unicode
support. support (which is the default).
.sp .sp
PCRE2_UNGREEDY PCRE2_UNGREEDY
.sp .sp
@ -1518,7 +1510,7 @@ page.
.SH "COMPILATION ERROR CODES" .SH "COMPILATION ERROR CODES"
.rs .rs
.sp .sp
There are over 80 positive error codes that \fBpcre2_compile()\fP may return There are nearly 100 positive error codes that \fBpcre2_compile()\fP may return
(via \fIerrorcode\fP) if it finds an error in the pattern. There are also some (via \fIerrorcode\fP) if it finds an error in the pattern. There are also some
negative error codes that are used for invalid UTF strings. These are the same negative error codes that are used for invalid UTF strings. These are the same
as given by \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, and are described as given by \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, and are described
@ -1570,7 +1562,7 @@ documentation.
JIT compilation is a heavyweight optimization. It can take some time for JIT compilation is a heavyweight optimization. It can take some time for
patterns to be analyzed, and for one-off matches and simple patterns the patterns to be analyzed, and for one-off matches and simple patterns the
benefit of faster execution might be offset by a much slower compilation time. benefit of faster execution might be offset by a much slower compilation time.
Most, but not all patterns can be optimized by the JIT compiler. Most (but not all) patterns can be optimized by the JIT compiler.
. .
. .
.\" HTML <a name="localesupport"></a> .\" HTML <a name="localesupport"></a>
@ -1581,10 +1573,10 @@ PCRE2 handles caseless matching, and determines whether characters are letters,
digits, or whatever, by reference to a set of tables, indexed by character code digits, or whatever, by reference to a set of tables, indexed by character code
point. This applies only to characters whose code points are less than 256. By point. This applies only to characters whose code points are less than 256. By
default, higher-valued code points never match escapes such as \ew or \ed. default, higher-valued code points never match escapes such as \ew or \ed.
However, if PCRE2 is built with UTF support, all characters can be tested with However, if PCRE2 is built with Unicode support, all characters can be tested
\ep and \eP, or, alternatively, the PCRE2_UCP option can be set when a pattern with \ep and \eP, or, alternatively, the PCRE2_UCP option can be set when a
is compiled; this causes \ew and friends to use Unicode property support pattern is compiled; this causes \ew and friends to use Unicode property
instead of the built-in tables. support instead of the built-in tables.
.P .P
The use of locales with Unicode is discouraged. If you are handling characters The use of locales with Unicode is discouraged. If you are handling characters
with code points greater than 128, you should either use Unicode support, or with code points greater than 128, you should either use Unicode support, or
@ -1623,7 +1615,7 @@ available for as long as it is needed.
The pointer that is passed (via the compile context) to \fBpcre2_compile()\fP The pointer that is passed (via the compile context) to \fBpcre2_compile()\fP
is saved with the compiled pattern, and the same tables are used by is saved with the compiled pattern, and the same tables are used by
\fBpcre2_match()\fP and \fBpcre_dfa_match()\fP. Thus, for any single pattern, \fBpcre2_match()\fP and \fBpcre_dfa_match()\fP. Thus, for any single pattern,
compilation, and matching all happen in the same locale, but different patterns compilation and matching both happen in the same locale, but different patterns
can be processed in different locales. can be processed in different locales.
. .
. .
@ -1646,7 +1638,7 @@ pattern. The second argument specifies which piece of information is required,
and the third argument is a pointer to a variable to receive the data. If the and the third argument is a pointer to a variable to receive the data. If the
third argument is NULL, the first argument is ignored, and the function returns third argument is NULL, the first argument is ignored, and the function returns
the size in bytes of the variable that is required for the information the size in bytes of the variable that is required for the information
requested. Otherwise, The yield of the function is zero for success, or one of requested. Otherwise, the yield of the function is zero for success, or one of
the following negative numbers: the following negative numbers:
.sp .sp
PCRE2_ERROR_NULL the argument \fIcode\fP was NULL PCRE2_ERROR_NULL the argument \fIcode\fP was NULL
@ -1699,8 +1691,8 @@ following are true:
.* is not in a capturing group that is the subject .* is not in a capturing group that is the subject
of a back reference of a back reference
PCRE2_DOTALL is in force for .* PCRE2_DOTALL is in force for .*
Neither (*PRUNE) nor (*SKIP) appears in the pattern. Neither (*PRUNE) nor (*SKIP) appears in the pattern
PCRE2_NO_DOTSTAR_ANCHOR is not set. PCRE2_NO_DOTSTAR_ANCHOR is not set
.sp .sp
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
options returned for PCRE2_INFO_ALLOPTIONS. options returned for PCRE2_INFO_ALLOPTIONS.
@ -1727,6 +1719,13 @@ matches only CR, LF, or CRLF.
Return the highest capturing subpattern number in the pattern. In patterns Return the highest capturing subpattern number in the pattern. In patterns
where (?| is not used, this is also the total number of capturing subpatterns. where (?| is not used, this is also the total number of capturing subpatterns.
The third argument should point to an \fBuint32_t\fP variable. The third argument should point to an \fBuint32_t\fP variable.
.sp
PCRE2_INFO_DEPTHLIMIT
.sp
If the pattern set a backtracking depth limit by including an item of the form
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
should point to an unsigned 32-bit integer. If no such value has been set, the
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
.sp .sp
PCRE2_INFO_FIRSTBITMAP PCRE2_INFO_FIRSTBITMAP
.sp .sp
@ -1758,6 +1757,14 @@ argument should point to an \fBuint32_t\fP variable. In the 8-bit library, the
value is always less than 256. In the 16-bit library the value can be up to value is always less than 256. In the 16-bit library the value can be up to
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff, 0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
and up to 0xffffffff when not using UTF-32 mode. and up to 0xffffffff when not using UTF-32 mode.
.sp
PCRE2_INFO_FRAMESIZE
.sp
Return the size (in bytes) of the data frames that are used to remember
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
without the use of JIT. The third argument should point to an \fBsize_t\fP
variable. The frame size depends on the number of capturing parentheses in the
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
.sp .sp
PCRE2_INFO_HASBACKSLASHC PCRE2_INFO_HASBACKSLASHC
.sp .sp
@ -1768,7 +1775,8 @@ argument should point to an \fBuint32_t\fP variable.
.sp .sp
Return 1 if the pattern contains any explicit matches for CR or LF characters, Return 1 if the pattern contains any explicit matches for CR or LF characters,
otherwise 0. The third argument should point to an \fBuint32_t\fP variable. An otherwise 0. The third argument should point to an \fBuint32_t\fP variable. An
explicit match is either a literal CR or LF character, or \er or \en. explicit match is either a literal CR or LF character, or \er or \en or one of
the equivalent hexadecimal or octal escape sequences.
.sp .sp
PCRE2_INFO_JCHANGED PCRE2_INFO_JCHANGED
.sp .sp
@ -1907,7 +1915,7 @@ different for each compiled pattern.
.sp .sp
PCRE2_INFO_NEWLINE PCRE2_INFO_NEWLINE
.sp .sp
The output is a \fBuint32_t\fP with one of the following values: The output is one of the following \fBuint32_t\fP values:
.sp .sp
PCRE2_NEWLINE_CR Carriage return (CR) PCRE2_NEWLINE_CR Carriage return (CR)
PCRE2_NEWLINE_LF Linefeed (LF) PCRE2_NEWLINE_LF Linefeed (LF)
@ -1915,15 +1923,8 @@ The output is a \fBuint32_t\fP with one of the following values:
PCRE2_NEWLINE_ANY Any Unicode line ending PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
.sp .sp
This specifies the default character sequence that will be recognized as This identifies the character sequence that will be recognized as meaning
meaning "newline" while matching. "newline" while matching.
.sp
PCRE2_INFO_RECURSIONLIMIT
.sp
If the pattern set a recursion limit by including an item of the form
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
argument should point to an unsigned 32-bit integer. If no such value has been
set, the call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
.sp .sp
PCRE2_INFO_SIZE PCRE2_INFO_SIZE
.sp .sp
@ -2000,9 +2001,9 @@ Before calling \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or
the creation functions above. For \fBpcre2_match_data_create()\fP, the first the creation functions above. For \fBpcre2_match_data_create()\fP, the first
argument is the number of pairs of offsets in the \fIovector\fP. One pair of argument is the number of pairs of offsets in the \fIovector\fP. One pair of
offsets is required to identify the string that matched the whole pattern, with offsets is required to identify the string that matched the whole pattern, with
another pair for each captured substring. For example, a value of 4 creates an additional pair for each captured substring. For example, a value of 4
enough space to record the matched portion of the subject plus three captured creates enough space to record the matched portion of the subject plus three
substrings. A minimum of at least 1 pair is imposed by captured substrings. A minimum of at least 1 pair is imposed by
\fBpcre2_match_data_create()\fP, so it is always possible to return the overall \fBpcre2_match_data_create()\fP, so it is always possible to return the overall
matched string. matched string.
.P .P
@ -2145,9 +2146,11 @@ newline convention recognizes CRLF as a newline, and if so, and the current
character is CR followed by LF, advance the starting offset by two characters character is CR followed by LF, advance the starting offset by two characters
instead of one. instead of one.
.P .P
If a non-zero starting offset is passed when the pattern is anchored, one If a non-zero starting offset is passed when the pattern is anchored, an single
attempt to match at the given offset is made. This can only succeed if the attempt to match at the given offset is made. This can only succeed if the
pattern does not require the match to be at the start of the subject. pattern does not require the match to be at the start of the subject. In other
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
. .
. .
.\" HTML <a name="matchoptions"></a> .\" HTML <a name="matchoptions"></a>
@ -2161,9 +2164,9 @@ PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is
described below. described below.
.P .P
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT) Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
compiler. If it is set, JIT matching is disabled and the normal interpretive compiler. If it is set, JIT matching is disabled and the interpretive code in
code in \fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT (obviously), the \fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT (obviously), the remaining
remaining options are supported for JIT matching. options are supported for JIT matching.
.sp .sp
PCRE2_ANCHORED PCRE2_ANCHORED
.sp .sp
@ -2257,12 +2260,12 @@ page.
If you know that your subject is valid, and you want to skip these checks for If you know that your subject is valid, and you want to skip these checks for
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
\fBpcre2_match()\fP. You might want to do this for the second and subsequent \fBpcre2_match()\fP. You might want to do this for the second and subsequent
calls to \fBpcre2_match()\fP if you are making repeated calls to find all the calls to \fBpcre2_match()\fP if you are making repeated calls to find other
matches in a single subject string. matches in the same subject string.
.P .P
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid string WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
as a subject, or an invalid value of \fIstartoffset\fP, is undefined. Your string as a subject, or an invalid value of \fIstartoffset\fP, is undefined.
program may crash or loop indefinitely. Your program may crash or loop indefinitely.
.sp .sp
PCRE2_PARTIAL_HARD PCRE2_PARTIAL_HARD
PCRE2_PARTIAL_SOFT PCRE2_PARTIAL_SOFT
@ -2329,9 +2332,9 @@ start, it skips both the CR and the LF before retrying. However, the pattern
reference, and so advances only by one character after the first failure. reference, and so advances only by one character after the first failure.
.P .P
An explicit match for CR of LF is either a literal appearance of one of those An explicit match for CR of LF is either a literal appearance of one of those
characters in the pattern, or one of the \er or \en escape sequences. Implicit characters in the pattern, or one of the \er or \en or equivalent octal or
matches such as [^X] do not count, nor does \es, even though it includes CR and hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
LF in the characters that it matches. does \es, even though it includes CR and LF in the characters that it matches.
.P .P
Notwithstanding the above, anomalous effects may still occur when CRLF is a Notwithstanding the above, anomalous effects may still occur when CRLF is a
valid newline sequence and explicit \er or \en escapes appear in the pattern. valid newline sequence and explicit \er or \en escapes appear in the pattern.
@ -2395,12 +2398,12 @@ identify the part of the subject that was partially matched. See the
.\" .\"
documentation for details of partial matching. documentation for details of partial matching.
.P .P
After a successful match, the first pair of offsets identifies the portion of After a fully successful match, the first pair of offsets identifies the
the subject string that was matched by the entire pattern. The next pair is portion of the subject string that was matched by the entire pattern. The next
used for the first capturing subpattern, and so on. The value returned by pair is used for the first captured substring, and so on. The value returned by
\fBpcre2_match()\fP is one more than the highest numbered pair that has been \fBpcre2_match()\fP is one more than the highest numbered pair that has been
set. For example, if two substrings have been captured, the returned value is set. For example, if two substrings have been captured, the returned value is
3. If there are no capturing subpatterns, the return value from a successful 3. If there are no captured substrings, the return value from a successful
match is 1, indicating that just the first pair of offsets has been set. match is 1, indicating that just the first pair of offsets has been set.
.P .P
If a pattern uses the \eK escape sequence within a positive assertion, the If a pattern uses the \eK escape sequence within a positive assertion, the
@ -2415,11 +2418,7 @@ returned.
If the ovector is too small to hold all the captured substring offsets, as much If the ovector is too small to hold all the captured substring offsets, as much
as possible is filled in, and the function returns a value of zero. If captured as possible is filled in, and the function returns a value of zero. If captured
substrings are not of interest, \fBpcre2_match()\fP may be called with a match substrings are not of interest, \fBpcre2_match()\fP may be called with a match
data block whose ovector is of minimum length (that is, one pair). However, if data block whose ovector is of minimum length (that is, one pair).
the pattern contains back references and the \fIovector\fP is not big enough to
remember the related substrings, PCRE2 has to get additional memory for use
during matching. Thus it is usually advisable to set up a match data block
containing an ovector of reasonable size.
.P .P
It is possible for capturing subpattern number \fIn+1\fP to match some part of It is possible for capturing subpattern number \fIn+1\fP to match some part of
the subject when subpattern \fIn\fP has not been used at all. For example, if the subject when subpattern \fIn\fP has not been used at all. For example, if
@ -2535,8 +2534,9 @@ returned when the magic number is not present.
.sp .sp
PCRE2_ERROR_BADMODE PCRE2_ERROR_BADMODE
.sp .sp
This error is given when a pattern that was compiled by the 8-bit library is This error is given when a compiled pattern is passed to a function in a
passed to a 16-bit or 32-bit library function, or vice versa. library of a different code unit width, for example, a pattern compiled by
the 8-bit library is passed to a 16-bit or 32-bit library function.
.sp .sp
PCRE2_ERROR_BADOFFSET PCRE2_ERROR_BADOFFSET
.sp .sp
@ -2562,22 +2562,15 @@ use by callout functions that want to cause \fBpcre2_match()\fP or
\fBpcre2callout\fP \fBpcre2callout\fP
.\" .\"
documentation for details. documentation for details.
.sp
PCRE2_ERROR_DEPTHLIMIT
.sp
The nested backtracking depth limit was reached.
.sp .sp
PCRE2_ERROR_INTERNAL PCRE2_ERROR_INTERNAL
.sp .sp
An unexpected internal error has occurred. This error could be caused by a bug An unexpected internal error has occurred. This error could be caused by a bug
in PCRE2 or by overwriting of the compiled pattern. in PCRE2 or by overwriting of the compiled pattern.
.sp
PCRE2_ERROR_JIT_BADOPTION
.sp
This error is returned when a pattern that was successfully studied using JIT
is being matched, but the matching mode (partial or complete match) does not
correspond to any JIT compilation mode. When the JIT fast path function is
used, this error may be also given for invalid options. See the
.\" HREF
\fBpcre2jit\fP
.\"
documentation for more details.
.sp .sp
PCRE2_ERROR_JIT_STACKLIMIT PCRE2_ERROR_JIT_STACKLIMIT
.sp .sp
@ -2591,15 +2584,13 @@ documentation for more details.
.sp .sp
PCRE2_ERROR_MATCHLIMIT PCRE2_ERROR_MATCHLIMIT
.sp .sp
The backtracking limit was reached. The backtracking match limit was reached.
.sp .sp
PCRE2_ERROR_NOMEMORY PCRE2_ERROR_NOMEMORY
.sp .sp
If a pattern contains back references, but the ovector is not big enough to If a pattern contains many nested backtracking points, heap memory is used to
remember the referenced substrings, PCRE2 gets a block of memory at the start remember them. This error is given when the memory allocation function (default
of matching to use for this purpose. There are some other special cases where or custom) fails.
extra memory is needed during matching. This error is given when memory cannot
be obtained.
.sp .sp
PCRE2_ERROR_NULL PCRE2_ERROR_NULL
.sp .sp
@ -2615,10 +2606,6 @@ in the subject string. Some simple patterns that might do this are detected and
faulted at compile time, but more complicated cases, in particular mutual faulted at compile time, but more complicated cases, in particular mutual
recursions between two different subpatterns, cannot be detected until matching recursions between two different subpatterns, cannot be detected until matching
is attempted. is attempted.
.sp
PCRE2_ERROR_RECURSIONLIMIT
.sp
The internal recursion limit was reached.
. .
. .
.\" HTML <a name="geterrormessage"></a> .\" HTML <a name="geterrormessage"></a>
@ -2808,8 +2795,8 @@ calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
compiled pattern, and the second is the name. The yield of the function is the compiled pattern, and the second is the name. The yield of the function is the
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
that name. Given the number, you can extract the substring directly, or use one that name. Given the number, you can extract the substring directly from the
of the functions described above. ovector, or use one of the "bynumber" functions described above.
.P .P
For convenience, there are also "byname" functions that correspond to the For convenience, there are also "byname" functions that correspond to the
"bynumber" functions, the only difference being that the second argument is a "bynumber" functions, the only difference being that the second argument is a
@ -3113,11 +3100,12 @@ other alternatives. Ultimately, when it runs out of matches,
.P .P
The function \fBpcre2_dfa_match()\fP is called to match a subject string The function \fBpcre2_dfa_match()\fP is called to match a subject string
against a compiled pattern, using a matching algorithm that scans the subject against a compiled pattern, using a matching algorithm that scans the subject
string just once, and does not backtrack. This has different characteristics to string just once (not counting lookaround assertions), and does not backtrack.
the normal algorithm, and is not compatible with Perl. Some of the features of This has different characteristics to the normal algorithm, and is not
PCRE2 patterns are not supported. Nevertheless, there are times when this kind compatible with Perl. Some of the features of PCRE2 patterns are not supported.
of matching can be useful. For a discussion of the two matching algorithms, and Nevertheless, there are times when this kind of matching can be useful. For a
a list of features that \fBpcre2_dfa_match()\fP does not support, see the discussion of the two matching algorithms, and a list of features that
\fBpcre2_dfa_match()\fP does not support, see the
.\" HREF .\" HREF
\fBpcre2matching\fP \fBpcre2matching\fP
.\" .\"
@ -3321,6 +3309,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 21 March 2017 Last updated: 27 March 2017
Copyright (c) 1997-2017 University of Cambridge. Copyright (c) 1997-2017 University of Cambridge.
.fi .fi