Documentation update.
This commit is contained in:
parent
447d1b3083
commit
6c7fa44939
|
@ -46,7 +46,7 @@ A match context is needed only if you want to:
|
||||||
Set a matching offset limit
|
Set a matching offset limit
|
||||||
Change the backtracking match limit
|
Change the backtracking match limit
|
||||||
Change the backtracking depth limit
|
Change the backtracking depth limit
|
||||||
Set custom memory management in the match context
|
Set custom memory management specifically for the match
|
||||||
</pre>
|
</pre>
|
||||||
The <i>length</i> and <i>startoffset</i> values are code
|
The <i>length</i> and <i>startoffset</i> values are code
|
||||||
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
||||||
|
|
|
@ -23,37 +23,38 @@ please consult the man page, in case the conversion went wrong.
|
||||||
<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
|
<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
|
||||||
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a>
|
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a>
|
||||||
<li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
|
<li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
|
||||||
<li><a name="TOC11" href="#SEC11">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
<li><a name="TOC11" href="#SEC11">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a>
|
||||||
<li><a name="TOC12" href="#SEC12">PCRE2 API OVERVIEW</a>
|
<li><a name="TOC12" href="#SEC12">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
||||||
<li><a name="TOC13" href="#SEC13">STRING LENGTHS AND OFFSETS</a>
|
<li><a name="TOC13" href="#SEC13">PCRE2 API OVERVIEW</a>
|
||||||
<li><a name="TOC14" href="#SEC14">NEWLINES</a>
|
<li><a name="TOC14" href="#SEC14">STRING LENGTHS AND OFFSETS</a>
|
||||||
<li><a name="TOC15" href="#SEC15">MULTITHREADING</a>
|
<li><a name="TOC15" href="#SEC15">NEWLINES</a>
|
||||||
<li><a name="TOC16" href="#SEC16">PCRE2 CONTEXTS</a>
|
<li><a name="TOC16" href="#SEC16">MULTITHREADING</a>
|
||||||
<li><a name="TOC17" href="#SEC17">CHECKING BUILD-TIME OPTIONS</a>
|
<li><a name="TOC17" href="#SEC17">PCRE2 CONTEXTS</a>
|
||||||
<li><a name="TOC18" href="#SEC18">COMPILING A PATTERN</a>
|
<li><a name="TOC18" href="#SEC18">CHECKING BUILD-TIME OPTIONS</a>
|
||||||
<li><a name="TOC19" href="#SEC19">COMPILATION ERROR CODES</a>
|
<li><a name="TOC19" href="#SEC19">COMPILING A PATTERN</a>
|
||||||
<li><a name="TOC20" href="#SEC20">JUST-IN-TIME (JIT) COMPILATION</a>
|
<li><a name="TOC20" href="#SEC20">COMPILATION ERROR CODES</a>
|
||||||
<li><a name="TOC21" href="#SEC21">LOCALE SUPPORT</a>
|
<li><a name="TOC21" href="#SEC21">JUST-IN-TIME (JIT) COMPILATION</a>
|
||||||
<li><a name="TOC22" href="#SEC22">INFORMATION ABOUT A COMPILED PATTERN</a>
|
<li><a name="TOC22" href="#SEC22">LOCALE SUPPORT</a>
|
||||||
<li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A PATTERN'S CALLOUTS</a>
|
<li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A COMPILED PATTERN</a>
|
||||||
<li><a name="TOC24" href="#SEC24">SERIALIZATION AND PRECOMPILING</a>
|
<li><a name="TOC24" href="#SEC24">INFORMATION ABOUT A PATTERN'S CALLOUTS</a>
|
||||||
<li><a name="TOC25" href="#SEC25">THE MATCH DATA BLOCK</a>
|
<li><a name="TOC25" href="#SEC25">SERIALIZATION AND PRECOMPILING</a>
|
||||||
<li><a name="TOC26" href="#SEC26">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
<li><a name="TOC26" href="#SEC26">THE MATCH DATA BLOCK</a>
|
||||||
<li><a name="TOC27" href="#SEC27">NEWLINE HANDLING WHEN MATCHING</a>
|
<li><a name="TOC27" href="#SEC27">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
||||||
<li><a name="TOC28" href="#SEC28">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
<li><a name="TOC28" href="#SEC28">NEWLINE HANDLING WHEN MATCHING</a>
|
||||||
<li><a name="TOC29" href="#SEC29">OTHER INFORMATION ABOUT A MATCH</a>
|
<li><a name="TOC29" href="#SEC29">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
||||||
<li><a name="TOC30" href="#SEC30">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
<li><a name="TOC30" href="#SEC30">OTHER INFORMATION ABOUT A MATCH</a>
|
||||||
<li><a name="TOC31" href="#SEC31">OBTAINING A TEXTUAL ERROR MESSAGE</a>
|
<li><a name="TOC31" href="#SEC31">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
||||||
<li><a name="TOC32" href="#SEC32">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
<li><a name="TOC32" href="#SEC32">OBTAINING A TEXTUAL ERROR MESSAGE</a>
|
||||||
<li><a name="TOC33" href="#SEC33">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
<li><a name="TOC33" href="#SEC33">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
||||||
<li><a name="TOC34" href="#SEC34">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
<li><a name="TOC34" href="#SEC34">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
||||||
<li><a name="TOC35" href="#SEC35">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
<li><a name="TOC35" href="#SEC35">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
||||||
<li><a name="TOC36" href="#SEC36">DUPLICATE SUBPATTERN NAMES</a>
|
<li><a name="TOC36" href="#SEC36">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
||||||
<li><a name="TOC37" href="#SEC37">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
<li><a name="TOC37" href="#SEC37">DUPLICATE SUBPATTERN NAMES</a>
|
||||||
<li><a name="TOC38" href="#SEC38">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
<li><a name="TOC38" href="#SEC38">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
||||||
<li><a name="TOC39" href="#SEC39">SEE ALSO</a>
|
<li><a name="TOC39" href="#SEC39">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
||||||
<li><a name="TOC40" href="#SEC40">AUTHOR</a>
|
<li><a name="TOC40" href="#SEC40">SEE ALSO</a>
|
||||||
<li><a name="TOC41" href="#SEC41">REVISION</a>
|
<li><a name="TOC41" href="#SEC41">AUTHOR</a>
|
||||||
|
<li><a name="TOC42" href="#SEC42">REVISION</a>
|
||||||
</ul>
|
</ul>
|
||||||
<P>
|
<P>
|
||||||
<b>#include <pcre2.h></b>
|
<b>#include <pcre2.h></b>
|
||||||
|
@ -177,22 +178,16 @@ document for an overview of all the PCRE2 documentation.
|
||||||
<b> void *<i>callout_data</i>);</b>
|
<b> void *<i>callout_data</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
|
||||||
<b> uint32_t <i>value</i>);</b>
|
|
||||||
<br>
|
|
||||||
<br>
|
|
||||||
<b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
<b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>value</i>);</b>
|
<b> PCRE2_SIZE <i>value</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||||
<b> uint32_t <i>value</i>);</b>
|
<b> uint32_t <i>value</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
<b>int pcre2_set_recursion_memory_management(</b>
|
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||||
<b> pcre2_match_context *<i>mcontext</i>,</b>
|
<b> uint32_t <i>value</i>);</b>
|
||||||
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
|
|
||||||
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
|
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC6" href="#TOC1">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a><br>
|
<br><a name="SEC6" href="#TOC1">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -314,7 +309,24 @@ document for an overview of all the PCRE2 documentation.
|
||||||
<br>
|
<br>
|
||||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC11" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
<br><a name="SEC11" href="#TOC1">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a><br>
|
||||||
|
<P>
|
||||||
|
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||||
|
<b> uint32_t <i>value</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>int pcre2_set_recursion_memory_management(</b>
|
||||||
|
<b> pcre2_match_context *<i>mcontext</i>,</b>
|
||||||
|
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
|
||||||
|
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
These functions became obsolete at release 10.30 and are retained only for
|
||||||
|
backward compatibility. They should not be used in new code. The first is
|
||||||
|
replaced by <b>pcre2_set_depth_limit()</b>; the second is no longer needed and
|
||||||
|
no longer has any effect (it always returns zero).
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC12" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
||||||
<P>
|
<P>
|
||||||
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
|
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
|
||||||
units, respectively. However, there is just one header file, <b>pcre2.h</b>.
|
units, respectively. However, there is just one header file, <b>pcre2.h</b>.
|
||||||
|
@ -368,14 +380,14 @@ When using multiple libraries in an application, you must take care when
|
||||||
processing any particular pattern to use only functions from a single library.
|
processing any particular pattern to use only functions from a single library.
|
||||||
For example, if you want to run a match using a pattern that was compiled with
|
For example, if you want to run a match using a pattern that was compiled with
|
||||||
<b>pcre2_compile_16()</b>, you must do so with <b>pcre2_match_16()</b>, not
|
<b>pcre2_compile_16()</b>, you must do so with <b>pcre2_match_16()</b>, not
|
||||||
<b>pcre2_match_8()</b>.
|
<b>pcre2_match_8()</b> or <b>pcre2_match_32</b>.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
In the function summaries above, and in the rest of this document and other
|
In the function summaries above, and in the rest of this document and other
|
||||||
PCRE2 documents, functions and data types are described using their generic
|
PCRE2 documents, functions and data types are described using their generic
|
||||||
names, without the 8, 16, or 32 suffix.
|
names, without the 8, 16, or 32 suffix.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC12" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
<br><a name="SEC13" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 has its own native API, which is described in this document. There are
|
PCRE2 has its own native API, which is described in this document. There are
|
||||||
also some wrapper functions for the 8-bit library that correspond to the
|
also some wrapper functions for the 8-bit library that correspond to the
|
||||||
|
@ -397,7 +409,7 @@ against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
|
||||||
<b>pcre2.h</b>.
|
<b>pcre2.h</b>.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The functions <b>pcre2_compile()</b>, and <b>pcre2_match()</b> are used for
|
The functions <b>pcre2_compile()</b> and <b>pcre2_match()</b> are used for
|
||||||
compiling and matching regular expressions in a Perl-compatible manner. A
|
compiling and matching regular expressions in a Perl-compatible manner. A
|
||||||
sample program that demonstrates the simplest way of using them is provided in
|
sample program that demonstrates the simplest way of using them is provided in
|
||||||
the file called <i>pcre2demo.c</i> in the PCRE2 source distribution. A listing
|
the file called <i>pcre2demo.c</i> in the PCRE2 source distribution. A listing
|
||||||
|
@ -408,10 +420,17 @@ documentation, and the
|
||||||
documentation describes how to compile and run it.
|
documentation describes how to compile and run it.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Just-in-time compiler support is an optional feature of PCRE2 that can be built
|
The compiling and matching functions recognize various options that are passed
|
||||||
in appropriate hardware environments. It greatly speeds up the matching
|
as bits in an options argument. There are also some more complicated parameters
|
||||||
|
such as custom memory management functions and resource limits that are passed
|
||||||
|
in "contexts" (which are just memory blocks, described below). Simple
|
||||||
|
applications do not need to make use of contexts.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
|
||||||
|
built in appropriate hardware environments. It greatly speeds up the matching
|
||||||
performance of many patterns. Programs can request that it be used if
|
performance of many patterns. Programs can request that it be used if
|
||||||
available, by calling <b>pcre2_jit_compile()</b> after a pattern has been
|
available by calling <b>pcre2_jit_compile()</b> after a pattern has been
|
||||||
successfully compiled by <b>pcre2_compile()</b>. This does nothing if JIT
|
successfully compiled by <b>pcre2_compile()</b>. This does nothing if JIT
|
||||||
support is not available.
|
support is not available.
|
||||||
</P>
|
</P>
|
||||||
|
@ -423,8 +442,8 @@ More complicated programs might need to make use of the specialist functions
|
||||||
<P>
|
<P>
|
||||||
JIT matching is automatically used by <b>pcre2_match()</b> if it is available,
|
JIT matching is automatically used by <b>pcre2_match()</b> if it is available,
|
||||||
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
||||||
matching, which gives improved performance. The JIT-specific functions are
|
matching, which gives improved performance at the expense of less sanity
|
||||||
discussed in the
|
checking. The JIT-specific functions are discussed in the
|
||||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||||
documentation.
|
documentation.
|
||||||
</P>
|
</P>
|
||||||
|
@ -433,7 +452,7 @@ A second matching function, <b>pcre2_dfa_match()</b>, which is not
|
||||||
Perl-compatible, is also provided. This uses a different algorithm for the
|
Perl-compatible, is also provided. This uses a different algorithm for the
|
||||||
matching. The alternative algorithm finds all possible matches (at a given
|
matching. The alternative algorithm finds all possible matches (at a given
|
||||||
point in the subject), and scans the subject just once (unless there are
|
point in the subject), and scans the subject just once (unless there are
|
||||||
lookbehind assertions). However, this algorithm does not return captured
|
lookaround assertions). However, this algorithm does not return captured
|
||||||
substrings. A description of the two matching algorithms and their advantages
|
substrings. A description of the two matching algorithms and their advantages
|
||||||
and disadvantages is given in the
|
and disadvantages is given in the
|
||||||
<a href="pcre2matching.html"><b>pcre2matching</b></a>
|
<a href="pcre2matching.html"><b>pcre2matching</b></a>
|
||||||
|
@ -476,7 +495,7 @@ Functions with names ending with <b>_free()</b> are used for freeing memory
|
||||||
blocks of various sorts. In all cases, if one of these functions is called with
|
blocks of various sorts. In all cases, if one of these functions is called with
|
||||||
a NULL argument, it does nothing.
|
a NULL argument, it does nothing.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC13" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
<br><a name="SEC14" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
||||||
<P>
|
<P>
|
||||||
The PCRE2 API uses string lengths and offsets into strings of code units in
|
The PCRE2 API uses string lengths and offsets into strings of code units in
|
||||||
several places. These values are always of type PCRE2_SIZE, which is an
|
several places. These values are always of type PCRE2_SIZE, which is an
|
||||||
|
@ -486,7 +505,7 @@ as a special indicator for zero-terminated strings and unset offsets.
|
||||||
Therefore, the longest string that can be handled is one less than this
|
Therefore, the longest string that can be handled is one less than this
|
||||||
maximum.
|
maximum.
|
||||||
<a name="newlines"></a></P>
|
<a name="newlines"></a></P>
|
||||||
<br><a name="SEC14" href="#TOC1">NEWLINES</a><br>
|
<br><a name="SEC15" href="#TOC1">NEWLINES</a><br>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 supports five different conventions for indicating line breaks in
|
PCRE2 supports five different conventions for indicating line breaks in
|
||||||
strings: a single CR (carriage return) character, a single LF (linefeed)
|
strings: a single CR (carriage return) character, a single LF (linefeed)
|
||||||
|
@ -521,7 +540,7 @@ The choice of newline convention does not affect the interpretation of
|
||||||
the \n or \r escape sequences, nor does it affect what \R matches; this has
|
the \n or \r escape sequences, nor does it affect what \R matches; this has
|
||||||
its own separate convention.
|
its own separate convention.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC15" href="#TOC1">MULTITHREADING</a><br>
|
<br><a name="SEC16" href="#TOC1">MULTITHREADING</a><br>
|
||||||
<P>
|
<P>
|
||||||
In a multithreaded application it is important to keep thread-specific data
|
In a multithreaded application it is important to keep thread-specific data
|
||||||
separate from data that can be shared between threads. The PCRE2 library code
|
separate from data that can be shared between threads. The PCRE2 library code
|
||||||
|
@ -543,8 +562,8 @@ and does not change when the pattern is matched. Therefore, it is thread-safe,
|
||||||
that is, the same compiled pattern can be used by more than one thread
|
that is, the same compiled pattern can be used by more than one thread
|
||||||
simultaneously. For example, an application can compile all its patterns at the
|
simultaneously. For example, an application can compile all its patterns at the
|
||||||
start, before forking off multiple threads that use them. However, if the
|
start, before forking off multiple threads that use them. However, if the
|
||||||
just-in-time optimization feature is being used, it needs separate memory stack
|
just-in-time (JIT) optimization feature is being used, it needs separate memory
|
||||||
areas for each thread. See the
|
stack areas for each thread. See the
|
||||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||||
documentation for more details.
|
documentation for more details.
|
||||||
</P>
|
</P>
|
||||||
|
@ -596,12 +615,12 @@ thread-specific copy.
|
||||||
Match blocks
|
Match blocks
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The matching functions need a block of memory for working space and for storing
|
The matching functions need a block of memory for storing the results of a
|
||||||
the results of a match. This includes details of what was matched, as well as
|
match. This includes details of what was matched, as well as additional
|
||||||
additional information such as the name of a (*MARK) setting. Each thread must
|
information such as the name of a (*MARK) setting. Each thread must provide its
|
||||||
provide its own copy of this memory.
|
own copy of this memory.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC16" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
<br><a name="SEC17" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
||||||
<P>
|
<P>
|
||||||
Some PCRE2 functions have a lot of parameters, many of which are used only by
|
Some PCRE2 functions have a lot of parameters, many of which are used only by
|
||||||
specialist applications, for example, those that use custom memory management
|
specialist applications, for example, those that use custom memory management
|
||||||
|
@ -663,15 +682,15 @@ The memory used for a general context should be freed by calling:
|
||||||
The compile context
|
The compile context
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
A compile context is required if you want to change the default values of any
|
A compile context is required if you want to provide an external function for
|
||||||
of the following compile-time parameters:
|
stack checking during compilation or to change the default values of any of the
|
||||||
|
following compile-time parameters:
|
||||||
<pre>
|
<pre>
|
||||||
What \R matches (Unicode newlines or CR, LF, CRLF only)
|
What \R matches (Unicode newlines or CR, LF, CRLF only)
|
||||||
PCRE2's character tables
|
PCRE2's character tables
|
||||||
The newline character sequence
|
The newline character sequence
|
||||||
The compile time nested parentheses limit
|
The compile time nested parentheses limit
|
||||||
The maximum length of the pattern string
|
The maximum length of the pattern string
|
||||||
An external function for stack checking
|
|
||||||
</pre>
|
</pre>
|
||||||
A compile context is also required if you are using custom memory management.
|
A compile context is also required if you are using custom memory management.
|
||||||
If none of these apply, just pass NULL as the context argument of
|
If none of these apply, just pass NULL as the context argument of
|
||||||
|
@ -713,11 +732,11 @@ in the current locale.
|
||||||
<b> PCRE2_SIZE <i>value</i>);</b>
|
<b> PCRE2_SIZE <i>value</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
This sets a maximum length, in code units, for the pattern string that is to be
|
This sets a maximum length, in code units, for any pattern string that is
|
||||||
compiled. If the pattern is longer, an error is generated. This facility is
|
compiled with this context. If the pattern is longer, an error is generated.
|
||||||
provided so that applications that accept patterns from external sources can
|
This facility is provided so that applications that accept patterns from
|
||||||
limit their size. The default is the largest number that a PCRE2_SIZE variable
|
external sources can limit their size. The default is the largest number that a
|
||||||
can hold, which is effectively unlimited.
|
PCRE2_SIZE variable can hold, which is effectively unlimited.
|
||||||
<b>int pcre2_set_newline(pcre2_compile_context *<i>ccontext</i>,</b>
|
<b>int pcre2_set_newline(pcre2_compile_context *<i>ccontext</i>,</b>
|
||||||
<b> uint32_t <i>value</i>);</b>
|
<b> uint32_t <i>value</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
|
@ -729,8 +748,14 @@ sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
|
||||||
PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When a pattern is compiled with the PCRE2_EXTENDED option, the value of this
|
A pattern can override the value set in the compile context by starting with a
|
||||||
parameter affects the recognition of white space and the end of internal
|
sequence such as (*CRLF). See the
|
||||||
|
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||||
|
page for details.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
|
||||||
|
convention affects the recognition of white space and the end of internal
|
||||||
comments starting with #. The value is saved with the compiled pattern for
|
comments starting with #. The value is saved with the compiled pattern for
|
||||||
subsequent use by the JIT compiler and by the two interpreted matching
|
subsequent use by the JIT compiler and by the two interpreted matching
|
||||||
functions, <i>pcre2_match()</i> and <i>pcre2_dfa_match()</i>.
|
functions, <i>pcre2_match()</i> and <i>pcre2_dfa_match()</i>.
|
||||||
|
@ -764,15 +789,14 @@ zero if all is well, or non-zero to force an error.
|
||||||
The match context
|
The match context
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
A match context is required if you want to change the default values of any
|
A match context is required if you want to:
|
||||||
of the following match-time parameters:
|
|
||||||
<pre>
|
<pre>
|
||||||
A callout function
|
Set up a callout function
|
||||||
The offset limit for matching an unanchored pattern
|
Set an offset limit for matching an unanchored pattern
|
||||||
The limit for calling <b>match()</b> (see below)
|
Change the backtracking match limit
|
||||||
The limit for calling <b>match()</b> recursively
|
Change the backtracking depth limit
|
||||||
|
Set custom memory management specifically for the match
|
||||||
</pre>
|
</pre>
|
||||||
A match context is also required if you are using custom memory management.
|
|
||||||
If none of these apply, just pass NULL as the context argument of
|
If none of these apply, just pass NULL as the context argument of
|
||||||
<b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>.
|
<b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>.
|
||||||
</P>
|
</P>
|
||||||
|
@ -797,7 +821,7 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
|
||||||
<b> void *<i>callout_data</i>);</b>
|
<b> void *<i>callout_data</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
This sets up a "callout" function, which PCRE2 will call at specified points
|
This sets up a "callout" function for PCRE2 to call at specified points
|
||||||
during a matching operation. Details are given in the
|
during a matching operation. Details are given in the
|
||||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||||
documentation.
|
documentation.
|
||||||
|
@ -816,8 +840,8 @@ A match can never be found if the <i>startoffset</i> argument of
|
||||||
limit.
|
limit.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when calling
|
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
|
||||||
<b>pcre2_compile()</b> so that when JIT is in use, different code can be
|
calling <b>pcre2_compile()</b> so that when JIT is in use, different code can be
|
||||||
compiled. If a match is started with a non-default match limit when
|
compiled. If a match is started with a non-default match limit when
|
||||||
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
||||||
</P>
|
</P>
|
||||||
|
@ -837,10 +861,10 @@ which have a very large number of possibilities in their search trees. The
|
||||||
classic example is a pattern that uses nested unlimited repeats.
|
classic example is a pattern that uses nested unlimited repeats.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Internally, <b>pcre2_match()</b> uses a function called <b>match()</b>, which it
|
There is an internal counter in <b>pcre2_match()</b> that is incremented each
|
||||||
calls repeatedly (sometimes recursively). The limit set by <i>match_limit</i> is
|
time round its main matching loop. If this value reaches the match limit,
|
||||||
imposed on the number of times this function is called during a match, which
|
<b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
||||||
has the effect of limiting the amount of backtracking that can take place. For
|
the effect of limiting the amount of backtracking that can take place. For
|
||||||
patterns that are not anchored, the count restarts from zero for each position
|
patterns that are not anchored, the count restarts from zero for each position
|
||||||
in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>,
|
in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>,
|
||||||
which ignores it.
|
which ignores it.
|
||||||
|
@ -855,8 +879,7 @@ matching can continue.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The default value for the limit can be set when PCRE2 is built; the default
|
The default value for the limit can be set when PCRE2 is built; the default
|
||||||
default is 10 million, which handles all but the most extreme cases. If the
|
default is 10 million, which handles all but the most extreme cases. A value
|
||||||
limit is exceeded, <b>pcre2_match()</b> returns PCRE2_ERROR_MATCHLIMIT. A value
|
|
||||||
for the match limit may also be supplied by an item at the start of a pattern
|
for the match limit may also be supplied by an item at the start of a pattern
|
||||||
of the form
|
of the form
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -865,64 +888,38 @@ of the form
|
||||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||||
less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
|
less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
|
||||||
limit is set, less than the default.
|
limit is set, less than the default.
|
||||||
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||||
<b> uint32_t <i>value</i>);</b>
|
<b> uint32_t <i>value</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
The <i>recursion_limit</i> parameter is similar to <i>match_limit</i>, but
|
This parameter limits the depth of nested backtracking in <b>pcre2_match()</b>.
|
||||||
instead of limiting the total number of times that <b>match()</b> is called, it
|
Each time a nested backtracking point is passed, a new memory "frame" is used
|
||||||
limits the depth of recursion. The recursion depth is a smaller number than the
|
to remember the state of matching at that point. Thus, this parameter
|
||||||
total number of calls, because not all calls to <b>match()</b> are recursive.
|
indirectly limits the amount of memory that is used in a match.
|
||||||
This limit is of use only if it is set smaller than <i>match_limit</i>.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Limiting the recursion depth limits the amount of system stack that can be
|
This limit is not relevant, and is ignored, when matching is done using JIT
|
||||||
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
|
compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which uses
|
||||||
stack, the amount of heap memory that can be used. This limit is not relevant,
|
it to limit the depth of internal recursive function calls that implement
|
||||||
and is ignored, when matching is done using JIT compiled code. However, it is
|
lookaround assertions and pattern recursions. This is, therefore, an indirect
|
||||||
supported by <b>pcre2_dfa_match()</b>, which uses recursive function calls less
|
limit on the amount of system stack that is used. A recursive pattern such as
|
||||||
frequently than <b>pcre2_match()</b>, but which can be caused to use a lot of
|
/(.)(?1)/, when matched to a very long string using <b>pcre2_dfa_match()</b>,
|
||||||
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
|
can use a great deal of stack.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The default value for <i>recursion_limit</i> can be set when PCRE2 is built; the
|
The default value for the depth limit can be set when PCRE2 is built; the
|
||||||
default default is the same value as the default for <i>match_limit</i>. If the
|
default default is the same value as the default for the match limit. If the
|
||||||
limit is exceeded, <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> return
|
limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> returns
|
||||||
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
|
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
||||||
supplied by an item at the start of a pattern of the form
|
item at the start of a pattern of the form
|
||||||
<pre>
|
<pre>
|
||||||
(*LIMIT_RECURSION=ddd)
|
(*LIMIT_DEPTH=ddd)
|
||||||
</pre>
|
</pre>
|
||||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||||
less than the limit set by the caller of <b>pcre2_match()</b> or
|
less than the limit set by the caller of <b>pcre2_match()</b> or
|
||||||
<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default.
|
<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default.
|
||||||
<b>int pcre2_set_recursion_memory_management(</b>
|
|
||||||
<b> pcre2_match_context *<i>mcontext</i>,</b>
|
|
||||||
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
|
|
||||||
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
|
|
||||||
<br>
|
|
||||||
<br>
|
|
||||||
This function sets up two additional custom memory management functions for use
|
|
||||||
by <b>pcre2_match()</b> when PCRE2 is compiled to use the heap for remembering
|
|
||||||
backtracking data, instead of recursive function calls that use the system
|
|
||||||
stack. There is a discussion about PCRE2's stack usage in the
|
|
||||||
<a href="pcre2stack.html"><b>pcre2stack</b></a>
|
|
||||||
documentation. See the
|
|
||||||
<a href="pcre2build.html"><b>pcre2build</b></a>
|
|
||||||
documentation for details of how to build PCRE2.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<br><a name="SEC18" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
||||||
Using the heap for recursion is a non-standard way of building PCRE2, for use
|
|
||||||
in environments that have limited stacks. Because of the greater use of memory
|
|
||||||
management, <b>pcre2_match()</b> runs more slowly. Functions that are different
|
|
||||||
to the general custom memory functions are provided so that special-purpose
|
|
||||||
external code can be used for this case, because the memory blocks are all the
|
|
||||||
same size. The blocks are retained by <b>pcre2_match()</b> until it is about to
|
|
||||||
exit so that they can be re-used when possible during the match. In the absence
|
|
||||||
of these functions, the normal custom memory management functions are used, if
|
|
||||||
supplied, otherwise the system functions.
|
|
||||||
</P>
|
|
||||||
<br><a name="SEC17" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
|
@ -954,6 +951,13 @@ sequences the \R escape sequence matches by default. A value of
|
||||||
PCRE2_BSR_UNICODE means that \R matches any Unicode line ending sequence; a
|
PCRE2_BSR_UNICODE means that \R matches any Unicode line ending sequence; a
|
||||||
value of PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF. The
|
value of PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF. The
|
||||||
default can be overridden when a pattern is compiled.
|
default can be overridden when a pattern is compiled.
|
||||||
|
<pre>
|
||||||
|
PCRE2_CONFIG_DEPTHLIMIT
|
||||||
|
</pre>
|
||||||
|
The output is a uint32_t integer that gives the default limit for the depth of
|
||||||
|
nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions
|
||||||
|
and lookarounds in <b>pcre2_dfa_match()</b>. Further details are given with
|
||||||
|
<b>pcre2_set_depth_limit()</b> above.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_JIT
|
PCRE2_CONFIG_JIT
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -989,9 +993,9 @@ be compiled by those two libraries, but at the expense of slower matching.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_MATCHLIMIT
|
PCRE2_CONFIG_MATCHLIMIT
|
||||||
</pre>
|
</pre>
|
||||||
The output is a uint32_t integer that gives the default limit for the number of
|
The output is a uint32_t integer that gives the default match limit for
|
||||||
internal matching function calls in a <b>pcre2_match()</b> execution. Further
|
<b>pcre2_match()</b>. Further details are given with
|
||||||
details are given with <b>pcre2_match()</b> below.
|
<b>pcre2_set_match_limit()</b> above.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_NEWLINE
|
PCRE2_CONFIG_NEWLINE
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1015,20 +1019,11 @@ amount of system stack used when a pattern is compiled. It is specified when
|
||||||
PCRE2 is built; the default is 250. This limit does not take into account the
|
PCRE2 is built; the default is 250. This limit does not take into account the
|
||||||
stack that may already be used by the calling application. For finer control
|
stack that may already be used by the calling application. For finer control
|
||||||
over compilation stack usage, see <b>pcre2_set_compile_recursion_guard()</b>.
|
over compilation stack usage, see <b>pcre2_set_compile_recursion_guard()</b>.
|
||||||
<pre>
|
|
||||||
PCRE2_CONFIG_RECURSIONLIMIT
|
|
||||||
</pre>
|
|
||||||
The output is a uint32_t integer that gives the default limit for the depth of
|
|
||||||
recursion when calling the internal matching function in a <b>pcre2_match()</b>
|
|
||||||
execution. Further details are given with <b>pcre2_match()</b> below.
|
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_STACKRECURSE
|
PCRE2_CONFIG_STACKRECURSE
|
||||||
</pre>
|
</pre>
|
||||||
The output is a uint32_t integer that is set to one if internal recursion when
|
This parameter is obsolete and should not be used in new code. The output is a
|
||||||
running <b>pcre2_match()</b> is implemented by recursive function calls that use
|
uint32_t integer that is always set to zero.
|
||||||
the system stack to remember their state. This is the usual way that PCRE2 is
|
|
||||||
compiled. The output is zero if PCRE2 was compiled to use blocks of data on the
|
|
||||||
heap instead of recursive function calls.
|
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_UNICODE_VERSION
|
PCRE2_CONFIG_UNICODE_VERSION
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1047,14 +1042,14 @@ available; otherwise it is set to zero. Unicode support implies UTF support.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_CONFIG_VERSION
|
PCRE2_CONFIG_VERSION
|
||||||
</pre>
|
</pre>
|
||||||
The <i>where</i> argument should point to a buffer that is at least 12 code
|
The <i>where</i> argument should point to a buffer that is at least 24 code
|
||||||
units long. (The exact length required can be found by calling
|
units long. (The exact length required can be found by calling
|
||||||
<b>pcre2_config()</b> with <b>where</b> set to NULL.) The buffer is filled with
|
<b>pcre2_config()</b> with <b>where</b> set to NULL.) The buffer is filled with
|
||||||
the PCRE2 version string, zero-terminated. The number of code units used is
|
the PCRE2 version string, zero-terminated. The number of code units used is
|
||||||
returned. This is the length of the string plus one unit for the terminating
|
returned. This is the length of the string plus one unit for the terminating
|
||||||
zero.
|
zero.
|
||||||
<a name="compiling"></a></P>
|
<a name="compiling"></a></P>
|
||||||
<br><a name="SEC18" href="#TOC1">COMPILING A PATTERN</a><br>
|
<br><a name="SEC19" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
|
<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
|
||||||
<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
|
<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
|
||||||
|
@ -1240,13 +1235,14 @@ option is set, normal backslash processing is applied to verb names and only an
|
||||||
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
||||||
included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
|
included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
|
||||||
option is set, unescaped whitespace in verb names is skipped and #-comments are
|
option is set, unescaped whitespace in verb names is skipped and #-comments are
|
||||||
recognized, exactly as in the rest of the pattern.
|
recognized in this mode, exactly as in the rest of the pattern.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_AUTO_CALLOUT
|
PCRE2_AUTO_CALLOUT
|
||||||
</pre>
|
</pre>
|
||||||
If this bit is set, <b>pcre2_compile()</b> automatically inserts callout items,
|
If this bit is set, <b>pcre2_compile()</b> automatically inserts callout items,
|
||||||
all with number 255, before each pattern item, except immediately before or
|
all with number 255, before each pattern item, except immediately before or
|
||||||
after a callout in the pattern. For discussion of the callout facility, see the
|
after an explicit callout in the pattern. For discussion of the callout
|
||||||
|
facility, see the
|
||||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||||
documentation.
|
documentation.
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -1472,9 +1468,8 @@ and
|
||||||
<a href="pcre2unicode.html#utf32strings">UTF-32 strings</a>
|
<a href="pcre2unicode.html#utf32strings">UTF-32 strings</a>
|
||||||
in the
|
in the
|
||||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||||
document.
|
document. If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a
|
||||||
If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a negative
|
negative error code.
|
||||||
error code.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If you know that your pattern is valid, and you want to skip this check for
|
If you know that your pattern is valid, and you want to skip this check for
|
||||||
|
@ -1495,7 +1490,7 @@ in the
|
||||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||||
page. If you set PCRE2_UCP, matching one of the items it affects takes much
|
page. If you set PCRE2_UCP, matching one of the items it affects takes much
|
||||||
longer. The option is available only if PCRE2 has been compiled with Unicode
|
longer. The option is available only if PCRE2 has been compiled with Unicode
|
||||||
support.
|
support (which is the default).
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_UNGREEDY
|
PCRE2_UNGREEDY
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1525,9 +1520,9 @@ the behaviour of PCRE2 are given in the
|
||||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||||
page.
|
page.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC19" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
<br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
||||||
<P>
|
<P>
|
||||||
There are over 80 positive error codes that <b>pcre2_compile()</b> may return
|
There are nearly 100 positive error codes that <b>pcre2_compile()</b> may return
|
||||||
(via <i>errorcode</i>) if it finds an error in the pattern. There are also some
|
(via <i>errorcode</i>) if it finds an error in the pattern. There are also some
|
||||||
negative error codes that are used for invalid UTF strings. These are the same
|
negative error codes that are used for invalid UTF strings. These are the same
|
||||||
as given by <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, and are described
|
as given by <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, and are described
|
||||||
|
@ -1538,7 +1533,7 @@ error message"
|
||||||
<a href="#geterrormessage">below)</a>
|
<a href="#geterrormessage">below)</a>
|
||||||
can be called to obtain a textual error message from any error code.
|
can be called to obtain a textual error message from any error code.
|
||||||
<a name="jitcompiling"></a></P>
|
<a name="jitcompiling"></a></P>
|
||||||
<br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
<br><a name="SEC21" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
|
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
|
@ -1574,18 +1569,18 @@ documentation.
|
||||||
JIT compilation is a heavyweight optimization. It can take some time for
|
JIT compilation is a heavyweight optimization. It can take some time for
|
||||||
patterns to be analyzed, and for one-off matches and simple patterns the
|
patterns to be analyzed, and for one-off matches and simple patterns the
|
||||||
benefit of faster execution might be offset by a much slower compilation time.
|
benefit of faster execution might be offset by a much slower compilation time.
|
||||||
Most, but not all patterns can be optimized by the JIT compiler.
|
Most (but not all) patterns can be optimized by the JIT compiler.
|
||||||
<a name="localesupport"></a></P>
|
<a name="localesupport"></a></P>
|
||||||
<br><a name="SEC21" href="#TOC1">LOCALE SUPPORT</a><br>
|
<br><a name="SEC22" href="#TOC1">LOCALE SUPPORT</a><br>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 handles caseless matching, and determines whether characters are letters,
|
PCRE2 handles caseless matching, and determines whether characters are letters,
|
||||||
digits, or whatever, by reference to a set of tables, indexed by character code
|
digits, or whatever, by reference to a set of tables, indexed by character code
|
||||||
point. This applies only to characters whose code points are less than 256. By
|
point. This applies only to characters whose code points are less than 256. By
|
||||||
default, higher-valued code points never match escapes such as \w or \d.
|
default, higher-valued code points never match escapes such as \w or \d.
|
||||||
However, if PCRE2 is built with UTF support, all characters can be tested with
|
However, if PCRE2 is built with Unicode support, all characters can be tested
|
||||||
\p and \P, or, alternatively, the PCRE2_UCP option can be set when a pattern
|
with \p and \P, or, alternatively, the PCRE2_UCP option can be set when a
|
||||||
is compiled; this causes \w and friends to use Unicode property support
|
pattern is compiled; this causes \w and friends to use Unicode property
|
||||||
instead of the built-in tables.
|
support instead of the built-in tables.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The use of locales with Unicode is discouraged. If you are handling characters
|
The use of locales with Unicode is discouraged. If you are handling characters
|
||||||
|
@ -1629,10 +1624,10 @@ available for as long as it is needed.
|
||||||
The pointer that is passed (via the compile context) to <b>pcre2_compile()</b>
|
The pointer that is passed (via the compile context) to <b>pcre2_compile()</b>
|
||||||
is saved with the compiled pattern, and the same tables are used by
|
is saved with the compiled pattern, and the same tables are used by
|
||||||
<b>pcre2_match()</b> and <b>pcre_dfa_match()</b>. Thus, for any single pattern,
|
<b>pcre2_match()</b> and <b>pcre_dfa_match()</b>. Thus, for any single pattern,
|
||||||
compilation, and matching all happen in the same locale, but different patterns
|
compilation and matching both happen in the same locale, but different patterns
|
||||||
can be processed in different locales.
|
can be processed in different locales.
|
||||||
<a name="infoaboutpattern"></a></P>
|
<a name="infoaboutpattern"></a></P>
|
||||||
<br><a name="SEC22" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
<br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
|
<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
|
@ -1645,7 +1640,7 @@ pattern. The second argument specifies which piece of information is required,
|
||||||
and the third argument is a pointer to a variable to receive the data. If the
|
and the third argument is a pointer to a variable to receive the data. If the
|
||||||
third argument is NULL, the first argument is ignored, and the function returns
|
third argument is NULL, the first argument is ignored, and the function returns
|
||||||
the size in bytes of the variable that is required for the information
|
the size in bytes of the variable that is required for the information
|
||||||
requested. Otherwise, The yield of the function is zero for success, or one of
|
requested. Otherwise, the yield of the function is zero for success, or one of
|
||||||
the following negative numbers:
|
the following negative numbers:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_NULL the argument <i>code</i> was NULL
|
PCRE2_ERROR_NULL the argument <i>code</i> was NULL
|
||||||
|
@ -1698,8 +1693,8 @@ following are true:
|
||||||
.* is not in an atomic group
|
.* is not in an atomic group
|
||||||
.* is not in a capturing group that is the subject of a back reference
|
.* is not in a capturing group that is the subject of a back reference
|
||||||
PCRE2_DOTALL is in force for .*
|
PCRE2_DOTALL is in force for .*
|
||||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern.
|
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||||
PCRE2_NO_DOTSTAR_ANCHOR is not set.
|
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
||||||
</pre>
|
</pre>
|
||||||
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
|
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
|
||||||
options returned for PCRE2_INFO_ALLOPTIONS.
|
options returned for PCRE2_INFO_ALLOPTIONS.
|
||||||
|
@ -1726,6 +1721,13 @@ matches only CR, LF, or CRLF.
|
||||||
Return the highest capturing subpattern number in the pattern. In patterns
|
Return the highest capturing subpattern number in the pattern. In patterns
|
||||||
where (?| is not used, this is also the total number of capturing subpatterns.
|
where (?| is not used, this is also the total number of capturing subpatterns.
|
||||||
The third argument should point to an <b>uint32_t</b> variable.
|
The third argument should point to an <b>uint32_t</b> variable.
|
||||||
|
<pre>
|
||||||
|
PCRE2_INFO_DEPTHLIMIT
|
||||||
|
</pre>
|
||||||
|
If the pattern set a backtracking depth limit by including an item of the form
|
||||||
|
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||||
|
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||||
|
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_FIRSTBITMAP
|
PCRE2_INFO_FIRSTBITMAP
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1757,6 +1759,14 @@ argument should point to an <b>uint32_t</b> variable. In the 8-bit library, the
|
||||||
value is always less than 256. In the 16-bit library the value can be up to
|
value is always less than 256. In the 16-bit library the value can be up to
|
||||||
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
|
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
|
||||||
and up to 0xffffffff when not using UTF-32 mode.
|
and up to 0xffffffff when not using UTF-32 mode.
|
||||||
|
<pre>
|
||||||
|
PCRE2_INFO_FRAMESIZE
|
||||||
|
</pre>
|
||||||
|
Return the size (in bytes) of the data frames that are used to remember
|
||||||
|
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
|
||||||
|
without the use of JIT. The third argument should point to an <b>size_t</b>
|
||||||
|
variable. The frame size depends on the number of capturing parentheses in the
|
||||||
|
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_HASBACKSLASHC
|
PCRE2_INFO_HASBACKSLASHC
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1767,7 +1777,8 @@ argument should point to an <b>uint32_t</b> variable.
|
||||||
</pre>
|
</pre>
|
||||||
Return 1 if the pattern contains any explicit matches for CR or LF characters,
|
Return 1 if the pattern contains any explicit matches for CR or LF characters,
|
||||||
otherwise 0. The third argument should point to an <b>uint32_t</b> variable. An
|
otherwise 0. The third argument should point to an <b>uint32_t</b> variable. An
|
||||||
explicit match is either a literal CR or LF character, or \r or \n.
|
explicit match is either a literal CR or LF character, or \r or \n or one of
|
||||||
|
the equivalent hexadecimal or octal escape sequences.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_JCHANGED
|
PCRE2_INFO_JCHANGED
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1904,7 +1915,7 @@ different for each compiled pattern.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_NEWLINE
|
PCRE2_INFO_NEWLINE
|
||||||
</pre>
|
</pre>
|
||||||
The output is a <b>uint32_t</b> with one of the following values:
|
The output is one of the following <b>uint32_t</b> values:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_NEWLINE_CR Carriage return (CR)
|
PCRE2_NEWLINE_CR Carriage return (CR)
|
||||||
PCRE2_NEWLINE_LF Linefeed (LF)
|
PCRE2_NEWLINE_LF Linefeed (LF)
|
||||||
|
@ -1912,15 +1923,8 @@ The output is a <b>uint32_t</b> with one of the following values:
|
||||||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||||
</pre>
|
</pre>
|
||||||
This specifies the default character sequence that will be recognized as
|
This identifies the character sequence that will be recognized as meaning
|
||||||
meaning "newline" while matching.
|
"newline" while matching.
|
||||||
<pre>
|
|
||||||
PCRE2_INFO_RECURSIONLIMIT
|
|
||||||
</pre>
|
|
||||||
If the pattern set a recursion limit by including an item of the form
|
|
||||||
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
|
|
||||||
argument should point to an unsigned 32-bit integer. If no such value has been
|
|
||||||
set, the call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
|
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_INFO_SIZE
|
PCRE2_INFO_SIZE
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1933,7 +1937,7 @@ value returned by this option, because there are cases where the code that
|
||||||
calculates the size has to over-estimate. Processing a pattern with the JIT
|
calculates the size has to over-estimate. Processing a pattern with the JIT
|
||||||
compiler does not alter the value returned by this option.
|
compiler does not alter the value returned by this option.
|
||||||
<a name="infoaboutcallouts"></a></P>
|
<a name="infoaboutcallouts"></a></P>
|
||||||
<br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br>
|
<br><a name="SEC24" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_callout_enumerate(const pcre2_code *<i>code</i>,</b>
|
<b>int pcre2_callout_enumerate(const pcre2_code *<i>code</i>,</b>
|
||||||
<b> int (*<i>callback</i>)(pcre2_callout_enumerate_block *, void *),</b>
|
<b> int (*<i>callback</i>)(pcre2_callout_enumerate_block *, void *),</b>
|
||||||
|
@ -1952,7 +1956,7 @@ contents of the callout enumeration block are described in the
|
||||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||||
documentation, which also gives further details about callouts.
|
documentation, which also gives further details about callouts.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC24" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
|
<br><a name="SEC25" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
|
||||||
<P>
|
<P>
|
||||||
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||||
later, subject to a number of restrictions. The functions whose names begin
|
later, subject to a number of restrictions. The functions whose names begin
|
||||||
|
@ -1961,7 +1965,7 @@ the
|
||||||
<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
|
<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
|
||||||
documentation.
|
documentation.
|
||||||
<a name="matchdatablock"></a></P>
|
<a name="matchdatablock"></a></P>
|
||||||
<br><a name="SEC25" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
<br><a name="SEC26" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
|
<b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
|
||||||
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||||
|
@ -1986,9 +1990,9 @@ Before calling <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or
|
||||||
the creation functions above. For <b>pcre2_match_data_create()</b>, the first
|
the creation functions above. For <b>pcre2_match_data_create()</b>, the first
|
||||||
argument is the number of pairs of offsets in the <i>ovector</i>. One pair of
|
argument is the number of pairs of offsets in the <i>ovector</i>. One pair of
|
||||||
offsets is required to identify the string that matched the whole pattern, with
|
offsets is required to identify the string that matched the whole pattern, with
|
||||||
another pair for each captured substring. For example, a value of 4 creates
|
an additional pair for each captured substring. For example, a value of 4
|
||||||
enough space to record the matched portion of the subject plus three captured
|
creates enough space to record the matched portion of the subject plus three
|
||||||
substrings. A minimum of at least 1 pair is imposed by
|
captured substrings. A minimum of at least 1 pair is imposed by
|
||||||
<b>pcre2_match_data_create()</b>, so it is always possible to return the overall
|
<b>pcre2_match_data_create()</b>, so it is always possible to return the overall
|
||||||
matched string.
|
matched string.
|
||||||
</P>
|
</P>
|
||||||
|
@ -2032,7 +2036,7 @@ match data block (for that match) have taken place.
|
||||||
When a match data block itself is no longer needed, it should be freed by
|
When a match data block itself is no longer needed, it should be freed by
|
||||||
calling <b>pcre2_match_data_free()</b>.
|
calling <b>pcre2_match_data_free()</b>.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC26" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
<br><a name="SEC27" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||||
|
@ -2126,9 +2130,11 @@ character is CR followed by LF, advance the starting offset by two characters
|
||||||
instead of one.
|
instead of one.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If a non-zero starting offset is passed when the pattern is anchored, one
|
If a non-zero starting offset is passed when the pattern is anchored, an single
|
||||||
attempt to match at the given offset is made. This can only succeed if the
|
attempt to match at the given offset is made. This can only succeed if the
|
||||||
pattern does not require the match to be at the start of the subject.
|
pattern does not require the match to be at the start of the subject. In other
|
||||||
|
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
|
||||||
|
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \A.
|
||||||
<a name="matchoptions"></a></P>
|
<a name="matchoptions"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Option bits for <b>pcre2_match()</b>
|
Option bits for <b>pcre2_match()</b>
|
||||||
|
@ -2142,9 +2148,9 @@ described below.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
|
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
|
||||||
compiler. If it is set, JIT matching is disabled and the normal interpretive
|
compiler. If it is set, JIT matching is disabled and the interpretive code in
|
||||||
code in <b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT (obviously), the
|
<b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT (obviously), the remaining
|
||||||
remaining options are supported for JIT matching.
|
options are supported for JIT matching.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ANCHORED
|
PCRE2_ANCHORED
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2229,13 +2235,13 @@ page.
|
||||||
If you know that your subject is valid, and you want to skip these checks for
|
If you know that your subject is valid, and you want to skip these checks for
|
||||||
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
|
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
|
||||||
<b>pcre2_match()</b>. You might want to do this for the second and subsequent
|
<b>pcre2_match()</b>. You might want to do this for the second and subsequent
|
||||||
calls to <b>pcre2_match()</b> if you are making repeated calls to find all the
|
calls to <b>pcre2_match()</b> if you are making repeated calls to find other
|
||||||
matches in a single subject string.
|
matches in the same subject string.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid string
|
WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
|
||||||
as a subject, or an invalid value of <i>startoffset</i>, is undefined. Your
|
string as a subject, or an invalid value of <i>startoffset</i>, is undefined.
|
||||||
program may crash or loop indefinitely.
|
Your program may crash or loop indefinitely.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_PARTIAL_HARD
|
PCRE2_PARTIAL_HARD
|
||||||
PCRE2_PARTIAL_SOFT
|
PCRE2_PARTIAL_SOFT
|
||||||
|
@ -2262,7 +2268,7 @@ examples, in the
|
||||||
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
||||||
documentation.
|
documentation.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC27" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
<br><a name="SEC28" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
||||||
<P>
|
<P>
|
||||||
When PCRE2 is built, a default newline convention is set; this is usually the
|
When PCRE2 is built, a default newline convention is set; this is usually the
|
||||||
standard convention for the operating system. The default can be overridden in
|
standard convention for the operating system. The default can be overridden in
|
||||||
|
@ -2294,15 +2300,15 @@ reference, and so advances only by one character after the first failure.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
An explicit match for CR of LF is either a literal appearance of one of those
|
An explicit match for CR of LF is either a literal appearance of one of those
|
||||||
characters in the pattern, or one of the \r or \n escape sequences. Implicit
|
characters in the pattern, or one of the \r or \n or equivalent octal or
|
||||||
matches such as [^X] do not count, nor does \s, even though it includes CR and
|
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
|
||||||
LF in the characters that it matches.
|
does \s, even though it includes CR and LF in the characters that it matches.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
||||||
valid newline sequence and explicit \r or \n escapes appear in the pattern.
|
valid newline sequence and explicit \r or \n escapes appear in the pattern.
|
||||||
<a name="matchedstrings"></a></P>
|
<a name="matchedstrings"></a></P>
|
||||||
<br><a name="SEC28" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
<br><a name="SEC29" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
|
<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
|
@ -2352,12 +2358,12 @@ identify the part of the subject that was partially matched. See the
|
||||||
documentation for details of partial matching.
|
documentation for details of partial matching.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
After a successful match, the first pair of offsets identifies the portion of
|
After a fully successful match, the first pair of offsets identifies the
|
||||||
the subject string that was matched by the entire pattern. The next pair is
|
portion of the subject string that was matched by the entire pattern. The next
|
||||||
used for the first capturing subpattern, and so on. The value returned by
|
pair is used for the first captured substring, and so on. The value returned by
|
||||||
<b>pcre2_match()</b> is one more than the highest numbered pair that has been
|
<b>pcre2_match()</b> is one more than the highest numbered pair that has been
|
||||||
set. For example, if two substrings have been captured, the returned value is
|
set. For example, if two substrings have been captured, the returned value is
|
||||||
3. If there are no capturing subpatterns, the return value from a successful
|
3. If there are no captured substrings, the return value from a successful
|
||||||
match is 1, indicating that just the first pair of offsets has been set.
|
match is 1, indicating that just the first pair of offsets has been set.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -2375,11 +2381,7 @@ returned.
|
||||||
If the ovector is too small to hold all the captured substring offsets, as much
|
If the ovector is too small to hold all the captured substring offsets, as much
|
||||||
as possible is filled in, and the function returns a value of zero. If captured
|
as possible is filled in, and the function returns a value of zero. If captured
|
||||||
substrings are not of interest, <b>pcre2_match()</b> may be called with a match
|
substrings are not of interest, <b>pcre2_match()</b> may be called with a match
|
||||||
data block whose ovector is of minimum length (that is, one pair). However, if
|
data block whose ovector is of minimum length (that is, one pair).
|
||||||
the pattern contains back references and the <i>ovector</i> is not big enough to
|
|
||||||
remember the related substrings, PCRE2 has to get additional memory for use
|
|
||||||
during matching. Thus it is usually advisable to set up a match data block
|
|
||||||
containing an ovector of reasonable size.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
It is possible for capturing subpattern number <i>n+1</i> to match some part of
|
It is possible for capturing subpattern number <i>n+1</i> to match some part of
|
||||||
|
@ -2405,7 +2407,7 @@ parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by
|
||||||
<b>pcre2_match()</b>. The other elements retain whatever values they previously
|
<b>pcre2_match()</b>. The other elements retain whatever values they previously
|
||||||
had.
|
had.
|
||||||
<a name="matchotherdata"></a></P>
|
<a name="matchotherdata"></a></P>
|
||||||
<br><a name="SEC29" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
<br><a name="SEC30" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
|
<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
|
@ -2455,7 +2457,7 @@ the code unit offset of the invalid UTF character. Details are given in the
|
||||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||||
page.
|
page.
|
||||||
<a name="errorlist"></a></P>
|
<a name="errorlist"></a></P>
|
||||||
<br><a name="SEC30" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
<br><a name="SEC31" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
||||||
<P>
|
<P>
|
||||||
If <b>pcre2_match()</b> fails, it returns a negative number. This can be
|
If <b>pcre2_match()</b> fails, it returns a negative number. This can be
|
||||||
converted to a text string by calling the <b>pcre2_get_error_message()</b>
|
converted to a text string by calling the <b>pcre2_get_error_message()</b>
|
||||||
|
@ -2487,8 +2489,9 @@ returned when the magic number is not present.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_BADMODE
|
PCRE2_ERROR_BADMODE
|
||||||
</pre>
|
</pre>
|
||||||
This error is given when a pattern that was compiled by the 8-bit library is
|
This error is given when a compiled pattern is passed to a function in a
|
||||||
passed to a 16-bit or 32-bit library function, or vice versa.
|
library of a different code unit width, for example, a pattern compiled by
|
||||||
|
the 8-bit library is passed to a 16-bit or 32-bit library function.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_BADOFFSET
|
PCRE2_ERROR_BADOFFSET
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2512,20 +2515,15 @@ use by callout functions that want to cause <b>pcre2_match()</b> or
|
||||||
<b>pcre2_callout_enumerate()</b> to return a distinctive error code. See the
|
<b>pcre2_callout_enumerate()</b> to return a distinctive error code. See the
|
||||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||||
documentation for details.
|
documentation for details.
|
||||||
|
<pre>
|
||||||
|
PCRE2_ERROR_DEPTHLIMIT
|
||||||
|
</pre>
|
||||||
|
The nested backtracking depth limit was reached.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_INTERNAL
|
PCRE2_ERROR_INTERNAL
|
||||||
</pre>
|
</pre>
|
||||||
An unexpected internal error has occurred. This error could be caused by a bug
|
An unexpected internal error has occurred. This error could be caused by a bug
|
||||||
in PCRE2 or by overwriting of the compiled pattern.
|
in PCRE2 or by overwriting of the compiled pattern.
|
||||||
<pre>
|
|
||||||
PCRE2_ERROR_JIT_BADOPTION
|
|
||||||
</pre>
|
|
||||||
This error is returned when a pattern that was successfully studied using JIT
|
|
||||||
is being matched, but the matching mode (partial or complete match) does not
|
|
||||||
correspond to any JIT compilation mode. When the JIT fast path function is
|
|
||||||
used, this error may be also given for invalid options. See the
|
|
||||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
|
||||||
documentation for more details.
|
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_JIT_STACKLIMIT
|
PCRE2_ERROR_JIT_STACKLIMIT
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2537,15 +2535,13 @@ documentation for more details.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_MATCHLIMIT
|
PCRE2_ERROR_MATCHLIMIT
|
||||||
</pre>
|
</pre>
|
||||||
The backtracking limit was reached.
|
The backtracking match limit was reached.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_NOMEMORY
|
PCRE2_ERROR_NOMEMORY
|
||||||
</pre>
|
</pre>
|
||||||
If a pattern contains back references, but the ovector is not big enough to
|
If a pattern contains many nested backtracking points, heap memory is used to
|
||||||
remember the referenced substrings, PCRE2 gets a block of memory at the start
|
remember them. This error is given when the memory allocation function (default
|
||||||
of matching to use for this purpose. There are some other special cases where
|
or custom) fails.
|
||||||
extra memory is needed during matching. This error is given when memory cannot
|
|
||||||
be obtained.
|
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ERROR_NULL
|
PCRE2_ERROR_NULL
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2561,12 +2557,8 @@ in the subject string. Some simple patterns that might do this are detected and
|
||||||
faulted at compile time, but more complicated cases, in particular mutual
|
faulted at compile time, but more complicated cases, in particular mutual
|
||||||
recursions between two different subpatterns, cannot be detected until matching
|
recursions between two different subpatterns, cannot be detected until matching
|
||||||
is attempted.
|
is attempted.
|
||||||
<pre>
|
|
||||||
PCRE2_ERROR_RECURSIONLIMIT
|
|
||||||
</pre>
|
|
||||||
The internal recursion limit was reached.
|
|
||||||
<a name="geterrormessage"></a></P>
|
<a name="geterrormessage"></a></P>
|
||||||
<br><a name="SEC31" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
|
<br><a name="SEC32" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
|
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>bufflen</i>);</b>
|
<b> PCRE2_SIZE <i>bufflen</i>);</b>
|
||||||
|
@ -2587,7 +2579,7 @@ returned. If the buffer is too small, the message is truncated (but still with
|
||||||
a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned.
|
a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned.
|
||||||
None of the messages are very long; a buffer size of 120 code units is ample.
|
None of the messages are very long; a buffer size of 120 code units is ample.
|
||||||
<a name="extractbynumber"></a></P>
|
<a name="extractbynumber"></a></P>
|
||||||
<br><a name="SEC32" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
<br><a name="SEC33" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
|
<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
|
||||||
<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
|
<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
|
||||||
|
@ -2684,7 +2676,7 @@ The substring did not participate in the match. For example, if the pattern is
|
||||||
(abc)|(def) and the subject is "def", and the ovector contains at least two
|
(abc)|(def) and the subject is "def", and the ovector contains at least two
|
||||||
capturing slots, substring number 1 is unset.
|
capturing slots, substring number 1 is unset.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC33" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
<br><a name="SEC34" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
|
<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
|
||||||
<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
|
<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
|
||||||
|
@ -2723,7 +2715,7 @@ can be distinguished from a genuine zero-length substring by inspecting the
|
||||||
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
||||||
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
|
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
|
||||||
<a name="extractbyname"></a></P>
|
<a name="extractbyname"></a></P>
|
||||||
<br><a name="SEC34" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
<br><a name="SEC35" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
|
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
|
||||||
<b> PCRE2_SPTR <i>name</i>);</b>
|
<b> PCRE2_SPTR <i>name</i>);</b>
|
||||||
|
@ -2755,8 +2747,8 @@ calling <b>pcre2_substring_number_from_name()</b>. The first argument is the
|
||||||
compiled pattern, and the second is the name. The yield of the function is the
|
compiled pattern, and the second is the name. The yield of the function is the
|
||||||
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
|
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
|
||||||
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
|
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
|
||||||
that name. Given the number, you can extract the substring directly, or use one
|
that name. Given the number, you can extract the substring directly from the
|
||||||
of the functions described above.
|
ovector, or use one of the "bynumber" functions described above.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
For convenience, there are also "byname" functions that correspond to the
|
For convenience, there are also "byname" functions that correspond to the
|
||||||
|
@ -2783,7 +2775,7 @@ names are not included in the compiled code. The matching process uses only
|
||||||
numbers. For this reason, the use of different names for subpatterns of the
|
numbers. For this reason, the use of different names for subpatterns of the
|
||||||
same number causes an error at compile time.
|
same number causes an error at compile time.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC35" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
<br><a name="SEC36" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||||
|
@ -2990,7 +2982,7 @@ obtained by calling the <b>pcre2_get_error_message()</b> function (see
|
||||||
"Obtaining a textual error message"
|
"Obtaining a textual error message"
|
||||||
<a href="#geterrormessage">above).</a>
|
<a href="#geterrormessage">above).</a>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC36" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
<br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
|
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
|
||||||
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
|
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
|
||||||
|
@ -3035,7 +3027,7 @@ in the section entitled <i>Information about a pattern</i>. Given all the
|
||||||
relevant entries for the name, you can extract each of their numbers, and hence
|
relevant entries for the name, you can extract each of their numbers, and hence
|
||||||
the captured data.
|
the captured data.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC37" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
<br><a name="SEC38" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
||||||
<P>
|
<P>
|
||||||
The traditional matching function uses a similar algorithm to Perl, which stops
|
The traditional matching function uses a similar algorithm to Perl, which stops
|
||||||
when it finds the first match at a given point in the subject. If you want to
|
when it finds the first match at a given point in the subject. If you want to
|
||||||
|
@ -3053,7 +3045,7 @@ substring. Then return 1, which forces <b>pcre2_match()</b> to backtrack and try
|
||||||
other alternatives. Ultimately, when it runs out of matches,
|
other alternatives. Ultimately, when it runs out of matches,
|
||||||
<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
|
<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
|
||||||
<a name="dfamatch"></a></P>
|
<a name="dfamatch"></a></P>
|
||||||
<br><a name="SEC38" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
<br><a name="SEC39" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||||
|
@ -3064,11 +3056,12 @@ other alternatives. Ultimately, when it runs out of matches,
|
||||||
<P>
|
<P>
|
||||||
The function <b>pcre2_dfa_match()</b> is called to match a subject string
|
The function <b>pcre2_dfa_match()</b> is called to match a subject string
|
||||||
against a compiled pattern, using a matching algorithm that scans the subject
|
against a compiled pattern, using a matching algorithm that scans the subject
|
||||||
string just once, and does not backtrack. This has different characteristics to
|
string just once (not counting lookaround assertions), and does not backtrack.
|
||||||
the normal algorithm, and is not compatible with Perl. Some of the features of
|
This has different characteristics to the normal algorithm, and is not
|
||||||
PCRE2 patterns are not supported. Nevertheless, there are times when this kind
|
compatible with Perl. Some of the features of PCRE2 patterns are not supported.
|
||||||
of matching can be useful. For a discussion of the two matching algorithms, and
|
Nevertheless, there are times when this kind of matching can be useful. For a
|
||||||
a list of features that <b>pcre2_dfa_match()</b> does not support, see the
|
discussion of the two matching algorithms, and a list of features that
|
||||||
|
<b>pcre2_dfa_match()</b> does not support, see the
|
||||||
<a href="pcre2matching.html"><b>pcre2matching</b></a>
|
<a href="pcre2matching.html"><b>pcre2matching</b></a>
|
||||||
documentation.
|
documentation.
|
||||||
</P>
|
</P>
|
||||||
|
@ -3248,13 +3241,13 @@ some plausibility checks are made on the contents of the workspace, which
|
||||||
should contain data about the previous partial match. If any of these checks
|
should contain data about the previous partial match. If any of these checks
|
||||||
fail, this error is given.
|
fail, this error is given.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC39" href="#TOC1">SEE ALSO</a><br>
|
<br><a name="SEC40" href="#TOC1">SEE ALSO</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
|
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
|
||||||
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
|
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
|
||||||
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
|
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC40" href="#TOC1">AUTHOR</a><br>
|
<br><a name="SEC41" href="#TOC1">AUTHOR</a><br>
|
||||||
<P>
|
<P>
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
<br>
|
<br>
|
||||||
|
@ -3263,9 +3256,9 @@ University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
<br>
|
<br>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC41" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 21 March 2017
|
Last updated: 27 March 2017
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2017 University of Cambridge.
|
Copyright © 1997-2017 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
421
doc/pcre2.txt
421
doc/pcre2.txt
|
@ -281,19 +281,14 @@ PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS
|
||||||
int (*callout_function)(pcre2_callout_block *, void *),
|
int (*callout_function)(pcre2_callout_block *, void *),
|
||||||
void *callout_data);
|
void *callout_data);
|
||||||
|
|
||||||
int pcre2_set_match_limit(pcre2_match_context *mcontext,
|
|
||||||
uint32_t value);
|
|
||||||
|
|
||||||
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
|
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
|
||||||
PCRE2_SIZE value);
|
PCRE2_SIZE value);
|
||||||
|
|
||||||
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
|
int pcre2_set_match_limit(pcre2_match_context *mcontext,
|
||||||
uint32_t value);
|
uint32_t value);
|
||||||
|
|
||||||
int pcre2_set_recursion_memory_management(
|
int pcre2_set_depth_limit(pcre2_match_context *mcontext,
|
||||||
pcre2_match_context *mcontext,
|
uint32_t value);
|
||||||
void *(*private_malloc)(PCRE2_SIZE, void *),
|
|
||||||
void (*private_free)(void *, void *), void *memory_data);
|
|
||||||
|
|
||||||
|
|
||||||
PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS
|
PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS
|
||||||
|
@ -397,6 +392,22 @@ PCRE2 NATIVE API AUXILIARY FUNCTIONS
|
||||||
int pcre2_config(uint32_t what, void *where);
|
int pcre2_config(uint32_t what, void *where);
|
||||||
|
|
||||||
|
|
||||||
|
PCRE2 NATIVE API OBSOLETE FUNCTIONS
|
||||||
|
|
||||||
|
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
|
||||||
|
uint32_t value);
|
||||||
|
|
||||||
|
int pcre2_set_recursion_memory_management(
|
||||||
|
pcre2_match_context *mcontext,
|
||||||
|
void *(*private_malloc)(PCRE2_SIZE, void *),
|
||||||
|
void (*private_free)(void *, void *), void *memory_data);
|
||||||
|
|
||||||
|
These functions became obsolete at release 10.30 and are retained only
|
||||||
|
for backward compatibility. They should not be used in new code. The
|
||||||
|
first is replaced by pcre2_set_depth_limit(); the second is no longer
|
||||||
|
needed and no longer has any effect (it always returns zero).
|
||||||
|
|
||||||
|
|
||||||
PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
|
PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
|
||||||
|
|
||||||
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit
|
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit
|
||||||
|
@ -449,7 +460,7 @@ PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
|
||||||
when processing any particular pattern to use only functions from a
|
when processing any particular pattern to use only functions from a
|
||||||
single library. For example, if you want to run a match using a pat-
|
single library. For example, if you want to run a match using a pat-
|
||||||
tern that was compiled with pcre2_compile_16(), you must do so with
|
tern that was compiled with pcre2_compile_16(), you must do so with
|
||||||
pcre2_match_16(), not pcre2_match_8().
|
pcre2_match_16(), not pcre2_match_8() or pcre2_match_32.
|
||||||
|
|
||||||
In the function summaries above, and in the rest of this document and
|
In the function summaries above, and in the rest of this document and
|
||||||
other PCRE2 documents, functions and data types are described using
|
other PCRE2 documents, functions and data types are described using
|
||||||
|
@ -474,19 +485,26 @@ PCRE2 API OVERVIEW
|
||||||
program against a non-dll PCRE2 library, you must define PCRE2_STATIC
|
program against a non-dll PCRE2 library, you must define PCRE2_STATIC
|
||||||
before including pcre2.h.
|
before including pcre2.h.
|
||||||
|
|
||||||
The functions pcre2_compile(), and pcre2_match() are used for compiling
|
The functions pcre2_compile() and pcre2_match() are used for compiling
|
||||||
and matching regular expressions in a Perl-compatible manner. A sample
|
and matching regular expressions in a Perl-compatible manner. A sample
|
||||||
program that demonstrates the simplest way of using them is provided in
|
program that demonstrates the simplest way of using them is provided in
|
||||||
the file called pcre2demo.c in the PCRE2 source distribution. A listing
|
the file called pcre2demo.c in the PCRE2 source distribution. A listing
|
||||||
of this program is given in the pcre2demo documentation, and the
|
of this program is given in the pcre2demo documentation, and the
|
||||||
pcre2sample documentation describes how to compile and run it.
|
pcre2sample documentation describes how to compile and run it.
|
||||||
|
|
||||||
Just-in-time compiler support is an optional feature of PCRE2 that can
|
The compiling and matching functions recognize various options that are
|
||||||
be built in appropriate hardware environments. It greatly speeds up the
|
passed as bits in an options argument. There are also some more compli-
|
||||||
matching performance of many patterns. Programs can request that it be
|
cated parameters such as custom memory management functions and
|
||||||
used if available, by calling pcre2_jit_compile() after a pattern has
|
resource limits that are passed in "contexts" (which are just memory
|
||||||
been successfully compiled by pcre2_compile(). This does nothing if JIT
|
blocks, described below). Simple applications do not need to make use
|
||||||
support is not available.
|
of contexts.
|
||||||
|
|
||||||
|
Just-in-time (JIT) compiler support is an optional feature of PCRE2
|
||||||
|
that can be built in appropriate hardware environments. It greatly
|
||||||
|
speeds up the matching performance of many patterns. Programs can
|
||||||
|
request that it be used if available by calling pcre2_jit_compile()
|
||||||
|
after a pattern has been successfully compiled by pcre2_compile(). This
|
||||||
|
does nothing if JIT support is not available.
|
||||||
|
|
||||||
More complicated programs might need to make use of the specialist
|
More complicated programs might need to make use of the specialist
|
||||||
functions pcre2_jit_stack_create(), pcre2_jit_stack_free(), and
|
functions pcre2_jit_stack_create(), pcre2_jit_stack_free(), and
|
||||||
|
@ -495,14 +513,15 @@ PCRE2 API OVERVIEW
|
||||||
|
|
||||||
JIT matching is automatically used by pcre2_match() if it is available,
|
JIT matching is automatically used by pcre2_match() if it is available,
|
||||||
unless the PCRE2_NO_JIT option is set. There is also a direct interface
|
unless the PCRE2_NO_JIT option is set. There is also a direct interface
|
||||||
for JIT matching, which gives improved performance. The JIT-specific
|
for JIT matching, which gives improved performance at the expense of
|
||||||
functions are discussed in the pcre2jit documentation.
|
less sanity checking. The JIT-specific functions are discussed in the
|
||||||
|
pcre2jit documentation.
|
||||||
|
|
||||||
A second matching function, pcre2_dfa_match(), which is not Perl-com-
|
A second matching function, pcre2_dfa_match(), which is not Perl-com-
|
||||||
patible, is also provided. This uses a different algorithm for the
|
patible, is also provided. This uses a different algorithm for the
|
||||||
matching. The alternative algorithm finds all possible matches (at a
|
matching. The alternative algorithm finds all possible matches (at a
|
||||||
given point in the subject), and scans the subject just once (unless
|
given point in the subject), and scans the subject just once (unless
|
||||||
there are lookbehind assertions). However, this algorithm does not
|
there are lookaround assertions). However, this algorithm does not
|
||||||
return captured substrings. A description of the two matching algo-
|
return captured substrings. A description of the two matching algo-
|
||||||
rithms and their advantages and disadvantages is given in the
|
rithms and their advantages and disadvantages is given in the
|
||||||
pcre2matching documentation. There is no JIT support for
|
pcre2matching documentation. There is no JIT support for
|
||||||
|
@ -603,9 +622,9 @@ MULTITHREADING
|
||||||
is thread-safe, that is, the same compiled pattern can be used by more
|
is thread-safe, that is, the same compiled pattern can be used by more
|
||||||
than one thread simultaneously. For example, an application can compile
|
than one thread simultaneously. For example, an application can compile
|
||||||
all its patterns at the start, before forking off multiple threads that
|
all its patterns at the start, before forking off multiple threads that
|
||||||
use them. However, if the just-in-time optimization feature is being
|
use them. However, if the just-in-time (JIT) optimization feature is
|
||||||
used, it needs separate memory stack areas for each thread. See the
|
being used, it needs separate memory stack areas for each thread. See
|
||||||
pcre2jit documentation for more details.
|
the pcre2jit documentation for more details.
|
||||||
|
|
||||||
In a more complicated situation, where patterns are compiled only when
|
In a more complicated situation, where patterns are compiled only when
|
||||||
they are first needed, but are still shared between threads, pointers
|
they are first needed, but are still shared between threads, pointers
|
||||||
|
@ -650,10 +669,10 @@ MULTITHREADING
|
||||||
|
|
||||||
Match blocks
|
Match blocks
|
||||||
|
|
||||||
The matching functions need a block of memory for working space and for
|
The matching functions need a block of memory for storing the results
|
||||||
storing the results of a match. This includes details of what was
|
of a match. This includes details of what was matched, as well as addi-
|
||||||
matched, as well as additional information such as the name of a
|
tional information such as the name of a (*MARK) setting. Each thread
|
||||||
(*MARK) setting. Each thread must provide its own copy of this memory.
|
must provide its own copy of this memory.
|
||||||
|
|
||||||
|
|
||||||
PCRE2 CONTEXTS
|
PCRE2 CONTEXTS
|
||||||
|
@ -718,15 +737,15 @@ PCRE2 CONTEXTS
|
||||||
|
|
||||||
The compile context
|
The compile context
|
||||||
|
|
||||||
A compile context is required if you want to change the default values
|
A compile context is required if you want to provide an external func-
|
||||||
of any of the following compile-time parameters:
|
tion for stack checking during compilation or to change the default
|
||||||
|
values of any of the following compile-time parameters:
|
||||||
|
|
||||||
What \R matches (Unicode newlines or CR, LF, CRLF only)
|
What \R matches (Unicode newlines or CR, LF, CRLF only)
|
||||||
PCRE2's character tables
|
PCRE2's character tables
|
||||||
The newline character sequence
|
The newline character sequence
|
||||||
The compile time nested parentheses limit
|
The compile time nested parentheses limit
|
||||||
The maximum length of the pattern string
|
The maximum length of the pattern string
|
||||||
An external function for stack checking
|
|
||||||
|
|
||||||
A compile context is also required if you are using custom memory man-
|
A compile context is also required if you are using custom memory man-
|
||||||
agement. If none of these apply, just pass NULL as the context argu-
|
agement. If none of these apply, just pass NULL as the context argu-
|
||||||
|
@ -766,12 +785,12 @@ PCRE2 CONTEXTS
|
||||||
int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext,
|
int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext,
|
||||||
PCRE2_SIZE value);
|
PCRE2_SIZE value);
|
||||||
|
|
||||||
This sets a maximum length, in code units, for the pattern string that
|
This sets a maximum length, in code units, for any pattern string that
|
||||||
is to be compiled. If the pattern is longer, an error is generated.
|
is compiled with this context. If the pattern is longer, an error is
|
||||||
This facility is provided so that applications that accept patterns
|
generated. This facility is provided so that applications that accept
|
||||||
from external sources can limit their size. The default is the largest
|
patterns from external sources can limit their size. The default is the
|
||||||
number that a PCRE2_SIZE variable can hold, which is effectively unlim-
|
largest number that a PCRE2_SIZE variable can hold, which is effec-
|
||||||
ited.
|
tively unlimited.
|
||||||
|
|
||||||
int pcre2_set_newline(pcre2_compile_context *ccontext,
|
int pcre2_set_newline(pcre2_compile_context *ccontext,
|
||||||
uint32_t value);
|
uint32_t value);
|
||||||
|
@ -782,11 +801,14 @@ PCRE2 CONTEXTS
|
||||||
two-character sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any
|
two-character sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any
|
||||||
of the above), or PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
of the above), or PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
||||||
|
|
||||||
When a pattern is compiled with the PCRE2_EXTENDED option, the value of
|
A pattern can override the value set in the compile context by starting
|
||||||
this parameter affects the recognition of white space and the end of
|
with a sequence such as (*CRLF). See the pcre2pattern page for details.
|
||||||
internal comments starting with #. The value is saved with the compiled
|
|
||||||
pattern for subsequent use by the JIT compiler and by the two inter-
|
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
|
||||||
preted matching functions, pcre2_match() and pcre2_dfa_match().
|
convention affects the recognition of white space and the end of inter-
|
||||||
|
nal comments starting with #. The value is saved with the compiled pat-
|
||||||
|
tern for subsequent use by the JIT compiler and by the two interpreted
|
||||||
|
matching functions, pcre2_match() and pcre2_dfa_match().
|
||||||
|
|
||||||
int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext,
|
int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext,
|
||||||
uint32_t value);
|
uint32_t value);
|
||||||
|
@ -815,17 +837,16 @@ PCRE2 CONTEXTS
|
||||||
|
|
||||||
The match context
|
The match context
|
||||||
|
|
||||||
A match context is required if you want to change the default values of
|
A match context is required if you want to:
|
||||||
any of the following match-time parameters:
|
|
||||||
|
|
||||||
A callout function
|
Set up a callout function
|
||||||
The offset limit for matching an unanchored pattern
|
Set an offset limit for matching an unanchored pattern
|
||||||
The limit for calling match() (see below)
|
Change the backtracking match limit
|
||||||
The limit for calling match() recursively
|
Change the backtracking depth limit
|
||||||
|
Set custom memory management specifically for the match
|
||||||
|
|
||||||
A match context is also required if you are using custom memory manage-
|
If none of these apply, just pass NULL as the context argument of
|
||||||
ment. If none of these apply, just pass NULL as the context argument
|
pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match().
|
||||||
of pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match().
|
|
||||||
|
|
||||||
A match context is created, copied, and freed by the following func-
|
A match context is created, copied, and freed by the following func-
|
||||||
tions:
|
tions:
|
||||||
|
@ -846,9 +867,9 @@ PCRE2 CONTEXTS
|
||||||
int (*callout_function)(pcre2_callout_block *, void *),
|
int (*callout_function)(pcre2_callout_block *, void *),
|
||||||
void *callout_data);
|
void *callout_data);
|
||||||
|
|
||||||
This sets up a "callout" function, which PCRE2 will call at specified
|
This sets up a "callout" function for PCRE2 to call at specified points
|
||||||
points during a matching operation. Details are given in the pcre2call-
|
during a matching operation. Details are given in the pcre2callout doc-
|
||||||
out documentation.
|
umentation.
|
||||||
|
|
||||||
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
|
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
|
||||||
PCRE2_SIZE value);
|
PCRE2_SIZE value);
|
||||||
|
@ -863,10 +884,11 @@ PCRE2 CONTEXTS
|
||||||
argument of pcre2_match() or pcre2_dfa_match() is greater than the off-
|
argument of pcre2_match() or pcre2_dfa_match() is greater than the off-
|
||||||
set limit.
|
set limit.
|
||||||
|
|
||||||
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when
|
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT
|
||||||
calling pcre2_compile() so that when JIT is in use, different code can
|
option when calling pcre2_compile() so that when JIT is in use, differ-
|
||||||
be compiled. If a match is started with a non-default match limit when
|
ent code can be compiled. If a match is started with a non-default
|
||||||
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
match limit when PCRE2_USE_OFFSET_LIMIT is not set, an error is gener-
|
||||||
|
ated.
|
||||||
|
|
||||||
The offset limit facility can be used to track progress when searching
|
The offset limit facility can be used to track progress when searching
|
||||||
large subject strings. See also the PCRE2_FIRSTLINE option, which
|
large subject strings. See also the PCRE2_FIRSTLINE option, which
|
||||||
|
@ -884,13 +906,13 @@ PCRE2 CONTEXTS
|
||||||
search trees. The classic example is a pattern that uses nested unlim-
|
search trees. The classic example is a pattern that uses nested unlim-
|
||||||
ited repeats.
|
ited repeats.
|
||||||
|
|
||||||
Internally, pcre2_match() uses a function called match(), which it
|
There is an internal counter in pcre2_match() that is incremented each
|
||||||
calls repeatedly (sometimes recursively). The limit set by match_limit
|
time round its main matching loop. If this value reaches the match
|
||||||
is imposed on the number of times this function is called during a
|
limit, pcre2_match() returns the negative value PCRE2_ERROR_MATCHLIMIT.
|
||||||
match, which has the effect of limiting the amount of backtracking that
|
This has the effect of limiting the amount of backtracking that can
|
||||||
can take place. For patterns that are not anchored, the count restarts
|
take place. For patterns that are not anchored, the count restarts from
|
||||||
from zero for each position in the subject string. This limit is not
|
zero for each position in the subject string. This limit is not rele-
|
||||||
relevant to pcre2_dfa_match(), which ignores it.
|
vant to pcre2_dfa_match(), which ignores it.
|
||||||
|
|
||||||
When pcre2_match() is called with a pattern that was successfully pro-
|
When pcre2_match() is called with a pattern that was successfully pro-
|
||||||
cessed by pcre2_jit_compile(), the way in which matching is executed is
|
cessed by pcre2_jit_compile(), the way in which matching is executed is
|
||||||
|
@ -901,9 +923,8 @@ PCRE2 CONTEXTS
|
||||||
|
|
||||||
The default value for the limit can be set when PCRE2 is built; the
|
The default value for the limit can be set when PCRE2 is built; the
|
||||||
default default is 10 million, which handles all but the most extreme
|
default default is 10 million, which handles all but the most extreme
|
||||||
cases. If the limit is exceeded, pcre2_match() returns
|
cases. A value for the match limit may also be supplied by an item at
|
||||||
PCRE2_ERROR_MATCHLIMIT. A value for the match limit may also be sup-
|
the start of a pattern of the form
|
||||||
plied by an item at the start of a pattern of the form
|
|
||||||
|
|
||||||
(*LIMIT_MATCH=ddd)
|
(*LIMIT_MATCH=ddd)
|
||||||
|
|
||||||
|
@ -911,59 +932,35 @@ PCRE2 CONTEXTS
|
||||||
unless ddd is less than the limit set by the caller of pcre2_match()
|
unless ddd is less than the limit set by the caller of pcre2_match()
|
||||||
or, if no such limit is set, less than the default.
|
or, if no such limit is set, less than the default.
|
||||||
|
|
||||||
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
|
int pcre2_set_depth_limit(pcre2_match_context *mcontext,
|
||||||
uint32_t value);
|
uint32_t value);
|
||||||
|
|
||||||
The recursion_limit parameter is similar to match_limit, but instead of
|
This parameter limits the depth of nested backtracking in
|
||||||
limiting the total number of times that match() is called, it limits
|
pcre2_match(). Each time a nested backtracking point is passed, a new
|
||||||
the depth of recursion. The recursion depth is a smaller number than
|
memory "frame" is used to remember the state of matching at that point.
|
||||||
the total number of calls, because not all calls to match() are recur-
|
Thus, this parameter indirectly limits the amount of memory that is
|
||||||
sive. This limit is of use only if it is set smaller than match_limit.
|
used in a match.
|
||||||
|
|
||||||
Limiting the recursion depth limits the amount of system stack that can
|
This limit is not relevant, and is ignored, when matching is done using
|
||||||
be used, or, when PCRE2 has been compiled to use memory on the heap
|
JIT compiled code. However, it is supported by pcre2_dfa_match(), which
|
||||||
instead of the stack, the amount of heap memory that can be used. This
|
uses it to limit the depth of internal recursive function calls that
|
||||||
limit is not relevant, and is ignored, when matching is done using JIT
|
implement lookaround assertions and pattern recursions. This is, there-
|
||||||
compiled code. However, it is supported by pcre2_dfa_match(), which
|
fore, an indirect limit on the amount of system stack that is used. A
|
||||||
uses recursive function calls less frequently than pcre2_match(), but
|
recursive pattern such as /(.)(?1)/, when matched to a very long string
|
||||||
which can be caused to use a lot of stack by a recursive pattern such
|
using pcre2_dfa_match(), can use a great deal of stack.
|
||||||
as /(.)(?1)/ matched to a very long string.
|
|
||||||
|
|
||||||
The default value for recursion_limit can be set when PCRE2 is built;
|
The default value for the depth limit can be set when PCRE2 is built;
|
||||||
the default default is the same value as the default for match_limit.
|
the default default is the same value as the default for the match
|
||||||
If the limit is exceeded, pcre2_match() and pcre2_dfa_match() return
|
limit. If the limit is exceeded, pcre2_match() or pcre2_dfa_match()
|
||||||
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
|
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
|
||||||
supplied by an item at the start of a pattern of the form
|
supplied by an item at the start of a pattern of the form
|
||||||
|
|
||||||
(*LIMIT_RECURSION=ddd)
|
(*LIMIT_DEPTH=ddd)
|
||||||
|
|
||||||
where ddd is a decimal number. However, such a setting is ignored
|
where ddd is a decimal number. However, such a setting is ignored
|
||||||
unless ddd is less than the limit set by the caller of pcre2_match() or
|
unless ddd is less than the limit set by the caller of pcre2_match() or
|
||||||
pcre2_dfa_match() or, if no such limit is set, less than the default.
|
pcre2_dfa_match() or, if no such limit is set, less than the default.
|
||||||
|
|
||||||
int pcre2_set_recursion_memory_management(
|
|
||||||
pcre2_match_context *mcontext,
|
|
||||||
void *(*private_malloc)(PCRE2_SIZE, void *),
|
|
||||||
void (*private_free)(void *, void *), void *memory_data);
|
|
||||||
|
|
||||||
This function sets up two additional custom memory management functions
|
|
||||||
for use by pcre2_match() when PCRE2 is compiled to use the heap for
|
|
||||||
remembering backtracking data, instead of recursive function calls that
|
|
||||||
use the system stack. There is a discussion about PCRE2's stack usage
|
|
||||||
in the pcre2stack documentation. See the pcre2build documentation for
|
|
||||||
details of how to build PCRE2.
|
|
||||||
|
|
||||||
Using the heap for recursion is a non-standard way of building PCRE2,
|
|
||||||
for use in environments that have limited stacks. Because of the
|
|
||||||
greater use of memory management, pcre2_match() runs more slowly. Func-
|
|
||||||
tions that are different to the general custom memory functions are
|
|
||||||
provided so that special-purpose external code can be used for this
|
|
||||||
case, because the memory blocks are all the same size. The blocks are
|
|
||||||
retained by pcre2_match() until it is about to exit so that they can be
|
|
||||||
re-used when possible during the match. In the absence of these func-
|
|
||||||
tions, the normal custom memory management functions are used, if sup-
|
|
||||||
plied, otherwise the system functions.
|
|
||||||
|
|
||||||
|
|
||||||
CHECKING BUILD-TIME OPTIONS
|
CHECKING BUILD-TIME OPTIONS
|
||||||
|
|
||||||
|
@ -996,6 +993,13 @@ CHECKING BUILD-TIME OPTIONS
|
||||||
sequence; a value of PCRE2_BSR_ANYCRLF means that \R matches only CR,
|
sequence; a value of PCRE2_BSR_ANYCRLF means that \R matches only CR,
|
||||||
LF, or CRLF. The default can be overridden when a pattern is compiled.
|
LF, or CRLF. The default can be overridden when a pattern is compiled.
|
||||||
|
|
||||||
|
PCRE2_CONFIG_DEPTHLIMIT
|
||||||
|
|
||||||
|
The output is a uint32_t integer that gives the default limit for the
|
||||||
|
depth of nested backtracking in pcre2_match() or the depth of nested
|
||||||
|
recursions and lookarounds in pcre2_dfa_match(). Further details are
|
||||||
|
given with pcre2_set_depth_limit() above.
|
||||||
|
|
||||||
PCRE2_CONFIG_JIT
|
PCRE2_CONFIG_JIT
|
||||||
|
|
||||||
The output is a uint32_t integer that is set to one if support for
|
The output is a uint32_t integer that is set to one if support for
|
||||||
|
@ -1030,9 +1034,9 @@ CHECKING BUILD-TIME OPTIONS
|
||||||
|
|
||||||
PCRE2_CONFIG_MATCHLIMIT
|
PCRE2_CONFIG_MATCHLIMIT
|
||||||
|
|
||||||
The output is a uint32_t integer that gives the default limit for the
|
The output is a uint32_t integer that gives the default match limit for
|
||||||
number of internal matching function calls in a pcre2_match() execu-
|
pcre2_match(). Further details are given with pcre2_set_match_limit()
|
||||||
tion. Further details are given with pcre2_match() below.
|
above.
|
||||||
|
|
||||||
PCRE2_CONFIG_NEWLINE
|
PCRE2_CONFIG_NEWLINE
|
||||||
|
|
||||||
|
@ -1059,21 +1063,10 @@ CHECKING BUILD-TIME OPTIONS
|
||||||
application. For finer control over compilation stack usage, see
|
application. For finer control over compilation stack usage, see
|
||||||
pcre2_set_compile_recursion_guard().
|
pcre2_set_compile_recursion_guard().
|
||||||
|
|
||||||
PCRE2_CONFIG_RECURSIONLIMIT
|
|
||||||
|
|
||||||
The output is a uint32_t integer that gives the default limit for the
|
|
||||||
depth of recursion when calling the internal matching function in a
|
|
||||||
pcre2_match() execution. Further details are given with pcre2_match()
|
|
||||||
below.
|
|
||||||
|
|
||||||
PCRE2_CONFIG_STACKRECURSE
|
PCRE2_CONFIG_STACKRECURSE
|
||||||
|
|
||||||
The output is a uint32_t integer that is set to one if internal recur-
|
This parameter is obsolete and should not be used in new code. The out-
|
||||||
sion when running pcre2_match() is implemented by recursive function
|
put is a uint32_t integer that is always set to zero.
|
||||||
calls that use the system stack to remember their state. This is the
|
|
||||||
usual way that PCRE2 is compiled. The output is zero if PCRE2 was com-
|
|
||||||
piled to use blocks of data on the heap instead of recursive function
|
|
||||||
calls.
|
|
||||||
|
|
||||||
PCRE2_CONFIG_UNICODE_VERSION
|
PCRE2_CONFIG_UNICODE_VERSION
|
||||||
|
|
||||||
|
@ -1093,7 +1086,7 @@ CHECKING BUILD-TIME OPTIONS
|
||||||
|
|
||||||
PCRE2_CONFIG_VERSION
|
PCRE2_CONFIG_VERSION
|
||||||
|
|
||||||
The where argument should point to a buffer that is at least 12 code
|
The where argument should point to a buffer that is at least 24 code
|
||||||
units long. (The exact length required can be found by calling
|
units long. (The exact length required can be found by calling
|
||||||
pcre2_config() with where set to NULL.) The buffer is filled with the
|
pcre2_config() with where set to NULL.) The buffer is filled with the
|
||||||
PCRE2 version string, zero-terminated. The number of code units used is
|
PCRE2 version string, zero-terminated. The number of code units used is
|
||||||
|
@ -1267,14 +1260,15 @@ COMPILING A PATTERN
|
||||||
parenthesis terminates the name. A closing parenthesis can be included
|
parenthesis terminates the name. A closing parenthesis can be included
|
||||||
in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
|
in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
|
||||||
option is set, unescaped whitespace in verb names is skipped and #-com-
|
option is set, unescaped whitespace in verb names is skipped and #-com-
|
||||||
ments are recognized, exactly as in the rest of the pattern.
|
ments are recognized in this mode, exactly as in the rest of the pat-
|
||||||
|
tern.
|
||||||
|
|
||||||
PCRE2_AUTO_CALLOUT
|
PCRE2_AUTO_CALLOUT
|
||||||
|
|
||||||
If this bit is set, pcre2_compile() automatically inserts callout
|
If this bit is set, pcre2_compile() automatically inserts callout
|
||||||
items, all with number 255, before each pattern item, except immedi-
|
items, all with number 255, before each pattern item, except immedi-
|
||||||
ately before or after a callout in the pattern. For discussion of the
|
ately before or after an explicit callout in the pattern. For discus-
|
||||||
callout facility, see the pcre2callout documentation.
|
sion of the callout facility, see the pcre2callout documentation.
|
||||||
|
|
||||||
PCRE2_CASELESS
|
PCRE2_CASELESS
|
||||||
|
|
||||||
|
@ -1517,7 +1511,7 @@ COMPILING A PATTERN
|
||||||
section on generic character types in the pcre2pattern page. If you set
|
section on generic character types in the pcre2pattern page. If you set
|
||||||
PCRE2_UCP, matching one of the items it affects takes much longer. The
|
PCRE2_UCP, matching one of the items it affects takes much longer. The
|
||||||
option is available only if PCRE2 has been compiled with Unicode sup-
|
option is available only if PCRE2 has been compiled with Unicode sup-
|
||||||
port.
|
port (which is the default).
|
||||||
|
|
||||||
PCRE2_UNGREEDY
|
PCRE2_UNGREEDY
|
||||||
|
|
||||||
|
@ -1548,13 +1542,13 @@ COMPILING A PATTERN
|
||||||
|
|
||||||
COMPILATION ERROR CODES
|
COMPILATION ERROR CODES
|
||||||
|
|
||||||
There are over 80 positive error codes that pcre2_compile() may return
|
There are nearly 100 positive error codes that pcre2_compile() may
|
||||||
(via errorcode) if it finds an error in the pattern. There are also
|
return (via errorcode) if it finds an error in the pattern. There are
|
||||||
some negative error codes that are used for invalid UTF strings. These
|
also some negative error codes that are used for invalid UTF strings.
|
||||||
are the same as given by pcre2_match() and pcre2_dfa_match(), and are
|
These are the same as given by pcre2_match() and pcre2_dfa_match(), and
|
||||||
described in the pcre2unicode page. The pcre2_get_error_message() func-
|
are described in the pcre2unicode page. The pcre2_get_error_message()
|
||||||
tion (see "Obtaining a textual error message" below) can be called to
|
function (see "Obtaining a textual error message" below) can be called
|
||||||
obtain a textual error message from any error code.
|
to obtain a textual error message from any error code.
|
||||||
|
|
||||||
|
|
||||||
JUST-IN-TIME (JIT) COMPILATION
|
JUST-IN-TIME (JIT) COMPILATION
|
||||||
|
@ -1585,7 +1579,7 @@ JUST-IN-TIME (JIT) COMPILATION
|
||||||
JIT compilation is a heavyweight optimization. It can take some time
|
JIT compilation is a heavyweight optimization. It can take some time
|
||||||
for patterns to be analyzed, and for one-off matches and simple pat-
|
for patterns to be analyzed, and for one-off matches and simple pat-
|
||||||
terns the benefit of faster execution might be offset by a much slower
|
terns the benefit of faster execution might be offset by a much slower
|
||||||
compilation time. Most, but not all patterns can be optimized by the
|
compilation time. Most (but not all) patterns can be optimized by the
|
||||||
JIT compiler.
|
JIT compiler.
|
||||||
|
|
||||||
|
|
||||||
|
@ -1595,8 +1589,8 @@ LOCALE SUPPORT
|
||||||
letters, digits, or whatever, by reference to a set of tables, indexed
|
letters, digits, or whatever, by reference to a set of tables, indexed
|
||||||
by character code point. This applies only to characters whose code
|
by character code point. This applies only to characters whose code
|
||||||
points are less than 256. By default, higher-valued code points never
|
points are less than 256. By default, higher-valued code points never
|
||||||
match escapes such as \w or \d. However, if PCRE2 is built with UTF
|
match escapes such as \w or \d. However, if PCRE2 is built with Uni-
|
||||||
support, all characters can be tested with \p and \P, or, alterna-
|
code support, all characters can be tested with \p and \P, or, alterna-
|
||||||
tively, the PCRE2_UCP option can be set when a pattern is compiled;
|
tively, the PCRE2_UCP option can be set when a pattern is compiled;
|
||||||
this causes \w and friends to use Unicode property support instead of
|
this causes \w and friends to use Unicode property support instead of
|
||||||
the built-in tables.
|
the built-in tables.
|
||||||
|
@ -1639,7 +1633,7 @@ LOCALE SUPPORT
|
||||||
The pointer that is passed (via the compile context) to pcre2_compile()
|
The pointer that is passed (via the compile context) to pcre2_compile()
|
||||||
is saved with the compiled pattern, and the same tables are used by
|
is saved with the compiled pattern, and the same tables are used by
|
||||||
pcre2_match() and pcre_dfa_match(). Thus, for any single pattern, com-
|
pcre2_match() and pcre_dfa_match(). Thus, for any single pattern, com-
|
||||||
pilation, and matching all happen in the same locale, but different
|
pilation and matching both happen in the same locale, but different
|
||||||
patterns can be processed in different locales.
|
patterns can be processed in different locales.
|
||||||
|
|
||||||
|
|
||||||
|
@ -1654,7 +1648,7 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
is required, and the third argument is a pointer to a variable to
|
is required, and the third argument is a pointer to a variable to
|
||||||
receive the data. If the third argument is NULL, the first argument is
|
receive the data. If the third argument is NULL, the first argument is
|
||||||
ignored, and the function returns the size in bytes of the variable
|
ignored, and the function returns the size in bytes of the variable
|
||||||
that is required for the information requested. Otherwise, The yield of
|
that is required for the information requested. Otherwise, the yield of
|
||||||
the function is zero for success, or one of the following negative num-
|
the function is zero for success, or one of the following negative num-
|
||||||
bers:
|
bers:
|
||||||
|
|
||||||
|
@ -1710,8 +1704,8 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
.* is not in a capturing group that is the subject
|
.* is not in a capturing group that is the subject
|
||||||
of a back reference
|
of a back reference
|
||||||
PCRE2_DOTALL is in force for .*
|
PCRE2_DOTALL is in force for .*
|
||||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern.
|
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||||
PCRE2_NO_DOTSTAR_ANCHOR is not set.
|
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
||||||
|
|
||||||
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in
|
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in
|
||||||
the options returned for PCRE2_INFO_ALLOPTIONS.
|
the options returned for PCRE2_INFO_ALLOPTIONS.
|
||||||
|
@ -1740,6 +1734,14 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
terns where (?| is not used, this is also the total number of capturing
|
terns where (?| is not used, this is also the total number of capturing
|
||||||
subpatterns. The third argument should point to an uint32_t variable.
|
subpatterns. The third argument should point to an uint32_t variable.
|
||||||
|
|
||||||
|
PCRE2_INFO_DEPTHLIMIT
|
||||||
|
|
||||||
|
If the pattern set a backtracking depth limit by including an item of
|
||||||
|
the form (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The
|
||||||
|
third argument should point to an unsigned 32-bit integer. If no such
|
||||||
|
value has been set, the call to pcre2_pattern_info() returns the error
|
||||||
|
PCRE2_ERROR_UNSET.
|
||||||
|
|
||||||
PCRE2_INFO_FIRSTBITMAP
|
PCRE2_INFO_FIRSTBITMAP
|
||||||
|
|
||||||
In the absence of a single first code unit for a non-anchored pattern,
|
In the absence of a single first code unit for a non-anchored pattern,
|
||||||
|
@ -1772,6 +1774,15 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
value can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32
|
value can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32
|
||||||
mode.
|
mode.
|
||||||
|
|
||||||
|
PCRE2_INFO_FRAMESIZE
|
||||||
|
|
||||||
|
Return the size (in bytes) of the data frames that are used to remember
|
||||||
|
backtracking positions when the pattern is processed by pcre2_match()
|
||||||
|
without the use of JIT. The third argument should point to an size_t
|
||||||
|
variable. The frame size depends on the number of capturing parentheses
|
||||||
|
in the pattern. Each additional capturing group adds two PCRE2_SIZE
|
||||||
|
variables.
|
||||||
|
|
||||||
PCRE2_INFO_HASBACKSLASHC
|
PCRE2_INFO_HASBACKSLASHC
|
||||||
|
|
||||||
Return 1 if the pattern contains any instances of \C, otherwise 0. The
|
Return 1 if the pattern contains any instances of \C, otherwise 0. The
|
||||||
|
@ -1782,7 +1793,8 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
Return 1 if the pattern contains any explicit matches for CR or LF
|
Return 1 if the pattern contains any explicit matches for CR or LF
|
||||||
characters, otherwise 0. The third argument should point to an uint32_t
|
characters, otherwise 0. The third argument should point to an uint32_t
|
||||||
variable. An explicit match is either a literal CR or LF character, or
|
variable. An explicit match is either a literal CR or LF character, or
|
||||||
\r or \n.
|
\r or \n or one of the equivalent hexadecimal or octal escape
|
||||||
|
sequences.
|
||||||
|
|
||||||
PCRE2_INFO_JCHANGED
|
PCRE2_INFO_JCHANGED
|
||||||
|
|
||||||
|
@ -1918,7 +1930,7 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
|
|
||||||
PCRE2_INFO_NEWLINE
|
PCRE2_INFO_NEWLINE
|
||||||
|
|
||||||
The output is a uint32_t with one of the following values:
|
The output is one of the following uint32_t values:
|
||||||
|
|
||||||
PCRE2_NEWLINE_CR Carriage return (CR)
|
PCRE2_NEWLINE_CR Carriage return (CR)
|
||||||
PCRE2_NEWLINE_LF Linefeed (LF)
|
PCRE2_NEWLINE_LF Linefeed (LF)
|
||||||
|
@ -1926,16 +1938,8 @@ INFORMATION ABOUT A COMPILED PATTERN
|
||||||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||||
|
|
||||||
This specifies the default character sequence that will be recognized
|
This identifies the character sequence that will be recognized as mean-
|
||||||
as meaning "newline" while matching.
|
ing "newline" while matching.
|
||||||
|
|
||||||
PCRE2_INFO_RECURSIONLIMIT
|
|
||||||
|
|
||||||
If the pattern set a recursion limit by including an item of the form
|
|
||||||
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
|
|
||||||
argument should point to an unsigned 32-bit integer. If no such value
|
|
||||||
has been set, the call to pcre2_pattern_info() returns the error
|
|
||||||
PCRE2_ERROR_UNSET.
|
|
||||||
|
|
||||||
PCRE2_INFO_SIZE
|
PCRE2_INFO_SIZE
|
||||||
|
|
||||||
|
@ -1998,8 +2002,8 @@ THE MATCH DATA BLOCK
|
||||||
you must create a match data block by calling one of the creation func-
|
you must create a match data block by calling one of the creation func-
|
||||||
tions above. For pcre2_match_data_create(), the first argument is the
|
tions above. For pcre2_match_data_create(), the first argument is the
|
||||||
number of pairs of offsets in the ovector. One pair of offsets is
|
number of pairs of offsets in the ovector. One pair of offsets is
|
||||||
required to identify the string that matched the whole pattern, with
|
required to identify the string that matched the whole pattern, with an
|
||||||
another pair for each captured substring. For example, a value of 4
|
additional pair for each captured substring. For example, a value of 4
|
||||||
creates enough space to record the matched portion of the subject plus
|
creates enough space to record the matched portion of the subject plus
|
||||||
three captured substrings. A minimum of at least 1 pair is imposed by
|
three captured substrings. A minimum of at least 1 pair is imposed by
|
||||||
pcre2_match_data_create(), so it is always possible to return the over-
|
pcre2_match_data_create(), so it is always possible to return the over-
|
||||||
|
@ -2124,9 +2128,11 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
|
||||||
ing offset by two characters instead of one.
|
ing offset by two characters instead of one.
|
||||||
|
|
||||||
If a non-zero starting offset is passed when the pattern is anchored,
|
If a non-zero starting offset is passed when the pattern is anchored,
|
||||||
one attempt to match at the given offset is made. This can only succeed
|
an single attempt to match at the given offset is made. This can only
|
||||||
if the pattern does not require the match to be at the start of the
|
succeed if the pattern does not require the match to be at the start of
|
||||||
subject.
|
the subject. In other words, the anchoring must be the result of set-
|
||||||
|
ting the PCRE2_ANCHORED option or the use of .* with PCRE2_DOTALL, not
|
||||||
|
by starting the pattern with ^ or \A.
|
||||||
|
|
||||||
Option bits for pcre2_match()
|
Option bits for pcre2_match()
|
||||||
|
|
||||||
|
@ -2138,9 +2144,8 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
|
||||||
|
|
||||||
Setting PCRE2_ANCHORED at match time is not supported by the just-in-
|
Setting PCRE2_ANCHORED at match time is not supported by the just-in-
|
||||||
time (JIT) compiler. If it is set, JIT matching is disabled and the
|
time (JIT) compiler. If it is set, JIT matching is disabled and the
|
||||||
normal interpretive code in pcre2_match() is run. Apart from
|
interpretive code in pcre2_match() is run. Apart from PCRE2_NO_JIT
|
||||||
PCRE2_NO_JIT (obviously), the remaining options are supported for JIT
|
(obviously), the remaining options are supported for JIT matching.
|
||||||
matching.
|
|
||||||
|
|
||||||
PCRE2_ANCHORED
|
PCRE2_ANCHORED
|
||||||
|
|
||||||
|
@ -2221,11 +2226,11 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
|
||||||
checks for performance reasons, you can set the PCRE2_NO_UTF_CHECK
|
checks for performance reasons, you can set the PCRE2_NO_UTF_CHECK
|
||||||
option when calling pcre2_match(). You might want to do this for the
|
option when calling pcre2_match(). You might want to do this for the
|
||||||
second and subsequent calls to pcre2_match() if you are making repeated
|
second and subsequent calls to pcre2_match() if you are making repeated
|
||||||
calls to find all the matches in a single subject string.
|
calls to find other matches in the same subject string.
|
||||||
|
|
||||||
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
|
WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an
|
||||||
string as a subject, or an invalid value of startoffset, is undefined.
|
invalid string as a subject, or an invalid value of startoffset, is
|
||||||
Your program may crash or loop indefinitely.
|
undefined. Your program may crash or loop indefinitely.
|
||||||
|
|
||||||
PCRE2_PARTIAL_HARD
|
PCRE2_PARTIAL_HARD
|
||||||
PCRE2_PARTIAL_SOFT
|
PCRE2_PARTIAL_SOFT
|
||||||
|
@ -2278,9 +2283,10 @@ NEWLINE HANDLING WHEN MATCHING
|
||||||
acter after the first failure.
|
acter after the first failure.
|
||||||
|
|
||||||
An explicit match for CR of LF is either a literal appearance of one of
|
An explicit match for CR of LF is either a literal appearance of one of
|
||||||
those characters in the pattern, or one of the \r or \n escape
|
those characters in the pattern, or one of the \r or \n or equivalent
|
||||||
sequences. Implicit matches such as [^X] do not count, nor does \s,
|
octal or hexadecimal escape sequences. Implicit matches such as [^X] do
|
||||||
even though it includes CR and LF in the characters that it matches.
|
not count, nor does \s, even though it includes CR and LF in the char-
|
||||||
|
acters that it matches.
|
||||||
|
|
||||||
Notwithstanding the above, anomalous effects may still occur when CRLF
|
Notwithstanding the above, anomalous effects may still occur when CRLF
|
||||||
is a valid newline sequence and explicit \r or \n escapes appear in the
|
is a valid newline sequence and explicit \r or \n escapes appear in the
|
||||||
|
@ -2325,14 +2331,14 @@ HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
|
||||||
They identify the part of the subject that was partially matched. See
|
They identify the part of the subject that was partially matched. See
|
||||||
the pcre2partial documentation for details of partial matching.
|
the pcre2partial documentation for details of partial matching.
|
||||||
|
|
||||||
After a successful match, the first pair of offsets identifies the por-
|
After a fully successful match, the first pair of offsets identifies
|
||||||
tion of the subject string that was matched by the entire pattern. The
|
the portion of the subject string that was matched by the entire pat-
|
||||||
next pair is used for the first capturing subpattern, and so on. The
|
tern. The next pair is used for the first captured substring, and so
|
||||||
value returned by pcre2_match() is one more than the highest numbered
|
on. The value returned by pcre2_match() is one more than the highest
|
||||||
pair that has been set. For example, if two substrings have been cap-
|
numbered pair that has been set. For example, if two substrings have
|
||||||
tured, the returned value is 3. If there are no capturing subpatterns,
|
been captured, the returned value is 3. If there are no captured sub-
|
||||||
the return value from a successful match is 1, indicating that just the
|
strings, the return value from a successful match is 1, indicating that
|
||||||
first pair of offsets has been set.
|
just the first pair of offsets has been set.
|
||||||
|
|
||||||
If a pattern uses the \K escape sequence within a positive assertion,
|
If a pattern uses the \K escape sequence within a positive assertion,
|
||||||
the reported start of a successful match can be greater than the end of
|
the reported start of a successful match can be greater than the end of
|
||||||
|
@ -2347,11 +2353,7 @@ HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
|
||||||
as much as possible is filled in, and the function returns a value of
|
as much as possible is filled in, and the function returns a value of
|
||||||
zero. If captured substrings are not of interest, pcre2_match() may be
|
zero. If captured substrings are not of interest, pcre2_match() may be
|
||||||
called with a match data block whose ovector is of minimum length (that
|
called with a match data block whose ovector is of minimum length (that
|
||||||
is, one pair). However, if the pattern contains back references and the
|
is, one pair).
|
||||||
ovector is not big enough to remember the related substrings, PCRE2 has
|
|
||||||
to get additional memory for use during matching. Thus it is usually
|
|
||||||
advisable to set up a match data block containing an ovector of reason-
|
|
||||||
able size.
|
|
||||||
|
|
||||||
It is possible for capturing subpattern number n+1 to match some part
|
It is possible for capturing subpattern number n+1 to match some part
|
||||||
of the subject when subpattern n has not been used at all. For example,
|
of the subject when subpattern n has not been used at all. For example,
|
||||||
|
@ -2450,9 +2452,10 @@ ERROR RETURNS FROM pcre2_match()
|
||||||
|
|
||||||
PCRE2_ERROR_BADMODE
|
PCRE2_ERROR_BADMODE
|
||||||
|
|
||||||
This error is given when a pattern that was compiled by the 8-bit
|
This error is given when a compiled pattern is passed to a function in
|
||||||
library is passed to a 16-bit or 32-bit library function, or vice
|
a library of a different code unit width, for example, a pattern com-
|
||||||
versa.
|
piled by the 8-bit library is passed to a 16-bit or 32-bit library
|
||||||
|
function.
|
||||||
|
|
||||||
PCRE2_ERROR_BADOFFSET
|
PCRE2_ERROR_BADOFFSET
|
||||||
|
|
||||||
|
@ -2476,19 +2479,15 @@ ERROR RETURNS FROM pcre2_match()
|
||||||
pcre2_callout_enumerate() to return a distinctive error code. See the
|
pcre2_callout_enumerate() to return a distinctive error code. See the
|
||||||
pcre2callout documentation for details.
|
pcre2callout documentation for details.
|
||||||
|
|
||||||
|
PCRE2_ERROR_DEPTHLIMIT
|
||||||
|
|
||||||
|
The nested backtracking depth limit was reached.
|
||||||
|
|
||||||
PCRE2_ERROR_INTERNAL
|
PCRE2_ERROR_INTERNAL
|
||||||
|
|
||||||
An unexpected internal error has occurred. This error could be caused
|
An unexpected internal error has occurred. This error could be caused
|
||||||
by a bug in PCRE2 or by overwriting of the compiled pattern.
|
by a bug in PCRE2 or by overwriting of the compiled pattern.
|
||||||
|
|
||||||
PCRE2_ERROR_JIT_BADOPTION
|
|
||||||
|
|
||||||
This error is returned when a pattern that was successfully studied
|
|
||||||
using JIT is being matched, but the matching mode (partial or complete
|
|
||||||
match) does not correspond to any JIT compilation mode. When the JIT
|
|
||||||
fast path function is used, this error may be also given for invalid
|
|
||||||
options. See the pcre2jit documentation for more details.
|
|
||||||
|
|
||||||
PCRE2_ERROR_JIT_STACKLIMIT
|
PCRE2_ERROR_JIT_STACKLIMIT
|
||||||
|
|
||||||
This error is returned when a pattern that was successfully studied
|
This error is returned when a pattern that was successfully studied
|
||||||
|
@ -2498,15 +2497,13 @@ ERROR RETURNS FROM pcre2_match()
|
||||||
|
|
||||||
PCRE2_ERROR_MATCHLIMIT
|
PCRE2_ERROR_MATCHLIMIT
|
||||||
|
|
||||||
The backtracking limit was reached.
|
The backtracking match limit was reached.
|
||||||
|
|
||||||
PCRE2_ERROR_NOMEMORY
|
PCRE2_ERROR_NOMEMORY
|
||||||
|
|
||||||
If a pattern contains back references, but the ovector is not big
|
If a pattern contains many nested backtracking points, heap memory is
|
||||||
enough to remember the referenced substrings, PCRE2 gets a block of
|
used to remember them. This error is given when the memory allocation
|
||||||
memory at the start of matching to use for this purpose. There are some
|
function (default or custom) fails.
|
||||||
other special cases where extra memory is needed during matching. This
|
|
||||||
error is given when memory cannot be obtained.
|
|
||||||
|
|
||||||
PCRE2_ERROR_NULL
|
PCRE2_ERROR_NULL
|
||||||
|
|
||||||
|
@ -2522,10 +2519,6 @@ ERROR RETURNS FROM pcre2_match()
|
||||||
plicated cases, in particular mutual recursions between two different
|
plicated cases, in particular mutual recursions between two different
|
||||||
subpatterns, cannot be detected until matching is attempted.
|
subpatterns, cannot be detected until matching is attempted.
|
||||||
|
|
||||||
PCRE2_ERROR_RECURSIONLIMIT
|
|
||||||
|
|
||||||
The internal recursion limit was reached.
|
|
||||||
|
|
||||||
|
|
||||||
OBTAINING A TEXTUAL ERROR MESSAGE
|
OBTAINING A TEXTUAL ERROR MESSAGE
|
||||||
|
|
||||||
|
@ -2703,8 +2696,8 @@ EXTRACTING CAPTURED SUBSTRINGS BY NAME
|
||||||
the function is the subpattern number, PCRE2_ERROR_NOSUBSTRING if there
|
the function is the subpattern number, PCRE2_ERROR_NOSUBSTRING if there
|
||||||
is no subpattern of that name, or PCRE2_ERROR_NOUNIQUESUBSTRING if
|
is no subpattern of that name, or PCRE2_ERROR_NOUNIQUESUBSTRING if
|
||||||
there is more than one subpattern of that name. Given the number, you
|
there is more than one subpattern of that name. Given the number, you
|
||||||
can extract the substring directly, or use one of the functions
|
can extract the substring directly from the ovector, or use one of the
|
||||||
described above.
|
"bynumber" functions described above.
|
||||||
|
|
||||||
For convenience, there are also "byname" functions that correspond to
|
For convenience, there are also "byname" functions that correspond to
|
||||||
the "bynumber" functions, the only difference being that the second
|
the "bynumber" functions, the only difference being that the second
|
||||||
|
@ -2991,13 +2984,13 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
|
|
||||||
The function pcre2_dfa_match() is called to match a subject string
|
The function pcre2_dfa_match() is called to match a subject string
|
||||||
against a compiled pattern, using a matching algorithm that scans the
|
against a compiled pattern, using a matching algorithm that scans the
|
||||||
subject string just once, and does not backtrack. This has different
|
subject string just once (not counting lookaround assertions), and does
|
||||||
characteristics to the normal algorithm, and is not compatible with
|
not backtrack. This has different characteristics to the normal algo-
|
||||||
Perl. Some of the features of PCRE2 patterns are not supported. Never-
|
rithm, and is not compatible with Perl. Some of the features of PCRE2
|
||||||
theless, there are times when this kind of matching can be useful. For
|
patterns are not supported. Nevertheless, there are times when this
|
||||||
a discussion of the two matching algorithms, and a list of features
|
kind of matching can be useful. For a discussion of the two matching
|
||||||
that pcre2_dfa_match() does not support, see the pcre2matching documen-
|
algorithms, and a list of features that pcre2_dfa_match() does not sup-
|
||||||
tation.
|
port, see the pcre2matching documentation.
|
||||||
|
|
||||||
The arguments for the pcre2_dfa_match() function are the same as for
|
The arguments for the pcre2_dfa_match() function are the same as for
|
||||||
pcre2_match(), plus two extras. The ovector within the match data block
|
pcre2_match(), plus two extras. The ovector within the match data block
|
||||||
|
@ -3181,7 +3174,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 21 March 2017
|
Last updated: 27 March 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -34,7 +34,7 @@ A match context is needed only if you want to:
|
||||||
Set a matching offset limit
|
Set a matching offset limit
|
||||||
Change the backtracking match limit
|
Change the backtracking match limit
|
||||||
Change the backtracking depth limit
|
Change the backtracking depth limit
|
||||||
Set custom memory management in the match context
|
Set custom memory management specifically for the match
|
||||||
.sp
|
.sp
|
||||||
The \fIlength\fP and \fIstartoffset\fP values are code
|
The \fIlength\fP and \fIstartoffset\fP values are code
|
||||||
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
||||||
|
|
380
doc/pcre2api.3
380
doc/pcre2api.3
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2API 3 "21 March 2017" "PCRE2 10.30"
|
.TH PCRE2API 3 "27 March 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
|
@ -120,19 +120,14 @@ document for an overview of all the PCRE2 documentation.
|
||||||
.B " int (*\fIcallout_function\fP)(pcre2_callout_block *, void *),"
|
.B " int (*\fIcallout_function\fP)(pcre2_callout_block *, void *),"
|
||||||
.B " void *\fIcallout_data\fP);"
|
.B " void *\fIcallout_data\fP);"
|
||||||
.sp
|
.sp
|
||||||
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
|
||||||
.B " uint32_t \fIvalue\fP);"
|
|
||||||
.sp
|
|
||||||
.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
|
.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
|
||||||
.B " PCRE2_SIZE \fIvalue\fP);"
|
.B " PCRE2_SIZE \fIvalue\fP);"
|
||||||
.sp
|
.sp
|
||||||
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
|
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
||||||
.B " uint32_t \fIvalue\fP);"
|
.B " uint32_t \fIvalue\fP);"
|
||||||
.sp
|
.sp
|
||||||
.B int pcre2_set_recursion_memory_management(
|
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
|
||||||
.B " pcre2_match_context *\fImcontext\fP,"
|
.B " uint32_t \fIvalue\fP);"
|
||||||
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
|
|
||||||
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
|
|
||||||
.fi
|
.fi
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
@ -252,6 +247,25 @@ document for an overview of all the PCRE2 documentation.
|
||||||
.fi
|
.fi
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SH "PCRE2 NATIVE API OBSOLETE FUNCTIONS"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
.nf
|
||||||
|
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
|
||||||
|
.B " uint32_t \fIvalue\fP);"
|
||||||
|
.sp
|
||||||
|
.B int pcre2_set_recursion_memory_management(
|
||||||
|
.B " pcre2_match_context *\fImcontext\fP,"
|
||||||
|
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
|
||||||
|
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
|
||||||
|
.fi
|
||||||
|
.sp
|
||||||
|
These functions became obsolete at release 10.30 and are retained only for
|
||||||
|
backward compatibility. They should not be used in new code. The first is
|
||||||
|
replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and
|
||||||
|
no longer has any effect (it always returns zero).
|
||||||
|
.
|
||||||
|
.
|
||||||
.SH "PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES"
|
.SH "PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -302,7 +316,7 @@ When using multiple libraries in an application, you must take care when
|
||||||
processing any particular pattern to use only functions from a single library.
|
processing any particular pattern to use only functions from a single library.
|
||||||
For example, if you want to run a match using a pattern that was compiled with
|
For example, if you want to run a match using a pattern that was compiled with
|
||||||
\fBpcre2_compile_16()\fP, you must do so with \fBpcre2_match_16()\fP, not
|
\fBpcre2_compile_16()\fP, you must do so with \fBpcre2_match_16()\fP, not
|
||||||
\fBpcre2_match_8()\fP.
|
\fBpcre2_match_8()\fP or \fBpcre2_match_32\fP.
|
||||||
.P
|
.P
|
||||||
In the function summaries above, and in the rest of this document and other
|
In the function summaries above, and in the rest of this document and other
|
||||||
PCRE2 documents, functions and data types are described using their generic
|
PCRE2 documents, functions and data types are described using their generic
|
||||||
|
@ -331,7 +345,7 @@ In a Windows environment, if you want to statically link an application program
|
||||||
against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
|
against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
|
||||||
\fBpcre2.h\fP.
|
\fBpcre2.h\fP.
|
||||||
.P
|
.P
|
||||||
The functions \fBpcre2_compile()\fP, and \fBpcre2_match()\fP are used for
|
The functions \fBpcre2_compile()\fP and \fBpcre2_match()\fP are used for
|
||||||
compiling and matching regular expressions in a Perl-compatible manner. A
|
compiling and matching regular expressions in a Perl-compatible manner. A
|
||||||
sample program that demonstrates the simplest way of using them is provided in
|
sample program that demonstrates the simplest way of using them is provided in
|
||||||
the file called \fIpcre2demo.c\fP in the PCRE2 source distribution. A listing
|
the file called \fIpcre2demo.c\fP in the PCRE2 source distribution. A listing
|
||||||
|
@ -345,10 +359,16 @@ documentation, and the
|
||||||
.\"
|
.\"
|
||||||
documentation describes how to compile and run it.
|
documentation describes how to compile and run it.
|
||||||
.P
|
.P
|
||||||
Just-in-time compiler support is an optional feature of PCRE2 that can be built
|
The compiling and matching functions recognize various options that are passed
|
||||||
in appropriate hardware environments. It greatly speeds up the matching
|
as bits in an options argument. There are also some more complicated parameters
|
||||||
|
such as custom memory management functions and resource limits that are passed
|
||||||
|
in "contexts" (which are just memory blocks, described below). Simple
|
||||||
|
applications do not need to make use of contexts.
|
||||||
|
.P
|
||||||
|
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
|
||||||
|
built in appropriate hardware environments. It greatly speeds up the matching
|
||||||
performance of many patterns. Programs can request that it be used if
|
performance of many patterns. Programs can request that it be used if
|
||||||
available, by calling \fBpcre2_jit_compile()\fP after a pattern has been
|
available by calling \fBpcre2_jit_compile()\fP after a pattern has been
|
||||||
successfully compiled by \fBpcre2_compile()\fP. This does nothing if JIT
|
successfully compiled by \fBpcre2_compile()\fP. This does nothing if JIT
|
||||||
support is not available.
|
support is not available.
|
||||||
.P
|
.P
|
||||||
|
@ -358,8 +378,8 @@ More complicated programs might need to make use of the specialist functions
|
||||||
.P
|
.P
|
||||||
JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
|
JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
|
||||||
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
||||||
matching, which gives improved performance. The JIT-specific functions are
|
matching, which gives improved performance at the expense of less sanity
|
||||||
discussed in the
|
checking. The JIT-specific functions are discussed in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2jit\fP
|
\fBpcre2jit\fP
|
||||||
.\"
|
.\"
|
||||||
|
@ -369,7 +389,7 @@ A second matching function, \fBpcre2_dfa_match()\fP, which is not
|
||||||
Perl-compatible, is also provided. This uses a different algorithm for the
|
Perl-compatible, is also provided. This uses a different algorithm for the
|
||||||
matching. The alternative algorithm finds all possible matches (at a given
|
matching. The alternative algorithm finds all possible matches (at a given
|
||||||
point in the subject), and scans the subject just once (unless there are
|
point in the subject), and scans the subject just once (unless there are
|
||||||
lookbehind assertions). However, this algorithm does not return captured
|
lookaround assertions). However, this algorithm does not return captured
|
||||||
substrings. A description of the two matching algorithms and their advantages
|
substrings. A description of the two matching algorithms and their advantages
|
||||||
and disadvantages is given in the
|
and disadvantages is given in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
|
@ -484,8 +504,8 @@ and does not change when the pattern is matched. Therefore, it is thread-safe,
|
||||||
that is, the same compiled pattern can be used by more than one thread
|
that is, the same compiled pattern can be used by more than one thread
|
||||||
simultaneously. For example, an application can compile all its patterns at the
|
simultaneously. For example, an application can compile all its patterns at the
|
||||||
start, before forking off multiple threads that use them. However, if the
|
start, before forking off multiple threads that use them. However, if the
|
||||||
just-in-time optimization feature is being used, it needs separate memory stack
|
just-in-time (JIT) optimization feature is being used, it needs separate memory
|
||||||
areas for each thread. See the
|
stack areas for each thread. See the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2jit\fP
|
\fBpcre2jit\fP
|
||||||
.\"
|
.\"
|
||||||
|
@ -536,10 +556,10 @@ thread-specific copy.
|
||||||
.SS "Match blocks"
|
.SS "Match blocks"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The matching functions need a block of memory for working space and for storing
|
The matching functions need a block of memory for storing the results of a
|
||||||
the results of a match. This includes details of what was matched, as well as
|
match. This includes details of what was matched, as well as additional
|
||||||
additional information such as the name of a (*MARK) setting. Each thread must
|
information such as the name of a (*MARK) setting. Each thread must provide its
|
||||||
provide its own copy of this memory.
|
own copy of this memory.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "PCRE2 CONTEXTS"
|
.SH "PCRE2 CONTEXTS"
|
||||||
|
@ -611,15 +631,15 @@ The memory used for a general context should be freed by calling:
|
||||||
.SS "The compile context"
|
.SS "The compile context"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
A compile context is required if you want to change the default values of any
|
A compile context is required if you want to provide an external function for
|
||||||
of the following compile-time parameters:
|
stack checking during compilation or to change the default values of any of the
|
||||||
|
following compile-time parameters:
|
||||||
.sp
|
.sp
|
||||||
What \eR matches (Unicode newlines or CR, LF, CRLF only)
|
What \eR matches (Unicode newlines or CR, LF, CRLF only)
|
||||||
PCRE2's character tables
|
PCRE2's character tables
|
||||||
The newline character sequence
|
The newline character sequence
|
||||||
The compile time nested parentheses limit
|
The compile time nested parentheses limit
|
||||||
The maximum length of the pattern string
|
The maximum length of the pattern string
|
||||||
An external function for stack checking
|
|
||||||
.sp
|
.sp
|
||||||
A compile context is also required if you are using custom memory management.
|
A compile context is also required if you are using custom memory management.
|
||||||
If none of these apply, just pass NULL as the context argument of
|
If none of these apply, just pass NULL as the context argument of
|
||||||
|
@ -666,11 +686,11 @@ in the current locale.
|
||||||
.B " PCRE2_SIZE \fIvalue\fP);"
|
.B " PCRE2_SIZE \fIvalue\fP);"
|
||||||
.fi
|
.fi
|
||||||
.sp
|
.sp
|
||||||
This sets a maximum length, in code units, for the pattern string that is to be
|
This sets a maximum length, in code units, for any pattern string that is
|
||||||
compiled. If the pattern is longer, an error is generated. This facility is
|
compiled with this context. If the pattern is longer, an error is generated.
|
||||||
provided so that applications that accept patterns from external sources can
|
This facility is provided so that applications that accept patterns from
|
||||||
limit their size. The default is the largest number that a PCRE2_SIZE variable
|
external sources can limit their size. The default is the largest number that a
|
||||||
can hold, which is effectively unlimited.
|
PCRE2_SIZE variable can hold, which is effectively unlimited.
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
.B int pcre2_set_newline(pcre2_compile_context *\fIccontext\fP,
|
.B int pcre2_set_newline(pcre2_compile_context *\fIccontext\fP,
|
||||||
|
@ -683,8 +703,15 @@ PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
|
||||||
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
|
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
|
||||||
PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
||||||
.P
|
.P
|
||||||
When a pattern is compiled with the PCRE2_EXTENDED option, the value of this
|
A pattern can override the value set in the compile context by starting with a
|
||||||
parameter affects the recognition of white space and the end of internal
|
sequence such as (*CRLF). See the
|
||||||
|
.\" HREF
|
||||||
|
\fBpcre2pattern\fP
|
||||||
|
.\"
|
||||||
|
page for details.
|
||||||
|
.P
|
||||||
|
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
|
||||||
|
convention affects the recognition of white space and the end of internal
|
||||||
comments starting with #. The value is saved with the compiled pattern for
|
comments starting with #. The value is saved with the compiled pattern for
|
||||||
subsequent use by the JIT compiler and by the two interpreted matching
|
subsequent use by the JIT compiler and by the two interpreted matching
|
||||||
functions, \fIpcre2_match()\fP and \fIpcre2_dfa_match()\fP.
|
functions, \fIpcre2_match()\fP and \fIpcre2_dfa_match()\fP.
|
||||||
|
@ -722,15 +749,14 @@ zero if all is well, or non-zero to force an error.
|
||||||
.SS "The match context"
|
.SS "The match context"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
A match context is required if you want to change the default values of any
|
A match context is required if you want to:
|
||||||
of the following match-time parameters:
|
|
||||||
.sp
|
.sp
|
||||||
A callout function
|
Set up a callout function
|
||||||
The offset limit for matching an unanchored pattern
|
Set an offset limit for matching an unanchored pattern
|
||||||
The limit for calling \fBmatch()\fP (see below)
|
Change the backtracking match limit
|
||||||
The limit for calling \fBmatch()\fP recursively
|
Change the backtracking depth limit
|
||||||
|
Set custom memory management specifically for the match
|
||||||
.sp
|
.sp
|
||||||
A match context is also required if you are using custom memory management.
|
|
||||||
If none of these apply, just pass NULL as the context argument of
|
If none of these apply, just pass NULL as the context argument of
|
||||||
\fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or \fBpcre2_jit_match()\fP.
|
\fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or \fBpcre2_jit_match()\fP.
|
||||||
.P
|
.P
|
||||||
|
@ -756,7 +782,7 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
|
||||||
.B " void *\fIcallout_data\fP);"
|
.B " void *\fIcallout_data\fP);"
|
||||||
.fi
|
.fi
|
||||||
.sp
|
.sp
|
||||||
This sets up a "callout" function, which PCRE2 will call at specified points
|
This sets up a "callout" function for PCRE2 to call at specified points
|
||||||
during a matching operation. Details are given in the
|
during a matching operation. Details are given in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2callout\fP
|
\fBpcre2callout\fP
|
||||||
|
@ -778,8 +804,8 @@ A match can never be found if the \fIstartoffset\fP argument of
|
||||||
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP is greater than the offset
|
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP is greater than the offset
|
||||||
limit.
|
limit.
|
||||||
.P
|
.P
|
||||||
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when calling
|
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
|
||||||
\fBpcre2_compile()\fP so that when JIT is in use, different code can be
|
calling \fBpcre2_compile()\fP so that when JIT is in use, different code can be
|
||||||
compiled. If a match is started with a non-default match limit when
|
compiled. If a match is started with a non-default match limit when
|
||||||
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
||||||
.P
|
.P
|
||||||
|
@ -799,10 +825,10 @@ up too many resources when processing patterns that are not going to match, but
|
||||||
which have a very large number of possibilities in their search trees. The
|
which have a very large number of possibilities in their search trees. The
|
||||||
classic example is a pattern that uses nested unlimited repeats.
|
classic example is a pattern that uses nested unlimited repeats.
|
||||||
.P
|
.P
|
||||||
Internally, \fBpcre2_match()\fP uses a function called \fBmatch()\fP, which it
|
There is an internal counter in \fBpcre2_match()\fP that is incremented each
|
||||||
calls repeatedly (sometimes recursively). The limit set by \fImatch_limit\fP is
|
time round its main matching loop. If this value reaches the match limit,
|
||||||
imposed on the number of times this function is called during a match, which
|
\fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
||||||
has the effect of limiting the amount of backtracking that can take place. For
|
the effect of limiting the amount of backtracking that can take place. For
|
||||||
patterns that are not anchored, the count restarts from zero for each position
|
patterns that are not anchored, the count restarts from zero for each position
|
||||||
in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP,
|
in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP,
|
||||||
which ignores it.
|
which ignores it.
|
||||||
|
@ -815,8 +841,7 @@ is also used in this case (but in a different way) to limit how long the
|
||||||
matching can continue.
|
matching can continue.
|
||||||
.P
|
.P
|
||||||
The default value for the limit can be set when PCRE2 is built; the default
|
The default value for the limit can be set when PCRE2 is built; the default
|
||||||
default is 10 million, which handles all but the most extreme cases. If the
|
default is 10 million, which handles all but the most extreme cases. A value
|
||||||
limit is exceeded, \fBpcre2_match()\fP returns PCRE2_ERROR_MATCHLIMIT. A value
|
|
||||||
for the match limit may also be supplied by an item at the start of a pattern
|
for the match limit may also be supplied by an item at the start of a pattern
|
||||||
of the form
|
of the form
|
||||||
.sp
|
.sp
|
||||||
|
@ -827,65 +852,34 @@ less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
||||||
limit is set, less than the default.
|
limit is set, less than the default.
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
|
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
|
||||||
.B " uint32_t \fIvalue\fP);"
|
.B " uint32_t \fIvalue\fP);"
|
||||||
.fi
|
.fi
|
||||||
.sp
|
.sp
|
||||||
The \fIrecursion_limit\fP parameter is similar to \fImatch_limit\fP, but
|
This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP.
|
||||||
instead of limiting the total number of times that \fBmatch()\fP is called, it
|
Each time a nested backtracking point is passed, a new memory "frame" is used
|
||||||
limits the depth of recursion. The recursion depth is a smaller number than the
|
to remember the state of matching at that point. Thus, this parameter
|
||||||
total number of calls, because not all calls to \fBmatch()\fP are recursive.
|
indirectly limits the amount of memory that is used in a match.
|
||||||
This limit is of use only if it is set smaller than \fImatch_limit\fP.
|
|
||||||
.P
|
.P
|
||||||
Limiting the recursion depth limits the amount of system stack that can be
|
This limit is not relevant, and is ignored, when matching is done using JIT
|
||||||
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
|
compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which uses
|
||||||
stack, the amount of heap memory that can be used. This limit is not relevant,
|
it to limit the depth of internal recursive function calls that implement
|
||||||
and is ignored, when matching is done using JIT compiled code. However, it is
|
lookaround assertions and pattern recursions. This is, therefore, an indirect
|
||||||
supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less
|
limit on the amount of system stack that is used. A recursive pattern such as
|
||||||
frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of
|
/(.)(?1)/, when matched to a very long string using \fBpcre2_dfa_match()\fP,
|
||||||
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
|
can use a great deal of stack.
|
||||||
.P
|
.P
|
||||||
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the
|
The default value for the depth limit can be set when PCRE2 is built; the
|
||||||
default default is the same value as the default for \fImatch_limit\fP. If the
|
default default is the same value as the default for the match limit. If the
|
||||||
limit is exceeded, \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP return
|
limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP returns
|
||||||
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
|
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
||||||
supplied by an item at the start of a pattern of the form
|
item at the start of a pattern of the form
|
||||||
.sp
|
.sp
|
||||||
(*LIMIT_RECURSION=ddd)
|
(*LIMIT_DEPTH=ddd)
|
||||||
.sp
|
.sp
|
||||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||||
less than the limit set by the caller of \fBpcre2_match()\fP or
|
less than the limit set by the caller of \fBpcre2_match()\fP or
|
||||||
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
|
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
|
||||||
.sp
|
|
||||||
.nf
|
|
||||||
.B int pcre2_set_recursion_memory_management(
|
|
||||||
.B " pcre2_match_context *\fImcontext\fP,"
|
|
||||||
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
|
|
||||||
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
|
|
||||||
.fi
|
|
||||||
.sp
|
|
||||||
This function sets up two additional custom memory management functions for use
|
|
||||||
by \fBpcre2_match()\fP when PCRE2 is compiled to use the heap for remembering
|
|
||||||
backtracking data, instead of recursive function calls that use the system
|
|
||||||
stack. There is a discussion about PCRE2's stack usage in the
|
|
||||||
.\" HREF
|
|
||||||
\fBpcre2stack\fP
|
|
||||||
.\"
|
|
||||||
documentation. See the
|
|
||||||
.\" HREF
|
|
||||||
\fBpcre2build\fP
|
|
||||||
.\"
|
|
||||||
documentation for details of how to build PCRE2.
|
|
||||||
.P
|
|
||||||
Using the heap for recursion is a non-standard way of building PCRE2, for use
|
|
||||||
in environments that have limited stacks. Because of the greater use of memory
|
|
||||||
management, \fBpcre2_match()\fP runs more slowly. Functions that are different
|
|
||||||
to the general custom memory functions are provided so that special-purpose
|
|
||||||
external code can be used for this case, because the memory blocks are all the
|
|
||||||
same size. The blocks are retained by \fBpcre2_match()\fP until it is about to
|
|
||||||
exit so that they can be re-used when possible during the match. In the absence
|
|
||||||
of these functions, the normal custom memory management functions are used, if
|
|
||||||
supplied, otherwise the system functions.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "CHECKING BUILD-TIME OPTIONS"
|
.SH "CHECKING BUILD-TIME OPTIONS"
|
||||||
|
@ -920,6 +914,13 @@ sequences the \eR escape sequence matches by default. A value of
|
||||||
PCRE2_BSR_UNICODE means that \eR matches any Unicode line ending sequence; a
|
PCRE2_BSR_UNICODE means that \eR matches any Unicode line ending sequence; a
|
||||||
value of PCRE2_BSR_ANYCRLF means that \eR matches only CR, LF, or CRLF. The
|
value of PCRE2_BSR_ANYCRLF means that \eR matches only CR, LF, or CRLF. The
|
||||||
default can be overridden when a pattern is compiled.
|
default can be overridden when a pattern is compiled.
|
||||||
|
.sp
|
||||||
|
PCRE2_CONFIG_DEPTHLIMIT
|
||||||
|
.sp
|
||||||
|
The output is a uint32_t integer that gives the default limit for the depth of
|
||||||
|
nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions
|
||||||
|
and lookarounds in \fBpcre2_dfa_match()\fP. Further details are given with
|
||||||
|
\fBpcre2_set_depth_limit()\fP above.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_JIT
|
PCRE2_CONFIG_JIT
|
||||||
.sp
|
.sp
|
||||||
|
@ -954,9 +955,9 @@ be compiled by those two libraries, but at the expense of slower matching.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_MATCHLIMIT
|
PCRE2_CONFIG_MATCHLIMIT
|
||||||
.sp
|
.sp
|
||||||
The output is a uint32_t integer that gives the default limit for the number of
|
The output is a uint32_t integer that gives the default match limit for
|
||||||
internal matching function calls in a \fBpcre2_match()\fP execution. Further
|
\fBpcre2_match()\fP. Further details are given with
|
||||||
details are given with \fBpcre2_match()\fP below.
|
\fBpcre2_set_match_limit()\fP above.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_NEWLINE
|
PCRE2_CONFIG_NEWLINE
|
||||||
.sp
|
.sp
|
||||||
|
@ -980,20 +981,11 @@ amount of system stack used when a pattern is compiled. It is specified when
|
||||||
PCRE2 is built; the default is 250. This limit does not take into account the
|
PCRE2 is built; the default is 250. This limit does not take into account the
|
||||||
stack that may already be used by the calling application. For finer control
|
stack that may already be used by the calling application. For finer control
|
||||||
over compilation stack usage, see \fBpcre2_set_compile_recursion_guard()\fP.
|
over compilation stack usage, see \fBpcre2_set_compile_recursion_guard()\fP.
|
||||||
.sp
|
|
||||||
PCRE2_CONFIG_RECURSIONLIMIT
|
|
||||||
.sp
|
|
||||||
The output is a uint32_t integer that gives the default limit for the depth of
|
|
||||||
recursion when calling the internal matching function in a \fBpcre2_match()\fP
|
|
||||||
execution. Further details are given with \fBpcre2_match()\fP below.
|
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_STACKRECURSE
|
PCRE2_CONFIG_STACKRECURSE
|
||||||
.sp
|
.sp
|
||||||
The output is a uint32_t integer that is set to one if internal recursion when
|
This parameter is obsolete and should not be used in new code. The output is a
|
||||||
running \fBpcre2_match()\fP is implemented by recursive function calls that use
|
uint32_t integer that is always set to zero.
|
||||||
the system stack to remember their state. This is the usual way that PCRE2 is
|
|
||||||
compiled. The output is zero if PCRE2 was compiled to use blocks of data on the
|
|
||||||
heap instead of recursive function calls.
|
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_UNICODE_VERSION
|
PCRE2_CONFIG_UNICODE_VERSION
|
||||||
.sp
|
.sp
|
||||||
|
@ -1012,7 +1004,7 @@ available; otherwise it is set to zero. Unicode support implies UTF support.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_CONFIG_VERSION
|
PCRE2_CONFIG_VERSION
|
||||||
.sp
|
.sp
|
||||||
The \fIwhere\fP argument should point to a buffer that is at least 12 code
|
The \fIwhere\fP argument should point to a buffer that is at least 24 code
|
||||||
units long. (The exact length required can be found by calling
|
units long. (The exact length required can be found by calling
|
||||||
\fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with
|
\fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with
|
||||||
the PCRE2 version string, zero-terminated. The number of code units used is
|
the PCRE2 version string, zero-terminated. The number of code units used is
|
||||||
|
@ -1208,13 +1200,14 @@ option is set, normal backslash processing is applied to verb names and only an
|
||||||
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
||||||
included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED
|
included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED
|
||||||
option is set, unescaped whitespace in verb names is skipped and #-comments are
|
option is set, unescaped whitespace in verb names is skipped and #-comments are
|
||||||
recognized, exactly as in the rest of the pattern.
|
recognized in this mode, exactly as in the rest of the pattern.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_AUTO_CALLOUT
|
PCRE2_AUTO_CALLOUT
|
||||||
.sp
|
.sp
|
||||||
If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
|
If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
|
||||||
all with number 255, before each pattern item, except immediately before or
|
all with number 255, before each pattern item, except immediately before or
|
||||||
after a callout in the pattern. For discussion of the callout facility, see the
|
after an explicit callout in the pattern. For discussion of the callout
|
||||||
|
facility, see the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2callout\fP
|
\fBpcre2callout\fP
|
||||||
.\"
|
.\"
|
||||||
|
@ -1452,9 +1445,8 @@ in the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2unicode\fP
|
\fBpcre2unicode\fP
|
||||||
.\"
|
.\"
|
||||||
document.
|
document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a
|
||||||
If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a negative
|
negative error code.
|
||||||
error code.
|
|
||||||
.P
|
.P
|
||||||
If you know that your pattern is valid, and you want to skip this check for
|
If you know that your pattern is valid, and you want to skip this check for
|
||||||
performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When it is set,
|
performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When it is set,
|
||||||
|
@ -1479,7 +1471,7 @@ in the
|
||||||
.\"
|
.\"
|
||||||
page. If you set PCRE2_UCP, matching one of the items it affects takes much
|
page. If you set PCRE2_UCP, matching one of the items it affects takes much
|
||||||
longer. The option is available only if PCRE2 has been compiled with Unicode
|
longer. The option is available only if PCRE2 has been compiled with Unicode
|
||||||
support.
|
support (which is the default).
|
||||||
.sp
|
.sp
|
||||||
PCRE2_UNGREEDY
|
PCRE2_UNGREEDY
|
||||||
.sp
|
.sp
|
||||||
|
@ -1518,7 +1510,7 @@ page.
|
||||||
.SH "COMPILATION ERROR CODES"
|
.SH "COMPILATION ERROR CODES"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
There are over 80 positive error codes that \fBpcre2_compile()\fP may return
|
There are nearly 100 positive error codes that \fBpcre2_compile()\fP may return
|
||||||
(via \fIerrorcode\fP) if it finds an error in the pattern. There are also some
|
(via \fIerrorcode\fP) if it finds an error in the pattern. There are also some
|
||||||
negative error codes that are used for invalid UTF strings. These are the same
|
negative error codes that are used for invalid UTF strings. These are the same
|
||||||
as given by \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, and are described
|
as given by \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, and are described
|
||||||
|
@ -1570,7 +1562,7 @@ documentation.
|
||||||
JIT compilation is a heavyweight optimization. It can take some time for
|
JIT compilation is a heavyweight optimization. It can take some time for
|
||||||
patterns to be analyzed, and for one-off matches and simple patterns the
|
patterns to be analyzed, and for one-off matches and simple patterns the
|
||||||
benefit of faster execution might be offset by a much slower compilation time.
|
benefit of faster execution might be offset by a much slower compilation time.
|
||||||
Most, but not all patterns can be optimized by the JIT compiler.
|
Most (but not all) patterns can be optimized by the JIT compiler.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="localesupport"></a>
|
.\" HTML <a name="localesupport"></a>
|
||||||
|
@ -1581,10 +1573,10 @@ PCRE2 handles caseless matching, and determines whether characters are letters,
|
||||||
digits, or whatever, by reference to a set of tables, indexed by character code
|
digits, or whatever, by reference to a set of tables, indexed by character code
|
||||||
point. This applies only to characters whose code points are less than 256. By
|
point. This applies only to characters whose code points are less than 256. By
|
||||||
default, higher-valued code points never match escapes such as \ew or \ed.
|
default, higher-valued code points never match escapes such as \ew or \ed.
|
||||||
However, if PCRE2 is built with UTF support, all characters can be tested with
|
However, if PCRE2 is built with Unicode support, all characters can be tested
|
||||||
\ep and \eP, or, alternatively, the PCRE2_UCP option can be set when a pattern
|
with \ep and \eP, or, alternatively, the PCRE2_UCP option can be set when a
|
||||||
is compiled; this causes \ew and friends to use Unicode property support
|
pattern is compiled; this causes \ew and friends to use Unicode property
|
||||||
instead of the built-in tables.
|
support instead of the built-in tables.
|
||||||
.P
|
.P
|
||||||
The use of locales with Unicode is discouraged. If you are handling characters
|
The use of locales with Unicode is discouraged. If you are handling characters
|
||||||
with code points greater than 128, you should either use Unicode support, or
|
with code points greater than 128, you should either use Unicode support, or
|
||||||
|
@ -1623,7 +1615,7 @@ available for as long as it is needed.
|
||||||
The pointer that is passed (via the compile context) to \fBpcre2_compile()\fP
|
The pointer that is passed (via the compile context) to \fBpcre2_compile()\fP
|
||||||
is saved with the compiled pattern, and the same tables are used by
|
is saved with the compiled pattern, and the same tables are used by
|
||||||
\fBpcre2_match()\fP and \fBpcre_dfa_match()\fP. Thus, for any single pattern,
|
\fBpcre2_match()\fP and \fBpcre_dfa_match()\fP. Thus, for any single pattern,
|
||||||
compilation, and matching all happen in the same locale, but different patterns
|
compilation and matching both happen in the same locale, but different patterns
|
||||||
can be processed in different locales.
|
can be processed in different locales.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
@ -1646,7 +1638,7 @@ pattern. The second argument specifies which piece of information is required,
|
||||||
and the third argument is a pointer to a variable to receive the data. If the
|
and the third argument is a pointer to a variable to receive the data. If the
|
||||||
third argument is NULL, the first argument is ignored, and the function returns
|
third argument is NULL, the first argument is ignored, and the function returns
|
||||||
the size in bytes of the variable that is required for the information
|
the size in bytes of the variable that is required for the information
|
||||||
requested. Otherwise, The yield of the function is zero for success, or one of
|
requested. Otherwise, the yield of the function is zero for success, or one of
|
||||||
the following negative numbers:
|
the following negative numbers:
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_NULL the argument \fIcode\fP was NULL
|
PCRE2_ERROR_NULL the argument \fIcode\fP was NULL
|
||||||
|
@ -1699,8 +1691,8 @@ following are true:
|
||||||
.* is not in a capturing group that is the subject
|
.* is not in a capturing group that is the subject
|
||||||
of a back reference
|
of a back reference
|
||||||
PCRE2_DOTALL is in force for .*
|
PCRE2_DOTALL is in force for .*
|
||||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern.
|
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||||
PCRE2_NO_DOTSTAR_ANCHOR is not set.
|
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
||||||
.sp
|
.sp
|
||||||
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
|
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
|
||||||
options returned for PCRE2_INFO_ALLOPTIONS.
|
options returned for PCRE2_INFO_ALLOPTIONS.
|
||||||
|
@ -1727,6 +1719,13 @@ matches only CR, LF, or CRLF.
|
||||||
Return the highest capturing subpattern number in the pattern. In patterns
|
Return the highest capturing subpattern number in the pattern. In patterns
|
||||||
where (?| is not used, this is also the total number of capturing subpatterns.
|
where (?| is not used, this is also the total number of capturing subpatterns.
|
||||||
The third argument should point to an \fBuint32_t\fP variable.
|
The third argument should point to an \fBuint32_t\fP variable.
|
||||||
|
.sp
|
||||||
|
PCRE2_INFO_DEPTHLIMIT
|
||||||
|
.sp
|
||||||
|
If the pattern set a backtracking depth limit by including an item of the form
|
||||||
|
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||||
|
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||||
|
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_FIRSTBITMAP
|
PCRE2_INFO_FIRSTBITMAP
|
||||||
.sp
|
.sp
|
||||||
|
@ -1758,6 +1757,14 @@ argument should point to an \fBuint32_t\fP variable. In the 8-bit library, the
|
||||||
value is always less than 256. In the 16-bit library the value can be up to
|
value is always less than 256. In the 16-bit library the value can be up to
|
||||||
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
|
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
|
||||||
and up to 0xffffffff when not using UTF-32 mode.
|
and up to 0xffffffff when not using UTF-32 mode.
|
||||||
|
.sp
|
||||||
|
PCRE2_INFO_FRAMESIZE
|
||||||
|
.sp
|
||||||
|
Return the size (in bytes) of the data frames that are used to remember
|
||||||
|
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
|
||||||
|
without the use of JIT. The third argument should point to an \fBsize_t\fP
|
||||||
|
variable. The frame size depends on the number of capturing parentheses in the
|
||||||
|
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_HASBACKSLASHC
|
PCRE2_INFO_HASBACKSLASHC
|
||||||
.sp
|
.sp
|
||||||
|
@ -1768,7 +1775,8 @@ argument should point to an \fBuint32_t\fP variable.
|
||||||
.sp
|
.sp
|
||||||
Return 1 if the pattern contains any explicit matches for CR or LF characters,
|
Return 1 if the pattern contains any explicit matches for CR or LF characters,
|
||||||
otherwise 0. The third argument should point to an \fBuint32_t\fP variable. An
|
otherwise 0. The third argument should point to an \fBuint32_t\fP variable. An
|
||||||
explicit match is either a literal CR or LF character, or \er or \en.
|
explicit match is either a literal CR or LF character, or \er or \en or one of
|
||||||
|
the equivalent hexadecimal or octal escape sequences.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_JCHANGED
|
PCRE2_INFO_JCHANGED
|
||||||
.sp
|
.sp
|
||||||
|
@ -1907,7 +1915,7 @@ different for each compiled pattern.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_NEWLINE
|
PCRE2_INFO_NEWLINE
|
||||||
.sp
|
.sp
|
||||||
The output is a \fBuint32_t\fP with one of the following values:
|
The output is one of the following \fBuint32_t\fP values:
|
||||||
.sp
|
.sp
|
||||||
PCRE2_NEWLINE_CR Carriage return (CR)
|
PCRE2_NEWLINE_CR Carriage return (CR)
|
||||||
PCRE2_NEWLINE_LF Linefeed (LF)
|
PCRE2_NEWLINE_LF Linefeed (LF)
|
||||||
|
@ -1915,15 +1923,8 @@ The output is a \fBuint32_t\fP with one of the following values:
|
||||||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||||
.sp
|
.sp
|
||||||
This specifies the default character sequence that will be recognized as
|
This identifies the character sequence that will be recognized as meaning
|
||||||
meaning "newline" while matching.
|
"newline" while matching.
|
||||||
.sp
|
|
||||||
PCRE2_INFO_RECURSIONLIMIT
|
|
||||||
.sp
|
|
||||||
If the pattern set a recursion limit by including an item of the form
|
|
||||||
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
|
|
||||||
argument should point to an unsigned 32-bit integer. If no such value has been
|
|
||||||
set, the call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
|
|
||||||
.sp
|
.sp
|
||||||
PCRE2_INFO_SIZE
|
PCRE2_INFO_SIZE
|
||||||
.sp
|
.sp
|
||||||
|
@ -2000,9 +2001,9 @@ Before calling \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or
|
||||||
the creation functions above. For \fBpcre2_match_data_create()\fP, the first
|
the creation functions above. For \fBpcre2_match_data_create()\fP, the first
|
||||||
argument is the number of pairs of offsets in the \fIovector\fP. One pair of
|
argument is the number of pairs of offsets in the \fIovector\fP. One pair of
|
||||||
offsets is required to identify the string that matched the whole pattern, with
|
offsets is required to identify the string that matched the whole pattern, with
|
||||||
another pair for each captured substring. For example, a value of 4 creates
|
an additional pair for each captured substring. For example, a value of 4
|
||||||
enough space to record the matched portion of the subject plus three captured
|
creates enough space to record the matched portion of the subject plus three
|
||||||
substrings. A minimum of at least 1 pair is imposed by
|
captured substrings. A minimum of at least 1 pair is imposed by
|
||||||
\fBpcre2_match_data_create()\fP, so it is always possible to return the overall
|
\fBpcre2_match_data_create()\fP, so it is always possible to return the overall
|
||||||
matched string.
|
matched string.
|
||||||
.P
|
.P
|
||||||
|
@ -2145,9 +2146,11 @@ newline convention recognizes CRLF as a newline, and if so, and the current
|
||||||
character is CR followed by LF, advance the starting offset by two characters
|
character is CR followed by LF, advance the starting offset by two characters
|
||||||
instead of one.
|
instead of one.
|
||||||
.P
|
.P
|
||||||
If a non-zero starting offset is passed when the pattern is anchored, one
|
If a non-zero starting offset is passed when the pattern is anchored, an single
|
||||||
attempt to match at the given offset is made. This can only succeed if the
|
attempt to match at the given offset is made. This can only succeed if the
|
||||||
pattern does not require the match to be at the start of the subject.
|
pattern does not require the match to be at the start of the subject. In other
|
||||||
|
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
|
||||||
|
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="matchoptions"></a>
|
.\" HTML <a name="matchoptions"></a>
|
||||||
|
@ -2161,9 +2164,9 @@ PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is
|
||||||
described below.
|
described below.
|
||||||
.P
|
.P
|
||||||
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
|
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
|
||||||
compiler. If it is set, JIT matching is disabled and the normal interpretive
|
compiler. If it is set, JIT matching is disabled and the interpretive code in
|
||||||
code in \fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT (obviously), the
|
\fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT (obviously), the remaining
|
||||||
remaining options are supported for JIT matching.
|
options are supported for JIT matching.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ANCHORED
|
PCRE2_ANCHORED
|
||||||
.sp
|
.sp
|
||||||
|
@ -2257,12 +2260,12 @@ page.
|
||||||
If you know that your subject is valid, and you want to skip these checks for
|
If you know that your subject is valid, and you want to skip these checks for
|
||||||
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
|
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
|
||||||
\fBpcre2_match()\fP. You might want to do this for the second and subsequent
|
\fBpcre2_match()\fP. You might want to do this for the second and subsequent
|
||||||
calls to \fBpcre2_match()\fP if you are making repeated calls to find all the
|
calls to \fBpcre2_match()\fP if you are making repeated calls to find other
|
||||||
matches in a single subject string.
|
matches in the same subject string.
|
||||||
.P
|
.P
|
||||||
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid string
|
WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
|
||||||
as a subject, or an invalid value of \fIstartoffset\fP, is undefined. Your
|
string as a subject, or an invalid value of \fIstartoffset\fP, is undefined.
|
||||||
program may crash or loop indefinitely.
|
Your program may crash or loop indefinitely.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_PARTIAL_HARD
|
PCRE2_PARTIAL_HARD
|
||||||
PCRE2_PARTIAL_SOFT
|
PCRE2_PARTIAL_SOFT
|
||||||
|
@ -2329,9 +2332,9 @@ start, it skips both the CR and the LF before retrying. However, the pattern
|
||||||
reference, and so advances only by one character after the first failure.
|
reference, and so advances only by one character after the first failure.
|
||||||
.P
|
.P
|
||||||
An explicit match for CR of LF is either a literal appearance of one of those
|
An explicit match for CR of LF is either a literal appearance of one of those
|
||||||
characters in the pattern, or one of the \er or \en escape sequences. Implicit
|
characters in the pattern, or one of the \er or \en or equivalent octal or
|
||||||
matches such as [^X] do not count, nor does \es, even though it includes CR and
|
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
|
||||||
LF in the characters that it matches.
|
does \es, even though it includes CR and LF in the characters that it matches.
|
||||||
.P
|
.P
|
||||||
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
||||||
valid newline sequence and explicit \er or \en escapes appear in the pattern.
|
valid newline sequence and explicit \er or \en escapes appear in the pattern.
|
||||||
|
@ -2395,12 +2398,12 @@ identify the part of the subject that was partially matched. See the
|
||||||
.\"
|
.\"
|
||||||
documentation for details of partial matching.
|
documentation for details of partial matching.
|
||||||
.P
|
.P
|
||||||
After a successful match, the first pair of offsets identifies the portion of
|
After a fully successful match, the first pair of offsets identifies the
|
||||||
the subject string that was matched by the entire pattern. The next pair is
|
portion of the subject string that was matched by the entire pattern. The next
|
||||||
used for the first capturing subpattern, and so on. The value returned by
|
pair is used for the first captured substring, and so on. The value returned by
|
||||||
\fBpcre2_match()\fP is one more than the highest numbered pair that has been
|
\fBpcre2_match()\fP is one more than the highest numbered pair that has been
|
||||||
set. For example, if two substrings have been captured, the returned value is
|
set. For example, if two substrings have been captured, the returned value is
|
||||||
3. If there are no capturing subpatterns, the return value from a successful
|
3. If there are no captured substrings, the return value from a successful
|
||||||
match is 1, indicating that just the first pair of offsets has been set.
|
match is 1, indicating that just the first pair of offsets has been set.
|
||||||
.P
|
.P
|
||||||
If a pattern uses the \eK escape sequence within a positive assertion, the
|
If a pattern uses the \eK escape sequence within a positive assertion, the
|
||||||
|
@ -2415,11 +2418,7 @@ returned.
|
||||||
If the ovector is too small to hold all the captured substring offsets, as much
|
If the ovector is too small to hold all the captured substring offsets, as much
|
||||||
as possible is filled in, and the function returns a value of zero. If captured
|
as possible is filled in, and the function returns a value of zero. If captured
|
||||||
substrings are not of interest, \fBpcre2_match()\fP may be called with a match
|
substrings are not of interest, \fBpcre2_match()\fP may be called with a match
|
||||||
data block whose ovector is of minimum length (that is, one pair). However, if
|
data block whose ovector is of minimum length (that is, one pair).
|
||||||
the pattern contains back references and the \fIovector\fP is not big enough to
|
|
||||||
remember the related substrings, PCRE2 has to get additional memory for use
|
|
||||||
during matching. Thus it is usually advisable to set up a match data block
|
|
||||||
containing an ovector of reasonable size.
|
|
||||||
.P
|
.P
|
||||||
It is possible for capturing subpattern number \fIn+1\fP to match some part of
|
It is possible for capturing subpattern number \fIn+1\fP to match some part of
|
||||||
the subject when subpattern \fIn\fP has not been used at all. For example, if
|
the subject when subpattern \fIn\fP has not been used at all. For example, if
|
||||||
|
@ -2535,8 +2534,9 @@ returned when the magic number is not present.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_BADMODE
|
PCRE2_ERROR_BADMODE
|
||||||
.sp
|
.sp
|
||||||
This error is given when a pattern that was compiled by the 8-bit library is
|
This error is given when a compiled pattern is passed to a function in a
|
||||||
passed to a 16-bit or 32-bit library function, or vice versa.
|
library of a different code unit width, for example, a pattern compiled by
|
||||||
|
the 8-bit library is passed to a 16-bit or 32-bit library function.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_BADOFFSET
|
PCRE2_ERROR_BADOFFSET
|
||||||
.sp
|
.sp
|
||||||
|
@ -2562,22 +2562,15 @@ use by callout functions that want to cause \fBpcre2_match()\fP or
|
||||||
\fBpcre2callout\fP
|
\fBpcre2callout\fP
|
||||||
.\"
|
.\"
|
||||||
documentation for details.
|
documentation for details.
|
||||||
|
.sp
|
||||||
|
PCRE2_ERROR_DEPTHLIMIT
|
||||||
|
.sp
|
||||||
|
The nested backtracking depth limit was reached.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_INTERNAL
|
PCRE2_ERROR_INTERNAL
|
||||||
.sp
|
.sp
|
||||||
An unexpected internal error has occurred. This error could be caused by a bug
|
An unexpected internal error has occurred. This error could be caused by a bug
|
||||||
in PCRE2 or by overwriting of the compiled pattern.
|
in PCRE2 or by overwriting of the compiled pattern.
|
||||||
.sp
|
|
||||||
PCRE2_ERROR_JIT_BADOPTION
|
|
||||||
.sp
|
|
||||||
This error is returned when a pattern that was successfully studied using JIT
|
|
||||||
is being matched, but the matching mode (partial or complete match) does not
|
|
||||||
correspond to any JIT compilation mode. When the JIT fast path function is
|
|
||||||
used, this error may be also given for invalid options. See the
|
|
||||||
.\" HREF
|
|
||||||
\fBpcre2jit\fP
|
|
||||||
.\"
|
|
||||||
documentation for more details.
|
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_JIT_STACKLIMIT
|
PCRE2_ERROR_JIT_STACKLIMIT
|
||||||
.sp
|
.sp
|
||||||
|
@ -2591,15 +2584,13 @@ documentation for more details.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_MATCHLIMIT
|
PCRE2_ERROR_MATCHLIMIT
|
||||||
.sp
|
.sp
|
||||||
The backtracking limit was reached.
|
The backtracking match limit was reached.
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_NOMEMORY
|
PCRE2_ERROR_NOMEMORY
|
||||||
.sp
|
.sp
|
||||||
If a pattern contains back references, but the ovector is not big enough to
|
If a pattern contains many nested backtracking points, heap memory is used to
|
||||||
remember the referenced substrings, PCRE2 gets a block of memory at the start
|
remember them. This error is given when the memory allocation function (default
|
||||||
of matching to use for this purpose. There are some other special cases where
|
or custom) fails.
|
||||||
extra memory is needed during matching. This error is given when memory cannot
|
|
||||||
be obtained.
|
|
||||||
.sp
|
.sp
|
||||||
PCRE2_ERROR_NULL
|
PCRE2_ERROR_NULL
|
||||||
.sp
|
.sp
|
||||||
|
@ -2615,10 +2606,6 @@ in the subject string. Some simple patterns that might do this are detected and
|
||||||
faulted at compile time, but more complicated cases, in particular mutual
|
faulted at compile time, but more complicated cases, in particular mutual
|
||||||
recursions between two different subpatterns, cannot be detected until matching
|
recursions between two different subpatterns, cannot be detected until matching
|
||||||
is attempted.
|
is attempted.
|
||||||
.sp
|
|
||||||
PCRE2_ERROR_RECURSIONLIMIT
|
|
||||||
.sp
|
|
||||||
The internal recursion limit was reached.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="geterrormessage"></a>
|
.\" HTML <a name="geterrormessage"></a>
|
||||||
|
@ -2808,8 +2795,8 @@ calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
|
||||||
compiled pattern, and the second is the name. The yield of the function is the
|
compiled pattern, and the second is the name. The yield of the function is the
|
||||||
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
|
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
|
||||||
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
|
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
|
||||||
that name. Given the number, you can extract the substring directly, or use one
|
that name. Given the number, you can extract the substring directly from the
|
||||||
of the functions described above.
|
ovector, or use one of the "bynumber" functions described above.
|
||||||
.P
|
.P
|
||||||
For convenience, there are also "byname" functions that correspond to the
|
For convenience, there are also "byname" functions that correspond to the
|
||||||
"bynumber" functions, the only difference being that the second argument is a
|
"bynumber" functions, the only difference being that the second argument is a
|
||||||
|
@ -3113,11 +3100,12 @@ other alternatives. Ultimately, when it runs out of matches,
|
||||||
.P
|
.P
|
||||||
The function \fBpcre2_dfa_match()\fP is called to match a subject string
|
The function \fBpcre2_dfa_match()\fP is called to match a subject string
|
||||||
against a compiled pattern, using a matching algorithm that scans the subject
|
against a compiled pattern, using a matching algorithm that scans the subject
|
||||||
string just once, and does not backtrack. This has different characteristics to
|
string just once (not counting lookaround assertions), and does not backtrack.
|
||||||
the normal algorithm, and is not compatible with Perl. Some of the features of
|
This has different characteristics to the normal algorithm, and is not
|
||||||
PCRE2 patterns are not supported. Nevertheless, there are times when this kind
|
compatible with Perl. Some of the features of PCRE2 patterns are not supported.
|
||||||
of matching can be useful. For a discussion of the two matching algorithms, and
|
Nevertheless, there are times when this kind of matching can be useful. For a
|
||||||
a list of features that \fBpcre2_dfa_match()\fP does not support, see the
|
discussion of the two matching algorithms, and a list of features that
|
||||||
|
\fBpcre2_dfa_match()\fP does not support, see the
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2matching\fP
|
\fBpcre2matching\fP
|
||||||
.\"
|
.\"
|
||||||
|
@ -3321,6 +3309,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 21 March 2017
|
Last updated: 27 March 2017
|
||||||
Copyright (c) 1997-2017 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
Loading…
Reference in New Issue