Documentation update.
This commit is contained in:
parent
447d1b3083
commit
6c7fa44939
|
@ -46,7 +46,7 @@ A match context is needed only if you want to:
|
|||
Set a matching offset limit
|
||||
Change the backtracking match limit
|
||||
Change the backtracking depth limit
|
||||
Set custom memory management in the match context
|
||||
Set custom memory management specifically for the match
|
||||
</pre>
|
||||
The <i>length</i> and <i>startoffset</i> values are code
|
||||
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
||||
|
|
|
@ -23,37 +23,38 @@ please consult the man page, in case the conversion went wrong.
|
|||
<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
|
||||
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a>
|
||||
<li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
|
||||
<li><a name="TOC11" href="#SEC11">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
||||
<li><a name="TOC12" href="#SEC12">PCRE2 API OVERVIEW</a>
|
||||
<li><a name="TOC13" href="#SEC13">STRING LENGTHS AND OFFSETS</a>
|
||||
<li><a name="TOC14" href="#SEC14">NEWLINES</a>
|
||||
<li><a name="TOC15" href="#SEC15">MULTITHREADING</a>
|
||||
<li><a name="TOC16" href="#SEC16">PCRE2 CONTEXTS</a>
|
||||
<li><a name="TOC17" href="#SEC17">CHECKING BUILD-TIME OPTIONS</a>
|
||||
<li><a name="TOC18" href="#SEC18">COMPILING A PATTERN</a>
|
||||
<li><a name="TOC19" href="#SEC19">COMPILATION ERROR CODES</a>
|
||||
<li><a name="TOC20" href="#SEC20">JUST-IN-TIME (JIT) COMPILATION</a>
|
||||
<li><a name="TOC21" href="#SEC21">LOCALE SUPPORT</a>
|
||||
<li><a name="TOC22" href="#SEC22">INFORMATION ABOUT A COMPILED PATTERN</a>
|
||||
<li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A PATTERN'S CALLOUTS</a>
|
||||
<li><a name="TOC24" href="#SEC24">SERIALIZATION AND PRECOMPILING</a>
|
||||
<li><a name="TOC25" href="#SEC25">THE MATCH DATA BLOCK</a>
|
||||
<li><a name="TOC26" href="#SEC26">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
||||
<li><a name="TOC27" href="#SEC27">NEWLINE HANDLING WHEN MATCHING</a>
|
||||
<li><a name="TOC28" href="#SEC28">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC29" href="#SEC29">OTHER INFORMATION ABOUT A MATCH</a>
|
||||
<li><a name="TOC30" href="#SEC30">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
||||
<li><a name="TOC31" href="#SEC31">OBTAINING A TEXTUAL ERROR MESSAGE</a>
|
||||
<li><a name="TOC32" href="#SEC32">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
||||
<li><a name="TOC33" href="#SEC33">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC34" href="#SEC34">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
||||
<li><a name="TOC35" href="#SEC35">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
||||
<li><a name="TOC36" href="#SEC36">DUPLICATE SUBPATTERN NAMES</a>
|
||||
<li><a name="TOC37" href="#SEC37">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
||||
<li><a name="TOC38" href="#SEC38">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
||||
<li><a name="TOC39" href="#SEC39">SEE ALSO</a>
|
||||
<li><a name="TOC40" href="#SEC40">AUTHOR</a>
|
||||
<li><a name="TOC41" href="#SEC41">REVISION</a>
|
||||
<li><a name="TOC11" href="#SEC11">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a>
|
||||
<li><a name="TOC12" href="#SEC12">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
||||
<li><a name="TOC13" href="#SEC13">PCRE2 API OVERVIEW</a>
|
||||
<li><a name="TOC14" href="#SEC14">STRING LENGTHS AND OFFSETS</a>
|
||||
<li><a name="TOC15" href="#SEC15">NEWLINES</a>
|
||||
<li><a name="TOC16" href="#SEC16">MULTITHREADING</a>
|
||||
<li><a name="TOC17" href="#SEC17">PCRE2 CONTEXTS</a>
|
||||
<li><a name="TOC18" href="#SEC18">CHECKING BUILD-TIME OPTIONS</a>
|
||||
<li><a name="TOC19" href="#SEC19">COMPILING A PATTERN</a>
|
||||
<li><a name="TOC20" href="#SEC20">COMPILATION ERROR CODES</a>
|
||||
<li><a name="TOC21" href="#SEC21">JUST-IN-TIME (JIT) COMPILATION</a>
|
||||
<li><a name="TOC22" href="#SEC22">LOCALE SUPPORT</a>
|
||||
<li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A COMPILED PATTERN</a>
|
||||
<li><a name="TOC24" href="#SEC24">INFORMATION ABOUT A PATTERN'S CALLOUTS</a>
|
||||
<li><a name="TOC25" href="#SEC25">SERIALIZATION AND PRECOMPILING</a>
|
||||
<li><a name="TOC26" href="#SEC26">THE MATCH DATA BLOCK</a>
|
||||
<li><a name="TOC27" href="#SEC27">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
||||
<li><a name="TOC28" href="#SEC28">NEWLINE HANDLING WHEN MATCHING</a>
|
||||
<li><a name="TOC29" href="#SEC29">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC30" href="#SEC30">OTHER INFORMATION ABOUT A MATCH</a>
|
||||
<li><a name="TOC31" href="#SEC31">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
||||
<li><a name="TOC32" href="#SEC32">OBTAINING A TEXTUAL ERROR MESSAGE</a>
|
||||
<li><a name="TOC33" href="#SEC33">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
||||
<li><a name="TOC34" href="#SEC34">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC35" href="#SEC35">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
||||
<li><a name="TOC36" href="#SEC36">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
||||
<li><a name="TOC37" href="#SEC37">DUPLICATE SUBPATTERN NAMES</a>
|
||||
<li><a name="TOC38" href="#SEC38">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
||||
<li><a name="TOC39" href="#SEC39">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
||||
<li><a name="TOC40" href="#SEC40">SEE ALSO</a>
|
||||
<li><a name="TOC41" href="#SEC41">AUTHOR</a>
|
||||
<li><a name="TOC42" href="#SEC42">REVISION</a>
|
||||
</ul>
|
||||
<P>
|
||||
<b>#include <pcre2.h></b>
|
||||
|
@ -177,22 +178,16 @@ document for an overview of all the PCRE2 documentation.
|
|||
<b> void *<i>callout_data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> PCRE2_SIZE <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_recursion_memory_management(</b>
|
||||
<b> pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
|
||||
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
|
||||
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a><br>
|
||||
<P>
|
||||
|
@ -314,7 +309,24 @@ document for an overview of all the PCRE2 documentation.
|
|||
<br>
|
||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
||||
<br><a name="SEC11" href="#TOC1">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_recursion_memory_management(</b>
|
||||
<b> pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
|
||||
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
These functions became obsolete at release 10.30 and are retained only for
|
||||
backward compatibility. They should not be used in new code. The first is
|
||||
replaced by <b>pcre2_set_depth_limit()</b>; the second is no longer needed and
|
||||
no longer has any effect (it always returns zero).
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
||||
<P>
|
||||
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
|
||||
units, respectively. However, there is just one header file, <b>pcre2.h</b>.
|
||||
|
@ -368,14 +380,14 @@ When using multiple libraries in an application, you must take care when
|
|||
processing any particular pattern to use only functions from a single library.
|
||||
For example, if you want to run a match using a pattern that was compiled with
|
||||
<b>pcre2_compile_16()</b>, you must do so with <b>pcre2_match_16()</b>, not
|
||||
<b>pcre2_match_8()</b>.
|
||||
<b>pcre2_match_8()</b> or <b>pcre2_match_32</b>.
|
||||
</P>
|
||||
<P>
|
||||
In the function summaries above, and in the rest of this document and other
|
||||
PCRE2 documents, functions and data types are described using their generic
|
||||
names, without the 8, 16, or 32 suffix.
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
||||
<br><a name="SEC13" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
||||
<P>
|
||||
PCRE2 has its own native API, which is described in this document. There are
|
||||
also some wrapper functions for the 8-bit library that correspond to the
|
||||
|
@ -397,7 +409,7 @@ against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
|
|||
<b>pcre2.h</b>.
|
||||
</P>
|
||||
<P>
|
||||
The functions <b>pcre2_compile()</b>, and <b>pcre2_match()</b> are used for
|
||||
The functions <b>pcre2_compile()</b> and <b>pcre2_match()</b> are used for
|
||||
compiling and matching regular expressions in a Perl-compatible manner. A
|
||||
sample program that demonstrates the simplest way of using them is provided in
|
||||
the file called <i>pcre2demo.c</i> in the PCRE2 source distribution. A listing
|
||||
|
@ -408,10 +420,17 @@ documentation, and the
|
|||
documentation describes how to compile and run it.
|
||||
</P>
|
||||
<P>
|
||||
Just-in-time compiler support is an optional feature of PCRE2 that can be built
|
||||
in appropriate hardware environments. It greatly speeds up the matching
|
||||
The compiling and matching functions recognize various options that are passed
|
||||
as bits in an options argument. There are also some more complicated parameters
|
||||
such as custom memory management functions and resource limits that are passed
|
||||
in "contexts" (which are just memory blocks, described below). Simple
|
||||
applications do not need to make use of contexts.
|
||||
</P>
|
||||
<P>
|
||||
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
|
||||
built in appropriate hardware environments. It greatly speeds up the matching
|
||||
performance of many patterns. Programs can request that it be used if
|
||||
available, by calling <b>pcre2_jit_compile()</b> after a pattern has been
|
||||
available by calling <b>pcre2_jit_compile()</b> after a pattern has been
|
||||
successfully compiled by <b>pcre2_compile()</b>. This does nothing if JIT
|
||||
support is not available.
|
||||
</P>
|
||||
|
@ -423,8 +442,8 @@ More complicated programs might need to make use of the specialist functions
|
|||
<P>
|
||||
JIT matching is automatically used by <b>pcre2_match()</b> if it is available,
|
||||
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
||||
matching, which gives improved performance. The JIT-specific functions are
|
||||
discussed in the
|
||||
matching, which gives improved performance at the expense of less sanity
|
||||
checking. The JIT-specific functions are discussed in the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
|
@ -433,7 +452,7 @@ A second matching function, <b>pcre2_dfa_match()</b>, which is not
|
|||
Perl-compatible, is also provided. This uses a different algorithm for the
|
||||
matching. The alternative algorithm finds all possible matches (at a given
|
||||
point in the subject), and scans the subject just once (unless there are
|
||||
lookbehind assertions). However, this algorithm does not return captured
|
||||
lookaround assertions). However, this algorithm does not return captured
|
||||
substrings. A description of the two matching algorithms and their advantages
|
||||
and disadvantages is given in the
|
||||
<a href="pcre2matching.html"><b>pcre2matching</b></a>
|
||||
|
@ -476,7 +495,7 @@ Functions with names ending with <b>_free()</b> are used for freeing memory
|
|||
blocks of various sorts. In all cases, if one of these functions is called with
|
||||
a NULL argument, it does nothing.
|
||||
</P>
|
||||
<br><a name="SEC13" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
||||
<br><a name="SEC14" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
||||
<P>
|
||||
The PCRE2 API uses string lengths and offsets into strings of code units in
|
||||
several places. These values are always of type PCRE2_SIZE, which is an
|
||||
|
@ -486,7 +505,7 @@ as a special indicator for zero-terminated strings and unset offsets.
|
|||
Therefore, the longest string that can be handled is one less than this
|
||||
maximum.
|
||||
<a name="newlines"></a></P>
|
||||
<br><a name="SEC14" href="#TOC1">NEWLINES</a><br>
|
||||
<br><a name="SEC15" href="#TOC1">NEWLINES</a><br>
|
||||
<P>
|
||||
PCRE2 supports five different conventions for indicating line breaks in
|
||||
strings: a single CR (carriage return) character, a single LF (linefeed)
|
||||
|
@ -521,7 +540,7 @@ The choice of newline convention does not affect the interpretation of
|
|||
the \n or \r escape sequences, nor does it affect what \R matches; this has
|
||||
its own separate convention.
|
||||
</P>
|
||||
<br><a name="SEC15" href="#TOC1">MULTITHREADING</a><br>
|
||||
<br><a name="SEC16" href="#TOC1">MULTITHREADING</a><br>
|
||||
<P>
|
||||
In a multithreaded application it is important to keep thread-specific data
|
||||
separate from data that can be shared between threads. The PCRE2 library code
|
||||
|
@ -543,8 +562,8 @@ and does not change when the pattern is matched. Therefore, it is thread-safe,
|
|||
that is, the same compiled pattern can be used by more than one thread
|
||||
simultaneously. For example, an application can compile all its patterns at the
|
||||
start, before forking off multiple threads that use them. However, if the
|
||||
just-in-time optimization feature is being used, it needs separate memory stack
|
||||
areas for each thread. See the
|
||||
just-in-time (JIT) optimization feature is being used, it needs separate memory
|
||||
stack areas for each thread. See the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
documentation for more details.
|
||||
</P>
|
||||
|
@ -596,12 +615,12 @@ thread-specific copy.
|
|||
Match blocks
|
||||
</b><br>
|
||||
<P>
|
||||
The matching functions need a block of memory for working space and for storing
|
||||
the results of a match. This includes details of what was matched, as well as
|
||||
additional information such as the name of a (*MARK) setting. Each thread must
|
||||
provide its own copy of this memory.
|
||||
The matching functions need a block of memory for storing the results of a
|
||||
match. This includes details of what was matched, as well as additional
|
||||
information such as the name of a (*MARK) setting. Each thread must provide its
|
||||
own copy of this memory.
|
||||
</P>
|
||||
<br><a name="SEC16" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
||||
<br><a name="SEC17" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
||||
<P>
|
||||
Some PCRE2 functions have a lot of parameters, many of which are used only by
|
||||
specialist applications, for example, those that use custom memory management
|
||||
|
@ -663,15 +682,15 @@ The memory used for a general context should be freed by calling:
|
|||
The compile context
|
||||
</b><br>
|
||||
<P>
|
||||
A compile context is required if you want to change the default values of any
|
||||
of the following compile-time parameters:
|
||||
A compile context is required if you want to provide an external function for
|
||||
stack checking during compilation or to change the default values of any of the
|
||||
following compile-time parameters:
|
||||
<pre>
|
||||
What \R matches (Unicode newlines or CR, LF, CRLF only)
|
||||
PCRE2's character tables
|
||||
The newline character sequence
|
||||
The compile time nested parentheses limit
|
||||
The maximum length of the pattern string
|
||||
An external function for stack checking
|
||||
</pre>
|
||||
A compile context is also required if you are using custom memory management.
|
||||
If none of these apply, just pass NULL as the context argument of
|
||||
|
@ -713,11 +732,11 @@ in the current locale.
|
|||
<b> PCRE2_SIZE <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
This sets a maximum length, in code units, for the pattern string that is to be
|
||||
compiled. If the pattern is longer, an error is generated. This facility is
|
||||
provided so that applications that accept patterns from external sources can
|
||||
limit their size. The default is the largest number that a PCRE2_SIZE variable
|
||||
can hold, which is effectively unlimited.
|
||||
This sets a maximum length, in code units, for any pattern string that is
|
||||
compiled with this context. If the pattern is longer, an error is generated.
|
||||
This facility is provided so that applications that accept patterns from
|
||||
external sources can limit their size. The default is the largest number that a
|
||||
PCRE2_SIZE variable can hold, which is effectively unlimited.
|
||||
<b>int pcre2_set_newline(pcre2_compile_context *<i>ccontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
<br>
|
||||
|
@ -729,8 +748,14 @@ sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
|
|||
PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
||||
</P>
|
||||
<P>
|
||||
When a pattern is compiled with the PCRE2_EXTENDED option, the value of this
|
||||
parameter affects the recognition of white space and the end of internal
|
||||
A pattern can override the value set in the compile context by starting with a
|
||||
sequence such as (*CRLF). See the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
page for details.
|
||||
</P>
|
||||
<P>
|
||||
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
|
||||
convention affects the recognition of white space and the end of internal
|
||||
comments starting with #. The value is saved with the compiled pattern for
|
||||
subsequent use by the JIT compiler and by the two interpreted matching
|
||||
functions, <i>pcre2_match()</i> and <i>pcre2_dfa_match()</i>.
|
||||
|
@ -764,15 +789,14 @@ zero if all is well, or non-zero to force an error.
|
|||
The match context
|
||||
</b><br>
|
||||
<P>
|
||||
A match context is required if you want to change the default values of any
|
||||
of the following match-time parameters:
|
||||
A match context is required if you want to:
|
||||
<pre>
|
||||
A callout function
|
||||
The offset limit for matching an unanchored pattern
|
||||
The limit for calling <b>match()</b> (see below)
|
||||
The limit for calling <b>match()</b> recursively
|
||||
Set up a callout function
|
||||
Set an offset limit for matching an unanchored pattern
|
||||
Change the backtracking match limit
|
||||
Change the backtracking depth limit
|
||||
Set custom memory management specifically for the match
|
||||
</pre>
|
||||
A match context is also required if you are using custom memory management.
|
||||
If none of these apply, just pass NULL as the context argument of
|
||||
<b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>.
|
||||
</P>
|
||||
|
@ -797,7 +821,7 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
|
|||
<b> void *<i>callout_data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
This sets up a "callout" function, which PCRE2 will call at specified points
|
||||
This sets up a "callout" function for PCRE2 to call at specified points
|
||||
during a matching operation. Details are given in the
|
||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||
documentation.
|
||||
|
@ -816,8 +840,8 @@ A match can never be found if the <i>startoffset</i> argument of
|
|||
limit.
|
||||
</P>
|
||||
<P>
|
||||
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when calling
|
||||
<b>pcre2_compile()</b> so that when JIT is in use, different code can be
|
||||
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
|
||||
calling <b>pcre2_compile()</b> so that when JIT is in use, different code can be
|
||||
compiled. If a match is started with a non-default match limit when
|
||||
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
||||
</P>
|
||||
|
@ -837,10 +861,10 @@ which have a very large number of possibilities in their search trees. The
|
|||
classic example is a pattern that uses nested unlimited repeats.
|
||||
</P>
|
||||
<P>
|
||||
Internally, <b>pcre2_match()</b> uses a function called <b>match()</b>, which it
|
||||
calls repeatedly (sometimes recursively). The limit set by <i>match_limit</i> is
|
||||
imposed on the number of times this function is called during a match, which
|
||||
has the effect of limiting the amount of backtracking that can take place. For
|
||||
There is an internal counter in <b>pcre2_match()</b> that is incremented each
|
||||
time round its main matching loop. If this value reaches the match limit,
|
||||
<b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
||||
the effect of limiting the amount of backtracking that can take place. For
|
||||
patterns that are not anchored, the count restarts from zero for each position
|
||||
in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>,
|
||||
which ignores it.
|
||||
|
@ -855,8 +879,7 @@ matching can continue.
|
|||
</P>
|
||||
<P>
|
||||
The default value for the limit can be set when PCRE2 is built; the default
|
||||
default is 10 million, which handles all but the most extreme cases. If the
|
||||
limit is exceeded, <b>pcre2_match()</b> returns PCRE2_ERROR_MATCHLIMIT. A value
|
||||
default is 10 million, which handles all but the most extreme cases. A value
|
||||
for the match limit may also be supplied by an item at the start of a pattern
|
||||
of the form
|
||||
<pre>
|
||||
|
@ -865,64 +888,38 @@ of the form
|
|||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||
less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
|
||||
limit is set, less than the default.
|
||||
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
The <i>recursion_limit</i> parameter is similar to <i>match_limit</i>, but
|
||||
instead of limiting the total number of times that <b>match()</b> is called, it
|
||||
limits the depth of recursion. The recursion depth is a smaller number than the
|
||||
total number of calls, because not all calls to <b>match()</b> are recursive.
|
||||
This limit is of use only if it is set smaller than <i>match_limit</i>.
|
||||
This parameter limits the depth of nested backtracking in <b>pcre2_match()</b>.
|
||||
Each time a nested backtracking point is passed, a new memory "frame" is used
|
||||
to remember the state of matching at that point. Thus, this parameter
|
||||
indirectly limits the amount of memory that is used in a match.
|
||||
</P>
|
||||
<P>
|
||||
Limiting the recursion depth limits the amount of system stack that can be
|
||||
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
|
||||
stack, the amount of heap memory that can be used. This limit is not relevant,
|
||||
and is ignored, when matching is done using JIT compiled code. However, it is
|
||||
supported by <b>pcre2_dfa_match()</b>, which uses recursive function calls less
|
||||
frequently than <b>pcre2_match()</b>, but which can be caused to use a lot of
|
||||
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
|
||||
This limit is not relevant, and is ignored, when matching is done using JIT
|
||||
compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which uses
|
||||
it to limit the depth of internal recursive function calls that implement
|
||||
lookaround assertions and pattern recursions. This is, therefore, an indirect
|
||||
limit on the amount of system stack that is used. A recursive pattern such as
|
||||
/(.)(?1)/, when matched to a very long string using <b>pcre2_dfa_match()</b>,
|
||||
can use a great deal of stack.
|
||||
</P>
|
||||
<P>
|
||||
The default value for <i>recursion_limit</i> can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for <i>match_limit</i>. If the
|
||||
limit is exceeded, <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> return
|
||||
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
|
||||
supplied by an item at the start of a pattern of the form
|
||||
The default value for the depth limit can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for the match limit. If the
|
||||
limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> returns
|
||||
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
||||
item at the start of a pattern of the form
|
||||
<pre>
|
||||
(*LIMIT_RECURSION=ddd)
|
||||
(*LIMIT_DEPTH=ddd)
|
||||
</pre>
|
||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||
less than the limit set by the caller of <b>pcre2_match()</b> or
|
||||
<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default.
|
||||
<b>int pcre2_set_recursion_memory_management(</b>
|
||||
<b> pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
|
||||
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
This function sets up two additional custom memory management functions for use
|
||||
by <b>pcre2_match()</b> when PCRE2 is compiled to use the heap for remembering
|
||||
backtracking data, instead of recursive function calls that use the system
|
||||
stack. There is a discussion about PCRE2's stack usage in the
|
||||
<a href="pcre2stack.html"><b>pcre2stack</b></a>
|
||||
documentation. See the
|
||||
<a href="pcre2build.html"><b>pcre2build</b></a>
|
||||
documentation for details of how to build PCRE2.
|
||||
</P>
|
||||
<P>
|
||||
Using the heap for recursion is a non-standard way of building PCRE2, for use
|
||||
in environments that have limited stacks. Because of the greater use of memory
|
||||
management, <b>pcre2_match()</b> runs more slowly. Functions that are different
|
||||
to the general custom memory functions are provided so that special-purpose
|
||||
external code can be used for this case, because the memory blocks are all the
|
||||
same size. The blocks are retained by <b>pcre2_match()</b> until it is about to
|
||||
exit so that they can be re-used when possible during the match. In the absence
|
||||
of these functions, the normal custom memory management functions are used, if
|
||||
supplied, otherwise the system functions.
|
||||
</P>
|
||||
<br><a name="SEC17" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
||||
<br><a name="SEC18" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
|
@ -954,6 +951,13 @@ sequences the \R escape sequence matches by default. A value of
|
|||
PCRE2_BSR_UNICODE means that \R matches any Unicode line ending sequence; a
|
||||
value of PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF. The
|
||||
default can be overridden when a pattern is compiled.
|
||||
<pre>
|
||||
PCRE2_CONFIG_DEPTHLIMIT
|
||||
</pre>
|
||||
The output is a uint32_t integer that gives the default limit for the depth of
|
||||
nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions
|
||||
and lookarounds in <b>pcre2_dfa_match()</b>. Further details are given with
|
||||
<b>pcre2_set_depth_limit()</b> above.
|
||||
<pre>
|
||||
PCRE2_CONFIG_JIT
|
||||
</pre>
|
||||
|
@ -989,9 +993,9 @@ be compiled by those two libraries, but at the expense of slower matching.
|
|||
<pre>
|
||||
PCRE2_CONFIG_MATCHLIMIT
|
||||
</pre>
|
||||
The output is a uint32_t integer that gives the default limit for the number of
|
||||
internal matching function calls in a <b>pcre2_match()</b> execution. Further
|
||||
details are given with <b>pcre2_match()</b> below.
|
||||
The output is a uint32_t integer that gives the default match limit for
|
||||
<b>pcre2_match()</b>. Further details are given with
|
||||
<b>pcre2_set_match_limit()</b> above.
|
||||
<pre>
|
||||
PCRE2_CONFIG_NEWLINE
|
||||
</pre>
|
||||
|
@ -1015,20 +1019,11 @@ amount of system stack used when a pattern is compiled. It is specified when
|
|||
PCRE2 is built; the default is 250. This limit does not take into account the
|
||||
stack that may already be used by the calling application. For finer control
|
||||
over compilation stack usage, see <b>pcre2_set_compile_recursion_guard()</b>.
|
||||
<pre>
|
||||
PCRE2_CONFIG_RECURSIONLIMIT
|
||||
</pre>
|
||||
The output is a uint32_t integer that gives the default limit for the depth of
|
||||
recursion when calling the internal matching function in a <b>pcre2_match()</b>
|
||||
execution. Further details are given with <b>pcre2_match()</b> below.
|
||||
<pre>
|
||||
PCRE2_CONFIG_STACKRECURSE
|
||||
</pre>
|
||||
The output is a uint32_t integer that is set to one if internal recursion when
|
||||
running <b>pcre2_match()</b> is implemented by recursive function calls that use
|
||||
the system stack to remember their state. This is the usual way that PCRE2 is
|
||||
compiled. The output is zero if PCRE2 was compiled to use blocks of data on the
|
||||
heap instead of recursive function calls.
|
||||
This parameter is obsolete and should not be used in new code. The output is a
|
||||
uint32_t integer that is always set to zero.
|
||||
<pre>
|
||||
PCRE2_CONFIG_UNICODE_VERSION
|
||||
</pre>
|
||||
|
@ -1047,14 +1042,14 @@ available; otherwise it is set to zero. Unicode support implies UTF support.
|
|||
<pre>
|
||||
PCRE2_CONFIG_VERSION
|
||||
</pre>
|
||||
The <i>where</i> argument should point to a buffer that is at least 12 code
|
||||
The <i>where</i> argument should point to a buffer that is at least 24 code
|
||||
units long. (The exact length required can be found by calling
|
||||
<b>pcre2_config()</b> with <b>where</b> set to NULL.) The buffer is filled with
|
||||
the PCRE2 version string, zero-terminated. The number of code units used is
|
||||
returned. This is the length of the string plus one unit for the terminating
|
||||
zero.
|
||||
<a name="compiling"></a></P>
|
||||
<br><a name="SEC18" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||
<br><a name="SEC19" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||
<P>
|
||||
<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
|
||||
<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
|
||||
|
@ -1240,13 +1235,14 @@ option is set, normal backslash processing is applied to verb names and only an
|
|||
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
||||
included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
|
||||
option is set, unescaped whitespace in verb names is skipped and #-comments are
|
||||
recognized, exactly as in the rest of the pattern.
|
||||
recognized in this mode, exactly as in the rest of the pattern.
|
||||
<pre>
|
||||
PCRE2_AUTO_CALLOUT
|
||||
</pre>
|
||||
If this bit is set, <b>pcre2_compile()</b> automatically inserts callout items,
|
||||
all with number 255, before each pattern item, except immediately before or
|
||||
after a callout in the pattern. For discussion of the callout facility, see the
|
||||
after an explicit callout in the pattern. For discussion of the callout
|
||||
facility, see the
|
||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||
documentation.
|
||||
<pre>
|
||||
|
@ -1472,9 +1468,8 @@ and
|
|||
<a href="pcre2unicode.html#utf32strings">UTF-32 strings</a>
|
||||
in the
|
||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||
document.
|
||||
If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a negative
|
||||
error code.
|
||||
document. If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a
|
||||
negative error code.
|
||||
</P>
|
||||
<P>
|
||||
If you know that your pattern is valid, and you want to skip this check for
|
||||
|
@ -1495,7 +1490,7 @@ in the
|
|||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
page. If you set PCRE2_UCP, matching one of the items it affects takes much
|
||||
longer. The option is available only if PCRE2 has been compiled with Unicode
|
||||
support.
|
||||
support (which is the default).
|
||||
<pre>
|
||||
PCRE2_UNGREEDY
|
||||
</pre>
|
||||
|
@ -1525,9 +1520,9 @@ the behaviour of PCRE2 are given in the
|
|||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||
page.
|
||||
</P>
|
||||
<br><a name="SEC19" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
||||
<br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
||||
<P>
|
||||
There are over 80 positive error codes that <b>pcre2_compile()</b> may return
|
||||
There are nearly 100 positive error codes that <b>pcre2_compile()</b> may return
|
||||
(via <i>errorcode</i>) if it finds an error in the pattern. There are also some
|
||||
negative error codes that are used for invalid UTF strings. These are the same
|
||||
as given by <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, and are described
|
||||
|
@ -1538,7 +1533,7 @@ error message"
|
|||
<a href="#geterrormessage">below)</a>
|
||||
can be called to obtain a textual error message from any error code.
|
||||
<a name="jitcompiling"></a></P>
|
||||
<br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
||||
<br><a name="SEC21" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
||||
<P>
|
||||
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
|
||||
<br>
|
||||
|
@ -1574,18 +1569,18 @@ documentation.
|
|||
JIT compilation is a heavyweight optimization. It can take some time for
|
||||
patterns to be analyzed, and for one-off matches and simple patterns the
|
||||
benefit of faster execution might be offset by a much slower compilation time.
|
||||
Most, but not all patterns can be optimized by the JIT compiler.
|
||||
Most (but not all) patterns can be optimized by the JIT compiler.
|
||||
<a name="localesupport"></a></P>
|
||||
<br><a name="SEC21" href="#TOC1">LOCALE SUPPORT</a><br>
|
||||
<br><a name="SEC22" href="#TOC1">LOCALE SUPPORT</a><br>
|
||||
<P>
|
||||
PCRE2 handles caseless matching, and determines whether characters are letters,
|
||||
digits, or whatever, by reference to a set of tables, indexed by character code
|
||||
point. This applies only to characters whose code points are less than 256. By
|
||||
default, higher-valued code points never match escapes such as \w or \d.
|
||||
However, if PCRE2 is built with UTF support, all characters can be tested with
|
||||
\p and \P, or, alternatively, the PCRE2_UCP option can be set when a pattern
|
||||
is compiled; this causes \w and friends to use Unicode property support
|
||||
instead of the built-in tables.
|
||||
However, if PCRE2 is built with Unicode support, all characters can be tested
|
||||
with \p and \P, or, alternatively, the PCRE2_UCP option can be set when a
|
||||
pattern is compiled; this causes \w and friends to use Unicode property
|
||||
support instead of the built-in tables.
|
||||
</P>
|
||||
<P>
|
||||
The use of locales with Unicode is discouraged. If you are handling characters
|
||||
|
@ -1629,10 +1624,10 @@ available for as long as it is needed.
|
|||
The pointer that is passed (via the compile context) to <b>pcre2_compile()</b>
|
||||
is saved with the compiled pattern, and the same tables are used by
|
||||
<b>pcre2_match()</b> and <b>pcre_dfa_match()</b>. Thus, for any single pattern,
|
||||
compilation, and matching all happen in the same locale, but different patterns
|
||||
compilation and matching both happen in the same locale, but different patterns
|
||||
can be processed in different locales.
|
||||
<a name="infoaboutpattern"></a></P>
|
||||
<br><a name="SEC22" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
||||
<br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
||||
<P>
|
||||
<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
|
@ -1645,7 +1640,7 @@ pattern. The second argument specifies which piece of information is required,
|
|||
and the third argument is a pointer to a variable to receive the data. If the
|
||||
third argument is NULL, the first argument is ignored, and the function returns
|
||||
the size in bytes of the variable that is required for the information
|
||||
requested. Otherwise, The yield of the function is zero for success, or one of
|
||||
requested. Otherwise, the yield of the function is zero for success, or one of
|
||||
the following negative numbers:
|
||||
<pre>
|
||||
PCRE2_ERROR_NULL the argument <i>code</i> was NULL
|
||||
|
@ -1698,8 +1693,8 @@ following are true:
|
|||
.* is not in an atomic group
|
||||
.* is not in a capturing group that is the subject of a back reference
|
||||
PCRE2_DOTALL is in force for .*
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern.
|
||||
PCRE2_NO_DOTSTAR_ANCHOR is not set.
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
||||
</pre>
|
||||
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
|
||||
options returned for PCRE2_INFO_ALLOPTIONS.
|
||||
|
@ -1726,6 +1721,13 @@ matches only CR, LF, or CRLF.
|
|||
Return the highest capturing subpattern number in the pattern. In patterns
|
||||
where (?| is not used, this is also the total number of capturing subpatterns.
|
||||
The third argument should point to an <b>uint32_t</b> variable.
|
||||
<pre>
|
||||
PCRE2_INFO_DEPTHLIMIT
|
||||
</pre>
|
||||
If the pattern set a backtracking depth limit by including an item of the form
|
||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
|
||||
<pre>
|
||||
PCRE2_INFO_FIRSTBITMAP
|
||||
</pre>
|
||||
|
@ -1757,6 +1759,14 @@ argument should point to an <b>uint32_t</b> variable. In the 8-bit library, the
|
|||
value is always less than 256. In the 16-bit library the value can be up to
|
||||
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
|
||||
and up to 0xffffffff when not using UTF-32 mode.
|
||||
<pre>
|
||||
PCRE2_INFO_FRAMESIZE
|
||||
</pre>
|
||||
Return the size (in bytes) of the data frames that are used to remember
|
||||
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
|
||||
without the use of JIT. The third argument should point to an <b>size_t</b>
|
||||
variable. The frame size depends on the number of capturing parentheses in the
|
||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||
<pre>
|
||||
PCRE2_INFO_HASBACKSLASHC
|
||||
</pre>
|
||||
|
@ -1767,7 +1777,8 @@ argument should point to an <b>uint32_t</b> variable.
|
|||
</pre>
|
||||
Return 1 if the pattern contains any explicit matches for CR or LF characters,
|
||||
otherwise 0. The third argument should point to an <b>uint32_t</b> variable. An
|
||||
explicit match is either a literal CR or LF character, or \r or \n.
|
||||
explicit match is either a literal CR or LF character, or \r or \n or one of
|
||||
the equivalent hexadecimal or octal escape sequences.
|
||||
<pre>
|
||||
PCRE2_INFO_JCHANGED
|
||||
</pre>
|
||||
|
@ -1904,7 +1915,7 @@ different for each compiled pattern.
|
|||
<pre>
|
||||
PCRE2_INFO_NEWLINE
|
||||
</pre>
|
||||
The output is a <b>uint32_t</b> with one of the following values:
|
||||
The output is one of the following <b>uint32_t</b> values:
|
||||
<pre>
|
||||
PCRE2_NEWLINE_CR Carriage return (CR)
|
||||
PCRE2_NEWLINE_LF Linefeed (LF)
|
||||
|
@ -1912,15 +1923,8 @@ The output is a <b>uint32_t</b> with one of the following values:
|
|||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||
</pre>
|
||||
This specifies the default character sequence that will be recognized as
|
||||
meaning "newline" while matching.
|
||||
<pre>
|
||||
PCRE2_INFO_RECURSIONLIMIT
|
||||
</pre>
|
||||
If the pattern set a recursion limit by including an item of the form
|
||||
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
|
||||
argument should point to an unsigned 32-bit integer. If no such value has been
|
||||
set, the call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
|
||||
This identifies the character sequence that will be recognized as meaning
|
||||
"newline" while matching.
|
||||
<pre>
|
||||
PCRE2_INFO_SIZE
|
||||
</pre>
|
||||
|
@ -1933,7 +1937,7 @@ value returned by this option, because there are cases where the code that
|
|||
calculates the size has to over-estimate. Processing a pattern with the JIT
|
||||
compiler does not alter the value returned by this option.
|
||||
<a name="infoaboutcallouts"></a></P>
|
||||
<br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br>
|
||||
<br><a name="SEC24" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_callout_enumerate(const pcre2_code *<i>code</i>,</b>
|
||||
<b> int (*<i>callback</i>)(pcre2_callout_enumerate_block *, void *),</b>
|
||||
|
@ -1952,7 +1956,7 @@ contents of the callout enumeration block are described in the
|
|||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||
documentation, which also gives further details about callouts.
|
||||
</P>
|
||||
<br><a name="SEC24" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
|
||||
<br><a name="SEC25" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
|
||||
<P>
|
||||
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||
later, subject to a number of restrictions. The functions whose names begin
|
||||
|
@ -1961,7 +1965,7 @@ the
|
|||
<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
|
||||
documentation.
|
||||
<a name="matchdatablock"></a></P>
|
||||
<br><a name="SEC25" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
||||
<br><a name="SEC26" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
||||
<P>
|
||||
<b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
|
||||
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||
|
@ -1986,9 +1990,9 @@ Before calling <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or
|
|||
the creation functions above. For <b>pcre2_match_data_create()</b>, the first
|
||||
argument is the number of pairs of offsets in the <i>ovector</i>. One pair of
|
||||
offsets is required to identify the string that matched the whole pattern, with
|
||||
another pair for each captured substring. For example, a value of 4 creates
|
||||
enough space to record the matched portion of the subject plus three captured
|
||||
substrings. A minimum of at least 1 pair is imposed by
|
||||
an additional pair for each captured substring. For example, a value of 4
|
||||
creates enough space to record the matched portion of the subject plus three
|
||||
captured substrings. A minimum of at least 1 pair is imposed by
|
||||
<b>pcre2_match_data_create()</b>, so it is always possible to return the overall
|
||||
matched string.
|
||||
</P>
|
||||
|
@ -2032,7 +2036,7 @@ match data block (for that match) have taken place.
|
|||
When a match data block itself is no longer needed, it should be freed by
|
||||
calling <b>pcre2_match_data_free()</b>.
|
||||
</P>
|
||||
<br><a name="SEC26" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
||||
<br><a name="SEC27" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
||||
<P>
|
||||
<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||
|
@ -2126,9 +2130,11 @@ character is CR followed by LF, advance the starting offset by two characters
|
|||
instead of one.
|
||||
</P>
|
||||
<P>
|
||||
If a non-zero starting offset is passed when the pattern is anchored, one
|
||||
If a non-zero starting offset is passed when the pattern is anchored, an single
|
||||
attempt to match at the given offset is made. This can only succeed if the
|
||||
pattern does not require the match to be at the start of the subject.
|
||||
pattern does not require the match to be at the start of the subject. In other
|
||||
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
|
||||
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \A.
|
||||
<a name="matchoptions"></a></P>
|
||||
<br><b>
|
||||
Option bits for <b>pcre2_match()</b>
|
||||
|
@ -2142,9 +2148,9 @@ described below.
|
|||
</P>
|
||||
<P>
|
||||
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
|
||||
compiler. If it is set, JIT matching is disabled and the normal interpretive
|
||||
code in <b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT (obviously), the
|
||||
remaining options are supported for JIT matching.
|
||||
compiler. If it is set, JIT matching is disabled and the interpretive code in
|
||||
<b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT (obviously), the remaining
|
||||
options are supported for JIT matching.
|
||||
<pre>
|
||||
PCRE2_ANCHORED
|
||||
</pre>
|
||||
|
@ -2229,13 +2235,13 @@ page.
|
|||
If you know that your subject is valid, and you want to skip these checks for
|
||||
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
|
||||
<b>pcre2_match()</b>. You might want to do this for the second and subsequent
|
||||
calls to <b>pcre2_match()</b> if you are making repeated calls to find all the
|
||||
matches in a single subject string.
|
||||
calls to <b>pcre2_match()</b> if you are making repeated calls to find other
|
||||
matches in the same subject string.
|
||||
</P>
|
||||
<P>
|
||||
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid string
|
||||
as a subject, or an invalid value of <i>startoffset</i>, is undefined. Your
|
||||
program may crash or loop indefinitely.
|
||||
WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
|
||||
string as a subject, or an invalid value of <i>startoffset</i>, is undefined.
|
||||
Your program may crash or loop indefinitely.
|
||||
<pre>
|
||||
PCRE2_PARTIAL_HARD
|
||||
PCRE2_PARTIAL_SOFT
|
||||
|
@ -2262,7 +2268,7 @@ examples, in the
|
|||
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC27" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
||||
<br><a name="SEC28" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
||||
<P>
|
||||
When PCRE2 is built, a default newline convention is set; this is usually the
|
||||
standard convention for the operating system. The default can be overridden in
|
||||
|
@ -2294,15 +2300,15 @@ reference, and so advances only by one character after the first failure.
|
|||
</P>
|
||||
<P>
|
||||
An explicit match for CR of LF is either a literal appearance of one of those
|
||||
characters in the pattern, or one of the \r or \n escape sequences. Implicit
|
||||
matches such as [^X] do not count, nor does \s, even though it includes CR and
|
||||
LF in the characters that it matches.
|
||||
characters in the pattern, or one of the \r or \n or equivalent octal or
|
||||
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
|
||||
does \s, even though it includes CR and LF in the characters that it matches.
|
||||
</P>
|
||||
<P>
|
||||
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
||||
valid newline sequence and explicit \r or \n escapes appear in the pattern.
|
||||
<a name="matchedstrings"></a></P>
|
||||
<br><a name="SEC28" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
||||
<br><a name="SEC29" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
||||
<P>
|
||||
<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
|
||||
<br>
|
||||
|
@ -2352,12 +2358,12 @@ identify the part of the subject that was partially matched. See the
|
|||
documentation for details of partial matching.
|
||||
</P>
|
||||
<P>
|
||||
After a successful match, the first pair of offsets identifies the portion of
|
||||
the subject string that was matched by the entire pattern. The next pair is
|
||||
used for the first capturing subpattern, and so on. The value returned by
|
||||
After a fully successful match, the first pair of offsets identifies the
|
||||
portion of the subject string that was matched by the entire pattern. The next
|
||||
pair is used for the first captured substring, and so on. The value returned by
|
||||
<b>pcre2_match()</b> is one more than the highest numbered pair that has been
|
||||
set. For example, if two substrings have been captured, the returned value is
|
||||
3. If there are no capturing subpatterns, the return value from a successful
|
||||
3. If there are no captured substrings, the return value from a successful
|
||||
match is 1, indicating that just the first pair of offsets has been set.
|
||||
</P>
|
||||
<P>
|
||||
|
@ -2375,11 +2381,7 @@ returned.
|
|||
If the ovector is too small to hold all the captured substring offsets, as much
|
||||
as possible is filled in, and the function returns a value of zero. If captured
|
||||
substrings are not of interest, <b>pcre2_match()</b> may be called with a match
|
||||
data block whose ovector is of minimum length (that is, one pair). However, if
|
||||
the pattern contains back references and the <i>ovector</i> is not big enough to
|
||||
remember the related substrings, PCRE2 has to get additional memory for use
|
||||
during matching. Thus it is usually advisable to set up a match data block
|
||||
containing an ovector of reasonable size.
|
||||
data block whose ovector is of minimum length (that is, one pair).
|
||||
</P>
|
||||
<P>
|
||||
It is possible for capturing subpattern number <i>n+1</i> to match some part of
|
||||
|
@ -2405,7 +2407,7 @@ parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by
|
|||
<b>pcre2_match()</b>. The other elements retain whatever values they previously
|
||||
had.
|
||||
<a name="matchotherdata"></a></P>
|
||||
<br><a name="SEC29" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
||||
<br><a name="SEC30" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
||||
<P>
|
||||
<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
|
||||
<br>
|
||||
|
@ -2455,7 +2457,7 @@ the code unit offset of the invalid UTF character. Details are given in the
|
|||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||
page.
|
||||
<a name="errorlist"></a></P>
|
||||
<br><a name="SEC30" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
||||
<br><a name="SEC31" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
||||
<P>
|
||||
If <b>pcre2_match()</b> fails, it returns a negative number. This can be
|
||||
converted to a text string by calling the <b>pcre2_get_error_message()</b>
|
||||
|
@ -2487,8 +2489,9 @@ returned when the magic number is not present.
|
|||
<pre>
|
||||
PCRE2_ERROR_BADMODE
|
||||
</pre>
|
||||
This error is given when a pattern that was compiled by the 8-bit library is
|
||||
passed to a 16-bit or 32-bit library function, or vice versa.
|
||||
This error is given when a compiled pattern is passed to a function in a
|
||||
library of a different code unit width, for example, a pattern compiled by
|
||||
the 8-bit library is passed to a 16-bit or 32-bit library function.
|
||||
<pre>
|
||||
PCRE2_ERROR_BADOFFSET
|
||||
</pre>
|
||||
|
@ -2512,20 +2515,15 @@ use by callout functions that want to cause <b>pcre2_match()</b> or
|
|||
<b>pcre2_callout_enumerate()</b> to return a distinctive error code. See the
|
||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||
documentation for details.
|
||||
<pre>
|
||||
PCRE2_ERROR_DEPTHLIMIT
|
||||
</pre>
|
||||
The nested backtracking depth limit was reached.
|
||||
<pre>
|
||||
PCRE2_ERROR_INTERNAL
|
||||
</pre>
|
||||
An unexpected internal error has occurred. This error could be caused by a bug
|
||||
in PCRE2 or by overwriting of the compiled pattern.
|
||||
<pre>
|
||||
PCRE2_ERROR_JIT_BADOPTION
|
||||
</pre>
|
||||
This error is returned when a pattern that was successfully studied using JIT
|
||||
is being matched, but the matching mode (partial or complete match) does not
|
||||
correspond to any JIT compilation mode. When the JIT fast path function is
|
||||
used, this error may be also given for invalid options. See the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
documentation for more details.
|
||||
<pre>
|
||||
PCRE2_ERROR_JIT_STACKLIMIT
|
||||
</pre>
|
||||
|
@ -2537,15 +2535,13 @@ documentation for more details.
|
|||
<pre>
|
||||
PCRE2_ERROR_MATCHLIMIT
|
||||
</pre>
|
||||
The backtracking limit was reached.
|
||||
The backtracking match limit was reached.
|
||||
<pre>
|
||||
PCRE2_ERROR_NOMEMORY
|
||||
</pre>
|
||||
If a pattern contains back references, but the ovector is not big enough to
|
||||
remember the referenced substrings, PCRE2 gets a block of memory at the start
|
||||
of matching to use for this purpose. There are some other special cases where
|
||||
extra memory is needed during matching. This error is given when memory cannot
|
||||
be obtained.
|
||||
If a pattern contains many nested backtracking points, heap memory is used to
|
||||
remember them. This error is given when the memory allocation function (default
|
||||
or custom) fails.
|
||||
<pre>
|
||||
PCRE2_ERROR_NULL
|
||||
</pre>
|
||||
|
@ -2561,12 +2557,8 @@ in the subject string. Some simple patterns that might do this are detected and
|
|||
faulted at compile time, but more complicated cases, in particular mutual
|
||||
recursions between two different subpatterns, cannot be detected until matching
|
||||
is attempted.
|
||||
<pre>
|
||||
PCRE2_ERROR_RECURSIONLIMIT
|
||||
</pre>
|
||||
The internal recursion limit was reached.
|
||||
<a name="geterrormessage"></a></P>
|
||||
<br><a name="SEC31" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
|
||||
<br><a name="SEC32" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
|
||||
<P>
|
||||
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
|
||||
<b> PCRE2_SIZE <i>bufflen</i>);</b>
|
||||
|
@ -2587,7 +2579,7 @@ returned. If the buffer is too small, the message is truncated (but still with
|
|||
a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned.
|
||||
None of the messages are very long; a buffer size of 120 code units is ample.
|
||||
<a name="extractbynumber"></a></P>
|
||||
<br><a name="SEC32" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
||||
<br><a name="SEC33" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
|
||||
<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
|
||||
|
@ -2684,7 +2676,7 @@ The substring did not participate in the match. For example, if the pattern is
|
|||
(abc)|(def) and the subject is "def", and the ovector contains at least two
|
||||
capturing slots, substring number 1 is unset.
|
||||
</P>
|
||||
<br><a name="SEC33" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
||||
<br><a name="SEC34" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
|
||||
<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
|
||||
|
@ -2723,7 +2715,7 @@ can be distinguished from a genuine zero-length substring by inspecting the
|
|||
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
||||
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
|
||||
<a name="extractbyname"></a></P>
|
||||
<br><a name="SEC34" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
||||
<br><a name="SEC35" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
|
||||
<b> PCRE2_SPTR <i>name</i>);</b>
|
||||
|
@ -2755,8 +2747,8 @@ calling <b>pcre2_substring_number_from_name()</b>. The first argument is the
|
|||
compiled pattern, and the second is the name. The yield of the function is the
|
||||
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
|
||||
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
|
||||
that name. Given the number, you can extract the substring directly, or use one
|
||||
of the functions described above.
|
||||
that name. Given the number, you can extract the substring directly from the
|
||||
ovector, or use one of the "bynumber" functions described above.
|
||||
</P>
|
||||
<P>
|
||||
For convenience, there are also "byname" functions that correspond to the
|
||||
|
@ -2783,7 +2775,7 @@ names are not included in the compiled code. The matching process uses only
|
|||
numbers. For this reason, the use of different names for subpatterns of the
|
||||
same number causes an error at compile time.
|
||||
</P>
|
||||
<br><a name="SEC35" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
||||
<br><a name="SEC36" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||
|
@ -2990,7 +2982,7 @@ obtained by calling the <b>pcre2_get_error_message()</b> function (see
|
|||
"Obtaining a textual error message"
|
||||
<a href="#geterrormessage">above).</a>
|
||||
</P>
|
||||
<br><a name="SEC36" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
||||
<br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
|
||||
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
|
||||
|
@ -3035,7 +3027,7 @@ in the section entitled <i>Information about a pattern</i>. Given all the
|
|||
relevant entries for the name, you can extract each of their numbers, and hence
|
||||
the captured data.
|
||||
</P>
|
||||
<br><a name="SEC37" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
||||
<br><a name="SEC38" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
||||
<P>
|
||||
The traditional matching function uses a similar algorithm to Perl, which stops
|
||||
when it finds the first match at a given point in the subject. If you want to
|
||||
|
@ -3053,7 +3045,7 @@ substring. Then return 1, which forces <b>pcre2_match()</b> to backtrack and try
|
|||
other alternatives. Ultimately, when it runs out of matches,
|
||||
<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
|
||||
<a name="dfamatch"></a></P>
|
||||
<br><a name="SEC38" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
||||
<br><a name="SEC39" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
||||
<P>
|
||||
<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||
|
@ -3064,11 +3056,12 @@ other alternatives. Ultimately, when it runs out of matches,
|
|||
<P>
|
||||
The function <b>pcre2_dfa_match()</b> is called to match a subject string
|
||||
against a compiled pattern, using a matching algorithm that scans the subject
|
||||
string just once, and does not backtrack. This has different characteristics to
|
||||
the normal algorithm, and is not compatible with Perl. Some of the features of
|
||||
PCRE2 patterns are not supported. Nevertheless, there are times when this kind
|
||||
of matching can be useful. For a discussion of the two matching algorithms, and
|
||||
a list of features that <b>pcre2_dfa_match()</b> does not support, see the
|
||||
string just once (not counting lookaround assertions), and does not backtrack.
|
||||
This has different characteristics to the normal algorithm, and is not
|
||||
compatible with Perl. Some of the features of PCRE2 patterns are not supported.
|
||||
Nevertheless, there are times when this kind of matching can be useful. For a
|
||||
discussion of the two matching algorithms, and a list of features that
|
||||
<b>pcre2_dfa_match()</b> does not support, see the
|
||||
<a href="pcre2matching.html"><b>pcre2matching</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
|
@ -3248,13 +3241,13 @@ some plausibility checks are made on the contents of the workspace, which
|
|||
should contain data about the previous partial match. If any of these checks
|
||||
fail, this error is given.
|
||||
</P>
|
||||
<br><a name="SEC39" href="#TOC1">SEE ALSO</a><br>
|
||||
<br><a name="SEC40" href="#TOC1">SEE ALSO</a><br>
|
||||
<P>
|
||||
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
|
||||
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
|
||||
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
|
||||
</P>
|
||||
<br><a name="SEC40" href="#TOC1">AUTHOR</a><br>
|
||||
<br><a name="SEC41" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
|
@ -3263,9 +3256,9 @@ University Computing Service
|
|||
Cambridge, England.
|
||||
<br>
|
||||
</P>
|
||||
<br><a name="SEC41" href="#TOC1">REVISION</a><br>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 21 March 2017
|
||||
Last updated: 27 March 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
1559
doc/pcre2.txt
1559
doc/pcre2.txt
File diff suppressed because it is too large
Load Diff
|
@ -34,7 +34,7 @@ A match context is needed only if you want to:
|
|||
Set a matching offset limit
|
||||
Change the backtracking match limit
|
||||
Change the backtracking depth limit
|
||||
Set custom memory management in the match context
|
||||
Set custom memory management specifically for the match
|
||||
.sp
|
||||
The \fIlength\fP and \fIstartoffset\fP values are code
|
||||
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
||||
|
|
380
doc/pcre2api.3
380
doc/pcre2api.3
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "21 March 2017" "PCRE2 10.30"
|
||||
.TH PCRE2API 3 "27 March 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -120,19 +120,14 @@ document for an overview of all the PCRE2 documentation.
|
|||
.B " int (*\fIcallout_function\fP)(pcre2_callout_block *, void *),"
|
||||
.B " void *\fIcallout_data\fP);"
|
||||
.sp
|
||||
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " uint32_t \fIvalue\fP);"
|
||||
.sp
|
||||
.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " PCRE2_SIZE \fIvalue\fP);"
|
||||
.sp
|
||||
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " uint32_t \fIvalue\fP);"
|
||||
.sp
|
||||
.B int pcre2_set_recursion_memory_management(
|
||||
.B " pcre2_match_context *\fImcontext\fP,"
|
||||
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
|
||||
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
|
||||
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " uint32_t \fIvalue\fP);"
|
||||
.fi
|
||||
.
|
||||
.
|
||||
|
@ -252,6 +247,25 @@ document for an overview of all the PCRE2 documentation.
|
|||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE2 NATIVE API OBSOLETE FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " uint32_t \fIvalue\fP);"
|
||||
.sp
|
||||
.B int pcre2_set_recursion_memory_management(
|
||||
.B " pcre2_match_context *\fImcontext\fP,"
|
||||
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
|
||||
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
|
||||
.fi
|
||||
.sp
|
||||
These functions became obsolete at release 10.30 and are retained only for
|
||||
backward compatibility. They should not be used in new code. The first is
|
||||
replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and
|
||||
no longer has any effect (it always returns zero).
|
||||
.
|
||||
.
|
||||
.SH "PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES"
|
||||
.rs
|
||||
.sp
|
||||
|
@ -302,7 +316,7 @@ When using multiple libraries in an application, you must take care when
|
|||
processing any particular pattern to use only functions from a single library.
|
||||
For example, if you want to run a match using a pattern that was compiled with
|
||||
\fBpcre2_compile_16()\fP, you must do so with \fBpcre2_match_16()\fP, not
|
||||
\fBpcre2_match_8()\fP.
|
||||
\fBpcre2_match_8()\fP or \fBpcre2_match_32\fP.
|
||||
.P
|
||||
In the function summaries above, and in the rest of this document and other
|
||||
PCRE2 documents, functions and data types are described using their generic
|
||||
|
@ -331,7 +345,7 @@ In a Windows environment, if you want to statically link an application program
|
|||
against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
|
||||
\fBpcre2.h\fP.
|
||||
.P
|
||||
The functions \fBpcre2_compile()\fP, and \fBpcre2_match()\fP are used for
|
||||
The functions \fBpcre2_compile()\fP and \fBpcre2_match()\fP are used for
|
||||
compiling and matching regular expressions in a Perl-compatible manner. A
|
||||
sample program that demonstrates the simplest way of using them is provided in
|
||||
the file called \fIpcre2demo.c\fP in the PCRE2 source distribution. A listing
|
||||
|
@ -345,10 +359,16 @@ documentation, and the
|
|||
.\"
|
||||
documentation describes how to compile and run it.
|
||||
.P
|
||||
Just-in-time compiler support is an optional feature of PCRE2 that can be built
|
||||
in appropriate hardware environments. It greatly speeds up the matching
|
||||
The compiling and matching functions recognize various options that are passed
|
||||
as bits in an options argument. There are also some more complicated parameters
|
||||
such as custom memory management functions and resource limits that are passed
|
||||
in "contexts" (which are just memory blocks, described below). Simple
|
||||
applications do not need to make use of contexts.
|
||||
.P
|
||||
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
|
||||
built in appropriate hardware environments. It greatly speeds up the matching
|
||||
performance of many patterns. Programs can request that it be used if
|
||||
available, by calling \fBpcre2_jit_compile()\fP after a pattern has been
|
||||
available by calling \fBpcre2_jit_compile()\fP after a pattern has been
|
||||
successfully compiled by \fBpcre2_compile()\fP. This does nothing if JIT
|
||||
support is not available.
|
||||
.P
|
||||
|
@ -358,8 +378,8 @@ More complicated programs might need to make use of the specialist functions
|
|||
.P
|
||||
JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
|
||||
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
||||
matching, which gives improved performance. The JIT-specific functions are
|
||||
discussed in the
|
||||
matching, which gives improved performance at the expense of less sanity
|
||||
checking. The JIT-specific functions are discussed in the
|
||||
.\" HREF
|
||||
\fBpcre2jit\fP
|
||||
.\"
|
||||
|
@ -369,7 +389,7 @@ A second matching function, \fBpcre2_dfa_match()\fP, which is not
|
|||
Perl-compatible, is also provided. This uses a different algorithm for the
|
||||
matching. The alternative algorithm finds all possible matches (at a given
|
||||
point in the subject), and scans the subject just once (unless there are
|
||||
lookbehind assertions). However, this algorithm does not return captured
|
||||
lookaround assertions). However, this algorithm does not return captured
|
||||
substrings. A description of the two matching algorithms and their advantages
|
||||
and disadvantages is given in the
|
||||
.\" HREF
|
||||
|
@ -484,8 +504,8 @@ and does not change when the pattern is matched. Therefore, it is thread-safe,
|
|||
that is, the same compiled pattern can be used by more than one thread
|
||||
simultaneously. For example, an application can compile all its patterns at the
|
||||
start, before forking off multiple threads that use them. However, if the
|
||||
just-in-time optimization feature is being used, it needs separate memory stack
|
||||
areas for each thread. See the
|
||||
just-in-time (JIT) optimization feature is being used, it needs separate memory
|
||||
stack areas for each thread. See the
|
||||
.\" HREF
|
||||
\fBpcre2jit\fP
|
||||
.\"
|
||||
|
@ -536,10 +556,10 @@ thread-specific copy.
|
|||
.SS "Match blocks"
|
||||
.rs
|
||||
.sp
|
||||
The matching functions need a block of memory for working space and for storing
|
||||
the results of a match. This includes details of what was matched, as well as
|
||||
additional information such as the name of a (*MARK) setting. Each thread must
|
||||
provide its own copy of this memory.
|
||||
The matching functions need a block of memory for storing the results of a
|
||||
match. This includes details of what was matched, as well as additional
|
||||
information such as the name of a (*MARK) setting. Each thread must provide its
|
||||
own copy of this memory.
|
||||
.
|
||||
.
|
||||
.SH "PCRE2 CONTEXTS"
|
||||
|
@ -611,15 +631,15 @@ The memory used for a general context should be freed by calling:
|
|||
.SS "The compile context"
|
||||
.rs
|
||||
.sp
|
||||
A compile context is required if you want to change the default values of any
|
||||
of the following compile-time parameters:
|
||||
A compile context is required if you want to provide an external function for
|
||||
stack checking during compilation or to change the default values of any of the
|
||||
following compile-time parameters:
|
||||
.sp
|
||||
What \eR matches (Unicode newlines or CR, LF, CRLF only)
|
||||
PCRE2's character tables
|
||||
The newline character sequence
|
||||
The compile time nested parentheses limit
|
||||
The maximum length of the pattern string
|
||||
An external function for stack checking
|
||||
.sp
|
||||
A compile context is also required if you are using custom memory management.
|
||||
If none of these apply, just pass NULL as the context argument of
|
||||
|
@ -666,11 +686,11 @@ in the current locale.
|
|||
.B " PCRE2_SIZE \fIvalue\fP);"
|
||||
.fi
|
||||
.sp
|
||||
This sets a maximum length, in code units, for the pattern string that is to be
|
||||
compiled. If the pattern is longer, an error is generated. This facility is
|
||||
provided so that applications that accept patterns from external sources can
|
||||
limit their size. The default is the largest number that a PCRE2_SIZE variable
|
||||
can hold, which is effectively unlimited.
|
||||
This sets a maximum length, in code units, for any pattern string that is
|
||||
compiled with this context. If the pattern is longer, an error is generated.
|
||||
This facility is provided so that applications that accept patterns from
|
||||
external sources can limit their size. The default is the largest number that a
|
||||
PCRE2_SIZE variable can hold, which is effectively unlimited.
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre2_set_newline(pcre2_compile_context *\fIccontext\fP,
|
||||
|
@ -683,8 +703,15 @@ PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
|
|||
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
|
||||
PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
||||
.P
|
||||
When a pattern is compiled with the PCRE2_EXTENDED option, the value of this
|
||||
parameter affects the recognition of white space and the end of internal
|
||||
A pattern can override the value set in the compile context by starting with a
|
||||
sequence such as (*CRLF). See the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
page for details.
|
||||
.P
|
||||
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
|
||||
convention affects the recognition of white space and the end of internal
|
||||
comments starting with #. The value is saved with the compiled pattern for
|
||||
subsequent use by the JIT compiler and by the two interpreted matching
|
||||
functions, \fIpcre2_match()\fP and \fIpcre2_dfa_match()\fP.
|
||||
|
@ -722,15 +749,14 @@ zero if all is well, or non-zero to force an error.
|
|||
.SS "The match context"
|
||||
.rs
|
||||
.sp
|
||||
A match context is required if you want to change the default values of any
|
||||
of the following match-time parameters:
|
||||
A match context is required if you want to:
|
||||
.sp
|
||||
A callout function
|
||||
The offset limit for matching an unanchored pattern
|
||||
The limit for calling \fBmatch()\fP (see below)
|
||||
The limit for calling \fBmatch()\fP recursively
|
||||
Set up a callout function
|
||||
Set an offset limit for matching an unanchored pattern
|
||||
Change the backtracking match limit
|
||||
Change the backtracking depth limit
|
||||
Set custom memory management specifically for the match
|
||||
.sp
|
||||
A match context is also required if you are using custom memory management.
|
||||
If none of these apply, just pass NULL as the context argument of
|
||||
\fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or \fBpcre2_jit_match()\fP.
|
||||
.P
|
||||
|
@ -756,7 +782,7 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
|
|||
.B " void *\fIcallout_data\fP);"
|
||||
.fi
|
||||
.sp
|
||||
This sets up a "callout" function, which PCRE2 will call at specified points
|
||||
This sets up a "callout" function for PCRE2 to call at specified points
|
||||
during a matching operation. Details are given in the
|
||||
.\" HREF
|
||||
\fBpcre2callout\fP
|
||||
|
@ -778,8 +804,8 @@ A match can never be found if the \fIstartoffset\fP argument of
|
|||
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP is greater than the offset
|
||||
limit.
|
||||
.P
|
||||
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when calling
|
||||
\fBpcre2_compile()\fP so that when JIT is in use, different code can be
|
||||
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
|
||||
calling \fBpcre2_compile()\fP so that when JIT is in use, different code can be
|
||||
compiled. If a match is started with a non-default match limit when
|
||||
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
||||
.P
|
||||
|
@ -799,10 +825,10 @@ up too many resources when processing patterns that are not going to match, but
|
|||
which have a very large number of possibilities in their search trees. The
|
||||
classic example is a pattern that uses nested unlimited repeats.
|
||||
.P
|
||||
Internally, \fBpcre2_match()\fP uses a function called \fBmatch()\fP, which it
|
||||
calls repeatedly (sometimes recursively). The limit set by \fImatch_limit\fP is
|
||||
imposed on the number of times this function is called during a match, which
|
||||
has the effect of limiting the amount of backtracking that can take place. For
|
||||
There is an internal counter in \fBpcre2_match()\fP that is incremented each
|
||||
time round its main matching loop. If this value reaches the match limit,
|
||||
\fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
||||
the effect of limiting the amount of backtracking that can take place. For
|
||||
patterns that are not anchored, the count restarts from zero for each position
|
||||
in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP,
|
||||
which ignores it.
|
||||
|
@ -815,8 +841,7 @@ is also used in this case (but in a different way) to limit how long the
|
|||
matching can continue.
|
||||
.P
|
||||
The default value for the limit can be set when PCRE2 is built; the default
|
||||
default is 10 million, which handles all but the most extreme cases. If the
|
||||
limit is exceeded, \fBpcre2_match()\fP returns PCRE2_ERROR_MATCHLIMIT. A value
|
||||
default is 10 million, which handles all but the most extreme cases. A value
|
||||
for the match limit may also be supplied by an item at the start of a pattern
|
||||
of the form
|
||||
.sp
|
||||
|
@ -827,65 +852,34 @@ less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
|||
limit is set, less than the default.
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " uint32_t \fIvalue\fP);"
|
||||
.fi
|
||||
.sp
|
||||
The \fIrecursion_limit\fP parameter is similar to \fImatch_limit\fP, but
|
||||
instead of limiting the total number of times that \fBmatch()\fP is called, it
|
||||
limits the depth of recursion. The recursion depth is a smaller number than the
|
||||
total number of calls, because not all calls to \fBmatch()\fP are recursive.
|
||||
This limit is of use only if it is set smaller than \fImatch_limit\fP.
|
||||
This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP.
|
||||
Each time a nested backtracking point is passed, a new memory "frame" is used
|
||||
to remember the state of matching at that point. Thus, this parameter
|
||||
indirectly limits the amount of memory that is used in a match.
|
||||
.P
|
||||
Limiting the recursion depth limits the amount of system stack that can be
|
||||
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
|
||||
stack, the amount of heap memory that can be used. This limit is not relevant,
|
||||
and is ignored, when matching is done using JIT compiled code. However, it is
|
||||
supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less
|
||||
frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of
|
||||
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
|
||||
This limit is not relevant, and is ignored, when matching is done using JIT
|
||||
compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which uses
|
||||
it to limit the depth of internal recursive function calls that implement
|
||||
lookaround assertions and pattern recursions. This is, therefore, an indirect
|
||||
limit on the amount of system stack that is used. A recursive pattern such as
|
||||
/(.)(?1)/, when matched to a very long string using \fBpcre2_dfa_match()\fP,
|
||||
can use a great deal of stack.
|
||||
.P
|
||||
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for \fImatch_limit\fP. If the
|
||||
limit is exceeded, \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP return
|
||||
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
|
||||
supplied by an item at the start of a pattern of the form
|
||||
The default value for the depth limit can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for the match limit. If the
|
||||
limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP returns
|
||||
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
||||
item at the start of a pattern of the form
|
||||
.sp
|
||||
(*LIMIT_RECURSION=ddd)
|
||||
(*LIMIT_DEPTH=ddd)
|
||||
.sp
|
||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||
less than the limit set by the caller of \fBpcre2_match()\fP or
|
||||
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre2_set_recursion_memory_management(
|
||||
.B " pcre2_match_context *\fImcontext\fP,"
|
||||
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
|
||||
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
|
||||
.fi
|
||||
.sp
|
||||
This function sets up two additional custom memory management functions for use
|
||||
by \fBpcre2_match()\fP when PCRE2 is compiled to use the heap for remembering
|
||||
backtracking data, instead of recursive function calls that use the system
|
||||
stack. There is a discussion about PCRE2's stack usage in the
|
||||
.\" HREF
|
||||
\fBpcre2stack\fP
|
||||
.\"
|
||||
documentation. See the
|
||||
.\" HREF
|
||||
\fBpcre2build\fP
|
||||
.\"
|
||||
documentation for details of how to build PCRE2.
|
||||
.P
|
||||
Using the heap for recursion is a non-standard way of building PCRE2, for use
|
||||
in environments that have limited stacks. Because of the greater use of memory
|
||||
management, \fBpcre2_match()\fP runs more slowly. Functions that are different
|
||||
to the general custom memory functions are provided so that special-purpose
|
||||
external code can be used for this case, because the memory blocks are all the
|
||||
same size. The blocks are retained by \fBpcre2_match()\fP until it is about to
|
||||
exit so that they can be re-used when possible during the match. In the absence
|
||||
of these functions, the normal custom memory management functions are used, if
|
||||
supplied, otherwise the system functions.
|
||||
.
|
||||
.
|
||||
.SH "CHECKING BUILD-TIME OPTIONS"
|
||||
|
@ -920,6 +914,13 @@ sequences the \eR escape sequence matches by default. A value of
|
|||
PCRE2_BSR_UNICODE means that \eR matches any Unicode line ending sequence; a
|
||||
value of PCRE2_BSR_ANYCRLF means that \eR matches only CR, LF, or CRLF. The
|
||||
default can be overridden when a pattern is compiled.
|
||||
.sp
|
||||
PCRE2_CONFIG_DEPTHLIMIT
|
||||
.sp
|
||||
The output is a uint32_t integer that gives the default limit for the depth of
|
||||
nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions
|
||||
and lookarounds in \fBpcre2_dfa_match()\fP. Further details are given with
|
||||
\fBpcre2_set_depth_limit()\fP above.
|
||||
.sp
|
||||
PCRE2_CONFIG_JIT
|
||||
.sp
|
||||
|
@ -954,9 +955,9 @@ be compiled by those two libraries, but at the expense of slower matching.
|
|||
.sp
|
||||
PCRE2_CONFIG_MATCHLIMIT
|
||||
.sp
|
||||
The output is a uint32_t integer that gives the default limit for the number of
|
||||
internal matching function calls in a \fBpcre2_match()\fP execution. Further
|
||||
details are given with \fBpcre2_match()\fP below.
|
||||
The output is a uint32_t integer that gives the default match limit for
|
||||
\fBpcre2_match()\fP. Further details are given with
|
||||
\fBpcre2_set_match_limit()\fP above.
|
||||
.sp
|
||||
PCRE2_CONFIG_NEWLINE
|
||||
.sp
|
||||
|
@ -980,20 +981,11 @@ amount of system stack used when a pattern is compiled. It is specified when
|
|||
PCRE2 is built; the default is 250. This limit does not take into account the
|
||||
stack that may already be used by the calling application. For finer control
|
||||
over compilation stack usage, see \fBpcre2_set_compile_recursion_guard()\fP.
|
||||
.sp
|
||||
PCRE2_CONFIG_RECURSIONLIMIT
|
||||
.sp
|
||||
The output is a uint32_t integer that gives the default limit for the depth of
|
||||
recursion when calling the internal matching function in a \fBpcre2_match()\fP
|
||||
execution. Further details are given with \fBpcre2_match()\fP below.
|
||||
.sp
|
||||
PCRE2_CONFIG_STACKRECURSE
|
||||
.sp
|
||||
The output is a uint32_t integer that is set to one if internal recursion when
|
||||
running \fBpcre2_match()\fP is implemented by recursive function calls that use
|
||||
the system stack to remember their state. This is the usual way that PCRE2 is
|
||||
compiled. The output is zero if PCRE2 was compiled to use blocks of data on the
|
||||
heap instead of recursive function calls.
|
||||
This parameter is obsolete and should not be used in new code. The output is a
|
||||
uint32_t integer that is always set to zero.
|
||||
.sp
|
||||
PCRE2_CONFIG_UNICODE_VERSION
|
||||
.sp
|
||||
|
@ -1012,7 +1004,7 @@ available; otherwise it is set to zero. Unicode support implies UTF support.
|
|||
.sp
|
||||
PCRE2_CONFIG_VERSION
|
||||
.sp
|
||||
The \fIwhere\fP argument should point to a buffer that is at least 12 code
|
||||
The \fIwhere\fP argument should point to a buffer that is at least 24 code
|
||||
units long. (The exact length required can be found by calling
|
||||
\fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with
|
||||
the PCRE2 version string, zero-terminated. The number of code units used is
|
||||
|
@ -1208,13 +1200,14 @@ option is set, normal backslash processing is applied to verb names and only an
|
|||
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
||||
included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED
|
||||
option is set, unescaped whitespace in verb names is skipped and #-comments are
|
||||
recognized, exactly as in the rest of the pattern.
|
||||
recognized in this mode, exactly as in the rest of the pattern.
|
||||
.sp
|
||||
PCRE2_AUTO_CALLOUT
|
||||
.sp
|
||||
If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
|
||||
all with number 255, before each pattern item, except immediately before or
|
||||
after a callout in the pattern. For discussion of the callout facility, see the
|
||||
after an explicit callout in the pattern. For discussion of the callout
|
||||
facility, see the
|
||||
.\" HREF
|
||||
\fBpcre2callout\fP
|
||||
.\"
|
||||
|
@ -1452,9 +1445,8 @@ in the
|
|||
.\" HREF
|
||||
\fBpcre2unicode\fP
|
||||
.\"
|
||||
document.
|
||||
If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a negative
|
||||
error code.
|
||||
document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a
|
||||
negative error code.
|
||||
.P
|
||||
If you know that your pattern is valid, and you want to skip this check for
|
||||
performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When it is set,
|
||||
|
@ -1479,7 +1471,7 @@ in the
|
|||
.\"
|
||||
page. If you set PCRE2_UCP, matching one of the items it affects takes much
|
||||
longer. The option is available only if PCRE2 has been compiled with Unicode
|
||||
support.
|
||||
support (which is the default).
|
||||
.sp
|
||||
PCRE2_UNGREEDY
|
||||
.sp
|
||||
|
@ -1518,7 +1510,7 @@ page.
|
|||
.SH "COMPILATION ERROR CODES"
|
||||
.rs
|
||||
.sp
|
||||
There are over 80 positive error codes that \fBpcre2_compile()\fP may return
|
||||
There are nearly 100 positive error codes that \fBpcre2_compile()\fP may return
|
||||
(via \fIerrorcode\fP) if it finds an error in the pattern. There are also some
|
||||
negative error codes that are used for invalid UTF strings. These are the same
|
||||
as given by \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, and are described
|
||||
|
@ -1570,7 +1562,7 @@ documentation.
|
|||
JIT compilation is a heavyweight optimization. It can take some time for
|
||||
patterns to be analyzed, and for one-off matches and simple patterns the
|
||||
benefit of faster execution might be offset by a much slower compilation time.
|
||||
Most, but not all patterns can be optimized by the JIT compiler.
|
||||
Most (but not all) patterns can be optimized by the JIT compiler.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="localesupport"></a>
|
||||
|
@ -1581,10 +1573,10 @@ PCRE2 handles caseless matching, and determines whether characters are letters,
|
|||
digits, or whatever, by reference to a set of tables, indexed by character code
|
||||
point. This applies only to characters whose code points are less than 256. By
|
||||
default, higher-valued code points never match escapes such as \ew or \ed.
|
||||
However, if PCRE2 is built with UTF support, all characters can be tested with
|
||||
\ep and \eP, or, alternatively, the PCRE2_UCP option can be set when a pattern
|
||||
is compiled; this causes \ew and friends to use Unicode property support
|
||||
instead of the built-in tables.
|
||||
However, if PCRE2 is built with Unicode support, all characters can be tested
|
||||
with \ep and \eP, or, alternatively, the PCRE2_UCP option can be set when a
|
||||
pattern is compiled; this causes \ew and friends to use Unicode property
|
||||
support instead of the built-in tables.
|
||||
.P
|
||||
The use of locales with Unicode is discouraged. If you are handling characters
|
||||
with code points greater than 128, you should either use Unicode support, or
|
||||
|
@ -1623,7 +1615,7 @@ available for as long as it is needed.
|
|||
The pointer that is passed (via the compile context) to \fBpcre2_compile()\fP
|
||||
is saved with the compiled pattern, and the same tables are used by
|
||||
\fBpcre2_match()\fP and \fBpcre_dfa_match()\fP. Thus, for any single pattern,
|
||||
compilation, and matching all happen in the same locale, but different patterns
|
||||
compilation and matching both happen in the same locale, but different patterns
|
||||
can be processed in different locales.
|
||||
.
|
||||
.
|
||||
|
@ -1646,7 +1638,7 @@ pattern. The second argument specifies which piece of information is required,
|
|||
and the third argument is a pointer to a variable to receive the data. If the
|
||||
third argument is NULL, the first argument is ignored, and the function returns
|
||||
the size in bytes of the variable that is required for the information
|
||||
requested. Otherwise, The yield of the function is zero for success, or one of
|
||||
requested. Otherwise, the yield of the function is zero for success, or one of
|
||||
the following negative numbers:
|
||||
.sp
|
||||
PCRE2_ERROR_NULL the argument \fIcode\fP was NULL
|
||||
|
@ -1699,8 +1691,8 @@ following are true:
|
|||
.* is not in a capturing group that is the subject
|
||||
of a back reference
|
||||
PCRE2_DOTALL is in force for .*
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern.
|
||||
PCRE2_NO_DOTSTAR_ANCHOR is not set.
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
||||
.sp
|
||||
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
|
||||
options returned for PCRE2_INFO_ALLOPTIONS.
|
||||
|
@ -1727,6 +1719,13 @@ matches only CR, LF, or CRLF.
|
|||
Return the highest capturing subpattern number in the pattern. In patterns
|
||||
where (?| is not used, this is also the total number of capturing subpatterns.
|
||||
The third argument should point to an \fBuint32_t\fP variable.
|
||||
.sp
|
||||
PCRE2_INFO_DEPTHLIMIT
|
||||
.sp
|
||||
If the pattern set a backtracking depth limit by including an item of the form
|
||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
|
||||
.sp
|
||||
PCRE2_INFO_FIRSTBITMAP
|
||||
.sp
|
||||
|
@ -1758,6 +1757,14 @@ argument should point to an \fBuint32_t\fP variable. In the 8-bit library, the
|
|||
value is always less than 256. In the 16-bit library the value can be up to
|
||||
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
|
||||
and up to 0xffffffff when not using UTF-32 mode.
|
||||
.sp
|
||||
PCRE2_INFO_FRAMESIZE
|
||||
.sp
|
||||
Return the size (in bytes) of the data frames that are used to remember
|
||||
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
|
||||
without the use of JIT. The third argument should point to an \fBsize_t\fP
|
||||
variable. The frame size depends on the number of capturing parentheses in the
|
||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||
.sp
|
||||
PCRE2_INFO_HASBACKSLASHC
|
||||
.sp
|
||||
|
@ -1768,7 +1775,8 @@ argument should point to an \fBuint32_t\fP variable.
|
|||
.sp
|
||||
Return 1 if the pattern contains any explicit matches for CR or LF characters,
|
||||
otherwise 0. The third argument should point to an \fBuint32_t\fP variable. An
|
||||
explicit match is either a literal CR or LF character, or \er or \en.
|
||||
explicit match is either a literal CR or LF character, or \er or \en or one of
|
||||
the equivalent hexadecimal or octal escape sequences.
|
||||
.sp
|
||||
PCRE2_INFO_JCHANGED
|
||||
.sp
|
||||
|
@ -1907,7 +1915,7 @@ different for each compiled pattern.
|
|||
.sp
|
||||
PCRE2_INFO_NEWLINE
|
||||
.sp
|
||||
The output is a \fBuint32_t\fP with one of the following values:
|
||||
The output is one of the following \fBuint32_t\fP values:
|
||||
.sp
|
||||
PCRE2_NEWLINE_CR Carriage return (CR)
|
||||
PCRE2_NEWLINE_LF Linefeed (LF)
|
||||
|
@ -1915,15 +1923,8 @@ The output is a \fBuint32_t\fP with one of the following values:
|
|||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||
.sp
|
||||
This specifies the default character sequence that will be recognized as
|
||||
meaning "newline" while matching.
|
||||
.sp
|
||||
PCRE2_INFO_RECURSIONLIMIT
|
||||
.sp
|
||||
If the pattern set a recursion limit by including an item of the form
|
||||
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
|
||||
argument should point to an unsigned 32-bit integer. If no such value has been
|
||||
set, the call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
|
||||
This identifies the character sequence that will be recognized as meaning
|
||||
"newline" while matching.
|
||||
.sp
|
||||
PCRE2_INFO_SIZE
|
||||
.sp
|
||||
|
@ -2000,9 +2001,9 @@ Before calling \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or
|
|||
the creation functions above. For \fBpcre2_match_data_create()\fP, the first
|
||||
argument is the number of pairs of offsets in the \fIovector\fP. One pair of
|
||||
offsets is required to identify the string that matched the whole pattern, with
|
||||
another pair for each captured substring. For example, a value of 4 creates
|
||||
enough space to record the matched portion of the subject plus three captured
|
||||
substrings. A minimum of at least 1 pair is imposed by
|
||||
an additional pair for each captured substring. For example, a value of 4
|
||||
creates enough space to record the matched portion of the subject plus three
|
||||
captured substrings. A minimum of at least 1 pair is imposed by
|
||||
\fBpcre2_match_data_create()\fP, so it is always possible to return the overall
|
||||
matched string.
|
||||
.P
|
||||
|
@ -2145,9 +2146,11 @@ newline convention recognizes CRLF as a newline, and if so, and the current
|
|||
character is CR followed by LF, advance the starting offset by two characters
|
||||
instead of one.
|
||||
.P
|
||||
If a non-zero starting offset is passed when the pattern is anchored, one
|
||||
If a non-zero starting offset is passed when the pattern is anchored, an single
|
||||
attempt to match at the given offset is made. This can only succeed if the
|
||||
pattern does not require the match to be at the start of the subject.
|
||||
pattern does not require the match to be at the start of the subject. In other
|
||||
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
|
||||
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="matchoptions"></a>
|
||||
|
@ -2161,9 +2164,9 @@ PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is
|
|||
described below.
|
||||
.P
|
||||
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
|
||||
compiler. If it is set, JIT matching is disabled and the normal interpretive
|
||||
code in \fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT (obviously), the
|
||||
remaining options are supported for JIT matching.
|
||||
compiler. If it is set, JIT matching is disabled and the interpretive code in
|
||||
\fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT (obviously), the remaining
|
||||
options are supported for JIT matching.
|
||||
.sp
|
||||
PCRE2_ANCHORED
|
||||
.sp
|
||||
|
@ -2257,12 +2260,12 @@ page.
|
|||
If you know that your subject is valid, and you want to skip these checks for
|
||||
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
|
||||
\fBpcre2_match()\fP. You might want to do this for the second and subsequent
|
||||
calls to \fBpcre2_match()\fP if you are making repeated calls to find all the
|
||||
matches in a single subject string.
|
||||
calls to \fBpcre2_match()\fP if you are making repeated calls to find other
|
||||
matches in the same subject string.
|
||||
.P
|
||||
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid string
|
||||
as a subject, or an invalid value of \fIstartoffset\fP, is undefined. Your
|
||||
program may crash or loop indefinitely.
|
||||
WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
|
||||
string as a subject, or an invalid value of \fIstartoffset\fP, is undefined.
|
||||
Your program may crash or loop indefinitely.
|
||||
.sp
|
||||
PCRE2_PARTIAL_HARD
|
||||
PCRE2_PARTIAL_SOFT
|
||||
|
@ -2329,9 +2332,9 @@ start, it skips both the CR and the LF before retrying. However, the pattern
|
|||
reference, and so advances only by one character after the first failure.
|
||||
.P
|
||||
An explicit match for CR of LF is either a literal appearance of one of those
|
||||
characters in the pattern, or one of the \er or \en escape sequences. Implicit
|
||||
matches such as [^X] do not count, nor does \es, even though it includes CR and
|
||||
LF in the characters that it matches.
|
||||
characters in the pattern, or one of the \er or \en or equivalent octal or
|
||||
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
|
||||
does \es, even though it includes CR and LF in the characters that it matches.
|
||||
.P
|
||||
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
||||
valid newline sequence and explicit \er or \en escapes appear in the pattern.
|
||||
|
@ -2395,12 +2398,12 @@ identify the part of the subject that was partially matched. See the
|
|||
.\"
|
||||
documentation for details of partial matching.
|
||||
.P
|
||||
After a successful match, the first pair of offsets identifies the portion of
|
||||
the subject string that was matched by the entire pattern. The next pair is
|
||||
used for the first capturing subpattern, and so on. The value returned by
|
||||
After a fully successful match, the first pair of offsets identifies the
|
||||
portion of the subject string that was matched by the entire pattern. The next
|
||||
pair is used for the first captured substring, and so on. The value returned by
|
||||
\fBpcre2_match()\fP is one more than the highest numbered pair that has been
|
||||
set. For example, if two substrings have been captured, the returned value is
|
||||
3. If there are no capturing subpatterns, the return value from a successful
|
||||
3. If there are no captured substrings, the return value from a successful
|
||||
match is 1, indicating that just the first pair of offsets has been set.
|
||||
.P
|
||||
If a pattern uses the \eK escape sequence within a positive assertion, the
|
||||
|
@ -2415,11 +2418,7 @@ returned.
|
|||
If the ovector is too small to hold all the captured substring offsets, as much
|
||||
as possible is filled in, and the function returns a value of zero. If captured
|
||||
substrings are not of interest, \fBpcre2_match()\fP may be called with a match
|
||||
data block whose ovector is of minimum length (that is, one pair). However, if
|
||||
the pattern contains back references and the \fIovector\fP is not big enough to
|
||||
remember the related substrings, PCRE2 has to get additional memory for use
|
||||
during matching. Thus it is usually advisable to set up a match data block
|
||||
containing an ovector of reasonable size.
|
||||
data block whose ovector is of minimum length (that is, one pair).
|
||||
.P
|
||||
It is possible for capturing subpattern number \fIn+1\fP to match some part of
|
||||
the subject when subpattern \fIn\fP has not been used at all. For example, if
|
||||
|
@ -2535,8 +2534,9 @@ returned when the magic number is not present.
|
|||
.sp
|
||||
PCRE2_ERROR_BADMODE
|
||||
.sp
|
||||
This error is given when a pattern that was compiled by the 8-bit library is
|
||||
passed to a 16-bit or 32-bit library function, or vice versa.
|
||||
This error is given when a compiled pattern is passed to a function in a
|
||||
library of a different code unit width, for example, a pattern compiled by
|
||||
the 8-bit library is passed to a 16-bit or 32-bit library function.
|
||||
.sp
|
||||
PCRE2_ERROR_BADOFFSET
|
||||
.sp
|
||||
|
@ -2562,22 +2562,15 @@ use by callout functions that want to cause \fBpcre2_match()\fP or
|
|||
\fBpcre2callout\fP
|
||||
.\"
|
||||
documentation for details.
|
||||
.sp
|
||||
PCRE2_ERROR_DEPTHLIMIT
|
||||
.sp
|
||||
The nested backtracking depth limit was reached.
|
||||
.sp
|
||||
PCRE2_ERROR_INTERNAL
|
||||
.sp
|
||||
An unexpected internal error has occurred. This error could be caused by a bug
|
||||
in PCRE2 or by overwriting of the compiled pattern.
|
||||
.sp
|
||||
PCRE2_ERROR_JIT_BADOPTION
|
||||
.sp
|
||||
This error is returned when a pattern that was successfully studied using JIT
|
||||
is being matched, but the matching mode (partial or complete match) does not
|
||||
correspond to any JIT compilation mode. When the JIT fast path function is
|
||||
used, this error may be also given for invalid options. See the
|
||||
.\" HREF
|
||||
\fBpcre2jit\fP
|
||||
.\"
|
||||
documentation for more details.
|
||||
.sp
|
||||
PCRE2_ERROR_JIT_STACKLIMIT
|
||||
.sp
|
||||
|
@ -2591,15 +2584,13 @@ documentation for more details.
|
|||
.sp
|
||||
PCRE2_ERROR_MATCHLIMIT
|
||||
.sp
|
||||
The backtracking limit was reached.
|
||||
The backtracking match limit was reached.
|
||||
.sp
|
||||
PCRE2_ERROR_NOMEMORY
|
||||
.sp
|
||||
If a pattern contains back references, but the ovector is not big enough to
|
||||
remember the referenced substrings, PCRE2 gets a block of memory at the start
|
||||
of matching to use for this purpose. There are some other special cases where
|
||||
extra memory is needed during matching. This error is given when memory cannot
|
||||
be obtained.
|
||||
If a pattern contains many nested backtracking points, heap memory is used to
|
||||
remember them. This error is given when the memory allocation function (default
|
||||
or custom) fails.
|
||||
.sp
|
||||
PCRE2_ERROR_NULL
|
||||
.sp
|
||||
|
@ -2615,10 +2606,6 @@ in the subject string. Some simple patterns that might do this are detected and
|
|||
faulted at compile time, but more complicated cases, in particular mutual
|
||||
recursions between two different subpatterns, cannot be detected until matching
|
||||
is attempted.
|
||||
.sp
|
||||
PCRE2_ERROR_RECURSIONLIMIT
|
||||
.sp
|
||||
The internal recursion limit was reached.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="geterrormessage"></a>
|
||||
|
@ -2808,8 +2795,8 @@ calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
|
|||
compiled pattern, and the second is the name. The yield of the function is the
|
||||
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
|
||||
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
|
||||
that name. Given the number, you can extract the substring directly, or use one
|
||||
of the functions described above.
|
||||
that name. Given the number, you can extract the substring directly from the
|
||||
ovector, or use one of the "bynumber" functions described above.
|
||||
.P
|
||||
For convenience, there are also "byname" functions that correspond to the
|
||||
"bynumber" functions, the only difference being that the second argument is a
|
||||
|
@ -3113,11 +3100,12 @@ other alternatives. Ultimately, when it runs out of matches,
|
|||
.P
|
||||
The function \fBpcre2_dfa_match()\fP is called to match a subject string
|
||||
against a compiled pattern, using a matching algorithm that scans the subject
|
||||
string just once, and does not backtrack. This has different characteristics to
|
||||
the normal algorithm, and is not compatible with Perl. Some of the features of
|
||||
PCRE2 patterns are not supported. Nevertheless, there are times when this kind
|
||||
of matching can be useful. For a discussion of the two matching algorithms, and
|
||||
a list of features that \fBpcre2_dfa_match()\fP does not support, see the
|
||||
string just once (not counting lookaround assertions), and does not backtrack.
|
||||
This has different characteristics to the normal algorithm, and is not
|
||||
compatible with Perl. Some of the features of PCRE2 patterns are not supported.
|
||||
Nevertheless, there are times when this kind of matching can be useful. For a
|
||||
discussion of the two matching algorithms, and a list of features that
|
||||
\fBpcre2_dfa_match()\fP does not support, see the
|
||||
.\" HREF
|
||||
\fBpcre2matching\fP
|
||||
.\"
|
||||
|
@ -3321,6 +3309,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 21 March 2017
|
||||
Last updated: 27 March 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
Loading…
Reference in New Issue