Documentation update.
This commit is contained in:
parent
447d1b3083
commit
6c7fa44939
|
@ -46,7 +46,7 @@ A match context is needed only if you want to:
|
|||
Set a matching offset limit
|
||||
Change the backtracking match limit
|
||||
Change the backtracking depth limit
|
||||
Set custom memory management in the match context
|
||||
Set custom memory management specifically for the match
|
||||
</pre>
|
||||
The <i>length</i> and <i>startoffset</i> values are code
|
||||
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
||||
|
|
|
@ -23,37 +23,38 @@ please consult the man page, in case the conversion went wrong.
|
|||
<li><a name="TOC8" href="#SEC8">PCRE2 NATIVE API JIT FUNCTIONS</a>
|
||||
<li><a name="TOC9" href="#SEC9">PCRE2 NATIVE API SERIALIZATION FUNCTIONS</a>
|
||||
<li><a name="TOC10" href="#SEC10">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a>
|
||||
<li><a name="TOC11" href="#SEC11">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
||||
<li><a name="TOC12" href="#SEC12">PCRE2 API OVERVIEW</a>
|
||||
<li><a name="TOC13" href="#SEC13">STRING LENGTHS AND OFFSETS</a>
|
||||
<li><a name="TOC14" href="#SEC14">NEWLINES</a>
|
||||
<li><a name="TOC15" href="#SEC15">MULTITHREADING</a>
|
||||
<li><a name="TOC16" href="#SEC16">PCRE2 CONTEXTS</a>
|
||||
<li><a name="TOC17" href="#SEC17">CHECKING BUILD-TIME OPTIONS</a>
|
||||
<li><a name="TOC18" href="#SEC18">COMPILING A PATTERN</a>
|
||||
<li><a name="TOC19" href="#SEC19">COMPILATION ERROR CODES</a>
|
||||
<li><a name="TOC20" href="#SEC20">JUST-IN-TIME (JIT) COMPILATION</a>
|
||||
<li><a name="TOC21" href="#SEC21">LOCALE SUPPORT</a>
|
||||
<li><a name="TOC22" href="#SEC22">INFORMATION ABOUT A COMPILED PATTERN</a>
|
||||
<li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A PATTERN'S CALLOUTS</a>
|
||||
<li><a name="TOC24" href="#SEC24">SERIALIZATION AND PRECOMPILING</a>
|
||||
<li><a name="TOC25" href="#SEC25">THE MATCH DATA BLOCK</a>
|
||||
<li><a name="TOC26" href="#SEC26">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
||||
<li><a name="TOC27" href="#SEC27">NEWLINE HANDLING WHEN MATCHING</a>
|
||||
<li><a name="TOC28" href="#SEC28">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC29" href="#SEC29">OTHER INFORMATION ABOUT A MATCH</a>
|
||||
<li><a name="TOC30" href="#SEC30">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
||||
<li><a name="TOC31" href="#SEC31">OBTAINING A TEXTUAL ERROR MESSAGE</a>
|
||||
<li><a name="TOC32" href="#SEC32">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
||||
<li><a name="TOC33" href="#SEC33">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC34" href="#SEC34">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
||||
<li><a name="TOC35" href="#SEC35">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
||||
<li><a name="TOC36" href="#SEC36">DUPLICATE SUBPATTERN NAMES</a>
|
||||
<li><a name="TOC37" href="#SEC37">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
||||
<li><a name="TOC38" href="#SEC38">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
||||
<li><a name="TOC39" href="#SEC39">SEE ALSO</a>
|
||||
<li><a name="TOC40" href="#SEC40">AUTHOR</a>
|
||||
<li><a name="TOC41" href="#SEC41">REVISION</a>
|
||||
<li><a name="TOC11" href="#SEC11">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a>
|
||||
<li><a name="TOC12" href="#SEC12">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
|
||||
<li><a name="TOC13" href="#SEC13">PCRE2 API OVERVIEW</a>
|
||||
<li><a name="TOC14" href="#SEC14">STRING LENGTHS AND OFFSETS</a>
|
||||
<li><a name="TOC15" href="#SEC15">NEWLINES</a>
|
||||
<li><a name="TOC16" href="#SEC16">MULTITHREADING</a>
|
||||
<li><a name="TOC17" href="#SEC17">PCRE2 CONTEXTS</a>
|
||||
<li><a name="TOC18" href="#SEC18">CHECKING BUILD-TIME OPTIONS</a>
|
||||
<li><a name="TOC19" href="#SEC19">COMPILING A PATTERN</a>
|
||||
<li><a name="TOC20" href="#SEC20">COMPILATION ERROR CODES</a>
|
||||
<li><a name="TOC21" href="#SEC21">JUST-IN-TIME (JIT) COMPILATION</a>
|
||||
<li><a name="TOC22" href="#SEC22">LOCALE SUPPORT</a>
|
||||
<li><a name="TOC23" href="#SEC23">INFORMATION ABOUT A COMPILED PATTERN</a>
|
||||
<li><a name="TOC24" href="#SEC24">INFORMATION ABOUT A PATTERN'S CALLOUTS</a>
|
||||
<li><a name="TOC25" href="#SEC25">SERIALIZATION AND PRECOMPILING</a>
|
||||
<li><a name="TOC26" href="#SEC26">THE MATCH DATA BLOCK</a>
|
||||
<li><a name="TOC27" href="#SEC27">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
|
||||
<li><a name="TOC28" href="#SEC28">NEWLINE HANDLING WHEN MATCHING</a>
|
||||
<li><a name="TOC29" href="#SEC29">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC30" href="#SEC30">OTHER INFORMATION ABOUT A MATCH</a>
|
||||
<li><a name="TOC31" href="#SEC31">ERROR RETURNS FROM <b>pcre2_match()</b></a>
|
||||
<li><a name="TOC32" href="#SEC32">OBTAINING A TEXTUAL ERROR MESSAGE</a>
|
||||
<li><a name="TOC33" href="#SEC33">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
|
||||
<li><a name="TOC34" href="#SEC34">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a>
|
||||
<li><a name="TOC35" href="#SEC35">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
|
||||
<li><a name="TOC36" href="#SEC36">CREATING A NEW STRING WITH SUBSTITUTIONS</a>
|
||||
<li><a name="TOC37" href="#SEC37">DUPLICATE SUBPATTERN NAMES</a>
|
||||
<li><a name="TOC38" href="#SEC38">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a>
|
||||
<li><a name="TOC39" href="#SEC39">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
|
||||
<li><a name="TOC40" href="#SEC40">SEE ALSO</a>
|
||||
<li><a name="TOC41" href="#SEC41">AUTHOR</a>
|
||||
<li><a name="TOC42" href="#SEC42">REVISION</a>
|
||||
</ul>
|
||||
<P>
|
||||
<b>#include <pcre2.h></b>
|
||||
|
@ -177,22 +178,16 @@ document for an overview of all the PCRE2 documentation.
|
|||
<b> void *<i>callout_data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> PCRE2_SIZE <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b>int pcre2_set_match_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_recursion_memory_management(</b>
|
||||
<b> pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
|
||||
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
|
||||
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS</a><br>
|
||||
<P>
|
||||
|
@ -314,7 +309,24 @@ document for an overview of all the PCRE2 documentation.
|
|||
<br>
|
||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
||||
<br><a name="SEC11" href="#TOC1">PCRE2 NATIVE API OBSOLETE FUNCTIONS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>int pcre2_set_recursion_memory_management(</b>
|
||||
<b> pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
|
||||
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
These functions became obsolete at release 10.30 and are retained only for
|
||||
backward compatibility. They should not be used in new code. The first is
|
||||
replaced by <b>pcre2_set_depth_limit()</b>; the second is no longer needed and
|
||||
no longer has any effect (it always returns zero).
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
|
||||
<P>
|
||||
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code
|
||||
units, respectively. However, there is just one header file, <b>pcre2.h</b>.
|
||||
|
@ -368,14 +380,14 @@ When using multiple libraries in an application, you must take care when
|
|||
processing any particular pattern to use only functions from a single library.
|
||||
For example, if you want to run a match using a pattern that was compiled with
|
||||
<b>pcre2_compile_16()</b>, you must do so with <b>pcre2_match_16()</b>, not
|
||||
<b>pcre2_match_8()</b>.
|
||||
<b>pcre2_match_8()</b> or <b>pcre2_match_32</b>.
|
||||
</P>
|
||||
<P>
|
||||
In the function summaries above, and in the rest of this document and other
|
||||
PCRE2 documents, functions and data types are described using their generic
|
||||
names, without the 8, 16, or 32 suffix.
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
||||
<br><a name="SEC13" href="#TOC1">PCRE2 API OVERVIEW</a><br>
|
||||
<P>
|
||||
PCRE2 has its own native API, which is described in this document. There are
|
||||
also some wrapper functions for the 8-bit library that correspond to the
|
||||
|
@ -397,7 +409,7 @@ against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
|
|||
<b>pcre2.h</b>.
|
||||
</P>
|
||||
<P>
|
||||
The functions <b>pcre2_compile()</b>, and <b>pcre2_match()</b> are used for
|
||||
The functions <b>pcre2_compile()</b> and <b>pcre2_match()</b> are used for
|
||||
compiling and matching regular expressions in a Perl-compatible manner. A
|
||||
sample program that demonstrates the simplest way of using them is provided in
|
||||
the file called <i>pcre2demo.c</i> in the PCRE2 source distribution. A listing
|
||||
|
@ -408,10 +420,17 @@ documentation, and the
|
|||
documentation describes how to compile and run it.
|
||||
</P>
|
||||
<P>
|
||||
Just-in-time compiler support is an optional feature of PCRE2 that can be built
|
||||
in appropriate hardware environments. It greatly speeds up the matching
|
||||
The compiling and matching functions recognize various options that are passed
|
||||
as bits in an options argument. There are also some more complicated parameters
|
||||
such as custom memory management functions and resource limits that are passed
|
||||
in "contexts" (which are just memory blocks, described below). Simple
|
||||
applications do not need to make use of contexts.
|
||||
</P>
|
||||
<P>
|
||||
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
|
||||
built in appropriate hardware environments. It greatly speeds up the matching
|
||||
performance of many patterns. Programs can request that it be used if
|
||||
available, by calling <b>pcre2_jit_compile()</b> after a pattern has been
|
||||
available by calling <b>pcre2_jit_compile()</b> after a pattern has been
|
||||
successfully compiled by <b>pcre2_compile()</b>. This does nothing if JIT
|
||||
support is not available.
|
||||
</P>
|
||||
|
@ -423,8 +442,8 @@ More complicated programs might need to make use of the specialist functions
|
|||
<P>
|
||||
JIT matching is automatically used by <b>pcre2_match()</b> if it is available,
|
||||
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
||||
matching, which gives improved performance. The JIT-specific functions are
|
||||
discussed in the
|
||||
matching, which gives improved performance at the expense of less sanity
|
||||
checking. The JIT-specific functions are discussed in the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
|
@ -433,7 +452,7 @@ A second matching function, <b>pcre2_dfa_match()</b>, which is not
|
|||
Perl-compatible, is also provided. This uses a different algorithm for the
|
||||
matching. The alternative algorithm finds all possible matches (at a given
|
||||
point in the subject), and scans the subject just once (unless there are
|
||||
lookbehind assertions). However, this algorithm does not return captured
|
||||
lookaround assertions). However, this algorithm does not return captured
|
||||
substrings. A description of the two matching algorithms and their advantages
|
||||
and disadvantages is given in the
|
||||
<a href="pcre2matching.html"><b>pcre2matching</b></a>
|
||||
|
@ -476,7 +495,7 @@ Functions with names ending with <b>_free()</b> are used for freeing memory
|
|||
blocks of various sorts. In all cases, if one of these functions is called with
|
||||
a NULL argument, it does nothing.
|
||||
</P>
|
||||
<br><a name="SEC13" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
||||
<br><a name="SEC14" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
||||
<P>
|
||||
The PCRE2 API uses string lengths and offsets into strings of code units in
|
||||
several places. These values are always of type PCRE2_SIZE, which is an
|
||||
|
@ -486,7 +505,7 @@ as a special indicator for zero-terminated strings and unset offsets.
|
|||
Therefore, the longest string that can be handled is one less than this
|
||||
maximum.
|
||||
<a name="newlines"></a></P>
|
||||
<br><a name="SEC14" href="#TOC1">NEWLINES</a><br>
|
||||
<br><a name="SEC15" href="#TOC1">NEWLINES</a><br>
|
||||
<P>
|
||||
PCRE2 supports five different conventions for indicating line breaks in
|
||||
strings: a single CR (carriage return) character, a single LF (linefeed)
|
||||
|
@ -521,7 +540,7 @@ The choice of newline convention does not affect the interpretation of
|
|||
the \n or \r escape sequences, nor does it affect what \R matches; this has
|
||||
its own separate convention.
|
||||
</P>
|
||||
<br><a name="SEC15" href="#TOC1">MULTITHREADING</a><br>
|
||||
<br><a name="SEC16" href="#TOC1">MULTITHREADING</a><br>
|
||||
<P>
|
||||
In a multithreaded application it is important to keep thread-specific data
|
||||
separate from data that can be shared between threads. The PCRE2 library code
|
||||
|
@ -543,8 +562,8 @@ and does not change when the pattern is matched. Therefore, it is thread-safe,
|
|||
that is, the same compiled pattern can be used by more than one thread
|
||||
simultaneously. For example, an application can compile all its patterns at the
|
||||
start, before forking off multiple threads that use them. However, if the
|
||||
just-in-time optimization feature is being used, it needs separate memory stack
|
||||
areas for each thread. See the
|
||||
just-in-time (JIT) optimization feature is being used, it needs separate memory
|
||||
stack areas for each thread. See the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
documentation for more details.
|
||||
</P>
|
||||
|
@ -596,12 +615,12 @@ thread-specific copy.
|
|||
Match blocks
|
||||
</b><br>
|
||||
<P>
|
||||
The matching functions need a block of memory for working space and for storing
|
||||
the results of a match. This includes details of what was matched, as well as
|
||||
additional information such as the name of a (*MARK) setting. Each thread must
|
||||
provide its own copy of this memory.
|
||||
The matching functions need a block of memory for storing the results of a
|
||||
match. This includes details of what was matched, as well as additional
|
||||
information such as the name of a (*MARK) setting. Each thread must provide its
|
||||
own copy of this memory.
|
||||
</P>
|
||||
<br><a name="SEC16" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
||||
<br><a name="SEC17" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
||||
<P>
|
||||
Some PCRE2 functions have a lot of parameters, many of which are used only by
|
||||
specialist applications, for example, those that use custom memory management
|
||||
|
@ -663,15 +682,15 @@ The memory used for a general context should be freed by calling:
|
|||
The compile context
|
||||
</b><br>
|
||||
<P>
|
||||
A compile context is required if you want to change the default values of any
|
||||
of the following compile-time parameters:
|
||||
A compile context is required if you want to provide an external function for
|
||||
stack checking during compilation or to change the default values of any of the
|
||||
following compile-time parameters:
|
||||
<pre>
|
||||
What \R matches (Unicode newlines or CR, LF, CRLF only)
|
||||
PCRE2's character tables
|
||||
The newline character sequence
|
||||
The compile time nested parentheses limit
|
||||
The maximum length of the pattern string
|
||||
An external function for stack checking
|
||||
</pre>
|
||||
A compile context is also required if you are using custom memory management.
|
||||
If none of these apply, just pass NULL as the context argument of
|
||||
|
@ -713,11 +732,11 @@ in the current locale.
|
|||
<b> PCRE2_SIZE <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
This sets a maximum length, in code units, for the pattern string that is to be
|
||||
compiled. If the pattern is longer, an error is generated. This facility is
|
||||
provided so that applications that accept patterns from external sources can
|
||||
limit their size. The default is the largest number that a PCRE2_SIZE variable
|
||||
can hold, which is effectively unlimited.
|
||||
This sets a maximum length, in code units, for any pattern string that is
|
||||
compiled with this context. If the pattern is longer, an error is generated.
|
||||
This facility is provided so that applications that accept patterns from
|
||||
external sources can limit their size. The default is the largest number that a
|
||||
PCRE2_SIZE variable can hold, which is effectively unlimited.
|
||||
<b>int pcre2_set_newline(pcre2_compile_context *<i>ccontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
<br>
|
||||
|
@ -729,8 +748,14 @@ sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
|
|||
PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
||||
</P>
|
||||
<P>
|
||||
When a pattern is compiled with the PCRE2_EXTENDED option, the value of this
|
||||
parameter affects the recognition of white space and the end of internal
|
||||
A pattern can override the value set in the compile context by starting with a
|
||||
sequence such as (*CRLF). See the
|
||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
page for details.
|
||||
</P>
|
||||
<P>
|
||||
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
|
||||
convention affects the recognition of white space and the end of internal
|
||||
comments starting with #. The value is saved with the compiled pattern for
|
||||
subsequent use by the JIT compiler and by the two interpreted matching
|
||||
functions, <i>pcre2_match()</i> and <i>pcre2_dfa_match()</i>.
|
||||
|
@ -764,15 +789,14 @@ zero if all is well, or non-zero to force an error.
|
|||
The match context
|
||||
</b><br>
|
||||
<P>
|
||||
A match context is required if you want to change the default values of any
|
||||
of the following match-time parameters:
|
||||
A match context is required if you want to:
|
||||
<pre>
|
||||
A callout function
|
||||
The offset limit for matching an unanchored pattern
|
||||
The limit for calling <b>match()</b> (see below)
|
||||
The limit for calling <b>match()</b> recursively
|
||||
Set up a callout function
|
||||
Set an offset limit for matching an unanchored pattern
|
||||
Change the backtracking match limit
|
||||
Change the backtracking depth limit
|
||||
Set custom memory management specifically for the match
|
||||
</pre>
|
||||
A match context is also required if you are using custom memory management.
|
||||
If none of these apply, just pass NULL as the context argument of
|
||||
<b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or <b>pcre2_jit_match()</b>.
|
||||
</P>
|
||||
|
@ -797,7 +821,7 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
|
|||
<b> void *<i>callout_data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
This sets up a "callout" function, which PCRE2 will call at specified points
|
||||
This sets up a "callout" function for PCRE2 to call at specified points
|
||||
during a matching operation. Details are given in the
|
||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||
documentation.
|
||||
|
@ -816,8 +840,8 @@ A match can never be found if the <i>startoffset</i> argument of
|
|||
limit.
|
||||
</P>
|
||||
<P>
|
||||
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when calling
|
||||
<b>pcre2_compile()</b> so that when JIT is in use, different code can be
|
||||
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
|
||||
calling <b>pcre2_compile()</b> so that when JIT is in use, different code can be
|
||||
compiled. If a match is started with a non-default match limit when
|
||||
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
||||
</P>
|
||||
|
@ -837,10 +861,10 @@ which have a very large number of possibilities in their search trees. The
|
|||
classic example is a pattern that uses nested unlimited repeats.
|
||||
</P>
|
||||
<P>
|
||||
Internally, <b>pcre2_match()</b> uses a function called <b>match()</b>, which it
|
||||
calls repeatedly (sometimes recursively). The limit set by <i>match_limit</i> is
|
||||
imposed on the number of times this function is called during a match, which
|
||||
has the effect of limiting the amount of backtracking that can take place. For
|
||||
There is an internal counter in <b>pcre2_match()</b> that is incremented each
|
||||
time round its main matching loop. If this value reaches the match limit,
|
||||
<b>pcre2_match()</b> returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
||||
the effect of limiting the amount of backtracking that can take place. For
|
||||
patterns that are not anchored, the count restarts from zero for each position
|
||||
in the subject string. This limit is not relevant to <b>pcre2_dfa_match()</b>,
|
||||
which ignores it.
|
||||
|
@ -855,8 +879,7 @@ matching can continue.
|
|||
</P>
|
||||
<P>
|
||||
The default value for the limit can be set when PCRE2 is built; the default
|
||||
default is 10 million, which handles all but the most extreme cases. If the
|
||||
limit is exceeded, <b>pcre2_match()</b> returns PCRE2_ERROR_MATCHLIMIT. A value
|
||||
default is 10 million, which handles all but the most extreme cases. A value
|
||||
for the match limit may also be supplied by an item at the start of a pattern
|
||||
of the form
|
||||
<pre>
|
||||
|
@ -865,64 +888,38 @@ of the form
|
|||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||
less than the limit set by the caller of <b>pcre2_match()</b> or, if no such
|
||||
limit is set, less than the default.
|
||||
<b>int pcre2_set_recursion_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b>int pcre2_set_depth_limit(pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> uint32_t <i>value</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
The <i>recursion_limit</i> parameter is similar to <i>match_limit</i>, but
|
||||
instead of limiting the total number of times that <b>match()</b> is called, it
|
||||
limits the depth of recursion. The recursion depth is a smaller number than the
|
||||
total number of calls, because not all calls to <b>match()</b> are recursive.
|
||||
This limit is of use only if it is set smaller than <i>match_limit</i>.
|
||||
This parameter limits the depth of nested backtracking in <b>pcre2_match()</b>.
|
||||
Each time a nested backtracking point is passed, a new memory "frame" is used
|
||||
to remember the state of matching at that point. Thus, this parameter
|
||||
indirectly limits the amount of memory that is used in a match.
|
||||
</P>
|
||||
<P>
|
||||
Limiting the recursion depth limits the amount of system stack that can be
|
||||
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
|
||||
stack, the amount of heap memory that can be used. This limit is not relevant,
|
||||
and is ignored, when matching is done using JIT compiled code. However, it is
|
||||
supported by <b>pcre2_dfa_match()</b>, which uses recursive function calls less
|
||||
frequently than <b>pcre2_match()</b>, but which can be caused to use a lot of
|
||||
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
|
||||
This limit is not relevant, and is ignored, when matching is done using JIT
|
||||
compiled code. However, it is supported by <b>pcre2_dfa_match()</b>, which uses
|
||||
it to limit the depth of internal recursive function calls that implement
|
||||
lookaround assertions and pattern recursions. This is, therefore, an indirect
|
||||
limit on the amount of system stack that is used. A recursive pattern such as
|
||||
/(.)(?1)/, when matched to a very long string using <b>pcre2_dfa_match()</b>,
|
||||
can use a great deal of stack.
|
||||
</P>
|
||||
<P>
|
||||
The default value for <i>recursion_limit</i> can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for <i>match_limit</i>. If the
|
||||
limit is exceeded, <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> return
|
||||
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
|
||||
supplied by an item at the start of a pattern of the form
|
||||
The default value for the depth limit can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for the match limit. If the
|
||||
limit is exceeded, <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b> returns
|
||||
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
||||
item at the start of a pattern of the form
|
||||
<pre>
|
||||
(*LIMIT_RECURSION=ddd)
|
||||
(*LIMIT_DEPTH=ddd)
|
||||
</pre>
|
||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||
less than the limit set by the caller of <b>pcre2_match()</b> or
|
||||
<b>pcre2_dfa_match()</b> or, if no such limit is set, less than the default.
|
||||
<b>int pcre2_set_recursion_memory_management(</b>
|
||||
<b> pcre2_match_context *<i>mcontext</i>,</b>
|
||||
<b> void *(*<i>private_malloc</i>)(PCRE2_SIZE, void *),</b>
|
||||
<b> void (*<i>private_free</i>)(void *, void *), void *<i>memory_data</i>);</b>
|
||||
<br>
|
||||
<br>
|
||||
This function sets up two additional custom memory management functions for use
|
||||
by <b>pcre2_match()</b> when PCRE2 is compiled to use the heap for remembering
|
||||
backtracking data, instead of recursive function calls that use the system
|
||||
stack. There is a discussion about PCRE2's stack usage in the
|
||||
<a href="pcre2stack.html"><b>pcre2stack</b></a>
|
||||
documentation. See the
|
||||
<a href="pcre2build.html"><b>pcre2build</b></a>
|
||||
documentation for details of how to build PCRE2.
|
||||
</P>
|
||||
<P>
|
||||
Using the heap for recursion is a non-standard way of building PCRE2, for use
|
||||
in environments that have limited stacks. Because of the greater use of memory
|
||||
management, <b>pcre2_match()</b> runs more slowly. Functions that are different
|
||||
to the general custom memory functions are provided so that special-purpose
|
||||
external code can be used for this case, because the memory blocks are all the
|
||||
same size. The blocks are retained by <b>pcre2_match()</b> until it is about to
|
||||
exit so that they can be re-used when possible during the match. In the absence
|
||||
of these functions, the normal custom memory management functions are used, if
|
||||
supplied, otherwise the system functions.
|
||||
</P>
|
||||
<br><a name="SEC17" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
||||
<br><a name="SEC18" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_config(uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
|
@ -954,6 +951,13 @@ sequences the \R escape sequence matches by default. A value of
|
|||
PCRE2_BSR_UNICODE means that \R matches any Unicode line ending sequence; a
|
||||
value of PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF. The
|
||||
default can be overridden when a pattern is compiled.
|
||||
<pre>
|
||||
PCRE2_CONFIG_DEPTHLIMIT
|
||||
</pre>
|
||||
The output is a uint32_t integer that gives the default limit for the depth of
|
||||
nested backtracking in <b>pcre2_match()</b> or the depth of nested recursions
|
||||
and lookarounds in <b>pcre2_dfa_match()</b>. Further details are given with
|
||||
<b>pcre2_set_depth_limit()</b> above.
|
||||
<pre>
|
||||
PCRE2_CONFIG_JIT
|
||||
</pre>
|
||||
|
@ -989,9 +993,9 @@ be compiled by those two libraries, but at the expense of slower matching.
|
|||
<pre>
|
||||
PCRE2_CONFIG_MATCHLIMIT
|
||||
</pre>
|
||||
The output is a uint32_t integer that gives the default limit for the number of
|
||||
internal matching function calls in a <b>pcre2_match()</b> execution. Further
|
||||
details are given with <b>pcre2_match()</b> below.
|
||||
The output is a uint32_t integer that gives the default match limit for
|
||||
<b>pcre2_match()</b>. Further details are given with
|
||||
<b>pcre2_set_match_limit()</b> above.
|
||||
<pre>
|
||||
PCRE2_CONFIG_NEWLINE
|
||||
</pre>
|
||||
|
@ -1015,20 +1019,11 @@ amount of system stack used when a pattern is compiled. It is specified when
|
|||
PCRE2 is built; the default is 250. This limit does not take into account the
|
||||
stack that may already be used by the calling application. For finer control
|
||||
over compilation stack usage, see <b>pcre2_set_compile_recursion_guard()</b>.
|
||||
<pre>
|
||||
PCRE2_CONFIG_RECURSIONLIMIT
|
||||
</pre>
|
||||
The output is a uint32_t integer that gives the default limit for the depth of
|
||||
recursion when calling the internal matching function in a <b>pcre2_match()</b>
|
||||
execution. Further details are given with <b>pcre2_match()</b> below.
|
||||
<pre>
|
||||
PCRE2_CONFIG_STACKRECURSE
|
||||
</pre>
|
||||
The output is a uint32_t integer that is set to one if internal recursion when
|
||||
running <b>pcre2_match()</b> is implemented by recursive function calls that use
|
||||
the system stack to remember their state. This is the usual way that PCRE2 is
|
||||
compiled. The output is zero if PCRE2 was compiled to use blocks of data on the
|
||||
heap instead of recursive function calls.
|
||||
This parameter is obsolete and should not be used in new code. The output is a
|
||||
uint32_t integer that is always set to zero.
|
||||
<pre>
|
||||
PCRE2_CONFIG_UNICODE_VERSION
|
||||
</pre>
|
||||
|
@ -1047,14 +1042,14 @@ available; otherwise it is set to zero. Unicode support implies UTF support.
|
|||
<pre>
|
||||
PCRE2_CONFIG_VERSION
|
||||
</pre>
|
||||
The <i>where</i> argument should point to a buffer that is at least 12 code
|
||||
The <i>where</i> argument should point to a buffer that is at least 24 code
|
||||
units long. (The exact length required can be found by calling
|
||||
<b>pcre2_config()</b> with <b>where</b> set to NULL.) The buffer is filled with
|
||||
the PCRE2 version string, zero-terminated. The number of code units used is
|
||||
returned. This is the length of the string plus one unit for the terminating
|
||||
zero.
|
||||
<a name="compiling"></a></P>
|
||||
<br><a name="SEC18" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||
<br><a name="SEC19" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||
<P>
|
||||
<b>pcre2_code *pcre2_compile(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
|
||||
<b> uint32_t <i>options</i>, int *<i>errorcode</i>, PCRE2_SIZE *<i>erroroffset,</i></b>
|
||||
|
@ -1240,13 +1235,14 @@ option is set, normal backslash processing is applied to verb names and only an
|
|||
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
||||
included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
|
||||
option is set, unescaped whitespace in verb names is skipped and #-comments are
|
||||
recognized, exactly as in the rest of the pattern.
|
||||
recognized in this mode, exactly as in the rest of the pattern.
|
||||
<pre>
|
||||
PCRE2_AUTO_CALLOUT
|
||||
</pre>
|
||||
If this bit is set, <b>pcre2_compile()</b> automatically inserts callout items,
|
||||
all with number 255, before each pattern item, except immediately before or
|
||||
after a callout in the pattern. For discussion of the callout facility, see the
|
||||
after an explicit callout in the pattern. For discussion of the callout
|
||||
facility, see the
|
||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||
documentation.
|
||||
<pre>
|
||||
|
@ -1472,9 +1468,8 @@ and
|
|||
<a href="pcre2unicode.html#utf32strings">UTF-32 strings</a>
|
||||
in the
|
||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||
document.
|
||||
If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a negative
|
||||
error code.
|
||||
document. If an invalid UTF sequence is found, <b>pcre2_compile()</b> returns a
|
||||
negative error code.
|
||||
</P>
|
||||
<P>
|
||||
If you know that your pattern is valid, and you want to skip this check for
|
||||
|
@ -1495,7 +1490,7 @@ in the
|
|||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||
page. If you set PCRE2_UCP, matching one of the items it affects takes much
|
||||
longer. The option is available only if PCRE2 has been compiled with Unicode
|
||||
support.
|
||||
support (which is the default).
|
||||
<pre>
|
||||
PCRE2_UNGREEDY
|
||||
</pre>
|
||||
|
@ -1525,9 +1520,9 @@ the behaviour of PCRE2 are given in the
|
|||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||
page.
|
||||
</P>
|
||||
<br><a name="SEC19" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
||||
<br><a name="SEC20" href="#TOC1">COMPILATION ERROR CODES</a><br>
|
||||
<P>
|
||||
There are over 80 positive error codes that <b>pcre2_compile()</b> may return
|
||||
There are nearly 100 positive error codes that <b>pcre2_compile()</b> may return
|
||||
(via <i>errorcode</i>) if it finds an error in the pattern. There are also some
|
||||
negative error codes that are used for invalid UTF strings. These are the same
|
||||
as given by <b>pcre2_match()</b> and <b>pcre2_dfa_match()</b>, and are described
|
||||
|
@ -1538,7 +1533,7 @@ error message"
|
|||
<a href="#geterrormessage">below)</a>
|
||||
can be called to obtain a textual error message from any error code.
|
||||
<a name="jitcompiling"></a></P>
|
||||
<br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
||||
<br><a name="SEC21" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
||||
<P>
|
||||
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
|
||||
<br>
|
||||
|
@ -1574,18 +1569,18 @@ documentation.
|
|||
JIT compilation is a heavyweight optimization. It can take some time for
|
||||
patterns to be analyzed, and for one-off matches and simple patterns the
|
||||
benefit of faster execution might be offset by a much slower compilation time.
|
||||
Most, but not all patterns can be optimized by the JIT compiler.
|
||||
Most (but not all) patterns can be optimized by the JIT compiler.
|
||||
<a name="localesupport"></a></P>
|
||||
<br><a name="SEC21" href="#TOC1">LOCALE SUPPORT</a><br>
|
||||
<br><a name="SEC22" href="#TOC1">LOCALE SUPPORT</a><br>
|
||||
<P>
|
||||
PCRE2 handles caseless matching, and determines whether characters are letters,
|
||||
digits, or whatever, by reference to a set of tables, indexed by character code
|
||||
point. This applies only to characters whose code points are less than 256. By
|
||||
default, higher-valued code points never match escapes such as \w or \d.
|
||||
However, if PCRE2 is built with UTF support, all characters can be tested with
|
||||
\p and \P, or, alternatively, the PCRE2_UCP option can be set when a pattern
|
||||
is compiled; this causes \w and friends to use Unicode property support
|
||||
instead of the built-in tables.
|
||||
However, if PCRE2 is built with Unicode support, all characters can be tested
|
||||
with \p and \P, or, alternatively, the PCRE2_UCP option can be set when a
|
||||
pattern is compiled; this causes \w and friends to use Unicode property
|
||||
support instead of the built-in tables.
|
||||
</P>
|
||||
<P>
|
||||
The use of locales with Unicode is discouraged. If you are handling characters
|
||||
|
@ -1629,10 +1624,10 @@ available for as long as it is needed.
|
|||
The pointer that is passed (via the compile context) to <b>pcre2_compile()</b>
|
||||
is saved with the compiled pattern, and the same tables are used by
|
||||
<b>pcre2_match()</b> and <b>pcre_dfa_match()</b>. Thus, for any single pattern,
|
||||
compilation, and matching all happen in the same locale, but different patterns
|
||||
compilation and matching both happen in the same locale, but different patterns
|
||||
can be processed in different locales.
|
||||
<a name="infoaboutpattern"></a></P>
|
||||
<br><a name="SEC22" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
||||
<br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A COMPILED PATTERN</a><br>
|
||||
<P>
|
||||
<b>int pcre2_pattern_info(const pcre2 *<i>code</i>, uint32_t <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
|
@ -1645,7 +1640,7 @@ pattern. The second argument specifies which piece of information is required,
|
|||
and the third argument is a pointer to a variable to receive the data. If the
|
||||
third argument is NULL, the first argument is ignored, and the function returns
|
||||
the size in bytes of the variable that is required for the information
|
||||
requested. Otherwise, The yield of the function is zero for success, or one of
|
||||
requested. Otherwise, the yield of the function is zero for success, or one of
|
||||
the following negative numbers:
|
||||
<pre>
|
||||
PCRE2_ERROR_NULL the argument <i>code</i> was NULL
|
||||
|
@ -1698,8 +1693,8 @@ following are true:
|
|||
.* is not in an atomic group
|
||||
.* is not in a capturing group that is the subject of a back reference
|
||||
PCRE2_DOTALL is in force for .*
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern.
|
||||
PCRE2_NO_DOTSTAR_ANCHOR is not set.
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
||||
</pre>
|
||||
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
|
||||
options returned for PCRE2_INFO_ALLOPTIONS.
|
||||
|
@ -1726,6 +1721,13 @@ matches only CR, LF, or CRLF.
|
|||
Return the highest capturing subpattern number in the pattern. In patterns
|
||||
where (?| is not used, this is also the total number of capturing subpatterns.
|
||||
The third argument should point to an <b>uint32_t</b> variable.
|
||||
<pre>
|
||||
PCRE2_INFO_DEPTHLIMIT
|
||||
</pre>
|
||||
If the pattern set a backtracking depth limit by including an item of the form
|
||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||
call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
|
||||
<pre>
|
||||
PCRE2_INFO_FIRSTBITMAP
|
||||
</pre>
|
||||
|
@ -1757,6 +1759,14 @@ argument should point to an <b>uint32_t</b> variable. In the 8-bit library, the
|
|||
value is always less than 256. In the 16-bit library the value can be up to
|
||||
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
|
||||
and up to 0xffffffff when not using UTF-32 mode.
|
||||
<pre>
|
||||
PCRE2_INFO_FRAMESIZE
|
||||
</pre>
|
||||
Return the size (in bytes) of the data frames that are used to remember
|
||||
backtracking positions when the pattern is processed by <b>pcre2_match()</b>
|
||||
without the use of JIT. The third argument should point to an <b>size_t</b>
|
||||
variable. The frame size depends on the number of capturing parentheses in the
|
||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||
<pre>
|
||||
PCRE2_INFO_HASBACKSLASHC
|
||||
</pre>
|
||||
|
@ -1767,7 +1777,8 @@ argument should point to an <b>uint32_t</b> variable.
|
|||
</pre>
|
||||
Return 1 if the pattern contains any explicit matches for CR or LF characters,
|
||||
otherwise 0. The third argument should point to an <b>uint32_t</b> variable. An
|
||||
explicit match is either a literal CR or LF character, or \r or \n.
|
||||
explicit match is either a literal CR or LF character, or \r or \n or one of
|
||||
the equivalent hexadecimal or octal escape sequences.
|
||||
<pre>
|
||||
PCRE2_INFO_JCHANGED
|
||||
</pre>
|
||||
|
@ -1904,7 +1915,7 @@ different for each compiled pattern.
|
|||
<pre>
|
||||
PCRE2_INFO_NEWLINE
|
||||
</pre>
|
||||
The output is a <b>uint32_t</b> with one of the following values:
|
||||
The output is one of the following <b>uint32_t</b> values:
|
||||
<pre>
|
||||
PCRE2_NEWLINE_CR Carriage return (CR)
|
||||
PCRE2_NEWLINE_LF Linefeed (LF)
|
||||
|
@ -1912,15 +1923,8 @@ The output is a <b>uint32_t</b> with one of the following values:
|
|||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||
</pre>
|
||||
This specifies the default character sequence that will be recognized as
|
||||
meaning "newline" while matching.
|
||||
<pre>
|
||||
PCRE2_INFO_RECURSIONLIMIT
|
||||
</pre>
|
||||
If the pattern set a recursion limit by including an item of the form
|
||||
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
|
||||
argument should point to an unsigned 32-bit integer. If no such value has been
|
||||
set, the call to <b>pcre2_pattern_info()</b> returns the error PCRE2_ERROR_UNSET.
|
||||
This identifies the character sequence that will be recognized as meaning
|
||||
"newline" while matching.
|
||||
<pre>
|
||||
PCRE2_INFO_SIZE
|
||||
</pre>
|
||||
|
@ -1933,7 +1937,7 @@ value returned by this option, because there are cases where the code that
|
|||
calculates the size has to over-estimate. Processing a pattern with the JIT
|
||||
compiler does not alter the value returned by this option.
|
||||
<a name="infoaboutcallouts"></a></P>
|
||||
<br><a name="SEC23" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br>
|
||||
<br><a name="SEC24" href="#TOC1">INFORMATION ABOUT A PATTERN'S CALLOUTS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_callout_enumerate(const pcre2_code *<i>code</i>,</b>
|
||||
<b> int (*<i>callback</i>)(pcre2_callout_enumerate_block *, void *),</b>
|
||||
|
@ -1952,7 +1956,7 @@ contents of the callout enumeration block are described in the
|
|||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||
documentation, which also gives further details about callouts.
|
||||
</P>
|
||||
<br><a name="SEC24" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
|
||||
<br><a name="SEC25" href="#TOC1">SERIALIZATION AND PRECOMPILING</a><br>
|
||||
<P>
|
||||
It is possible to save compiled patterns on disc or elsewhere, and reload them
|
||||
later, subject to a number of restrictions. The functions whose names begin
|
||||
|
@ -1961,7 +1965,7 @@ the
|
|||
<a href="pcre2serialize.html"><b>pcre2serialize</b></a>
|
||||
documentation.
|
||||
<a name="matchdatablock"></a></P>
|
||||
<br><a name="SEC25" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
||||
<br><a name="SEC26" href="#TOC1">THE MATCH DATA BLOCK</a><br>
|
||||
<P>
|
||||
<b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
|
||||
<b> pcre2_general_context *<i>gcontext</i>);</b>
|
||||
|
@ -1986,9 +1990,9 @@ Before calling <b>pcre2_match()</b>, <b>pcre2_dfa_match()</b>, or
|
|||
the creation functions above. For <b>pcre2_match_data_create()</b>, the first
|
||||
argument is the number of pairs of offsets in the <i>ovector</i>. One pair of
|
||||
offsets is required to identify the string that matched the whole pattern, with
|
||||
another pair for each captured substring. For example, a value of 4 creates
|
||||
enough space to record the matched portion of the subject plus three captured
|
||||
substrings. A minimum of at least 1 pair is imposed by
|
||||
an additional pair for each captured substring. For example, a value of 4
|
||||
creates enough space to record the matched portion of the subject plus three
|
||||
captured substrings. A minimum of at least 1 pair is imposed by
|
||||
<b>pcre2_match_data_create()</b>, so it is always possible to return the overall
|
||||
matched string.
|
||||
</P>
|
||||
|
@ -2032,7 +2036,7 @@ match data block (for that match) have taken place.
|
|||
When a match data block itself is no longer needed, it should be freed by
|
||||
calling <b>pcre2_match_data_free()</b>.
|
||||
</P>
|
||||
<br><a name="SEC26" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
||||
<br><a name="SEC27" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
|
||||
<P>
|
||||
<b>int pcre2_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||
|
@ -2126,9 +2130,11 @@ character is CR followed by LF, advance the starting offset by two characters
|
|||
instead of one.
|
||||
</P>
|
||||
<P>
|
||||
If a non-zero starting offset is passed when the pattern is anchored, one
|
||||
If a non-zero starting offset is passed when the pattern is anchored, an single
|
||||
attempt to match at the given offset is made. This can only succeed if the
|
||||
pattern does not require the match to be at the start of the subject.
|
||||
pattern does not require the match to be at the start of the subject. In other
|
||||
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
|
||||
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \A.
|
||||
<a name="matchoptions"></a></P>
|
||||
<br><b>
|
||||
Option bits for <b>pcre2_match()</b>
|
||||
|
@ -2142,9 +2148,9 @@ described below.
|
|||
</P>
|
||||
<P>
|
||||
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
|
||||
compiler. If it is set, JIT matching is disabled and the normal interpretive
|
||||
code in <b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT (obviously), the
|
||||
remaining options are supported for JIT matching.
|
||||
compiler. If it is set, JIT matching is disabled and the interpretive code in
|
||||
<b>pcre2_match()</b> is run. Apart from PCRE2_NO_JIT (obviously), the remaining
|
||||
options are supported for JIT matching.
|
||||
<pre>
|
||||
PCRE2_ANCHORED
|
||||
</pre>
|
||||
|
@ -2229,13 +2235,13 @@ page.
|
|||
If you know that your subject is valid, and you want to skip these checks for
|
||||
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
|
||||
<b>pcre2_match()</b>. You might want to do this for the second and subsequent
|
||||
calls to <b>pcre2_match()</b> if you are making repeated calls to find all the
|
||||
matches in a single subject string.
|
||||
calls to <b>pcre2_match()</b> if you are making repeated calls to find other
|
||||
matches in the same subject string.
|
||||
</P>
|
||||
<P>
|
||||
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid string
|
||||
as a subject, or an invalid value of <i>startoffset</i>, is undefined. Your
|
||||
program may crash or loop indefinitely.
|
||||
WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
|
||||
string as a subject, or an invalid value of <i>startoffset</i>, is undefined.
|
||||
Your program may crash or loop indefinitely.
|
||||
<pre>
|
||||
PCRE2_PARTIAL_HARD
|
||||
PCRE2_PARTIAL_SOFT
|
||||
|
@ -2262,7 +2268,7 @@ examples, in the
|
|||
<a href="pcre2partial.html"><b>pcre2partial</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC27" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
||||
<br><a name="SEC28" href="#TOC1">NEWLINE HANDLING WHEN MATCHING</a><br>
|
||||
<P>
|
||||
When PCRE2 is built, a default newline convention is set; this is usually the
|
||||
standard convention for the operating system. The default can be overridden in
|
||||
|
@ -2294,15 +2300,15 @@ reference, and so advances only by one character after the first failure.
|
|||
</P>
|
||||
<P>
|
||||
An explicit match for CR of LF is either a literal appearance of one of those
|
||||
characters in the pattern, or one of the \r or \n escape sequences. Implicit
|
||||
matches such as [^X] do not count, nor does \s, even though it includes CR and
|
||||
LF in the characters that it matches.
|
||||
characters in the pattern, or one of the \r or \n or equivalent octal or
|
||||
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
|
||||
does \s, even though it includes CR and LF in the characters that it matches.
|
||||
</P>
|
||||
<P>
|
||||
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
||||
valid newline sequence and explicit \r or \n escapes appear in the pattern.
|
||||
<a name="matchedstrings"></a></P>
|
||||
<br><a name="SEC28" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
||||
<br><a name="SEC29" href="#TOC1">HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS</a><br>
|
||||
<P>
|
||||
<b>uint32_t pcre2_get_ovector_count(pcre2_match_data *<i>match_data</i>);</b>
|
||||
<br>
|
||||
|
@ -2352,12 +2358,12 @@ identify the part of the subject that was partially matched. See the
|
|||
documentation for details of partial matching.
|
||||
</P>
|
||||
<P>
|
||||
After a successful match, the first pair of offsets identifies the portion of
|
||||
the subject string that was matched by the entire pattern. The next pair is
|
||||
used for the first capturing subpattern, and so on. The value returned by
|
||||
After a fully successful match, the first pair of offsets identifies the
|
||||
portion of the subject string that was matched by the entire pattern. The next
|
||||
pair is used for the first captured substring, and so on. The value returned by
|
||||
<b>pcre2_match()</b> is one more than the highest numbered pair that has been
|
||||
set. For example, if two substrings have been captured, the returned value is
|
||||
3. If there are no capturing subpatterns, the return value from a successful
|
||||
3. If there are no captured substrings, the return value from a successful
|
||||
match is 1, indicating that just the first pair of offsets has been set.
|
||||
</P>
|
||||
<P>
|
||||
|
@ -2375,11 +2381,7 @@ returned.
|
|||
If the ovector is too small to hold all the captured substring offsets, as much
|
||||
as possible is filled in, and the function returns a value of zero. If captured
|
||||
substrings are not of interest, <b>pcre2_match()</b> may be called with a match
|
||||
data block whose ovector is of minimum length (that is, one pair). However, if
|
||||
the pattern contains back references and the <i>ovector</i> is not big enough to
|
||||
remember the related substrings, PCRE2 has to get additional memory for use
|
||||
during matching. Thus it is usually advisable to set up a match data block
|
||||
containing an ovector of reasonable size.
|
||||
data block whose ovector is of minimum length (that is, one pair).
|
||||
</P>
|
||||
<P>
|
||||
It is possible for capturing subpattern number <i>n+1</i> to match some part of
|
||||
|
@ -2405,7 +2407,7 @@ parentheses, no more than <i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by
|
|||
<b>pcre2_match()</b>. The other elements retain whatever values they previously
|
||||
had.
|
||||
<a name="matchotherdata"></a></P>
|
||||
<br><a name="SEC29" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
||||
<br><a name="SEC30" href="#TOC1">OTHER INFORMATION ABOUT A MATCH</a><br>
|
||||
<P>
|
||||
<b>PCRE2_SPTR pcre2_get_mark(pcre2_match_data *<i>match_data</i>);</b>
|
||||
<br>
|
||||
|
@ -2455,7 +2457,7 @@ the code unit offset of the invalid UTF character. Details are given in the
|
|||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||
page.
|
||||
<a name="errorlist"></a></P>
|
||||
<br><a name="SEC30" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
||||
<br><a name="SEC31" href="#TOC1">ERROR RETURNS FROM <b>pcre2_match()</b></a><br>
|
||||
<P>
|
||||
If <b>pcre2_match()</b> fails, it returns a negative number. This can be
|
||||
converted to a text string by calling the <b>pcre2_get_error_message()</b>
|
||||
|
@ -2487,8 +2489,9 @@ returned when the magic number is not present.
|
|||
<pre>
|
||||
PCRE2_ERROR_BADMODE
|
||||
</pre>
|
||||
This error is given when a pattern that was compiled by the 8-bit library is
|
||||
passed to a 16-bit or 32-bit library function, or vice versa.
|
||||
This error is given when a compiled pattern is passed to a function in a
|
||||
library of a different code unit width, for example, a pattern compiled by
|
||||
the 8-bit library is passed to a 16-bit or 32-bit library function.
|
||||
<pre>
|
||||
PCRE2_ERROR_BADOFFSET
|
||||
</pre>
|
||||
|
@ -2512,20 +2515,15 @@ use by callout functions that want to cause <b>pcre2_match()</b> or
|
|||
<b>pcre2_callout_enumerate()</b> to return a distinctive error code. See the
|
||||
<a href="pcre2callout.html"><b>pcre2callout</b></a>
|
||||
documentation for details.
|
||||
<pre>
|
||||
PCRE2_ERROR_DEPTHLIMIT
|
||||
</pre>
|
||||
The nested backtracking depth limit was reached.
|
||||
<pre>
|
||||
PCRE2_ERROR_INTERNAL
|
||||
</pre>
|
||||
An unexpected internal error has occurred. This error could be caused by a bug
|
||||
in PCRE2 or by overwriting of the compiled pattern.
|
||||
<pre>
|
||||
PCRE2_ERROR_JIT_BADOPTION
|
||||
</pre>
|
||||
This error is returned when a pattern that was successfully studied using JIT
|
||||
is being matched, but the matching mode (partial or complete match) does not
|
||||
correspond to any JIT compilation mode. When the JIT fast path function is
|
||||
used, this error may be also given for invalid options. See the
|
||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||
documentation for more details.
|
||||
<pre>
|
||||
PCRE2_ERROR_JIT_STACKLIMIT
|
||||
</pre>
|
||||
|
@ -2537,15 +2535,13 @@ documentation for more details.
|
|||
<pre>
|
||||
PCRE2_ERROR_MATCHLIMIT
|
||||
</pre>
|
||||
The backtracking limit was reached.
|
||||
The backtracking match limit was reached.
|
||||
<pre>
|
||||
PCRE2_ERROR_NOMEMORY
|
||||
</pre>
|
||||
If a pattern contains back references, but the ovector is not big enough to
|
||||
remember the referenced substrings, PCRE2 gets a block of memory at the start
|
||||
of matching to use for this purpose. There are some other special cases where
|
||||
extra memory is needed during matching. This error is given when memory cannot
|
||||
be obtained.
|
||||
If a pattern contains many nested backtracking points, heap memory is used to
|
||||
remember them. This error is given when the memory allocation function (default
|
||||
or custom) fails.
|
||||
<pre>
|
||||
PCRE2_ERROR_NULL
|
||||
</pre>
|
||||
|
@ -2561,12 +2557,8 @@ in the subject string. Some simple patterns that might do this are detected and
|
|||
faulted at compile time, but more complicated cases, in particular mutual
|
||||
recursions between two different subpatterns, cannot be detected until matching
|
||||
is attempted.
|
||||
<pre>
|
||||
PCRE2_ERROR_RECURSIONLIMIT
|
||||
</pre>
|
||||
The internal recursion limit was reached.
|
||||
<a name="geterrormessage"></a></P>
|
||||
<br><a name="SEC31" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
|
||||
<br><a name="SEC32" href="#TOC1">OBTAINING A TEXTUAL ERROR MESSAGE</a><br>
|
||||
<P>
|
||||
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
|
||||
<b> PCRE2_SIZE <i>bufflen</i>);</b>
|
||||
|
@ -2587,7 +2579,7 @@ returned. If the buffer is too small, the message is truncated (but still with
|
|||
a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned.
|
||||
None of the messages are very long; a buffer size of 120 code units is ample.
|
||||
<a name="extractbynumber"></a></P>
|
||||
<br><a name="SEC32" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
||||
<br><a name="SEC33" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_length_bynumber(pcre2_match_data *<i>match_data</i>,</b>
|
||||
<b> uint32_t <i>number</i>, PCRE2_SIZE *<i>length</i>);</b>
|
||||
|
@ -2684,7 +2676,7 @@ The substring did not participate in the match. For example, if the pattern is
|
|||
(abc)|(def) and the subject is "def", and the ovector contains at least two
|
||||
capturing slots, substring number 1 is unset.
|
||||
</P>
|
||||
<br><a name="SEC33" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
||||
<br><a name="SEC34" href="#TOC1">EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_list_get(pcre2_match_data *<i>match_data</i>,</b>
|
||||
<b>" PCRE2_UCHAR ***<i>listptr</i>, PCRE2_SIZE **<i>lengthsptr</i>);</b>
|
||||
|
@ -2723,7 +2715,7 @@ can be distinguished from a genuine zero-length substring by inspecting the
|
|||
appropriate offset in the ovector, which contain PCRE2_UNSET for unset
|
||||
substrings, or by calling <b>pcre2_substring_length_bynumber()</b>.
|
||||
<a name="extractbyname"></a></P>
|
||||
<br><a name="SEC34" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
||||
<br><a name="SEC35" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
|
||||
<b> PCRE2_SPTR <i>name</i>);</b>
|
||||
|
@ -2755,8 +2747,8 @@ calling <b>pcre2_substring_number_from_name()</b>. The first argument is the
|
|||
compiled pattern, and the second is the name. The yield of the function is the
|
||||
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
|
||||
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
|
||||
that name. Given the number, you can extract the substring directly, or use one
|
||||
of the functions described above.
|
||||
that name. Given the number, you can extract the substring directly from the
|
||||
ovector, or use one of the "bynumber" functions described above.
|
||||
</P>
|
||||
<P>
|
||||
For convenience, there are also "byname" functions that correspond to the
|
||||
|
@ -2783,7 +2775,7 @@ names are not included in the compiled code. The matching process uses only
|
|||
numbers. For this reason, the use of different names for subpatterns of the
|
||||
same number causes an error at compile time.
|
||||
</P>
|
||||
<br><a name="SEC35" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
||||
<br><a name="SEC36" href="#TOC1">CREATING A NEW STRING WITH SUBSTITUTIONS</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substitute(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||
|
@ -2990,7 +2982,7 @@ obtained by calling the <b>pcre2_get_error_message()</b> function (see
|
|||
"Obtaining a textual error message"
|
||||
<a href="#geterrormessage">above).</a>
|
||||
</P>
|
||||
<br><a name="SEC36" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
||||
<br><a name="SEC37" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
|
||||
<P>
|
||||
<b>int pcre2_substring_nametable_scan(const pcre2_code *<i>code</i>,</b>
|
||||
<b> PCRE2_SPTR <i>name</i>, PCRE2_SPTR *<i>first</i>, PCRE2_SPTR *<i>last</i>);</b>
|
||||
|
@ -3035,7 +3027,7 @@ in the section entitled <i>Information about a pattern</i>. Given all the
|
|||
relevant entries for the name, you can extract each of their numbers, and hence
|
||||
the captured data.
|
||||
</P>
|
||||
<br><a name="SEC37" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
||||
<br><a name="SEC38" href="#TOC1">FINDING ALL POSSIBLE MATCHES AT ONE POSITION</a><br>
|
||||
<P>
|
||||
The traditional matching function uses a similar algorithm to Perl, which stops
|
||||
when it finds the first match at a given point in the subject. If you want to
|
||||
|
@ -3053,7 +3045,7 @@ substring. Then return 1, which forces <b>pcre2_match()</b> to backtrack and try
|
|||
other alternatives. Ultimately, when it runs out of matches,
|
||||
<b>pcre2_match()</b> will yield PCRE2_ERROR_NOMATCH.
|
||||
<a name="dfamatch"></a></P>
|
||||
<br><a name="SEC38" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
||||
<br><a name="SEC39" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
|
||||
<P>
|
||||
<b>int pcre2_dfa_match(const pcre2_code *<i>code</i>, PCRE2_SPTR <i>subject</i>,</b>
|
||||
<b> PCRE2_SIZE <i>length</i>, PCRE2_SIZE <i>startoffset</i>,</b>
|
||||
|
@ -3064,11 +3056,12 @@ other alternatives. Ultimately, when it runs out of matches,
|
|||
<P>
|
||||
The function <b>pcre2_dfa_match()</b> is called to match a subject string
|
||||
against a compiled pattern, using a matching algorithm that scans the subject
|
||||
string just once, and does not backtrack. This has different characteristics to
|
||||
the normal algorithm, and is not compatible with Perl. Some of the features of
|
||||
PCRE2 patterns are not supported. Nevertheless, there are times when this kind
|
||||
of matching can be useful. For a discussion of the two matching algorithms, and
|
||||
a list of features that <b>pcre2_dfa_match()</b> does not support, see the
|
||||
string just once (not counting lookaround assertions), and does not backtrack.
|
||||
This has different characteristics to the normal algorithm, and is not
|
||||
compatible with Perl. Some of the features of PCRE2 patterns are not supported.
|
||||
Nevertheless, there are times when this kind of matching can be useful. For a
|
||||
discussion of the two matching algorithms, and a list of features that
|
||||
<b>pcre2_dfa_match()</b> does not support, see the
|
||||
<a href="pcre2matching.html"><b>pcre2matching</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
|
@ -3248,13 +3241,13 @@ some plausibility checks are made on the contents of the workspace, which
|
|||
should contain data about the previous partial match. If any of these checks
|
||||
fail, this error is given.
|
||||
</P>
|
||||
<br><a name="SEC39" href="#TOC1">SEE ALSO</a><br>
|
||||
<br><a name="SEC40" href="#TOC1">SEE ALSO</a><br>
|
||||
<P>
|
||||
<b>pcre2build</b>(3), <b>pcre2callout</b>(3), <b>pcre2demo(3)</b>,
|
||||
<b>pcre2matching</b>(3), <b>pcre2partial</b>(3), <b>pcre2posix</b>(3),
|
||||
<b>pcre2sample</b>(3), <b>pcre2stack</b>(3), <b>pcre2unicode</b>(3).
|
||||
</P>
|
||||
<br><a name="SEC40" href="#TOC1">AUTHOR</a><br>
|
||||
<br><a name="SEC41" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
|
@ -3263,9 +3256,9 @@ University Computing Service
|
|||
Cambridge, England.
|
||||
<br>
|
||||
</P>
|
||||
<br><a name="SEC41" href="#TOC1">REVISION</a><br>
|
||||
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 21 March 2017
|
||||
Last updated: 27 March 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
421
doc/pcre2.txt
421
doc/pcre2.txt
|
@ -281,19 +281,14 @@ PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS
|
|||
int (*callout_function)(pcre2_callout_block *, void *),
|
||||
void *callout_data);
|
||||
|
||||
int pcre2_set_match_limit(pcre2_match_context *mcontext,
|
||||
uint32_t value);
|
||||
|
||||
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
|
||||
PCRE2_SIZE value);
|
||||
|
||||
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
|
||||
int pcre2_set_match_limit(pcre2_match_context *mcontext,
|
||||
uint32_t value);
|
||||
|
||||
int pcre2_set_recursion_memory_management(
|
||||
pcre2_match_context *mcontext,
|
||||
void *(*private_malloc)(PCRE2_SIZE, void *),
|
||||
void (*private_free)(void *, void *), void *memory_data);
|
||||
int pcre2_set_depth_limit(pcre2_match_context *mcontext,
|
||||
uint32_t value);
|
||||
|
||||
|
||||
PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS
|
||||
|
@ -397,6 +392,22 @@ PCRE2 NATIVE API AUXILIARY FUNCTIONS
|
|||
int pcre2_config(uint32_t what, void *where);
|
||||
|
||||
|
||||
PCRE2 NATIVE API OBSOLETE FUNCTIONS
|
||||
|
||||
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
|
||||
uint32_t value);
|
||||
|
||||
int pcre2_set_recursion_memory_management(
|
||||
pcre2_match_context *mcontext,
|
||||
void *(*private_malloc)(PCRE2_SIZE, void *),
|
||||
void (*private_free)(void *, void *), void *memory_data);
|
||||
|
||||
These functions became obsolete at release 10.30 and are retained only
|
||||
for backward compatibility. They should not be used in new code. The
|
||||
first is replaced by pcre2_set_depth_limit(); the second is no longer
|
||||
needed and no longer has any effect (it always returns zero).
|
||||
|
||||
|
||||
PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
|
||||
|
||||
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit
|
||||
|
@ -449,7 +460,7 @@ PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
|
|||
when processing any particular pattern to use only functions from a
|
||||
single library. For example, if you want to run a match using a pat-
|
||||
tern that was compiled with pcre2_compile_16(), you must do so with
|
||||
pcre2_match_16(), not pcre2_match_8().
|
||||
pcre2_match_16(), not pcre2_match_8() or pcre2_match_32.
|
||||
|
||||
In the function summaries above, and in the rest of this document and
|
||||
other PCRE2 documents, functions and data types are described using
|
||||
|
@ -474,19 +485,26 @@ PCRE2 API OVERVIEW
|
|||
program against a non-dll PCRE2 library, you must define PCRE2_STATIC
|
||||
before including pcre2.h.
|
||||
|
||||
The functions pcre2_compile(), and pcre2_match() are used for compiling
|
||||
The functions pcre2_compile() and pcre2_match() are used for compiling
|
||||
and matching regular expressions in a Perl-compatible manner. A sample
|
||||
program that demonstrates the simplest way of using them is provided in
|
||||
the file called pcre2demo.c in the PCRE2 source distribution. A listing
|
||||
of this program is given in the pcre2demo documentation, and the
|
||||
pcre2sample documentation describes how to compile and run it.
|
||||
|
||||
Just-in-time compiler support is an optional feature of PCRE2 that can
|
||||
be built in appropriate hardware environments. It greatly speeds up the
|
||||
matching performance of many patterns. Programs can request that it be
|
||||
used if available, by calling pcre2_jit_compile() after a pattern has
|
||||
been successfully compiled by pcre2_compile(). This does nothing if JIT
|
||||
support is not available.
|
||||
The compiling and matching functions recognize various options that are
|
||||
passed as bits in an options argument. There are also some more compli-
|
||||
cated parameters such as custom memory management functions and
|
||||
resource limits that are passed in "contexts" (which are just memory
|
||||
blocks, described below). Simple applications do not need to make use
|
||||
of contexts.
|
||||
|
||||
Just-in-time (JIT) compiler support is an optional feature of PCRE2
|
||||
that can be built in appropriate hardware environments. It greatly
|
||||
speeds up the matching performance of many patterns. Programs can
|
||||
request that it be used if available by calling pcre2_jit_compile()
|
||||
after a pattern has been successfully compiled by pcre2_compile(). This
|
||||
does nothing if JIT support is not available.
|
||||
|
||||
More complicated programs might need to make use of the specialist
|
||||
functions pcre2_jit_stack_create(), pcre2_jit_stack_free(), and
|
||||
|
@ -495,14 +513,15 @@ PCRE2 API OVERVIEW
|
|||
|
||||
JIT matching is automatically used by pcre2_match() if it is available,
|
||||
unless the PCRE2_NO_JIT option is set. There is also a direct interface
|
||||
for JIT matching, which gives improved performance. The JIT-specific
|
||||
functions are discussed in the pcre2jit documentation.
|
||||
for JIT matching, which gives improved performance at the expense of
|
||||
less sanity checking. The JIT-specific functions are discussed in the
|
||||
pcre2jit documentation.
|
||||
|
||||
A second matching function, pcre2_dfa_match(), which is not Perl-com-
|
||||
patible, is also provided. This uses a different algorithm for the
|
||||
matching. The alternative algorithm finds all possible matches (at a
|
||||
given point in the subject), and scans the subject just once (unless
|
||||
there are lookbehind assertions). However, this algorithm does not
|
||||
there are lookaround assertions). However, this algorithm does not
|
||||
return captured substrings. A description of the two matching algo-
|
||||
rithms and their advantages and disadvantages is given in the
|
||||
pcre2matching documentation. There is no JIT support for
|
||||
|
@ -603,9 +622,9 @@ MULTITHREADING
|
|||
is thread-safe, that is, the same compiled pattern can be used by more
|
||||
than one thread simultaneously. For example, an application can compile
|
||||
all its patterns at the start, before forking off multiple threads that
|
||||
use them. However, if the just-in-time optimization feature is being
|
||||
used, it needs separate memory stack areas for each thread. See the
|
||||
pcre2jit documentation for more details.
|
||||
use them. However, if the just-in-time (JIT) optimization feature is
|
||||
being used, it needs separate memory stack areas for each thread. See
|
||||
the pcre2jit documentation for more details.
|
||||
|
||||
In a more complicated situation, where patterns are compiled only when
|
||||
they are first needed, but are still shared between threads, pointers
|
||||
|
@ -650,10 +669,10 @@ MULTITHREADING
|
|||
|
||||
Match blocks
|
||||
|
||||
The matching functions need a block of memory for working space and for
|
||||
storing the results of a match. This includes details of what was
|
||||
matched, as well as additional information such as the name of a
|
||||
(*MARK) setting. Each thread must provide its own copy of this memory.
|
||||
The matching functions need a block of memory for storing the results
|
||||
of a match. This includes details of what was matched, as well as addi-
|
||||
tional information such as the name of a (*MARK) setting. Each thread
|
||||
must provide its own copy of this memory.
|
||||
|
||||
|
||||
PCRE2 CONTEXTS
|
||||
|
@ -718,15 +737,15 @@ PCRE2 CONTEXTS
|
|||
|
||||
The compile context
|
||||
|
||||
A compile context is required if you want to change the default values
|
||||
of any of the following compile-time parameters:
|
||||
A compile context is required if you want to provide an external func-
|
||||
tion for stack checking during compilation or to change the default
|
||||
values of any of the following compile-time parameters:
|
||||
|
||||
What \R matches (Unicode newlines or CR, LF, CRLF only)
|
||||
PCRE2's character tables
|
||||
The newline character sequence
|
||||
The compile time nested parentheses limit
|
||||
The maximum length of the pattern string
|
||||
An external function for stack checking
|
||||
|
||||
A compile context is also required if you are using custom memory man-
|
||||
agement. If none of these apply, just pass NULL as the context argu-
|
||||
|
@ -766,12 +785,12 @@ PCRE2 CONTEXTS
|
|||
int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext,
|
||||
PCRE2_SIZE value);
|
||||
|
||||
This sets a maximum length, in code units, for the pattern string that
|
||||
is to be compiled. If the pattern is longer, an error is generated.
|
||||
This facility is provided so that applications that accept patterns
|
||||
from external sources can limit their size. The default is the largest
|
||||
number that a PCRE2_SIZE variable can hold, which is effectively unlim-
|
||||
ited.
|
||||
This sets a maximum length, in code units, for any pattern string that
|
||||
is compiled with this context. If the pattern is longer, an error is
|
||||
generated. This facility is provided so that applications that accept
|
||||
patterns from external sources can limit their size. The default is the
|
||||
largest number that a PCRE2_SIZE variable can hold, which is effec-
|
||||
tively unlimited.
|
||||
|
||||
int pcre2_set_newline(pcre2_compile_context *ccontext,
|
||||
uint32_t value);
|
||||
|
@ -782,11 +801,14 @@ PCRE2 CONTEXTS
|
|||
two-character sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any
|
||||
of the above), or PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
||||
|
||||
When a pattern is compiled with the PCRE2_EXTENDED option, the value of
|
||||
this parameter affects the recognition of white space and the end of
|
||||
internal comments starting with #. The value is saved with the compiled
|
||||
pattern for subsequent use by the JIT compiler and by the two inter-
|
||||
preted matching functions, pcre2_match() and pcre2_dfa_match().
|
||||
A pattern can override the value set in the compile context by starting
|
||||
with a sequence such as (*CRLF). See the pcre2pattern page for details.
|
||||
|
||||
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
|
||||
convention affects the recognition of white space and the end of inter-
|
||||
nal comments starting with #. The value is saved with the compiled pat-
|
||||
tern for subsequent use by the JIT compiler and by the two interpreted
|
||||
matching functions, pcre2_match() and pcre2_dfa_match().
|
||||
|
||||
int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext,
|
||||
uint32_t value);
|
||||
|
@ -815,17 +837,16 @@ PCRE2 CONTEXTS
|
|||
|
||||
The match context
|
||||
|
||||
A match context is required if you want to change the default values of
|
||||
any of the following match-time parameters:
|
||||
A match context is required if you want to:
|
||||
|
||||
A callout function
|
||||
The offset limit for matching an unanchored pattern
|
||||
The limit for calling match() (see below)
|
||||
The limit for calling match() recursively
|
||||
Set up a callout function
|
||||
Set an offset limit for matching an unanchored pattern
|
||||
Change the backtracking match limit
|
||||
Change the backtracking depth limit
|
||||
Set custom memory management specifically for the match
|
||||
|
||||
A match context is also required if you are using custom memory manage-
|
||||
ment. If none of these apply, just pass NULL as the context argument
|
||||
of pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match().
|
||||
If none of these apply, just pass NULL as the context argument of
|
||||
pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match().
|
||||
|
||||
A match context is created, copied, and freed by the following func-
|
||||
tions:
|
||||
|
@ -846,9 +867,9 @@ PCRE2 CONTEXTS
|
|||
int (*callout_function)(pcre2_callout_block *, void *),
|
||||
void *callout_data);
|
||||
|
||||
This sets up a "callout" function, which PCRE2 will call at specified
|
||||
points during a matching operation. Details are given in the pcre2call-
|
||||
out documentation.
|
||||
This sets up a "callout" function for PCRE2 to call at specified points
|
||||
during a matching operation. Details are given in the pcre2callout doc-
|
||||
umentation.
|
||||
|
||||
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
|
||||
PCRE2_SIZE value);
|
||||
|
@ -863,10 +884,11 @@ PCRE2 CONTEXTS
|
|||
argument of pcre2_match() or pcre2_dfa_match() is greater than the off-
|
||||
set limit.
|
||||
|
||||
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when
|
||||
calling pcre2_compile() so that when JIT is in use, different code can
|
||||
be compiled. If a match is started with a non-default match limit when
|
||||
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
||||
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT
|
||||
option when calling pcre2_compile() so that when JIT is in use, differ-
|
||||
ent code can be compiled. If a match is started with a non-default
|
||||
match limit when PCRE2_USE_OFFSET_LIMIT is not set, an error is gener-
|
||||
ated.
|
||||
|
||||
The offset limit facility can be used to track progress when searching
|
||||
large subject strings. See also the PCRE2_FIRSTLINE option, which
|
||||
|
@ -884,13 +906,13 @@ PCRE2 CONTEXTS
|
|||
search trees. The classic example is a pattern that uses nested unlim-
|
||||
ited repeats.
|
||||
|
||||
Internally, pcre2_match() uses a function called match(), which it
|
||||
calls repeatedly (sometimes recursively). The limit set by match_limit
|
||||
is imposed on the number of times this function is called during a
|
||||
match, which has the effect of limiting the amount of backtracking that
|
||||
can take place. For patterns that are not anchored, the count restarts
|
||||
from zero for each position in the subject string. This limit is not
|
||||
relevant to pcre2_dfa_match(), which ignores it.
|
||||
There is an internal counter in pcre2_match() that is incremented each
|
||||
time round its main matching loop. If this value reaches the match
|
||||
limit, pcre2_match() returns the negative value PCRE2_ERROR_MATCHLIMIT.
|
||||
This has the effect of limiting the amount of backtracking that can
|
||||
take place. For patterns that are not anchored, the count restarts from
|
||||
zero for each position in the subject string. This limit is not rele-
|
||||
vant to pcre2_dfa_match(), which ignores it.
|
||||
|
||||
When pcre2_match() is called with a pattern that was successfully pro-
|
||||
cessed by pcre2_jit_compile(), the way in which matching is executed is
|
||||
|
@ -901,9 +923,8 @@ PCRE2 CONTEXTS
|
|||
|
||||
The default value for the limit can be set when PCRE2 is built; the
|
||||
default default is 10 million, which handles all but the most extreme
|
||||
cases. If the limit is exceeded, pcre2_match() returns
|
||||
PCRE2_ERROR_MATCHLIMIT. A value for the match limit may also be sup-
|
||||
plied by an item at the start of a pattern of the form
|
||||
cases. A value for the match limit may also be supplied by an item at
|
||||
the start of a pattern of the form
|
||||
|
||||
(*LIMIT_MATCH=ddd)
|
||||
|
||||
|
@ -911,59 +932,35 @@ PCRE2 CONTEXTS
|
|||
unless ddd is less than the limit set by the caller of pcre2_match()
|
||||
or, if no such limit is set, less than the default.
|
||||
|
||||
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
|
||||
int pcre2_set_depth_limit(pcre2_match_context *mcontext,
|
||||
uint32_t value);
|
||||
|
||||
The recursion_limit parameter is similar to match_limit, but instead of
|
||||
limiting the total number of times that match() is called, it limits
|
||||
the depth of recursion. The recursion depth is a smaller number than
|
||||
the total number of calls, because not all calls to match() are recur-
|
||||
sive. This limit is of use only if it is set smaller than match_limit.
|
||||
This parameter limits the depth of nested backtracking in
|
||||
pcre2_match(). Each time a nested backtracking point is passed, a new
|
||||
memory "frame" is used to remember the state of matching at that point.
|
||||
Thus, this parameter indirectly limits the amount of memory that is
|
||||
used in a match.
|
||||
|
||||
Limiting the recursion depth limits the amount of system stack that can
|
||||
be used, or, when PCRE2 has been compiled to use memory on the heap
|
||||
instead of the stack, the amount of heap memory that can be used. This
|
||||
limit is not relevant, and is ignored, when matching is done using JIT
|
||||
compiled code. However, it is supported by pcre2_dfa_match(), which
|
||||
uses recursive function calls less frequently than pcre2_match(), but
|
||||
which can be caused to use a lot of stack by a recursive pattern such
|
||||
as /(.)(?1)/ matched to a very long string.
|
||||
This limit is not relevant, and is ignored, when matching is done using
|
||||
JIT compiled code. However, it is supported by pcre2_dfa_match(), which
|
||||
uses it to limit the depth of internal recursive function calls that
|
||||
implement lookaround assertions and pattern recursions. This is, there-
|
||||
fore, an indirect limit on the amount of system stack that is used. A
|
||||
recursive pattern such as /(.)(?1)/, when matched to a very long string
|
||||
using pcre2_dfa_match(), can use a great deal of stack.
|
||||
|
||||
The default value for recursion_limit can be set when PCRE2 is built;
|
||||
the default default is the same value as the default for match_limit.
|
||||
If the limit is exceeded, pcre2_match() and pcre2_dfa_match() return
|
||||
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
|
||||
The default value for the depth limit can be set when PCRE2 is built;
|
||||
the default default is the same value as the default for the match
|
||||
limit. If the limit is exceeded, pcre2_match() or pcre2_dfa_match()
|
||||
returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be
|
||||
supplied by an item at the start of a pattern of the form
|
||||
|
||||
(*LIMIT_RECURSION=ddd)
|
||||
(*LIMIT_DEPTH=ddd)
|
||||
|
||||
where ddd is a decimal number. However, such a setting is ignored
|
||||
unless ddd is less than the limit set by the caller of pcre2_match() or
|
||||
pcre2_dfa_match() or, if no such limit is set, less than the default.
|
||||
|
||||
int pcre2_set_recursion_memory_management(
|
||||
pcre2_match_context *mcontext,
|
||||
void *(*private_malloc)(PCRE2_SIZE, void *),
|
||||
void (*private_free)(void *, void *), void *memory_data);
|
||||
|
||||
This function sets up two additional custom memory management functions
|
||||
for use by pcre2_match() when PCRE2 is compiled to use the heap for
|
||||
remembering backtracking data, instead of recursive function calls that
|
||||
use the system stack. There is a discussion about PCRE2's stack usage
|
||||
in the pcre2stack documentation. See the pcre2build documentation for
|
||||
details of how to build PCRE2.
|
||||
|
||||
Using the heap for recursion is a non-standard way of building PCRE2,
|
||||
for use in environments that have limited stacks. Because of the
|
||||
greater use of memory management, pcre2_match() runs more slowly. Func-
|
||||
tions that are different to the general custom memory functions are
|
||||
provided so that special-purpose external code can be used for this
|
||||
case, because the memory blocks are all the same size. The blocks are
|
||||
retained by pcre2_match() until it is about to exit so that they can be
|
||||
re-used when possible during the match. In the absence of these func-
|
||||
tions, the normal custom memory management functions are used, if sup-
|
||||
plied, otherwise the system functions.
|
||||
|
||||
|
||||
CHECKING BUILD-TIME OPTIONS
|
||||
|
||||
|
@ -996,6 +993,13 @@ CHECKING BUILD-TIME OPTIONS
|
|||
sequence; a value of PCRE2_BSR_ANYCRLF means that \R matches only CR,
|
||||
LF, or CRLF. The default can be overridden when a pattern is compiled.
|
||||
|
||||
PCRE2_CONFIG_DEPTHLIMIT
|
||||
|
||||
The output is a uint32_t integer that gives the default limit for the
|
||||
depth of nested backtracking in pcre2_match() or the depth of nested
|
||||
recursions and lookarounds in pcre2_dfa_match(). Further details are
|
||||
given with pcre2_set_depth_limit() above.
|
||||
|
||||
PCRE2_CONFIG_JIT
|
||||
|
||||
The output is a uint32_t integer that is set to one if support for
|
||||
|
@ -1030,9 +1034,9 @@ CHECKING BUILD-TIME OPTIONS
|
|||
|
||||
PCRE2_CONFIG_MATCHLIMIT
|
||||
|
||||
The output is a uint32_t integer that gives the default limit for the
|
||||
number of internal matching function calls in a pcre2_match() execu-
|
||||
tion. Further details are given with pcre2_match() below.
|
||||
The output is a uint32_t integer that gives the default match limit for
|
||||
pcre2_match(). Further details are given with pcre2_set_match_limit()
|
||||
above.
|
||||
|
||||
PCRE2_CONFIG_NEWLINE
|
||||
|
||||
|
@ -1059,21 +1063,10 @@ CHECKING BUILD-TIME OPTIONS
|
|||
application. For finer control over compilation stack usage, see
|
||||
pcre2_set_compile_recursion_guard().
|
||||
|
||||
PCRE2_CONFIG_RECURSIONLIMIT
|
||||
|
||||
The output is a uint32_t integer that gives the default limit for the
|
||||
depth of recursion when calling the internal matching function in a
|
||||
pcre2_match() execution. Further details are given with pcre2_match()
|
||||
below.
|
||||
|
||||
PCRE2_CONFIG_STACKRECURSE
|
||||
|
||||
The output is a uint32_t integer that is set to one if internal recur-
|
||||
sion when running pcre2_match() is implemented by recursive function
|
||||
calls that use the system stack to remember their state. This is the
|
||||
usual way that PCRE2 is compiled. The output is zero if PCRE2 was com-
|
||||
piled to use blocks of data on the heap instead of recursive function
|
||||
calls.
|
||||
This parameter is obsolete and should not be used in new code. The out-
|
||||
put is a uint32_t integer that is always set to zero.
|
||||
|
||||
PCRE2_CONFIG_UNICODE_VERSION
|
||||
|
||||
|
@ -1093,7 +1086,7 @@ CHECKING BUILD-TIME OPTIONS
|
|||
|
||||
PCRE2_CONFIG_VERSION
|
||||
|
||||
The where argument should point to a buffer that is at least 12 code
|
||||
The where argument should point to a buffer that is at least 24 code
|
||||
units long. (The exact length required can be found by calling
|
||||
pcre2_config() with where set to NULL.) The buffer is filled with the
|
||||
PCRE2 version string, zero-terminated. The number of code units used is
|
||||
|
@ -1267,14 +1260,15 @@ COMPILING A PATTERN
|
|||
parenthesis terminates the name. A closing parenthesis can be included
|
||||
in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED
|
||||
option is set, unescaped whitespace in verb names is skipped and #-com-
|
||||
ments are recognized, exactly as in the rest of the pattern.
|
||||
ments are recognized in this mode, exactly as in the rest of the pat-
|
||||
tern.
|
||||
|
||||
PCRE2_AUTO_CALLOUT
|
||||
|
||||
If this bit is set, pcre2_compile() automatically inserts callout
|
||||
items, all with number 255, before each pattern item, except immedi-
|
||||
ately before or after a callout in the pattern. For discussion of the
|
||||
callout facility, see the pcre2callout documentation.
|
||||
ately before or after an explicit callout in the pattern. For discus-
|
||||
sion of the callout facility, see the pcre2callout documentation.
|
||||
|
||||
PCRE2_CASELESS
|
||||
|
||||
|
@ -1517,7 +1511,7 @@ COMPILING A PATTERN
|
|||
section on generic character types in the pcre2pattern page. If you set
|
||||
PCRE2_UCP, matching one of the items it affects takes much longer. The
|
||||
option is available only if PCRE2 has been compiled with Unicode sup-
|
||||
port.
|
||||
port (which is the default).
|
||||
|
||||
PCRE2_UNGREEDY
|
||||
|
||||
|
@ -1548,13 +1542,13 @@ COMPILING A PATTERN
|
|||
|
||||
COMPILATION ERROR CODES
|
||||
|
||||
There are over 80 positive error codes that pcre2_compile() may return
|
||||
(via errorcode) if it finds an error in the pattern. There are also
|
||||
some negative error codes that are used for invalid UTF strings. These
|
||||
are the same as given by pcre2_match() and pcre2_dfa_match(), and are
|
||||
described in the pcre2unicode page. The pcre2_get_error_message() func-
|
||||
tion (see "Obtaining a textual error message" below) can be called to
|
||||
obtain a textual error message from any error code.
|
||||
There are nearly 100 positive error codes that pcre2_compile() may
|
||||
return (via errorcode) if it finds an error in the pattern. There are
|
||||
also some negative error codes that are used for invalid UTF strings.
|
||||
These are the same as given by pcre2_match() and pcre2_dfa_match(), and
|
||||
are described in the pcre2unicode page. The pcre2_get_error_message()
|
||||
function (see "Obtaining a textual error message" below) can be called
|
||||
to obtain a textual error message from any error code.
|
||||
|
||||
|
||||
JUST-IN-TIME (JIT) COMPILATION
|
||||
|
@ -1585,7 +1579,7 @@ JUST-IN-TIME (JIT) COMPILATION
|
|||
JIT compilation is a heavyweight optimization. It can take some time
|
||||
for patterns to be analyzed, and for one-off matches and simple pat-
|
||||
terns the benefit of faster execution might be offset by a much slower
|
||||
compilation time. Most, but not all patterns can be optimized by the
|
||||
compilation time. Most (but not all) patterns can be optimized by the
|
||||
JIT compiler.
|
||||
|
||||
|
||||
|
@ -1595,8 +1589,8 @@ LOCALE SUPPORT
|
|||
letters, digits, or whatever, by reference to a set of tables, indexed
|
||||
by character code point. This applies only to characters whose code
|
||||
points are less than 256. By default, higher-valued code points never
|
||||
match escapes such as \w or \d. However, if PCRE2 is built with UTF
|
||||
support, all characters can be tested with \p and \P, or, alterna-
|
||||
match escapes such as \w or \d. However, if PCRE2 is built with Uni-
|
||||
code support, all characters can be tested with \p and \P, or, alterna-
|
||||
tively, the PCRE2_UCP option can be set when a pattern is compiled;
|
||||
this causes \w and friends to use Unicode property support instead of
|
||||
the built-in tables.
|
||||
|
@ -1639,7 +1633,7 @@ LOCALE SUPPORT
|
|||
The pointer that is passed (via the compile context) to pcre2_compile()
|
||||
is saved with the compiled pattern, and the same tables are used by
|
||||
pcre2_match() and pcre_dfa_match(). Thus, for any single pattern, com-
|
||||
pilation, and matching all happen in the same locale, but different
|
||||
pilation and matching both happen in the same locale, but different
|
||||
patterns can be processed in different locales.
|
||||
|
||||
|
||||
|
@ -1654,7 +1648,7 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
is required, and the third argument is a pointer to a variable to
|
||||
receive the data. If the third argument is NULL, the first argument is
|
||||
ignored, and the function returns the size in bytes of the variable
|
||||
that is required for the information requested. Otherwise, The yield of
|
||||
that is required for the information requested. Otherwise, the yield of
|
||||
the function is zero for success, or one of the following negative num-
|
||||
bers:
|
||||
|
||||
|
@ -1710,8 +1704,8 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
.* is not in a capturing group that is the subject
|
||||
of a back reference
|
||||
PCRE2_DOTALL is in force for .*
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern.
|
||||
PCRE2_NO_DOTSTAR_ANCHOR is not set.
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
||||
|
||||
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in
|
||||
the options returned for PCRE2_INFO_ALLOPTIONS.
|
||||
|
@ -1740,6 +1734,14 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
terns where (?| is not used, this is also the total number of capturing
|
||||
subpatterns. The third argument should point to an uint32_t variable.
|
||||
|
||||
PCRE2_INFO_DEPTHLIMIT
|
||||
|
||||
If the pattern set a backtracking depth limit by including an item of
|
||||
the form (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The
|
||||
third argument should point to an unsigned 32-bit integer. If no such
|
||||
value has been set, the call to pcre2_pattern_info() returns the error
|
||||
PCRE2_ERROR_UNSET.
|
||||
|
||||
PCRE2_INFO_FIRSTBITMAP
|
||||
|
||||
In the absence of a single first code unit for a non-anchored pattern,
|
||||
|
@ -1772,6 +1774,15 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
value can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32
|
||||
mode.
|
||||
|
||||
PCRE2_INFO_FRAMESIZE
|
||||
|
||||
Return the size (in bytes) of the data frames that are used to remember
|
||||
backtracking positions when the pattern is processed by pcre2_match()
|
||||
without the use of JIT. The third argument should point to an size_t
|
||||
variable. The frame size depends on the number of capturing parentheses
|
||||
in the pattern. Each additional capturing group adds two PCRE2_SIZE
|
||||
variables.
|
||||
|
||||
PCRE2_INFO_HASBACKSLASHC
|
||||
|
||||
Return 1 if the pattern contains any instances of \C, otherwise 0. The
|
||||
|
@ -1782,7 +1793,8 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
Return 1 if the pattern contains any explicit matches for CR or LF
|
||||
characters, otherwise 0. The third argument should point to an uint32_t
|
||||
variable. An explicit match is either a literal CR or LF character, or
|
||||
\r or \n.
|
||||
\r or \n or one of the equivalent hexadecimal or octal escape
|
||||
sequences.
|
||||
|
||||
PCRE2_INFO_JCHANGED
|
||||
|
||||
|
@ -1918,7 +1930,7 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
|
||||
PCRE2_INFO_NEWLINE
|
||||
|
||||
The output is a uint32_t with one of the following values:
|
||||
The output is one of the following uint32_t values:
|
||||
|
||||
PCRE2_NEWLINE_CR Carriage return (CR)
|
||||
PCRE2_NEWLINE_LF Linefeed (LF)
|
||||
|
@ -1926,16 +1938,8 @@ INFORMATION ABOUT A COMPILED PATTERN
|
|||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||
|
||||
This specifies the default character sequence that will be recognized
|
||||
as meaning "newline" while matching.
|
||||
|
||||
PCRE2_INFO_RECURSIONLIMIT
|
||||
|
||||
If the pattern set a recursion limit by including an item of the form
|
||||
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
|
||||
argument should point to an unsigned 32-bit integer. If no such value
|
||||
has been set, the call to pcre2_pattern_info() returns the error
|
||||
PCRE2_ERROR_UNSET.
|
||||
This identifies the character sequence that will be recognized as mean-
|
||||
ing "newline" while matching.
|
||||
|
||||
PCRE2_INFO_SIZE
|
||||
|
||||
|
@ -1998,8 +2002,8 @@ THE MATCH DATA BLOCK
|
|||
you must create a match data block by calling one of the creation func-
|
||||
tions above. For pcre2_match_data_create(), the first argument is the
|
||||
number of pairs of offsets in the ovector. One pair of offsets is
|
||||
required to identify the string that matched the whole pattern, with
|
||||
another pair for each captured substring. For example, a value of 4
|
||||
required to identify the string that matched the whole pattern, with an
|
||||
additional pair for each captured substring. For example, a value of 4
|
||||
creates enough space to record the matched portion of the subject plus
|
||||
three captured substrings. A minimum of at least 1 pair is imposed by
|
||||
pcre2_match_data_create(), so it is always possible to return the over-
|
||||
|
@ -2124,9 +2128,11 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
|
|||
ing offset by two characters instead of one.
|
||||
|
||||
If a non-zero starting offset is passed when the pattern is anchored,
|
||||
one attempt to match at the given offset is made. This can only succeed
|
||||
if the pattern does not require the match to be at the start of the
|
||||
subject.
|
||||
an single attempt to match at the given offset is made. This can only
|
||||
succeed if the pattern does not require the match to be at the start of
|
||||
the subject. In other words, the anchoring must be the result of set-
|
||||
ting the PCRE2_ANCHORED option or the use of .* with PCRE2_DOTALL, not
|
||||
by starting the pattern with ^ or \A.
|
||||
|
||||
Option bits for pcre2_match()
|
||||
|
||||
|
@ -2138,9 +2144,8 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
|
|||
|
||||
Setting PCRE2_ANCHORED at match time is not supported by the just-in-
|
||||
time (JIT) compiler. If it is set, JIT matching is disabled and the
|
||||
normal interpretive code in pcre2_match() is run. Apart from
|
||||
PCRE2_NO_JIT (obviously), the remaining options are supported for JIT
|
||||
matching.
|
||||
interpretive code in pcre2_match() is run. Apart from PCRE2_NO_JIT
|
||||
(obviously), the remaining options are supported for JIT matching.
|
||||
|
||||
PCRE2_ANCHORED
|
||||
|
||||
|
@ -2221,11 +2226,11 @@ MATCHING A PATTERN: THE TRADITIONAL FUNCTION
|
|||
checks for performance reasons, you can set the PCRE2_NO_UTF_CHECK
|
||||
option when calling pcre2_match(). You might want to do this for the
|
||||
second and subsequent calls to pcre2_match() if you are making repeated
|
||||
calls to find all the matches in a single subject string.
|
||||
calls to find other matches in the same subject string.
|
||||
|
||||
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
|
||||
string as a subject, or an invalid value of startoffset, is undefined.
|
||||
Your program may crash or loop indefinitely.
|
||||
WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an
|
||||
invalid string as a subject, or an invalid value of startoffset, is
|
||||
undefined. Your program may crash or loop indefinitely.
|
||||
|
||||
PCRE2_PARTIAL_HARD
|
||||
PCRE2_PARTIAL_SOFT
|
||||
|
@ -2278,9 +2283,10 @@ NEWLINE HANDLING WHEN MATCHING
|
|||
acter after the first failure.
|
||||
|
||||
An explicit match for CR of LF is either a literal appearance of one of
|
||||
those characters in the pattern, or one of the \r or \n escape
|
||||
sequences. Implicit matches such as [^X] do not count, nor does \s,
|
||||
even though it includes CR and LF in the characters that it matches.
|
||||
those characters in the pattern, or one of the \r or \n or equivalent
|
||||
octal or hexadecimal escape sequences. Implicit matches such as [^X] do
|
||||
not count, nor does \s, even though it includes CR and LF in the char-
|
||||
acters that it matches.
|
||||
|
||||
Notwithstanding the above, anomalous effects may still occur when CRLF
|
||||
is a valid newline sequence and explicit \r or \n escapes appear in the
|
||||
|
@ -2325,14 +2331,14 @@ HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
|
|||
They identify the part of the subject that was partially matched. See
|
||||
the pcre2partial documentation for details of partial matching.
|
||||
|
||||
After a successful match, the first pair of offsets identifies the por-
|
||||
tion of the subject string that was matched by the entire pattern. The
|
||||
next pair is used for the first capturing subpattern, and so on. The
|
||||
value returned by pcre2_match() is one more than the highest numbered
|
||||
pair that has been set. For example, if two substrings have been cap-
|
||||
tured, the returned value is 3. If there are no capturing subpatterns,
|
||||
the return value from a successful match is 1, indicating that just the
|
||||
first pair of offsets has been set.
|
||||
After a fully successful match, the first pair of offsets identifies
|
||||
the portion of the subject string that was matched by the entire pat-
|
||||
tern. The next pair is used for the first captured substring, and so
|
||||
on. The value returned by pcre2_match() is one more than the highest
|
||||
numbered pair that has been set. For example, if two substrings have
|
||||
been captured, the returned value is 3. If there are no captured sub-
|
||||
strings, the return value from a successful match is 1, indicating that
|
||||
just the first pair of offsets has been set.
|
||||
|
||||
If a pattern uses the \K escape sequence within a positive assertion,
|
||||
the reported start of a successful match can be greater than the end of
|
||||
|
@ -2347,11 +2353,7 @@ HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
|
|||
as much as possible is filled in, and the function returns a value of
|
||||
zero. If captured substrings are not of interest, pcre2_match() may be
|
||||
called with a match data block whose ovector is of minimum length (that
|
||||
is, one pair). However, if the pattern contains back references and the
|
||||
ovector is not big enough to remember the related substrings, PCRE2 has
|
||||
to get additional memory for use during matching. Thus it is usually
|
||||
advisable to set up a match data block containing an ovector of reason-
|
||||
able size.
|
||||
is, one pair).
|
||||
|
||||
It is possible for capturing subpattern number n+1 to match some part
|
||||
of the subject when subpattern n has not been used at all. For example,
|
||||
|
@ -2450,9 +2452,10 @@ ERROR RETURNS FROM pcre2_match()
|
|||
|
||||
PCRE2_ERROR_BADMODE
|
||||
|
||||
This error is given when a pattern that was compiled by the 8-bit
|
||||
library is passed to a 16-bit or 32-bit library function, or vice
|
||||
versa.
|
||||
This error is given when a compiled pattern is passed to a function in
|
||||
a library of a different code unit width, for example, a pattern com-
|
||||
piled by the 8-bit library is passed to a 16-bit or 32-bit library
|
||||
function.
|
||||
|
||||
PCRE2_ERROR_BADOFFSET
|
||||
|
||||
|
@ -2476,19 +2479,15 @@ ERROR RETURNS FROM pcre2_match()
|
|||
pcre2_callout_enumerate() to return a distinctive error code. See the
|
||||
pcre2callout documentation for details.
|
||||
|
||||
PCRE2_ERROR_DEPTHLIMIT
|
||||
|
||||
The nested backtracking depth limit was reached.
|
||||
|
||||
PCRE2_ERROR_INTERNAL
|
||||
|
||||
An unexpected internal error has occurred. This error could be caused
|
||||
by a bug in PCRE2 or by overwriting of the compiled pattern.
|
||||
|
||||
PCRE2_ERROR_JIT_BADOPTION
|
||||
|
||||
This error is returned when a pattern that was successfully studied
|
||||
using JIT is being matched, but the matching mode (partial or complete
|
||||
match) does not correspond to any JIT compilation mode. When the JIT
|
||||
fast path function is used, this error may be also given for invalid
|
||||
options. See the pcre2jit documentation for more details.
|
||||
|
||||
PCRE2_ERROR_JIT_STACKLIMIT
|
||||
|
||||
This error is returned when a pattern that was successfully studied
|
||||
|
@ -2498,15 +2497,13 @@ ERROR RETURNS FROM pcre2_match()
|
|||
|
||||
PCRE2_ERROR_MATCHLIMIT
|
||||
|
||||
The backtracking limit was reached.
|
||||
The backtracking match limit was reached.
|
||||
|
||||
PCRE2_ERROR_NOMEMORY
|
||||
|
||||
If a pattern contains back references, but the ovector is not big
|
||||
enough to remember the referenced substrings, PCRE2 gets a block of
|
||||
memory at the start of matching to use for this purpose. There are some
|
||||
other special cases where extra memory is needed during matching. This
|
||||
error is given when memory cannot be obtained.
|
||||
If a pattern contains many nested backtracking points, heap memory is
|
||||
used to remember them. This error is given when the memory allocation
|
||||
function (default or custom) fails.
|
||||
|
||||
PCRE2_ERROR_NULL
|
||||
|
||||
|
@ -2522,10 +2519,6 @@ ERROR RETURNS FROM pcre2_match()
|
|||
plicated cases, in particular mutual recursions between two different
|
||||
subpatterns, cannot be detected until matching is attempted.
|
||||
|
||||
PCRE2_ERROR_RECURSIONLIMIT
|
||||
|
||||
The internal recursion limit was reached.
|
||||
|
||||
|
||||
OBTAINING A TEXTUAL ERROR MESSAGE
|
||||
|
||||
|
@ -2703,8 +2696,8 @@ EXTRACTING CAPTURED SUBSTRINGS BY NAME
|
|||
the function is the subpattern number, PCRE2_ERROR_NOSUBSTRING if there
|
||||
is no subpattern of that name, or PCRE2_ERROR_NOUNIQUESUBSTRING if
|
||||
there is more than one subpattern of that name. Given the number, you
|
||||
can extract the substring directly, or use one of the functions
|
||||
described above.
|
||||
can extract the substring directly from the ovector, or use one of the
|
||||
"bynumber" functions described above.
|
||||
|
||||
For convenience, there are also "byname" functions that correspond to
|
||||
the "bynumber" functions, the only difference being that the second
|
||||
|
@ -2991,13 +2984,13 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
|||
|
||||
The function pcre2_dfa_match() is called to match a subject string
|
||||
against a compiled pattern, using a matching algorithm that scans the
|
||||
subject string just once, and does not backtrack. This has different
|
||||
characteristics to the normal algorithm, and is not compatible with
|
||||
Perl. Some of the features of PCRE2 patterns are not supported. Never-
|
||||
theless, there are times when this kind of matching can be useful. For
|
||||
a discussion of the two matching algorithms, and a list of features
|
||||
that pcre2_dfa_match() does not support, see the pcre2matching documen-
|
||||
tation.
|
||||
subject string just once (not counting lookaround assertions), and does
|
||||
not backtrack. This has different characteristics to the normal algo-
|
||||
rithm, and is not compatible with Perl. Some of the features of PCRE2
|
||||
patterns are not supported. Nevertheless, there are times when this
|
||||
kind of matching can be useful. For a discussion of the two matching
|
||||
algorithms, and a list of features that pcre2_dfa_match() does not sup-
|
||||
port, see the pcre2matching documentation.
|
||||
|
||||
The arguments for the pcre2_dfa_match() function are the same as for
|
||||
pcre2_match(), plus two extras. The ovector within the match data block
|
||||
|
@ -3181,7 +3174,7 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 21 March 2017
|
||||
Last updated: 27 March 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -34,7 +34,7 @@ A match context is needed only if you want to:
|
|||
Set a matching offset limit
|
||||
Change the backtracking match limit
|
||||
Change the backtracking depth limit
|
||||
Set custom memory management in the match context
|
||||
Set custom memory management specifically for the match
|
||||
.sp
|
||||
The \fIlength\fP and \fIstartoffset\fP values are code
|
||||
units, not characters. The length may be given as PCRE2_ZERO_TERMINATE for a
|
||||
|
|
380
doc/pcre2api.3
380
doc/pcre2api.3
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2API 3 "21 March 2017" "PCRE2 10.30"
|
||||
.TH PCRE2API 3 "27 March 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
|
@ -120,19 +120,14 @@ document for an overview of all the PCRE2 documentation.
|
|||
.B " int (*\fIcallout_function\fP)(pcre2_callout_block *, void *),"
|
||||
.B " void *\fIcallout_data\fP);"
|
||||
.sp
|
||||
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " uint32_t \fIvalue\fP);"
|
||||
.sp
|
||||
.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " PCRE2_SIZE \fIvalue\fP);"
|
||||
.sp
|
||||
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " uint32_t \fIvalue\fP);"
|
||||
.sp
|
||||
.B int pcre2_set_recursion_memory_management(
|
||||
.B " pcre2_match_context *\fImcontext\fP,"
|
||||
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
|
||||
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
|
||||
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " uint32_t \fIvalue\fP);"
|
||||
.fi
|
||||
.
|
||||
.
|
||||
|
@ -252,6 +247,25 @@ document for an overview of all the PCRE2 documentation.
|
|||
.fi
|
||||
.
|
||||
.
|
||||
.SH "PCRE2 NATIVE API OBSOLETE FUNCTIONS"
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " uint32_t \fIvalue\fP);"
|
||||
.sp
|
||||
.B int pcre2_set_recursion_memory_management(
|
||||
.B " pcre2_match_context *\fImcontext\fP,"
|
||||
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
|
||||
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
|
||||
.fi
|
||||
.sp
|
||||
These functions became obsolete at release 10.30 and are retained only for
|
||||
backward compatibility. They should not be used in new code. The first is
|
||||
replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and
|
||||
no longer has any effect (it always returns zero).
|
||||
.
|
||||
.
|
||||
.SH "PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES"
|
||||
.rs
|
||||
.sp
|
||||
|
@ -302,7 +316,7 @@ When using multiple libraries in an application, you must take care when
|
|||
processing any particular pattern to use only functions from a single library.
|
||||
For example, if you want to run a match using a pattern that was compiled with
|
||||
\fBpcre2_compile_16()\fP, you must do so with \fBpcre2_match_16()\fP, not
|
||||
\fBpcre2_match_8()\fP.
|
||||
\fBpcre2_match_8()\fP or \fBpcre2_match_32\fP.
|
||||
.P
|
||||
In the function summaries above, and in the rest of this document and other
|
||||
PCRE2 documents, functions and data types are described using their generic
|
||||
|
@ -331,7 +345,7 @@ In a Windows environment, if you want to statically link an application program
|
|||
against a non-dll PCRE2 library, you must define PCRE2_STATIC before including
|
||||
\fBpcre2.h\fP.
|
||||
.P
|
||||
The functions \fBpcre2_compile()\fP, and \fBpcre2_match()\fP are used for
|
||||
The functions \fBpcre2_compile()\fP and \fBpcre2_match()\fP are used for
|
||||
compiling and matching regular expressions in a Perl-compatible manner. A
|
||||
sample program that demonstrates the simplest way of using them is provided in
|
||||
the file called \fIpcre2demo.c\fP in the PCRE2 source distribution. A listing
|
||||
|
@ -345,10 +359,16 @@ documentation, and the
|
|||
.\"
|
||||
documentation describes how to compile and run it.
|
||||
.P
|
||||
Just-in-time compiler support is an optional feature of PCRE2 that can be built
|
||||
in appropriate hardware environments. It greatly speeds up the matching
|
||||
The compiling and matching functions recognize various options that are passed
|
||||
as bits in an options argument. There are also some more complicated parameters
|
||||
such as custom memory management functions and resource limits that are passed
|
||||
in "contexts" (which are just memory blocks, described below). Simple
|
||||
applications do not need to make use of contexts.
|
||||
.P
|
||||
Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
|
||||
built in appropriate hardware environments. It greatly speeds up the matching
|
||||
performance of many patterns. Programs can request that it be used if
|
||||
available, by calling \fBpcre2_jit_compile()\fP after a pattern has been
|
||||
available by calling \fBpcre2_jit_compile()\fP after a pattern has been
|
||||
successfully compiled by \fBpcre2_compile()\fP. This does nothing if JIT
|
||||
support is not available.
|
||||
.P
|
||||
|
@ -358,8 +378,8 @@ More complicated programs might need to make use of the specialist functions
|
|||
.P
|
||||
JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
|
||||
unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
|
||||
matching, which gives improved performance. The JIT-specific functions are
|
||||
discussed in the
|
||||
matching, which gives improved performance at the expense of less sanity
|
||||
checking. The JIT-specific functions are discussed in the
|
||||
.\" HREF
|
||||
\fBpcre2jit\fP
|
||||
.\"
|
||||
|
@ -369,7 +389,7 @@ A second matching function, \fBpcre2_dfa_match()\fP, which is not
|
|||
Perl-compatible, is also provided. This uses a different algorithm for the
|
||||
matching. The alternative algorithm finds all possible matches (at a given
|
||||
point in the subject), and scans the subject just once (unless there are
|
||||
lookbehind assertions). However, this algorithm does not return captured
|
||||
lookaround assertions). However, this algorithm does not return captured
|
||||
substrings. A description of the two matching algorithms and their advantages
|
||||
and disadvantages is given in the
|
||||
.\" HREF
|
||||
|
@ -484,8 +504,8 @@ and does not change when the pattern is matched. Therefore, it is thread-safe,
|
|||
that is, the same compiled pattern can be used by more than one thread
|
||||
simultaneously. For example, an application can compile all its patterns at the
|
||||
start, before forking off multiple threads that use them. However, if the
|
||||
just-in-time optimization feature is being used, it needs separate memory stack
|
||||
areas for each thread. See the
|
||||
just-in-time (JIT) optimization feature is being used, it needs separate memory
|
||||
stack areas for each thread. See the
|
||||
.\" HREF
|
||||
\fBpcre2jit\fP
|
||||
.\"
|
||||
|
@ -536,10 +556,10 @@ thread-specific copy.
|
|||
.SS "Match blocks"
|
||||
.rs
|
||||
.sp
|
||||
The matching functions need a block of memory for working space and for storing
|
||||
the results of a match. This includes details of what was matched, as well as
|
||||
additional information such as the name of a (*MARK) setting. Each thread must
|
||||
provide its own copy of this memory.
|
||||
The matching functions need a block of memory for storing the results of a
|
||||
match. This includes details of what was matched, as well as additional
|
||||
information such as the name of a (*MARK) setting. Each thread must provide its
|
||||
own copy of this memory.
|
||||
.
|
||||
.
|
||||
.SH "PCRE2 CONTEXTS"
|
||||
|
@ -611,15 +631,15 @@ The memory used for a general context should be freed by calling:
|
|||
.SS "The compile context"
|
||||
.rs
|
||||
.sp
|
||||
A compile context is required if you want to change the default values of any
|
||||
of the following compile-time parameters:
|
||||
A compile context is required if you want to provide an external function for
|
||||
stack checking during compilation or to change the default values of any of the
|
||||
following compile-time parameters:
|
||||
.sp
|
||||
What \eR matches (Unicode newlines or CR, LF, CRLF only)
|
||||
PCRE2's character tables
|
||||
The newline character sequence
|
||||
The compile time nested parentheses limit
|
||||
The maximum length of the pattern string
|
||||
An external function for stack checking
|
||||
.sp
|
||||
A compile context is also required if you are using custom memory management.
|
||||
If none of these apply, just pass NULL as the context argument of
|
||||
|
@ -666,11 +686,11 @@ in the current locale.
|
|||
.B " PCRE2_SIZE \fIvalue\fP);"
|
||||
.fi
|
||||
.sp
|
||||
This sets a maximum length, in code units, for the pattern string that is to be
|
||||
compiled. If the pattern is longer, an error is generated. This facility is
|
||||
provided so that applications that accept patterns from external sources can
|
||||
limit their size. The default is the largest number that a PCRE2_SIZE variable
|
||||
can hold, which is effectively unlimited.
|
||||
This sets a maximum length, in code units, for any pattern string that is
|
||||
compiled with this context. If the pattern is longer, an error is generated.
|
||||
This facility is provided so that applications that accept patterns from
|
||||
external sources can limit their size. The default is the largest number that a
|
||||
PCRE2_SIZE variable can hold, which is effectively unlimited.
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre2_set_newline(pcre2_compile_context *\fIccontext\fP,
|
||||
|
@ -683,8 +703,15 @@ PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
|
|||
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
|
||||
PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
||||
.P
|
||||
When a pattern is compiled with the PCRE2_EXTENDED option, the value of this
|
||||
parameter affects the recognition of white space and the end of internal
|
||||
A pattern can override the value set in the compile context by starting with a
|
||||
sequence such as (*CRLF). See the
|
||||
.\" HREF
|
||||
\fBpcre2pattern\fP
|
||||
.\"
|
||||
page for details.
|
||||
.P
|
||||
When a pattern is compiled with the PCRE2_EXTENDED option, the newline
|
||||
convention affects the recognition of white space and the end of internal
|
||||
comments starting with #. The value is saved with the compiled pattern for
|
||||
subsequent use by the JIT compiler and by the two interpreted matching
|
||||
functions, \fIpcre2_match()\fP and \fIpcre2_dfa_match()\fP.
|
||||
|
@ -722,15 +749,14 @@ zero if all is well, or non-zero to force an error.
|
|||
.SS "The match context"
|
||||
.rs
|
||||
.sp
|
||||
A match context is required if you want to change the default values of any
|
||||
of the following match-time parameters:
|
||||
A match context is required if you want to:
|
||||
.sp
|
||||
A callout function
|
||||
The offset limit for matching an unanchored pattern
|
||||
The limit for calling \fBmatch()\fP (see below)
|
||||
The limit for calling \fBmatch()\fP recursively
|
||||
Set up a callout function
|
||||
Set an offset limit for matching an unanchored pattern
|
||||
Change the backtracking match limit
|
||||
Change the backtracking depth limit
|
||||
Set custom memory management specifically for the match
|
||||
.sp
|
||||
A match context is also required if you are using custom memory management.
|
||||
If none of these apply, just pass NULL as the context argument of
|
||||
\fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or \fBpcre2_jit_match()\fP.
|
||||
.P
|
||||
|
@ -756,7 +782,7 @@ PCRE2_ERROR_BADDATA if invalid data is detected.
|
|||
.B " void *\fIcallout_data\fP);"
|
||||
.fi
|
||||
.sp
|
||||
This sets up a "callout" function, which PCRE2 will call at specified points
|
||||
This sets up a "callout" function for PCRE2 to call at specified points
|
||||
during a matching operation. Details are given in the
|
||||
.\" HREF
|
||||
\fBpcre2callout\fP
|
||||
|
@ -778,8 +804,8 @@ A match can never be found if the \fIstartoffset\fP argument of
|
|||
\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP is greater than the offset
|
||||
limit.
|
||||
.P
|
||||
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when calling
|
||||
\fBpcre2_compile()\fP so that when JIT is in use, different code can be
|
||||
When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when
|
||||
calling \fBpcre2_compile()\fP so that when JIT is in use, different code can be
|
||||
compiled. If a match is started with a non-default match limit when
|
||||
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
||||
.P
|
||||
|
@ -799,10 +825,10 @@ up too many resources when processing patterns that are not going to match, but
|
|||
which have a very large number of possibilities in their search trees. The
|
||||
classic example is a pattern that uses nested unlimited repeats.
|
||||
.P
|
||||
Internally, \fBpcre2_match()\fP uses a function called \fBmatch()\fP, which it
|
||||
calls repeatedly (sometimes recursively). The limit set by \fImatch_limit\fP is
|
||||
imposed on the number of times this function is called during a match, which
|
||||
has the effect of limiting the amount of backtracking that can take place. For
|
||||
There is an internal counter in \fBpcre2_match()\fP that is incremented each
|
||||
time round its main matching loop. If this value reaches the match limit,
|
||||
\fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has
|
||||
the effect of limiting the amount of backtracking that can take place. For
|
||||
patterns that are not anchored, the count restarts from zero for each position
|
||||
in the subject string. This limit is not relevant to \fBpcre2_dfa_match()\fP,
|
||||
which ignores it.
|
||||
|
@ -815,8 +841,7 @@ is also used in this case (but in a different way) to limit how long the
|
|||
matching can continue.
|
||||
.P
|
||||
The default value for the limit can be set when PCRE2 is built; the default
|
||||
default is 10 million, which handles all but the most extreme cases. If the
|
||||
limit is exceeded, \fBpcre2_match()\fP returns PCRE2_ERROR_MATCHLIMIT. A value
|
||||
default is 10 million, which handles all but the most extreme cases. A value
|
||||
for the match limit may also be supplied by an item at the start of a pattern
|
||||
of the form
|
||||
.sp
|
||||
|
@ -827,65 +852,34 @@ less than the limit set by the caller of \fBpcre2_match()\fP or, if no such
|
|||
limit is set, less than the default.
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP,
|
||||
.B " uint32_t \fIvalue\fP);"
|
||||
.fi
|
||||
.sp
|
||||
The \fIrecursion_limit\fP parameter is similar to \fImatch_limit\fP, but
|
||||
instead of limiting the total number of times that \fBmatch()\fP is called, it
|
||||
limits the depth of recursion. The recursion depth is a smaller number than the
|
||||
total number of calls, because not all calls to \fBmatch()\fP are recursive.
|
||||
This limit is of use only if it is set smaller than \fImatch_limit\fP.
|
||||
This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP.
|
||||
Each time a nested backtracking point is passed, a new memory "frame" is used
|
||||
to remember the state of matching at that point. Thus, this parameter
|
||||
indirectly limits the amount of memory that is used in a match.
|
||||
.P
|
||||
Limiting the recursion depth limits the amount of system stack that can be
|
||||
used, or, when PCRE2 has been compiled to use memory on the heap instead of the
|
||||
stack, the amount of heap memory that can be used. This limit is not relevant,
|
||||
and is ignored, when matching is done using JIT compiled code. However, it is
|
||||
supported by \fBpcre2_dfa_match()\fP, which uses recursive function calls less
|
||||
frequently than \fBpcre2_match()\fP, but which can be caused to use a lot of
|
||||
stack by a recursive pattern such as /(.)(?1)/ matched to a very long string.
|
||||
This limit is not relevant, and is ignored, when matching is done using JIT
|
||||
compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which uses
|
||||
it to limit the depth of internal recursive function calls that implement
|
||||
lookaround assertions and pattern recursions. This is, therefore, an indirect
|
||||
limit on the amount of system stack that is used. A recursive pattern such as
|
||||
/(.)(?1)/, when matched to a very long string using \fBpcre2_dfa_match()\fP,
|
||||
can use a great deal of stack.
|
||||
.P
|
||||
The default value for \fIrecursion_limit\fP can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for \fImatch_limit\fP. If the
|
||||
limit is exceeded, \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP return
|
||||
PCRE2_ERROR_RECURSIONLIMIT. A value for the recursion limit may also be
|
||||
supplied by an item at the start of a pattern of the form
|
||||
The default value for the depth limit can be set when PCRE2 is built; the
|
||||
default default is the same value as the default for the match limit. If the
|
||||
limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP returns
|
||||
PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be supplied by an
|
||||
item at the start of a pattern of the form
|
||||
.sp
|
||||
(*LIMIT_RECURSION=ddd)
|
||||
(*LIMIT_DEPTH=ddd)
|
||||
.sp
|
||||
where ddd is a decimal number. However, such a setting is ignored unless ddd is
|
||||
less than the limit set by the caller of \fBpcre2_match()\fP or
|
||||
\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
|
||||
.sp
|
||||
.nf
|
||||
.B int pcre2_set_recursion_memory_management(
|
||||
.B " pcre2_match_context *\fImcontext\fP,"
|
||||
.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *),"
|
||||
.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);"
|
||||
.fi
|
||||
.sp
|
||||
This function sets up two additional custom memory management functions for use
|
||||
by \fBpcre2_match()\fP when PCRE2 is compiled to use the heap for remembering
|
||||
backtracking data, instead of recursive function calls that use the system
|
||||
stack. There is a discussion about PCRE2's stack usage in the
|
||||
.\" HREF
|
||||
\fBpcre2stack\fP
|
||||
.\"
|
||||
documentation. See the
|
||||
.\" HREF
|
||||
\fBpcre2build\fP
|
||||
.\"
|
||||
documentation for details of how to build PCRE2.
|
||||
.P
|
||||
Using the heap for recursion is a non-standard way of building PCRE2, for use
|
||||
in environments that have limited stacks. Because of the greater use of memory
|
||||
management, \fBpcre2_match()\fP runs more slowly. Functions that are different
|
||||
to the general custom memory functions are provided so that special-purpose
|
||||
external code can be used for this case, because the memory blocks are all the
|
||||
same size. The blocks are retained by \fBpcre2_match()\fP until it is about to
|
||||
exit so that they can be re-used when possible during the match. In the absence
|
||||
of these functions, the normal custom memory management functions are used, if
|
||||
supplied, otherwise the system functions.
|
||||
.
|
||||
.
|
||||
.SH "CHECKING BUILD-TIME OPTIONS"
|
||||
|
@ -920,6 +914,13 @@ sequences the \eR escape sequence matches by default. A value of
|
|||
PCRE2_BSR_UNICODE means that \eR matches any Unicode line ending sequence; a
|
||||
value of PCRE2_BSR_ANYCRLF means that \eR matches only CR, LF, or CRLF. The
|
||||
default can be overridden when a pattern is compiled.
|
||||
.sp
|
||||
PCRE2_CONFIG_DEPTHLIMIT
|
||||
.sp
|
||||
The output is a uint32_t integer that gives the default limit for the depth of
|
||||
nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions
|
||||
and lookarounds in \fBpcre2_dfa_match()\fP. Further details are given with
|
||||
\fBpcre2_set_depth_limit()\fP above.
|
||||
.sp
|
||||
PCRE2_CONFIG_JIT
|
||||
.sp
|
||||
|
@ -954,9 +955,9 @@ be compiled by those two libraries, but at the expense of slower matching.
|
|||
.sp
|
||||
PCRE2_CONFIG_MATCHLIMIT
|
||||
.sp
|
||||
The output is a uint32_t integer that gives the default limit for the number of
|
||||
internal matching function calls in a \fBpcre2_match()\fP execution. Further
|
||||
details are given with \fBpcre2_match()\fP below.
|
||||
The output is a uint32_t integer that gives the default match limit for
|
||||
\fBpcre2_match()\fP. Further details are given with
|
||||
\fBpcre2_set_match_limit()\fP above.
|
||||
.sp
|
||||
PCRE2_CONFIG_NEWLINE
|
||||
.sp
|
||||
|
@ -980,20 +981,11 @@ amount of system stack used when a pattern is compiled. It is specified when
|
|||
PCRE2 is built; the default is 250. This limit does not take into account the
|
||||
stack that may already be used by the calling application. For finer control
|
||||
over compilation stack usage, see \fBpcre2_set_compile_recursion_guard()\fP.
|
||||
.sp
|
||||
PCRE2_CONFIG_RECURSIONLIMIT
|
||||
.sp
|
||||
The output is a uint32_t integer that gives the default limit for the depth of
|
||||
recursion when calling the internal matching function in a \fBpcre2_match()\fP
|
||||
execution. Further details are given with \fBpcre2_match()\fP below.
|
||||
.sp
|
||||
PCRE2_CONFIG_STACKRECURSE
|
||||
.sp
|
||||
The output is a uint32_t integer that is set to one if internal recursion when
|
||||
running \fBpcre2_match()\fP is implemented by recursive function calls that use
|
||||
the system stack to remember their state. This is the usual way that PCRE2 is
|
||||
compiled. The output is zero if PCRE2 was compiled to use blocks of data on the
|
||||
heap instead of recursive function calls.
|
||||
This parameter is obsolete and should not be used in new code. The output is a
|
||||
uint32_t integer that is always set to zero.
|
||||
.sp
|
||||
PCRE2_CONFIG_UNICODE_VERSION
|
||||
.sp
|
||||
|
@ -1012,7 +1004,7 @@ available; otherwise it is set to zero. Unicode support implies UTF support.
|
|||
.sp
|
||||
PCRE2_CONFIG_VERSION
|
||||
.sp
|
||||
The \fIwhere\fP argument should point to a buffer that is at least 12 code
|
||||
The \fIwhere\fP argument should point to a buffer that is at least 24 code
|
||||
units long. (The exact length required can be found by calling
|
||||
\fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with
|
||||
the PCRE2 version string, zero-terminated. The number of code units used is
|
||||
|
@ -1208,13 +1200,14 @@ option is set, normal backslash processing is applied to verb names and only an
|
|||
unescaped closing parenthesis terminates the name. A closing parenthesis can be
|
||||
included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED
|
||||
option is set, unescaped whitespace in verb names is skipped and #-comments are
|
||||
recognized, exactly as in the rest of the pattern.
|
||||
recognized in this mode, exactly as in the rest of the pattern.
|
||||
.sp
|
||||
PCRE2_AUTO_CALLOUT
|
||||
.sp
|
||||
If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
|
||||
all with number 255, before each pattern item, except immediately before or
|
||||
after a callout in the pattern. For discussion of the callout facility, see the
|
||||
after an explicit callout in the pattern. For discussion of the callout
|
||||
facility, see the
|
||||
.\" HREF
|
||||
\fBpcre2callout\fP
|
||||
.\"
|
||||
|
@ -1452,9 +1445,8 @@ in the
|
|||
.\" HREF
|
||||
\fBpcre2unicode\fP
|
||||
.\"
|
||||
document.
|
||||
If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a negative
|
||||
error code.
|
||||
document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a
|
||||
negative error code.
|
||||
.P
|
||||
If you know that your pattern is valid, and you want to skip this check for
|
||||
performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When it is set,
|
||||
|
@ -1479,7 +1471,7 @@ in the
|
|||
.\"
|
||||
page. If you set PCRE2_UCP, matching one of the items it affects takes much
|
||||
longer. The option is available only if PCRE2 has been compiled with Unicode
|
||||
support.
|
||||
support (which is the default).
|
||||
.sp
|
||||
PCRE2_UNGREEDY
|
||||
.sp
|
||||
|
@ -1518,7 +1510,7 @@ page.
|
|||
.SH "COMPILATION ERROR CODES"
|
||||
.rs
|
||||
.sp
|
||||
There are over 80 positive error codes that \fBpcre2_compile()\fP may return
|
||||
There are nearly 100 positive error codes that \fBpcre2_compile()\fP may return
|
||||
(via \fIerrorcode\fP) if it finds an error in the pattern. There are also some
|
||||
negative error codes that are used for invalid UTF strings. These are the same
|
||||
as given by \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, and are described
|
||||
|
@ -1570,7 +1562,7 @@ documentation.
|
|||
JIT compilation is a heavyweight optimization. It can take some time for
|
||||
patterns to be analyzed, and for one-off matches and simple patterns the
|
||||
benefit of faster execution might be offset by a much slower compilation time.
|
||||
Most, but not all patterns can be optimized by the JIT compiler.
|
||||
Most (but not all) patterns can be optimized by the JIT compiler.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="localesupport"></a>
|
||||
|
@ -1581,10 +1573,10 @@ PCRE2 handles caseless matching, and determines whether characters are letters,
|
|||
digits, or whatever, by reference to a set of tables, indexed by character code
|
||||
point. This applies only to characters whose code points are less than 256. By
|
||||
default, higher-valued code points never match escapes such as \ew or \ed.
|
||||
However, if PCRE2 is built with UTF support, all characters can be tested with
|
||||
\ep and \eP, or, alternatively, the PCRE2_UCP option can be set when a pattern
|
||||
is compiled; this causes \ew and friends to use Unicode property support
|
||||
instead of the built-in tables.
|
||||
However, if PCRE2 is built with Unicode support, all characters can be tested
|
||||
with \ep and \eP, or, alternatively, the PCRE2_UCP option can be set when a
|
||||
pattern is compiled; this causes \ew and friends to use Unicode property
|
||||
support instead of the built-in tables.
|
||||
.P
|
||||
The use of locales with Unicode is discouraged. If you are handling characters
|
||||
with code points greater than 128, you should either use Unicode support, or
|
||||
|
@ -1623,7 +1615,7 @@ available for as long as it is needed.
|
|||
The pointer that is passed (via the compile context) to \fBpcre2_compile()\fP
|
||||
is saved with the compiled pattern, and the same tables are used by
|
||||
\fBpcre2_match()\fP and \fBpcre_dfa_match()\fP. Thus, for any single pattern,
|
||||
compilation, and matching all happen in the same locale, but different patterns
|
||||
compilation and matching both happen in the same locale, but different patterns
|
||||
can be processed in different locales.
|
||||
.
|
||||
.
|
||||
|
@ -1646,7 +1638,7 @@ pattern. The second argument specifies which piece of information is required,
|
|||
and the third argument is a pointer to a variable to receive the data. If the
|
||||
third argument is NULL, the first argument is ignored, and the function returns
|
||||
the size in bytes of the variable that is required for the information
|
||||
requested. Otherwise, The yield of the function is zero for success, or one of
|
||||
requested. Otherwise, the yield of the function is zero for success, or one of
|
||||
the following negative numbers:
|
||||
.sp
|
||||
PCRE2_ERROR_NULL the argument \fIcode\fP was NULL
|
||||
|
@ -1699,8 +1691,8 @@ following are true:
|
|||
.* is not in a capturing group that is the subject
|
||||
of a back reference
|
||||
PCRE2_DOTALL is in force for .*
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern.
|
||||
PCRE2_NO_DOTSTAR_ANCHOR is not set.
|
||||
Neither (*PRUNE) nor (*SKIP) appears in the pattern
|
||||
PCRE2_NO_DOTSTAR_ANCHOR is not set
|
||||
.sp
|
||||
For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
|
||||
options returned for PCRE2_INFO_ALLOPTIONS.
|
||||
|
@ -1727,6 +1719,13 @@ matches only CR, LF, or CRLF.
|
|||
Return the highest capturing subpattern number in the pattern. In patterns
|
||||
where (?| is not used, this is also the total number of capturing subpatterns.
|
||||
The third argument should point to an \fBuint32_t\fP variable.
|
||||
.sp
|
||||
PCRE2_INFO_DEPTHLIMIT
|
||||
.sp
|
||||
If the pattern set a backtracking depth limit by including an item of the form
|
||||
(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
|
||||
should point to an unsigned 32-bit integer. If no such value has been set, the
|
||||
call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
|
||||
.sp
|
||||
PCRE2_INFO_FIRSTBITMAP
|
||||
.sp
|
||||
|
@ -1758,6 +1757,14 @@ argument should point to an \fBuint32_t\fP variable. In the 8-bit library, the
|
|||
value is always less than 256. In the 16-bit library the value can be up to
|
||||
0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff,
|
||||
and up to 0xffffffff when not using UTF-32 mode.
|
||||
.sp
|
||||
PCRE2_INFO_FRAMESIZE
|
||||
.sp
|
||||
Return the size (in bytes) of the data frames that are used to remember
|
||||
backtracking positions when the pattern is processed by \fBpcre2_match()\fP
|
||||
without the use of JIT. The third argument should point to an \fBsize_t\fP
|
||||
variable. The frame size depends on the number of capturing parentheses in the
|
||||
pattern. Each additional capturing group adds two PCRE2_SIZE variables.
|
||||
.sp
|
||||
PCRE2_INFO_HASBACKSLASHC
|
||||
.sp
|
||||
|
@ -1768,7 +1775,8 @@ argument should point to an \fBuint32_t\fP variable.
|
|||
.sp
|
||||
Return 1 if the pattern contains any explicit matches for CR or LF characters,
|
||||
otherwise 0. The third argument should point to an \fBuint32_t\fP variable. An
|
||||
explicit match is either a literal CR or LF character, or \er or \en.
|
||||
explicit match is either a literal CR or LF character, or \er or \en or one of
|
||||
the equivalent hexadecimal or octal escape sequences.
|
||||
.sp
|
||||
PCRE2_INFO_JCHANGED
|
||||
.sp
|
||||
|
@ -1907,7 +1915,7 @@ different for each compiled pattern.
|
|||
.sp
|
||||
PCRE2_INFO_NEWLINE
|
||||
.sp
|
||||
The output is a \fBuint32_t\fP with one of the following values:
|
||||
The output is one of the following \fBuint32_t\fP values:
|
||||
.sp
|
||||
PCRE2_NEWLINE_CR Carriage return (CR)
|
||||
PCRE2_NEWLINE_LF Linefeed (LF)
|
||||
|
@ -1915,15 +1923,8 @@ The output is a \fBuint32_t\fP with one of the following values:
|
|||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||
.sp
|
||||
This specifies the default character sequence that will be recognized as
|
||||
meaning "newline" while matching.
|
||||
.sp
|
||||
PCRE2_INFO_RECURSIONLIMIT
|
||||
.sp
|
||||
If the pattern set a recursion limit by including an item of the form
|
||||
(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The third
|
||||
argument should point to an unsigned 32-bit integer. If no such value has been
|
||||
set, the call to \fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET.
|
||||
This identifies the character sequence that will be recognized as meaning
|
||||
"newline" while matching.
|
||||
.sp
|
||||
PCRE2_INFO_SIZE
|
||||
.sp
|
||||
|
@ -2000,9 +2001,9 @@ Before calling \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or
|
|||
the creation functions above. For \fBpcre2_match_data_create()\fP, the first
|
||||
argument is the number of pairs of offsets in the \fIovector\fP. One pair of
|
||||
offsets is required to identify the string that matched the whole pattern, with
|
||||
another pair for each captured substring. For example, a value of 4 creates
|
||||
enough space to record the matched portion of the subject plus three captured
|
||||
substrings. A minimum of at least 1 pair is imposed by
|
||||
an additional pair for each captured substring. For example, a value of 4
|
||||
creates enough space to record the matched portion of the subject plus three
|
||||
captured substrings. A minimum of at least 1 pair is imposed by
|
||||
\fBpcre2_match_data_create()\fP, so it is always possible to return the overall
|
||||
matched string.
|
||||
.P
|
||||
|
@ -2145,9 +2146,11 @@ newline convention recognizes CRLF as a newline, and if so, and the current
|
|||
character is CR followed by LF, advance the starting offset by two characters
|
||||
instead of one.
|
||||
.P
|
||||
If a non-zero starting offset is passed when the pattern is anchored, one
|
||||
If a non-zero starting offset is passed when the pattern is anchored, an single
|
||||
attempt to match at the given offset is made. This can only succeed if the
|
||||
pattern does not require the match to be at the start of the subject.
|
||||
pattern does not require the match to be at the start of the subject. In other
|
||||
words, the anchoring must be the result of setting the PCRE2_ANCHORED option or
|
||||
the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="matchoptions"></a>
|
||||
|
@ -2161,9 +2164,9 @@ PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is
|
|||
described below.
|
||||
.P
|
||||
Setting PCRE2_ANCHORED at match time is not supported by the just-in-time (JIT)
|
||||
compiler. If it is set, JIT matching is disabled and the normal interpretive
|
||||
code in \fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT (obviously), the
|
||||
remaining options are supported for JIT matching.
|
||||
compiler. If it is set, JIT matching is disabled and the interpretive code in
|
||||
\fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT (obviously), the remaining
|
||||
options are supported for JIT matching.
|
||||
.sp
|
||||
PCRE2_ANCHORED
|
||||
.sp
|
||||
|
@ -2257,12 +2260,12 @@ page.
|
|||
If you know that your subject is valid, and you want to skip these checks for
|
||||
performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling
|
||||
\fBpcre2_match()\fP. You might want to do this for the second and subsequent
|
||||
calls to \fBpcre2_match()\fP if you are making repeated calls to find all the
|
||||
matches in a single subject string.
|
||||
calls to \fBpcre2_match()\fP if you are making repeated calls to find other
|
||||
matches in the same subject string.
|
||||
.P
|
||||
NOTE: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid string
|
||||
as a subject, or an invalid value of \fIstartoffset\fP, is undefined. Your
|
||||
program may crash or loop indefinitely.
|
||||
WARNING: When PCRE2_NO_UTF_CHECK is set, the effect of passing an invalid
|
||||
string as a subject, or an invalid value of \fIstartoffset\fP, is undefined.
|
||||
Your program may crash or loop indefinitely.
|
||||
.sp
|
||||
PCRE2_PARTIAL_HARD
|
||||
PCRE2_PARTIAL_SOFT
|
||||
|
@ -2329,9 +2332,9 @@ start, it skips both the CR and the LF before retrying. However, the pattern
|
|||
reference, and so advances only by one character after the first failure.
|
||||
.P
|
||||
An explicit match for CR of LF is either a literal appearance of one of those
|
||||
characters in the pattern, or one of the \er or \en escape sequences. Implicit
|
||||
matches such as [^X] do not count, nor does \es, even though it includes CR and
|
||||
LF in the characters that it matches.
|
||||
characters in the pattern, or one of the \er or \en or equivalent octal or
|
||||
hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor
|
||||
does \es, even though it includes CR and LF in the characters that it matches.
|
||||
.P
|
||||
Notwithstanding the above, anomalous effects may still occur when CRLF is a
|
||||
valid newline sequence and explicit \er or \en escapes appear in the pattern.
|
||||
|
@ -2395,12 +2398,12 @@ identify the part of the subject that was partially matched. See the
|
|||
.\"
|
||||
documentation for details of partial matching.
|
||||
.P
|
||||
After a successful match, the first pair of offsets identifies the portion of
|
||||
the subject string that was matched by the entire pattern. The next pair is
|
||||
used for the first capturing subpattern, and so on. The value returned by
|
||||
After a fully successful match, the first pair of offsets identifies the
|
||||
portion of the subject string that was matched by the entire pattern. The next
|
||||
pair is used for the first captured substring, and so on. The value returned by
|
||||
\fBpcre2_match()\fP is one more than the highest numbered pair that has been
|
||||
set. For example, if two substrings have been captured, the returned value is
|
||||
3. If there are no capturing subpatterns, the return value from a successful
|
||||
3. If there are no captured substrings, the return value from a successful
|
||||
match is 1, indicating that just the first pair of offsets has been set.
|
||||
.P
|
||||
If a pattern uses the \eK escape sequence within a positive assertion, the
|
||||
|
@ -2415,11 +2418,7 @@ returned.
|
|||
If the ovector is too small to hold all the captured substring offsets, as much
|
||||
as possible is filled in, and the function returns a value of zero. If captured
|
||||
substrings are not of interest, \fBpcre2_match()\fP may be called with a match
|
||||
data block whose ovector is of minimum length (that is, one pair). However, if
|
||||
the pattern contains back references and the \fIovector\fP is not big enough to
|
||||
remember the related substrings, PCRE2 has to get additional memory for use
|
||||
during matching. Thus it is usually advisable to set up a match data block
|
||||
containing an ovector of reasonable size.
|
||||
data block whose ovector is of minimum length (that is, one pair).
|
||||
.P
|
||||
It is possible for capturing subpattern number \fIn+1\fP to match some part of
|
||||
the subject when subpattern \fIn\fP has not been used at all. For example, if
|
||||
|
@ -2535,8 +2534,9 @@ returned when the magic number is not present.
|
|||
.sp
|
||||
PCRE2_ERROR_BADMODE
|
||||
.sp
|
||||
This error is given when a pattern that was compiled by the 8-bit library is
|
||||
passed to a 16-bit or 32-bit library function, or vice versa.
|
||||
This error is given when a compiled pattern is passed to a function in a
|
||||
library of a different code unit width, for example, a pattern compiled by
|
||||
the 8-bit library is passed to a 16-bit or 32-bit library function.
|
||||
.sp
|
||||
PCRE2_ERROR_BADOFFSET
|
||||
.sp
|
||||
|
@ -2562,22 +2562,15 @@ use by callout functions that want to cause \fBpcre2_match()\fP or
|
|||
\fBpcre2callout\fP
|
||||
.\"
|
||||
documentation for details.
|
||||
.sp
|
||||
PCRE2_ERROR_DEPTHLIMIT
|
||||
.sp
|
||||
The nested backtracking depth limit was reached.
|
||||
.sp
|
||||
PCRE2_ERROR_INTERNAL
|
||||
.sp
|
||||
An unexpected internal error has occurred. This error could be caused by a bug
|
||||
in PCRE2 or by overwriting of the compiled pattern.
|
||||
.sp
|
||||
PCRE2_ERROR_JIT_BADOPTION
|
||||
.sp
|
||||
This error is returned when a pattern that was successfully studied using JIT
|
||||
is being matched, but the matching mode (partial or complete match) does not
|
||||
correspond to any JIT compilation mode. When the JIT fast path function is
|
||||
used, this error may be also given for invalid options. See the
|
||||
.\" HREF
|
||||
\fBpcre2jit\fP
|
||||
.\"
|
||||
documentation for more details.
|
||||
.sp
|
||||
PCRE2_ERROR_JIT_STACKLIMIT
|
||||
.sp
|
||||
|
@ -2591,15 +2584,13 @@ documentation for more details.
|
|||
.sp
|
||||
PCRE2_ERROR_MATCHLIMIT
|
||||
.sp
|
||||
The backtracking limit was reached.
|
||||
The backtracking match limit was reached.
|
||||
.sp
|
||||
PCRE2_ERROR_NOMEMORY
|
||||
.sp
|
||||
If a pattern contains back references, but the ovector is not big enough to
|
||||
remember the referenced substrings, PCRE2 gets a block of memory at the start
|
||||
of matching to use for this purpose. There are some other special cases where
|
||||
extra memory is needed during matching. This error is given when memory cannot
|
||||
be obtained.
|
||||
If a pattern contains many nested backtracking points, heap memory is used to
|
||||
remember them. This error is given when the memory allocation function (default
|
||||
or custom) fails.
|
||||
.sp
|
||||
PCRE2_ERROR_NULL
|
||||
.sp
|
||||
|
@ -2615,10 +2606,6 @@ in the subject string. Some simple patterns that might do this are detected and
|
|||
faulted at compile time, but more complicated cases, in particular mutual
|
||||
recursions between two different subpatterns, cannot be detected until matching
|
||||
is attempted.
|
||||
.sp
|
||||
PCRE2_ERROR_RECURSIONLIMIT
|
||||
.sp
|
||||
The internal recursion limit was reached.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="geterrormessage"></a>
|
||||
|
@ -2808,8 +2795,8 @@ calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
|
|||
compiled pattern, and the second is the name. The yield of the function is the
|
||||
subpattern number, PCRE2_ERROR_NOSUBSTRING if there is no subpattern of that
|
||||
name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one subpattern of
|
||||
that name. Given the number, you can extract the substring directly, or use one
|
||||
of the functions described above.
|
||||
that name. Given the number, you can extract the substring directly from the
|
||||
ovector, or use one of the "bynumber" functions described above.
|
||||
.P
|
||||
For convenience, there are also "byname" functions that correspond to the
|
||||
"bynumber" functions, the only difference being that the second argument is a
|
||||
|
@ -3113,11 +3100,12 @@ other alternatives. Ultimately, when it runs out of matches,
|
|||
.P
|
||||
The function \fBpcre2_dfa_match()\fP is called to match a subject string
|
||||
against a compiled pattern, using a matching algorithm that scans the subject
|
||||
string just once, and does not backtrack. This has different characteristics to
|
||||
the normal algorithm, and is not compatible with Perl. Some of the features of
|
||||
PCRE2 patterns are not supported. Nevertheless, there are times when this kind
|
||||
of matching can be useful. For a discussion of the two matching algorithms, and
|
||||
a list of features that \fBpcre2_dfa_match()\fP does not support, see the
|
||||
string just once (not counting lookaround assertions), and does not backtrack.
|
||||
This has different characteristics to the normal algorithm, and is not
|
||||
compatible with Perl. Some of the features of PCRE2 patterns are not supported.
|
||||
Nevertheless, there are times when this kind of matching can be useful. For a
|
||||
discussion of the two matching algorithms, and a list of features that
|
||||
\fBpcre2_dfa_match()\fP does not support, see the
|
||||
.\" HREF
|
||||
\fBpcre2matching\fP
|
||||
.\"
|
||||
|
@ -3321,6 +3309,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 21 March 2017
|
||||
Last updated: 27 March 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
Loading…
Reference in New Issue