Update HTML docs.
This commit is contained in:
parent
96926349bc
commit
be1ac011ec
|
@ -91,6 +91,9 @@ in the library.
|
||||||
<tr><td><a href="pcre2_callout_enumerate.html">pcre2_callout_enumerate</a></td>
|
<tr><td><a href="pcre2_callout_enumerate.html">pcre2_callout_enumerate</a></td>
|
||||||
<td> Enumerate callouts in a compiled pattern</td></tr>
|
<td> Enumerate callouts in a compiled pattern</td></tr>
|
||||||
|
|
||||||
|
<tr><td><a href="pcre2_code_copy.html">pcre2_code_copy</a></td>
|
||||||
|
<td> Copy a compiled pattern</td></tr>
|
||||||
|
|
||||||
<tr><td><a href="pcre2_code_free.html">pcre2_code_free</a></td>
|
<tr><td><a href="pcre2_code_free.html">pcre2_code_free</a></td>
|
||||||
<td> Free a compiled pattern</td></tr>
|
<td> Free a compiled pattern</td></tr>
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,42 @@
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>pcre2_code_copy specification</title>
|
||||||
|
</head>
|
||||||
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||||
|
<h1>pcre2_code_copy man page</h1>
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
This page is part of the PCRE2 HTML documentation. It was generated
|
||||||
|
automatically from the original man page. If there is any nonsense in it,
|
||||||
|
please consult the man page, in case the conversion went wrong.
|
||||||
|
<br>
|
||||||
|
<br><b>
|
||||||
|
SYNOPSIS
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
<b>#include <pcre2.h></b>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b>
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
DESCRIPTION
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
This function makes a copy of the memory used for a compiled pattern, excluding
|
||||||
|
any memory used by the JIT compiler. Without a subsequent call to
|
||||||
|
<b>pcre2_jit_compile()</b>, the copy can be used only for non-JIT matching. The
|
||||||
|
yield of the function is NULL if <i>code</i> is NULL or if sufficient memory
|
||||||
|
cannot be obtained.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
There is a complete description of the PCRE2 native API in the
|
||||||
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
|
page and a description of the POSIX API in the
|
||||||
|
<a href="pcre2posix.html"><b>pcre2posix</b></a>
|
||||||
|
page.
|
||||||
|
<p>
|
||||||
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
</p>
|
|
@ -290,6 +290,9 @@ document for an overview of all the PCRE2 documentation.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC10" href="#TOC1">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a><br>
|
<br><a name="SEC10" href="#TOC1">PCRE2 NATIVE API AUXILIARY FUNCTIONS</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
<b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
|
<b>int pcre2_get_error_message(int <i>errorcode</i>, PCRE2_UCHAR *<i>buffer</i>,</b>
|
||||||
<b> PCRE2_SIZE <i>bufflen</i>);</b>
|
<b> PCRE2_SIZE <i>bufflen</i>);</b>
|
||||||
<br>
|
<br>
|
||||||
|
@ -455,10 +458,19 @@ return a copy of the subject string with substitutions for parts that were
|
||||||
matched.
|
matched.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
Functions whose names begin with <b>pcre2_serialize_</b> are used for saving
|
||||||
|
compiled patterns on disc or elsewhere, and reloading them later.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
Finally, there are functions for finding out information about a compiled
|
Finally, there are functions for finding out information about a compiled
|
||||||
pattern (<b>pcre2_pattern_info()</b>) and about the configuration with which
|
pattern (<b>pcre2_pattern_info()</b>) and about the configuration with which
|
||||||
PCRE2 was built (<b>pcre2_config()</b>).
|
PCRE2 was built (<b>pcre2_config()</b>).
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
Functions with names ending with <b>_free()</b> are used for freeing memory
|
||||||
|
blocks of various sorts. In all cases, if one of these functions is called with
|
||||||
|
a NULL argument, it does nothing.
|
||||||
|
</P>
|
||||||
<br><a name="SEC13" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
<br><a name="SEC13" href="#TOC1">STRING LENGTHS AND OFFSETS</a><br>
|
||||||
<P>
|
<P>
|
||||||
The PCRE2 API uses string lengths and offsets into strings of code units in
|
The PCRE2 API uses string lengths and offsets into strings of code units in
|
||||||
|
@ -516,20 +528,51 @@ time ensuring that multithreaded applications can use it.
|
||||||
There are several different blocks of data that are used to pass information
|
There are several different blocks of data that are used to pass information
|
||||||
between the application and the PCRE2 libraries.
|
between the application and the PCRE2 libraries.
|
||||||
</P>
|
</P>
|
||||||
|
<br><b>
|
||||||
|
The compiled pattern
|
||||||
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
(1) A pointer to the compiled form of a pattern is returned to the user when
|
A pointer to the compiled form of a pattern is returned to the user when
|
||||||
<b>pcre2_compile()</b> is successful. The data in the compiled pattern is fixed,
|
<b>pcre2_compile()</b> is successful. The data in the compiled pattern is fixed,
|
||||||
and does not change when the pattern is matched. Therefore, it is thread-safe,
|
and does not change when the pattern is matched. Therefore, it is thread-safe,
|
||||||
that is, the same compiled pattern can be used by more than one thread
|
that is, the same compiled pattern can be used by more than one thread
|
||||||
simultaneously. An application can compile all its patterns at the start,
|
simultaneously. For example, an application can compile all its patterns at the
|
||||||
before forking off multiple threads that use them. However, if the just-in-time
|
start, before forking off multiple threads that use them. However, if the
|
||||||
optimization feature is being used, it needs separate memory stack areas for
|
just-in-time optimization feature is being used, it needs separate memory stack
|
||||||
each thread. See the
|
areas for each thread. See the
|
||||||
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
<a href="pcre2jit.html"><b>pcre2jit</b></a>
|
||||||
documentation for more details.
|
documentation for more details.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
(2) The next section below introduces the idea of "contexts" in which PCRE2
|
In a more complicated situation, where patterns are compiled only when they are
|
||||||
|
first needed, but are still shared between threads, pointers to compiled
|
||||||
|
patterns must be protected from simultaneous writing by multiple threads, at
|
||||||
|
least until a pattern has been compiled. The logic can be something like this:
|
||||||
|
<pre>
|
||||||
|
Get a read-only (shared) lock (mutex) for pointer
|
||||||
|
if (pointer == NULL)
|
||||||
|
{
|
||||||
|
Get a write (unique) lock for pointer
|
||||||
|
pointer = pcre2_compile(...
|
||||||
|
}
|
||||||
|
Release the lock
|
||||||
|
Use pointer in pcre2_match()
|
||||||
|
</pre>
|
||||||
|
Of course, testing for compilation errors should also be included in the code.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If JIT is being used, but the JIT compilation is not being done immediately,
|
||||||
|
(perhaps waiting to see if the pattern is used often enough) similar logic is
|
||||||
|
required. JIT compilation updates a pointer within the compiled code block, so
|
||||||
|
a thread must gain unique write access to the pointer before calling
|
||||||
|
<b>pcre2_jit_compile()</b>. Alternatively, <b>pcre2_code_copy()</b> can be used
|
||||||
|
to obtain a private copy of the compiled code.
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Context blocks
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
The next main section below introduces the idea of "contexts" in which PCRE2
|
||||||
functions are called. A context is nothing more than a collection of parameters
|
functions are called. A context is nothing more than a collection of parameters
|
||||||
that control the way PCRE2 operates. Grouping a number of parameters together
|
that control the way PCRE2 operates. Grouping a number of parameters together
|
||||||
in a context is a convenient way of passing them to a PCRE2 function without
|
in a context is a convenient way of passing them to a PCRE2 function without
|
||||||
|
@ -543,11 +586,14 @@ are never changed, the same context can be used by all the threads. However, if
|
||||||
any thread needs to change any value in a context, it must make its own
|
any thread needs to change any value in a context, it must make its own
|
||||||
thread-specific copy.
|
thread-specific copy.
|
||||||
</P>
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Match blocks
|
||||||
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
(3) The matching functions need a block of memory for working space and for
|
The matching functions need a block of memory for working space and for storing
|
||||||
storing the results of a match. This includes details of what was matched, as
|
the results of a match. This includes details of what was matched, as well as
|
||||||
well as additional information such as the name of a (*MARK) setting. Each
|
additional information such as the name of a (*MARK) setting. Each thread must
|
||||||
thread must provide its own version of this memory.
|
provide its own copy of this memory.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC16" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
<br><a name="SEC16" href="#TOC1">PCRE2 CONTEXTS</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -1007,14 +1053,33 @@ zero.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
<b>void pcre2_code_free(pcre2_code *<i>code</i>);</b>
|
<b>void pcre2_code_free(pcre2_code *<i>code</i>);</b>
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
<b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>pcre2_compile()</b> function compiles a pattern into an internal form.
|
The <b>pcre2_compile()</b> function compiles a pattern into an internal form.
|
||||||
The pattern is defined by a pointer to a string of code units and a length, If
|
The pattern is defined by a pointer to a string of code units and a length. If
|
||||||
the pattern is zero-terminated, the length can be specified as
|
the pattern is zero-terminated, the length can be specified as
|
||||||
PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of memory that
|
PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of memory that
|
||||||
contains the compiled pattern and related data. The caller must free the memory
|
contains the compiled pattern and related data.
|
||||||
by calling <b>pcre2_code_free()</b> when it is no longer needed.
|
</P>
|
||||||
|
<P>
|
||||||
|
If the compile context argument <i>ccontext</i> is NULL, memory for the compiled
|
||||||
|
pattern is obtained by calling <b>malloc()</b>. Otherwise, it is obtained from
|
||||||
|
the same memory function that was used for the compile context. The caller must
|
||||||
|
free the memory by calling <b>pcre2_code_free()</b> when it is no longer needed.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The function <b>pcre2_code_copy()</b> makes a copy of the compiled code in new
|
||||||
|
memory, using the same memory allocator as was used for the original. However,
|
||||||
|
if the code has been processed by the JIT compiler (see
|
||||||
|
<a href="#jitcompiling">below),</a>
|
||||||
|
the JIT information cannot be copied (because it is position-dependent).
|
||||||
|
The new copy can initially be used only for non-JIT matching, though it can be
|
||||||
|
passed to <b>pcre2_jit_compile()</b> if required. The <b>pcre2_code_copy()</b>
|
||||||
|
function provides a way for individual threads in a multithreaded application
|
||||||
|
to acquire a private copy of shared compiled code.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
NOTE: When one of the matching functions is called, pointers to the compiled
|
NOTE: When one of the matching functions is called, pointers to the compiled
|
||||||
|
@ -1025,16 +1090,12 @@ free a compiled pattern (or a subject string) until after all operations on the
|
||||||
have taken place.
|
have taken place.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If the compile context argument <i>ccontext</i> is NULL, memory for the compiled
|
The <i>options</i> argument for <b>pcre2_compile()</b> contains various bit
|
||||||
pattern is obtained by calling <b>malloc()</b>. Otherwise, it is obtained from
|
settings that affect the compilation. It should be zero if no options are
|
||||||
the same memory function that was used for the compile context.
|
required. The available options are described below. Some of them (in
|
||||||
</P>
|
particular, those that are compatible with Perl, but some others as well) can
|
||||||
<P>
|
also be set and unset from within the pattern (see the detailed description in
|
||||||
The <i>options</i> argument contains various bit settings that affect the
|
the
|
||||||
compilation. It should be zero if no options are required. The available
|
|
||||||
options are described below. Some of them (in particular, those that are
|
|
||||||
compatible with Perl, but some others as well) can also be set and unset from
|
|
||||||
within the pattern (see the detailed description in the
|
|
||||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||||
documentation).
|
documentation).
|
||||||
</P>
|
</P>
|
||||||
|
@ -1433,7 +1494,7 @@ are used for invalid UTF strings. These are the same as given by
|
||||||
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
<a href="pcre2unicode.html"><b>pcre2unicode</b></a>
|
||||||
page. The <b>pcre2_get_error_message()</b> function can be called to obtain a
|
page. The <b>pcre2_get_error_message()</b> function can be called to obtain a
|
||||||
textual error message from any error code.
|
textual error message from any error code.
|
||||||
</P>
|
<a name="jitcompiling"></a></P>
|
||||||
<br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
<br><a name="SEC20" href="#TOC1">JUST-IN-TIME (JIT) COMPILATION</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
|
<b>int pcre2_jit_compile(pcre2_code *<i>code</i>, uint32_t <i>options</i>);</b>
|
||||||
|
@ -3123,7 +3184,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC40" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC40" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 31 January 2016
|
Last updated: 26 February 2016
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2016 University of Cambridge.
|
Copyright © 1997-2016 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -353,9 +353,10 @@ test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
|
||||||
command helps detect tests that are accidentally put in the wrong file.
|
command helps detect tests that are accidentally put in the wrong file.
|
||||||
<pre>
|
<pre>
|
||||||
#pop [<modifiers>]
|
#pop [<modifiers>]
|
||||||
|
#popcopy [<modifiers>]
|
||||||
</pre>
|
</pre>
|
||||||
This command is used to manipulate the stack of compiled patterns, as described
|
These commands are used to manipulate the stack of compiled patterns, as
|
||||||
in the section entitled "Saving and restoring compiled patterns"
|
described in the section entitled "Saving and restoring compiled patterns"
|
||||||
<a href="#saverestore">below.</a>
|
<a href="#saverestore">below.</a>
|
||||||
<pre>
|
<pre>
|
||||||
#save <filename>
|
#save <filename>
|
||||||
|
@ -573,6 +574,7 @@ about the pattern:
|
||||||
posix use the POSIX API
|
posix use the POSIX API
|
||||||
posix_nosub use the POSIX API with REG_NOSUB
|
posix_nosub use the POSIX API with REG_NOSUB
|
||||||
push push compiled pattern onto the stack
|
push push compiled pattern onto the stack
|
||||||
|
pushcopy push a copy onto the stack
|
||||||
stackguard=<number> test the stackguard feature
|
stackguard=<number> test the stackguard feature
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -932,12 +934,16 @@ pushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next
|
||||||
line to contain a new pattern (or a command) instead of a subject line. This
|
line to contain a new pattern (or a command) instead of a subject line. This
|
||||||
facility is used when saving compiled patterns to a file, as described in the
|
facility is used when saving compiled patterns to a file, as described in the
|
||||||
section entitled "Saving and restoring compiled patterns"
|
section entitled "Saving and restoring compiled patterns"
|
||||||
<a href="#saverestore">below.</a>
|
<a href="#saverestore">below. If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled</a>
|
||||||
The <b>push</b> modifier is incompatible with compilation modifiers such as
|
pattern is stacked, leaving the original as current, ready to match the
|
||||||
<b>global</b> that act at match time. Any that are specified are ignored, with a
|
following input lines. This provides a way of testing the
|
||||||
warning message, except for <b>replace</b>, which causes an error. Note that,
|
<b>pcre2_code_copy()</b> function.
|
||||||
<b>jitverify</b>, which is allowed, does not carry through to any subsequent
|
The <b>push</b> and <b>pushcopy </b> modifiers are incompatible with compilation
|
||||||
matching that uses this pattern.
|
modifiers such as <b>global</b> that act at match time. Any that are specified
|
||||||
|
are ignored (for the stacked copy), with a warning message, except for
|
||||||
|
<b>replace</b>, which causes an error. Note that <b>jitverify</b>, which is
|
||||||
|
allowed, does not carry through to any subsequent matching that uses a stacked
|
||||||
|
pattern.
|
||||||
<a name="subjectmodifiers"></a></P>
|
<a name="subjectmodifiers"></a></P>
|
||||||
<br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br>
|
<br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -1530,7 +1536,9 @@ item to be tested. For example:
|
||||||
This output indicates that callout number 0 occurred for a match attempt
|
This output indicates that callout number 0 occurred for a match attempt
|
||||||
starting at the fourth character of the subject string, when the pointer was at
|
starting at the fourth character of the subject string, when the pointer was at
|
||||||
the seventh character, and when the next pattern item was \d. Just
|
the seventh character, and when the next pattern item was \d. Just
|
||||||
one circumflex is output if the start and current positions are the same.
|
one circumflex is output if the start and current positions are the same, or if
|
||||||
|
the current position precedes the start position, which can happen if the
|
||||||
|
callout is in a lookbehind assertion.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
|
||||||
|
@ -1622,11 +1630,16 @@ can be used to test these functions.
|
||||||
<P>
|
<P>
|
||||||
When a pattern with <b>push</b> modifier is successfully compiled, it is pushed
|
When a pattern with <b>push</b> modifier is successfully compiled, it is pushed
|
||||||
onto a stack of compiled patterns, and <b>pcre2test</b> expects the next line to
|
onto a stack of compiled patterns, and <b>pcre2test</b> expects the next line to
|
||||||
contain a new pattern (or command) instead of a subject line. By this means, a
|
contain a new pattern (or command) instead of a subject line. By contrast,
|
||||||
number of patterns can be compiled and retained. The <b>push</b> modifier is
|
the <b>pushcopy</b> modifier causes a copy of the compiled pattern to be
|
||||||
incompatible with <b>posix</b>, and control modifiers that act at match time are
|
stacked, leaving the original available for immediate matching. By using
|
||||||
ignored (with a message). The <b>jitverify</b> modifier applies only at compile
|
<b>push</b> and/or <b>pushcopy</b>, a number of patterns can be compiled and
|
||||||
time. The command
|
retained. These modifiers are incompatible with <b>posix</b>, and control
|
||||||
|
modifiers that act at match time are ignored (with a message) for the stacked
|
||||||
|
patterns. The <b>jitverify</b> modifier applies only at compile time.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The command
|
||||||
<pre>
|
<pre>
|
||||||
#save <filename>
|
#save <filename>
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1643,7 +1656,8 @@ usual by an empty line or end of file. This command may be followed by a
|
||||||
modifier list containing only
|
modifier list containing only
|
||||||
<a href="#controlmodifiers">control modifiers</a>
|
<a href="#controlmodifiers">control modifiers</a>
|
||||||
that act after a pattern has been compiled. In particular, <b>hex</b>,
|
that act after a pattern has been compiled. In particular, <b>hex</b>,
|
||||||
<b>posix</b>, <b>posix_nosub</b>, and <b>push</b> are not allowed, nor are any
|
<b>posix</b>, <b>posix_nosub</b>, <b>push</b>, and <b>pushcopy</b> are not allowed,
|
||||||
|
nor are any
|
||||||
<a href="#optionmodifiers">option-setting modifiers.</a>
|
<a href="#optionmodifiers">option-setting modifiers.</a>
|
||||||
The JIT modifiers are, however permitted. Here is an example that saves and
|
The JIT modifiers are, however permitted. Here is an example that saves and
|
||||||
reloads two patterns.
|
reloads two patterns.
|
||||||
|
@ -1661,6 +1675,11 @@ reloads two patterns.
|
||||||
If <b>jitverify</b> is used with #pop, it does not automatically imply
|
If <b>jitverify</b> is used with #pop, it does not automatically imply
|
||||||
<b>jit</b>, which is different behaviour from when it is used on a pattern.
|
<b>jit</b>, which is different behaviour from when it is used on a pattern.
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
The #popcopy command is analagous to the <b>pushcopy</b> modifier in that it
|
||||||
|
makes current a copy of the topmost stack pattern, leaving the original still
|
||||||
|
on the stack.
|
||||||
|
</P>
|
||||||
<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
|
<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
|
<b>pcre2</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
|
||||||
|
@ -1678,7 +1697,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 31 January 2016
|
Last updated: 06 February 2016
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2016 University of Cambridge.
|
Copyright © 1997-2016 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
503
doc/pcre2.txt
503
doc/pcre2.txt
|
@ -377,6 +377,8 @@ PCRE2 NATIVE API SERIALIZATION FUNCTIONS
|
||||||
|
|
||||||
PCRE2 NATIVE API AUXILIARY FUNCTIONS
|
PCRE2 NATIVE API AUXILIARY FUNCTIONS
|
||||||
|
|
||||||
|
pcre2_code *pcre2_code_copy(const pcre2_code *code);
|
||||||
|
|
||||||
int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer,
|
int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer,
|
||||||
PCRE2_SIZE bufflen);
|
PCRE2_SIZE bufflen);
|
||||||
|
|
||||||
|
@ -523,76 +525,113 @@ PCRE2 API OVERVIEW
|
||||||
return a copy of the subject string with substitutions for parts that
|
return a copy of the subject string with substitutions for parts that
|
||||||
were matched.
|
were matched.
|
||||||
|
|
||||||
Finally, there are functions for finding out information about a com-
|
Functions whose names begin with pcre2_serialize_ are used for saving
|
||||||
piled pattern (pcre2_pattern_info()) and about the configuration with
|
compiled patterns on disc or elsewhere, and reloading them later.
|
||||||
|
|
||||||
|
Finally, there are functions for finding out information about a com-
|
||||||
|
piled pattern (pcre2_pattern_info()) and about the configuration with
|
||||||
which PCRE2 was built (pcre2_config()).
|
which PCRE2 was built (pcre2_config()).
|
||||||
|
|
||||||
|
Functions with names ending with _free() are used for freeing memory
|
||||||
|
blocks of various sorts. In all cases, if one of these functions is
|
||||||
|
called with a NULL argument, it does nothing.
|
||||||
|
|
||||||
|
|
||||||
STRING LENGTHS AND OFFSETS
|
STRING LENGTHS AND OFFSETS
|
||||||
|
|
||||||
The PCRE2 API uses string lengths and offsets into strings of code
|
The PCRE2 API uses string lengths and offsets into strings of code
|
||||||
units in several places. These values are always of type PCRE2_SIZE,
|
units in several places. These values are always of type PCRE2_SIZE,
|
||||||
which is an unsigned integer type, currently always defined as size_t.
|
which is an unsigned integer type, currently always defined as size_t.
|
||||||
The largest value that can be stored in such a type (that is
|
The largest value that can be stored in such a type (that is
|
||||||
~(PCRE2_SIZE)0) is reserved as a special indicator for zero-terminated
|
~(PCRE2_SIZE)0) is reserved as a special indicator for zero-terminated
|
||||||
strings and unset offsets. Therefore, the longest string that can be
|
strings and unset offsets. Therefore, the longest string that can be
|
||||||
handled is one less than this maximum.
|
handled is one less than this maximum.
|
||||||
|
|
||||||
|
|
||||||
NEWLINES
|
NEWLINES
|
||||||
|
|
||||||
PCRE2 supports five different conventions for indicating line breaks in
|
PCRE2 supports five different conventions for indicating line breaks in
|
||||||
strings: a single CR (carriage return) character, a single LF (line-
|
strings: a single CR (carriage return) character, a single LF (line-
|
||||||
feed) character, the two-character sequence CRLF, any of the three pre-
|
feed) character, the two-character sequence CRLF, any of the three pre-
|
||||||
ceding, or any Unicode newline sequence. The Unicode newline sequences
|
ceding, or any Unicode newline sequence. The Unicode newline sequences
|
||||||
are the three just mentioned, plus the single characters VT (vertical
|
are the three just mentioned, plus the single characters VT (vertical
|
||||||
tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
|
tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
|
||||||
separator, U+2028), and PS (paragraph separator, U+2029).
|
separator, U+2028), and PS (paragraph separator, U+2029).
|
||||||
|
|
||||||
Each of the first three conventions is used by at least one operating
|
Each of the first three conventions is used by at least one operating
|
||||||
system as its standard newline sequence. When PCRE2 is built, a default
|
system as its standard newline sequence. When PCRE2 is built, a default
|
||||||
can be specified. The default default is LF, which is the Unix stan-
|
can be specified. The default default is LF, which is the Unix stan-
|
||||||
dard. However, the newline convention can be changed by an application
|
dard. However, the newline convention can be changed by an application
|
||||||
when calling pcre2_compile(), or it can be specified by special text at
|
when calling pcre2_compile(), or it can be specified by special text at
|
||||||
the start of the pattern itself; this overrides any other settings. See
|
the start of the pattern itself; this overrides any other settings. See
|
||||||
the pcre2pattern page for details of the special character sequences.
|
the pcre2pattern page for details of the special character sequences.
|
||||||
|
|
||||||
In the PCRE2 documentation the word "newline" is used to mean "the
|
In the PCRE2 documentation the word "newline" is used to mean "the
|
||||||
character or pair of characters that indicate a line break". The choice
|
character or pair of characters that indicate a line break". The choice
|
||||||
of newline convention affects the handling of the dot, circumflex, and
|
of newline convention affects the handling of the dot, circumflex, and
|
||||||
dollar metacharacters, the handling of #-comments in /x mode, and, when
|
dollar metacharacters, the handling of #-comments in /x mode, and, when
|
||||||
CRLF is a recognized line ending sequence, the match position advance-
|
CRLF is a recognized line ending sequence, the match position advance-
|
||||||
ment for a non-anchored pattern. There is more detail about this in the
|
ment for a non-anchored pattern. There is more detail about this in the
|
||||||
section on pcre2_match() options below.
|
section on pcre2_match() options below.
|
||||||
|
|
||||||
The choice of newline convention does not affect the interpretation of
|
The choice of newline convention does not affect the interpretation of
|
||||||
the \n or \r escape sequences, nor does it affect what \R matches; this
|
the \n or \r escape sequences, nor does it affect what \R matches; this
|
||||||
has its own separate convention.
|
has its own separate convention.
|
||||||
|
|
||||||
|
|
||||||
MULTITHREADING
|
MULTITHREADING
|
||||||
|
|
||||||
In a multithreaded application it is important to keep thread-specific
|
In a multithreaded application it is important to keep thread-specific
|
||||||
data separate from data that can be shared between threads. The PCRE2
|
data separate from data that can be shared between threads. The PCRE2
|
||||||
library code itself is thread-safe: it contains no static or global
|
library code itself is thread-safe: it contains no static or global
|
||||||
variables. The API is designed to be fairly simple for non-threaded
|
variables. The API is designed to be fairly simple for non-threaded
|
||||||
applications while at the same time ensuring that multithreaded appli-
|
applications while at the same time ensuring that multithreaded appli-
|
||||||
cations can use it.
|
cations can use it.
|
||||||
|
|
||||||
There are several different blocks of data that are used to pass infor-
|
There are several different blocks of data that are used to pass infor-
|
||||||
mation between the application and the PCRE2 libraries.
|
mation between the application and the PCRE2 libraries.
|
||||||
|
|
||||||
(1) A pointer to the compiled form of a pattern is returned to the user
|
The compiled pattern
|
||||||
when pcre2_compile() is successful. The data in the compiled pattern is
|
|
||||||
fixed, and does not change when the pattern is matched. Therefore, it
|
|
||||||
is thread-safe, that is, the same compiled pattern can be used by more
|
|
||||||
than one thread simultaneously. An application can compile all its pat-
|
|
||||||
terns at the start, before forking off multiple threads that use them.
|
|
||||||
However, if the just-in-time optimization feature is being used, it
|
|
||||||
needs separate memory stack areas for each thread. See the pcre2jit
|
|
||||||
documentation for more details.
|
|
||||||
|
|
||||||
(2) The next section below introduces the idea of "contexts" in which
|
A pointer to the compiled form of a pattern is returned to the user
|
||||||
|
when pcre2_compile() is successful. The data in the compiled pattern is
|
||||||
|
fixed, and does not change when the pattern is matched. Therefore, it
|
||||||
|
is thread-safe, that is, the same compiled pattern can be used by more
|
||||||
|
than one thread simultaneously. For example, an application can compile
|
||||||
|
all its patterns at the start, before forking off multiple threads that
|
||||||
|
use them. However, if the just-in-time optimization feature is being
|
||||||
|
used, it needs separate memory stack areas for each thread. See the
|
||||||
|
pcre2jit documentation for more details.
|
||||||
|
|
||||||
|
In a more complicated situation, where patterns are compiled only when
|
||||||
|
they are first needed, but are still shared between threads, pointers
|
||||||
|
to compiled patterns must be protected from simultaneous writing by
|
||||||
|
multiple threads, at least until a pattern has been compiled. The logic
|
||||||
|
can be something like this:
|
||||||
|
|
||||||
|
Get a read-only (shared) lock (mutex) for pointer
|
||||||
|
if (pointer == NULL)
|
||||||
|
{
|
||||||
|
Get a write (unique) lock for pointer
|
||||||
|
pointer = pcre2_compile(...
|
||||||
|
}
|
||||||
|
Release the lock
|
||||||
|
Use pointer in pcre2_match()
|
||||||
|
|
||||||
|
Of course, testing for compilation errors should also be included in
|
||||||
|
the code.
|
||||||
|
|
||||||
|
If JIT is being used, but the JIT compilation is not being done immedi-
|
||||||
|
ately, (perhaps waiting to see if the pattern is used often enough)
|
||||||
|
similar logic is required. JIT compilation updates a pointer within the
|
||||||
|
compiled code block, so a thread must gain unique write access to the
|
||||||
|
pointer before calling pcre2_jit_compile(). Alternatively,
|
||||||
|
pcre2_code_copy() can be used to obtain a private copy of the compiled
|
||||||
|
code.
|
||||||
|
|
||||||
|
Context blocks
|
||||||
|
|
||||||
|
The next main section below introduces the idea of "contexts" in which
|
||||||
PCRE2 functions are called. A context is nothing more than a collection
|
PCRE2 functions are called. A context is nothing more than a collection
|
||||||
of parameters that control the way PCRE2 operates. Grouping a number of
|
of parameters that control the way PCRE2 operates. Grouping a number of
|
||||||
parameters together in a context is a convenient way of passing them to
|
parameters together in a context is a convenient way of passing them to
|
||||||
|
@ -605,44 +644,45 @@ MULTITHREADING
|
||||||
threads. However, if any thread needs to change any value in a context,
|
threads. However, if any thread needs to change any value in a context,
|
||||||
it must make its own thread-specific copy.
|
it must make its own thread-specific copy.
|
||||||
|
|
||||||
(3) The matching functions need a block of memory for working space and
|
Match blocks
|
||||||
for storing the results of a match. This includes details of what was
|
|
||||||
|
The matching functions need a block of memory for working space and for
|
||||||
|
storing the results of a match. This includes details of what was
|
||||||
matched, as well as additional information such as the name of a
|
matched, as well as additional information such as the name of a
|
||||||
(*MARK) setting. Each thread must provide its own version of this mem-
|
(*MARK) setting. Each thread must provide its own copy of this memory.
|
||||||
ory.
|
|
||||||
|
|
||||||
|
|
||||||
PCRE2 CONTEXTS
|
PCRE2 CONTEXTS
|
||||||
|
|
||||||
Some PCRE2 functions have a lot of parameters, many of which are used
|
Some PCRE2 functions have a lot of parameters, many of which are used
|
||||||
only by specialist applications, for example, those that use custom
|
only by specialist applications, for example, those that use custom
|
||||||
memory management or non-standard character tables. To keep function
|
memory management or non-standard character tables. To keep function
|
||||||
argument lists at a reasonable size, and at the same time to keep the
|
argument lists at a reasonable size, and at the same time to keep the
|
||||||
API extensible, "uncommon" parameters are passed to certain functions
|
API extensible, "uncommon" parameters are passed to certain functions
|
||||||
in a context instead of directly. A context is just a block of memory
|
in a context instead of directly. A context is just a block of memory
|
||||||
that holds the parameter values. Applications that do not need to
|
that holds the parameter values. Applications that do not need to
|
||||||
adjust any of the context parameters can pass NULL when a context
|
adjust any of the context parameters can pass NULL when a context
|
||||||
pointer is required.
|
pointer is required.
|
||||||
|
|
||||||
There are three different types of context: a general context that is
|
There are three different types of context: a general context that is
|
||||||
relevant for several PCRE2 operations, a compile-time context, and a
|
relevant for several PCRE2 operations, a compile-time context, and a
|
||||||
match-time context.
|
match-time context.
|
||||||
|
|
||||||
The general context
|
The general context
|
||||||
|
|
||||||
At present, this context just contains pointers to (and data for)
|
At present, this context just contains pointers to (and data for)
|
||||||
external memory management functions that are called from several
|
external memory management functions that are called from several
|
||||||
places in the PCRE2 library. The context is named `general' rather than
|
places in the PCRE2 library. The context is named `general' rather than
|
||||||
specifically `memory' because in future other fields may be added. If
|
specifically `memory' because in future other fields may be added. If
|
||||||
you do not want to supply your own custom memory management functions,
|
you do not want to supply your own custom memory management functions,
|
||||||
you do not need to bother with a general context. A general context is
|
you do not need to bother with a general context. A general context is
|
||||||
created by:
|
created by:
|
||||||
|
|
||||||
pcre2_general_context *pcre2_general_context_create(
|
pcre2_general_context *pcre2_general_context_create(
|
||||||
void *(*private_malloc)(PCRE2_SIZE, void *),
|
void *(*private_malloc)(PCRE2_SIZE, void *),
|
||||||
void (*private_free)(void *, void *), void *memory_data);
|
void (*private_free)(void *, void *), void *memory_data);
|
||||||
|
|
||||||
The two function pointers specify custom memory management functions,
|
The two function pointers specify custom memory management functions,
|
||||||
whose prototypes are:
|
whose prototypes are:
|
||||||
|
|
||||||
void *private_malloc(PCRE2_SIZE, void *);
|
void *private_malloc(PCRE2_SIZE, void *);
|
||||||
|
@ -650,16 +690,16 @@ PCRE2 CONTEXTS
|
||||||
|
|
||||||
Whenever code in PCRE2 calls these functions, the final argument is the
|
Whenever code in PCRE2 calls these functions, the final argument is the
|
||||||
value of memory_data. Either of the first two arguments of the creation
|
value of memory_data. Either of the first two arguments of the creation
|
||||||
function may be NULL, in which case the system memory management func-
|
function may be NULL, in which case the system memory management func-
|
||||||
tions malloc() and free() are used. (This is not currently useful, as
|
tions malloc() and free() are used. (This is not currently useful, as
|
||||||
there are no other fields in a general context, but in future there
|
there are no other fields in a general context, but in future there
|
||||||
might be.) The private_malloc() function is used (if supplied) to
|
might be.) The private_malloc() function is used (if supplied) to
|
||||||
obtain memory for storing the context, and all three values are saved
|
obtain memory for storing the context, and all three values are saved
|
||||||
as part of the context.
|
as part of the context.
|
||||||
|
|
||||||
Whenever PCRE2 creates a data block of any kind, the block contains a
|
Whenever PCRE2 creates a data block of any kind, the block contains a
|
||||||
pointer to the free() function that matches the malloc() function that
|
pointer to the free() function that matches the malloc() function that
|
||||||
was used. When the time comes to free the block, this function is
|
was used. When the time comes to free the block, this function is
|
||||||
called.
|
called.
|
||||||
|
|
||||||
A general context can be copied by calling:
|
A general context can be copied by calling:
|
||||||
|
@ -674,7 +714,7 @@ PCRE2 CONTEXTS
|
||||||
|
|
||||||
The compile context
|
The compile context
|
||||||
|
|
||||||
A compile context is required if you want to change the default values
|
A compile context is required if you want to change the default values
|
||||||
of any of the following compile-time parameters:
|
of any of the following compile-time parameters:
|
||||||
|
|
||||||
What \R matches (Unicode newlines or CR, LF, CRLF only)
|
What \R matches (Unicode newlines or CR, LF, CRLF only)
|
||||||
|
@ -684,11 +724,11 @@ PCRE2 CONTEXTS
|
||||||
The maximum length of the pattern string
|
The maximum length of the pattern string
|
||||||
An external function for stack checking
|
An external function for stack checking
|
||||||
|
|
||||||
A compile context is also required if you are using custom memory man-
|
A compile context is also required if you are using custom memory man-
|
||||||
agement. If none of these apply, just pass NULL as the context argu-
|
agement. If none of these apply, just pass NULL as the context argu-
|
||||||
ment of pcre2_compile().
|
ment of pcre2_compile().
|
||||||
|
|
||||||
A compile context is created, copied, and freed by the following func-
|
A compile context is created, copied, and freed by the following func-
|
||||||
tions:
|
tions:
|
||||||
|
|
||||||
pcre2_compile_context *pcre2_compile_context_create(
|
pcre2_compile_context *pcre2_compile_context_create(
|
||||||
|
@ -699,33 +739,33 @@ PCRE2 CONTEXTS
|
||||||
|
|
||||||
void pcre2_compile_context_free(pcre2_compile_context *ccontext);
|
void pcre2_compile_context_free(pcre2_compile_context *ccontext);
|
||||||
|
|
||||||
A compile context is created with default values for its parameters.
|
A compile context is created with default values for its parameters.
|
||||||
These can be changed by calling the following functions, which return 0
|
These can be changed by calling the following functions, which return 0
|
||||||
on success, or PCRE2_ERROR_BADDATA if invalid data is detected.
|
on success, or PCRE2_ERROR_BADDATA if invalid data is detected.
|
||||||
|
|
||||||
int pcre2_set_bsr(pcre2_compile_context *ccontext,
|
int pcre2_set_bsr(pcre2_compile_context *ccontext,
|
||||||
uint32_t value);
|
uint32_t value);
|
||||||
|
|
||||||
The value must be PCRE2_BSR_ANYCRLF, to specify that \R matches only
|
The value must be PCRE2_BSR_ANYCRLF, to specify that \R matches only
|
||||||
CR, LF, or CRLF, or PCRE2_BSR_UNICODE, to specify that \R matches any
|
CR, LF, or CRLF, or PCRE2_BSR_UNICODE, to specify that \R matches any
|
||||||
Unicode line ending sequence. The value is used by the JIT compiler and
|
Unicode line ending sequence. The value is used by the JIT compiler and
|
||||||
by the two interpreted matching functions, pcre2_match() and
|
by the two interpreted matching functions, pcre2_match() and
|
||||||
pcre2_dfa_match().
|
pcre2_dfa_match().
|
||||||
|
|
||||||
int pcre2_set_character_tables(pcre2_compile_context *ccontext,
|
int pcre2_set_character_tables(pcre2_compile_context *ccontext,
|
||||||
const unsigned char *tables);
|
const unsigned char *tables);
|
||||||
|
|
||||||
The value must be the result of a call to pcre2_maketables(), whose
|
The value must be the result of a call to pcre2_maketables(), whose
|
||||||
only argument is a general context. This function builds a set of char-
|
only argument is a general context. This function builds a set of char-
|
||||||
acter tables in the current locale.
|
acter tables in the current locale.
|
||||||
|
|
||||||
int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext,
|
int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext,
|
||||||
PCRE2_SIZE value);
|
PCRE2_SIZE value);
|
||||||
|
|
||||||
This sets a maximum length, in code units, for the pattern string that
|
This sets a maximum length, in code units, for the pattern string that
|
||||||
is to be compiled. If the pattern is longer, an error is generated.
|
is to be compiled. If the pattern is longer, an error is generated.
|
||||||
This facility is provided so that applications that accept patterns
|
This facility is provided so that applications that accept patterns
|
||||||
from external sources can limit their size. The default is the largest
|
from external sources can limit their size. The default is the largest
|
||||||
number that a PCRE2_SIZE variable can hold, which is effectively unlim-
|
number that a PCRE2_SIZE variable can hold, which is effectively unlim-
|
||||||
ited.
|
ited.
|
||||||
|
|
||||||
|
@ -733,38 +773,38 @@ PCRE2 CONTEXTS
|
||||||
uint32_t value);
|
uint32_t value);
|
||||||
|
|
||||||
This specifies which characters or character sequences are to be recog-
|
This specifies which characters or character sequences are to be recog-
|
||||||
nized as newlines. The value must be one of PCRE2_NEWLINE_CR (carriage
|
nized as newlines. The value must be one of PCRE2_NEWLINE_CR (carriage
|
||||||
return only), PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the
|
return only), PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the
|
||||||
two-character sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any
|
two-character sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any
|
||||||
of the above), or PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
of the above), or PCRE2_NEWLINE_ANY (any Unicode newline sequence).
|
||||||
|
|
||||||
When a pattern is compiled with the PCRE2_EXTENDED option, the value of
|
When a pattern is compiled with the PCRE2_EXTENDED option, the value of
|
||||||
this parameter affects the recognition of white space and the end of
|
this parameter affects the recognition of white space and the end of
|
||||||
internal comments starting with #. The value is saved with the compiled
|
internal comments starting with #. The value is saved with the compiled
|
||||||
pattern for subsequent use by the JIT compiler and by the two inter-
|
pattern for subsequent use by the JIT compiler and by the two inter-
|
||||||
preted matching functions, pcre2_match() and pcre2_dfa_match().
|
preted matching functions, pcre2_match() and pcre2_dfa_match().
|
||||||
|
|
||||||
int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext,
|
int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext,
|
||||||
uint32_t value);
|
uint32_t value);
|
||||||
|
|
||||||
This parameter ajusts the limit, set when PCRE2 is built (default 250),
|
This parameter ajusts the limit, set when PCRE2 is built (default 250),
|
||||||
on the depth of parenthesis nesting in a pattern. This limit stops
|
on the depth of parenthesis nesting in a pattern. This limit stops
|
||||||
rogue patterns using up too much system stack when being compiled.
|
rogue patterns using up too much system stack when being compiled.
|
||||||
|
|
||||||
int pcre2_set_compile_recursion_guard(pcre2_compile_context *ccontext,
|
int pcre2_set_compile_recursion_guard(pcre2_compile_context *ccontext,
|
||||||
int (*guard_function)(uint32_t, void *), void *user_data);
|
int (*guard_function)(uint32_t, void *), void *user_data);
|
||||||
|
|
||||||
There is at least one application that runs PCRE2 in threads with very
|
There is at least one application that runs PCRE2 in threads with very
|
||||||
limited system stack, where running out of stack is to be avoided at
|
limited system stack, where running out of stack is to be avoided at
|
||||||
all costs. The parenthesis limit above cannot take account of how much
|
all costs. The parenthesis limit above cannot take account of how much
|
||||||
stack is actually available. For a finer control, you can supply a
|
stack is actually available. For a finer control, you can supply a
|
||||||
function that is called whenever pcre2_compile() starts to compile a
|
function that is called whenever pcre2_compile() starts to compile a
|
||||||
parenthesized part of a pattern. This function can check the actual
|
parenthesized part of a pattern. This function can check the actual
|
||||||
stack size (or anything else that it wants to, of course).
|
stack size (or anything else that it wants to, of course).
|
||||||
|
|
||||||
The first argument to the callout function gives the current depth of
|
The first argument to the callout function gives the current depth of
|
||||||
nesting, and the second is user data that is set up by the last argu-
|
nesting, and the second is user data that is set up by the last argu-
|
||||||
ment of pcre2_set_compile_recursion_guard(). The callout function
|
ment of pcre2_set_compile_recursion_guard(). The callout function
|
||||||
should return zero if all is well, or non-zero to force an error.
|
should return zero if all is well, or non-zero to force an error.
|
||||||
|
|
||||||
The match context
|
The match context
|
||||||
|
@ -778,10 +818,10 @@ PCRE2 CONTEXTS
|
||||||
The limit for calling match() recursively
|
The limit for calling match() recursively
|
||||||
|
|
||||||
A match context is also required if you are using custom memory manage-
|
A match context is also required if you are using custom memory manage-
|
||||||
ment. If none of these apply, just pass NULL as the context argument
|
ment. If none of these apply, just pass NULL as the context argument
|
||||||
of pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match().
|
of pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match().
|
||||||
|
|
||||||
A match context is created, copied, and freed by the following func-
|
A match context is created, copied, and freed by the following func-
|
||||||
tions:
|
tions:
|
||||||
|
|
||||||
pcre2_match_context *pcre2_match_context_create(
|
pcre2_match_context *pcre2_match_context_create(
|
||||||
|
@ -792,7 +832,7 @@ PCRE2 CONTEXTS
|
||||||
|
|
||||||
void pcre2_match_context_free(pcre2_match_context *mcontext);
|
void pcre2_match_context_free(pcre2_match_context *mcontext);
|
||||||
|
|
||||||
A match context is created with default values for its parameters.
|
A match context is created with default values for its parameters.
|
||||||
These can be changed by calling the following functions, which return 0
|
These can be changed by calling the following functions, which return 0
|
||||||
on success, or PCRE2_ERROR_BADDATA if invalid data is detected.
|
on success, or PCRE2_ERROR_BADDATA if invalid data is detected.
|
||||||
|
|
||||||
|
@ -800,96 +840,96 @@ PCRE2 CONTEXTS
|
||||||
int (*callout_function)(pcre2_callout_block *, void *),
|
int (*callout_function)(pcre2_callout_block *, void *),
|
||||||
void *callout_data);
|
void *callout_data);
|
||||||
|
|
||||||
This sets up a "callout" function, which PCRE2 will call at specified
|
This sets up a "callout" function, which PCRE2 will call at specified
|
||||||
points during a matching operation. Details are given in the pcre2call-
|
points during a matching operation. Details are given in the pcre2call-
|
||||||
out documentation.
|
out documentation.
|
||||||
|
|
||||||
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
|
int pcre2_set_offset_limit(pcre2_match_context *mcontext,
|
||||||
PCRE2_SIZE value);
|
PCRE2_SIZE value);
|
||||||
|
|
||||||
The offset_limit parameter limits how far an unanchored search can
|
The offset_limit parameter limits how far an unanchored search can
|
||||||
advance in the subject string. The default value is PCRE2_UNSET. The
|
advance in the subject string. The default value is PCRE2_UNSET. The
|
||||||
pcre2_match() and pcre2_dfa_match() functions return
|
pcre2_match() and pcre2_dfa_match() functions return
|
||||||
PCRE2_ERROR_NOMATCH if a match with a starting point before or at the
|
PCRE2_ERROR_NOMATCH if a match with a starting point before or at the
|
||||||
given offset is not found. For example, if the pattern /abc/ is matched
|
given offset is not found. For example, if the pattern /abc/ is matched
|
||||||
against "123abc" with an offset limit less than 3, the result is
|
against "123abc" with an offset limit less than 3, the result is
|
||||||
PCRE2_ERROR_NO_MATCH. A match can never be found if the startoffset
|
PCRE2_ERROR_NO_MATCH. A match can never be found if the startoffset
|
||||||
argument of pcre2_match() or pcre2_dfa_match() is greater than the off-
|
argument of pcre2_match() or pcre2_dfa_match() is greater than the off-
|
||||||
set limit.
|
set limit.
|
||||||
|
|
||||||
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when
|
When using this facility, you must set PCRE2_USE_OFFSET_LIMIT when
|
||||||
calling pcre2_compile() so that when JIT is in use, different code can
|
calling pcre2_compile() so that when JIT is in use, different code can
|
||||||
be compiled. If a match is started with a non-default match limit when
|
be compiled. If a match is started with a non-default match limit when
|
||||||
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
|
||||||
|
|
||||||
The offset limit facility can be used to track progress when searching
|
The offset limit facility can be used to track progress when searching
|
||||||
large subject strings. See also the PCRE2_FIRSTLINE option, which
|
large subject strings. See also the PCRE2_FIRSTLINE option, which
|
||||||
requires a match to start within the first line of the subject. If this
|
requires a match to start within the first line of the subject. If this
|
||||||
is set with an offset limit, a match must occur in the first line and
|
is set with an offset limit, a match must occur in the first line and
|
||||||
also within the offset limit. In other words, whichever limit comes
|
also within the offset limit. In other words, whichever limit comes
|
||||||
first is used.
|
first is used.
|
||||||
|
|
||||||
int pcre2_set_match_limit(pcre2_match_context *mcontext,
|
int pcre2_set_match_limit(pcre2_match_context *mcontext,
|
||||||
uint32_t value);
|
uint32_t value);
|
||||||
|
|
||||||
The match_limit parameter provides a means of preventing PCRE2 from
|
The match_limit parameter provides a means of preventing PCRE2 from
|
||||||
using up too many resources when processing patterns that are not going
|
using up too many resources when processing patterns that are not going
|
||||||
to match, but which have a very large number of possibilities in their
|
to match, but which have a very large number of possibilities in their
|
||||||
search trees. The classic example is a pattern that uses nested unlim-
|
search trees. The classic example is a pattern that uses nested unlim-
|
||||||
ited repeats.
|
ited repeats.
|
||||||
|
|
||||||
Internally, pcre2_match() uses a function called match(), which it
|
Internally, pcre2_match() uses a function called match(), which it
|
||||||
calls repeatedly (sometimes recursively). The limit set by match_limit
|
calls repeatedly (sometimes recursively). The limit set by match_limit
|
||||||
is imposed on the number of times this function is called during a
|
is imposed on the number of times this function is called during a
|
||||||
match, which has the effect of limiting the amount of backtracking that
|
match, which has the effect of limiting the amount of backtracking that
|
||||||
can take place. For patterns that are not anchored, the count restarts
|
can take place. For patterns that are not anchored, the count restarts
|
||||||
from zero for each position in the subject string. This limit is not
|
from zero for each position in the subject string. This limit is not
|
||||||
relevant to pcre2_dfa_match(), which ignores it.
|
relevant to pcre2_dfa_match(), which ignores it.
|
||||||
|
|
||||||
When pcre2_match() is called with a pattern that was successfully pro-
|
When pcre2_match() is called with a pattern that was successfully pro-
|
||||||
cessed by pcre2_jit_compile(), the way in which matching is executed is
|
cessed by pcre2_jit_compile(), the way in which matching is executed is
|
||||||
entirely different. However, there is still the possibility of runaway
|
entirely different. However, there is still the possibility of runaway
|
||||||
matching that goes on for a very long time, and so the match_limit
|
matching that goes on for a very long time, and so the match_limit
|
||||||
value is also used in this case (but in a different way) to limit how
|
value is also used in this case (but in a different way) to limit how
|
||||||
long the matching can continue.
|
long the matching can continue.
|
||||||
|
|
||||||
The default value for the limit can be set when PCRE2 is built; the
|
The default value for the limit can be set when PCRE2 is built; the
|
||||||
default default is 10 million, which handles all but the most extreme
|
default default is 10 million, which handles all but the most extreme
|
||||||
cases. If the limit is exceeded, pcre2_match() returns
|
cases. If the limit is exceeded, pcre2_match() returns
|
||||||
PCRE2_ERROR_MATCHLIMIT. A value for the match limit may also be sup-
|
PCRE2_ERROR_MATCHLIMIT. A value for the match limit may also be sup-
|
||||||
plied by an item at the start of a pattern of the form
|
plied by an item at the start of a pattern of the form
|
||||||
|
|
||||||
(*LIMIT_MATCH=ddd)
|
(*LIMIT_MATCH=ddd)
|
||||||
|
|
||||||
where ddd is a decimal number. However, such a setting is ignored
|
where ddd is a decimal number. However, such a setting is ignored
|
||||||
unless ddd is less than the limit set by the caller of pcre2_match()
|
unless ddd is less than the limit set by the caller of pcre2_match()
|
||||||
or, if no such limit is set, less than the default.
|
or, if no such limit is set, less than the default.
|
||||||
|
|
||||||
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
|
int pcre2_set_recursion_limit(pcre2_match_context *mcontext,
|
||||||
uint32_t value);
|
uint32_t value);
|
||||||
|
|
||||||
The recursion_limit parameter is similar to match_limit, but instead of
|
The recursion_limit parameter is similar to match_limit, but instead of
|
||||||
limiting the total number of times that match() is called, it limits
|
limiting the total number of times that match() is called, it limits
|
||||||
the depth of recursion. The recursion depth is a smaller number than
|
the depth of recursion. The recursion depth is a smaller number than
|
||||||
the total number of calls, because not all calls to match() are recur-
|
the total number of calls, because not all calls to match() are recur-
|
||||||
sive. This limit is of use only if it is set smaller than match_limit.
|
sive. This limit is of use only if it is set smaller than match_limit.
|
||||||
|
|
||||||
Limiting the recursion depth limits the amount of system stack that can
|
Limiting the recursion depth limits the amount of system stack that can
|
||||||
be used, or, when PCRE2 has been compiled to use memory on the heap
|
be used, or, when PCRE2 has been compiled to use memory on the heap
|
||||||
instead of the stack, the amount of heap memory that can be used. This
|
instead of the stack, the amount of heap memory that can be used. This
|
||||||
limit is not relevant, and is ignored, when matching is done using JIT
|
limit is not relevant, and is ignored, when matching is done using JIT
|
||||||
compiled code or by the pcre2_dfa_match() function.
|
compiled code or by the pcre2_dfa_match() function.
|
||||||
|
|
||||||
The default value for recursion_limit can be set when PCRE2 is built;
|
The default value for recursion_limit can be set when PCRE2 is built;
|
||||||
the default default is the same value as the default for match_limit.
|
the default default is the same value as the default for match_limit.
|
||||||
If the limit is exceeded, pcre2_match() returns PCRE2_ERROR_RECURSION-
|
If the limit is exceeded, pcre2_match() returns PCRE2_ERROR_RECURSION-
|
||||||
LIMIT. A value for the recursion limit may also be supplied by an item
|
LIMIT. A value for the recursion limit may also be supplied by an item
|
||||||
at the start of a pattern of the form
|
at the start of a pattern of the form
|
||||||
|
|
||||||
(*LIMIT_RECURSION=ddd)
|
(*LIMIT_RECURSION=ddd)
|
||||||
|
|
||||||
where ddd is a decimal number. However, such a setting is ignored
|
where ddd is a decimal number. However, such a setting is ignored
|
||||||
unless ddd is less than the limit set by the caller of pcre2_match()
|
unless ddd is less than the limit set by the caller of pcre2_match()
|
||||||
or, if no such limit is set, less than the default.
|
or, if no such limit is set, less than the default.
|
||||||
|
|
||||||
int pcre2_set_recursion_memory_management(
|
int pcre2_set_recursion_memory_management(
|
||||||
|
@ -898,21 +938,21 @@ PCRE2 CONTEXTS
|
||||||
void (*private_free)(void *, void *), void *memory_data);
|
void (*private_free)(void *, void *), void *memory_data);
|
||||||
|
|
||||||
This function sets up two additional custom memory management functions
|
This function sets up two additional custom memory management functions
|
||||||
for use by pcre2_match() when PCRE2 is compiled to use the heap for
|
for use by pcre2_match() when PCRE2 is compiled to use the heap for
|
||||||
remembering backtracking data, instead of recursive function calls that
|
remembering backtracking data, instead of recursive function calls that
|
||||||
use the system stack. There is a discussion about PCRE2's stack usage
|
use the system stack. There is a discussion about PCRE2's stack usage
|
||||||
in the pcre2stack documentation. See the pcre2build documentation for
|
in the pcre2stack documentation. See the pcre2build documentation for
|
||||||
details of how to build PCRE2.
|
details of how to build PCRE2.
|
||||||
|
|
||||||
Using the heap for recursion is a non-standard way of building PCRE2,
|
Using the heap for recursion is a non-standard way of building PCRE2,
|
||||||
for use in environments that have limited stacks. Because of the
|
for use in environments that have limited stacks. Because of the
|
||||||
greater use of memory management, pcre2_match() runs more slowly. Func-
|
greater use of memory management, pcre2_match() runs more slowly. Func-
|
||||||
tions that are different to the general custom memory functions are
|
tions that are different to the general custom memory functions are
|
||||||
provided so that special-purpose external code can be used for this
|
provided so that special-purpose external code can be used for this
|
||||||
case, because the memory blocks are all the same size. The blocks are
|
case, because the memory blocks are all the same size. The blocks are
|
||||||
retained by pcre2_match() until it is about to exit so that they can be
|
retained by pcre2_match() until it is about to exit so that they can be
|
||||||
re-used when possible during the match. In the absence of these func-
|
re-used when possible during the match. In the absence of these func-
|
||||||
tions, the normal custom memory management functions are used, if sup-
|
tions, the normal custom memory management functions are used, if sup-
|
||||||
plied, otherwise the system functions.
|
plied, otherwise the system functions.
|
||||||
|
|
||||||
|
|
||||||
|
@ -920,75 +960,75 @@ CHECKING BUILD-TIME OPTIONS
|
||||||
|
|
||||||
int pcre2_config(uint32_t what, void *where);
|
int pcre2_config(uint32_t what, void *where);
|
||||||
|
|
||||||
The function pcre2_config() makes it possible for a PCRE2 client to
|
The function pcre2_config() makes it possible for a PCRE2 client to
|
||||||
discover which optional features have been compiled into the PCRE2
|
discover which optional features have been compiled into the PCRE2
|
||||||
library. The pcre2build documentation has more details about these
|
library. The pcre2build documentation has more details about these
|
||||||
optional features.
|
optional features.
|
||||||
|
|
||||||
The first argument for pcre2_config() specifies which information is
|
The first argument for pcre2_config() specifies which information is
|
||||||
required. The second argument is a pointer to memory into which the
|
required. The second argument is a pointer to memory into which the
|
||||||
information is placed. If NULL is passed, the function returns the
|
information is placed. If NULL is passed, the function returns the
|
||||||
amount of memory that is needed for the requested information. For
|
amount of memory that is needed for the requested information. For
|
||||||
calls that return numerical values, the value is in bytes; when
|
calls that return numerical values, the value is in bytes; when
|
||||||
requesting these values, where should point to appropriately aligned
|
requesting these values, where should point to appropriately aligned
|
||||||
memory. For calls that return strings, the required length is given in
|
memory. For calls that return strings, the required length is given in
|
||||||
code units, not counting the terminating zero.
|
code units, not counting the terminating zero.
|
||||||
|
|
||||||
When requesting information, the returned value from pcre2_config() is
|
When requesting information, the returned value from pcre2_config() is
|
||||||
non-negative on success, or the negative error code PCRE2_ERROR_BADOP-
|
non-negative on success, or the negative error code PCRE2_ERROR_BADOP-
|
||||||
TION if the value in the first argument is not recognized. The follow-
|
TION if the value in the first argument is not recognized. The follow-
|
||||||
ing information is available:
|
ing information is available:
|
||||||
|
|
||||||
PCRE2_CONFIG_BSR
|
PCRE2_CONFIG_BSR
|
||||||
|
|
||||||
The output is a uint32_t integer whose value indicates what character
|
The output is a uint32_t integer whose value indicates what character
|
||||||
sequences the \R escape sequence matches by default. A value of
|
sequences the \R escape sequence matches by default. A value of
|
||||||
PCRE2_BSR_UNICODE means that \R matches any Unicode line ending
|
PCRE2_BSR_UNICODE means that \R matches any Unicode line ending
|
||||||
sequence; a value of PCRE2_BSR_ANYCRLF means that \R matches only CR,
|
sequence; a value of PCRE2_BSR_ANYCRLF means that \R matches only CR,
|
||||||
LF, or CRLF. The default can be overridden when a pattern is compiled.
|
LF, or CRLF. The default can be overridden when a pattern is compiled.
|
||||||
|
|
||||||
PCRE2_CONFIG_JIT
|
PCRE2_CONFIG_JIT
|
||||||
|
|
||||||
The output is a uint32_t integer that is set to one if support for
|
The output is a uint32_t integer that is set to one if support for
|
||||||
just-in-time compiling is available; otherwise it is set to zero.
|
just-in-time compiling is available; otherwise it is set to zero.
|
||||||
|
|
||||||
PCRE2_CONFIG_JITTARGET
|
PCRE2_CONFIG_JITTARGET
|
||||||
|
|
||||||
The where argument should point to a buffer that is at least 48 code
|
The where argument should point to a buffer that is at least 48 code
|
||||||
units long. (The exact length required can be found by calling
|
units long. (The exact length required can be found by calling
|
||||||
pcre2_config() with where set to NULL.) The buffer is filled with a
|
pcre2_config() with where set to NULL.) The buffer is filled with a
|
||||||
string that contains the name of the architecture for which the JIT
|
string that contains the name of the architecture for which the JIT
|
||||||
compiler is configured, for example "x86 32bit (little endian +
|
compiler is configured, for example "x86 32bit (little endian +
|
||||||
unaligned)". If JIT support is not available, PCRE2_ERROR_BADOPTION is
|
unaligned)". If JIT support is not available, PCRE2_ERROR_BADOPTION is
|
||||||
returned, otherwise the number of code units used is returned. This is
|
returned, otherwise the number of code units used is returned. This is
|
||||||
the length of the string, plus one unit for the terminating zero.
|
the length of the string, plus one unit for the terminating zero.
|
||||||
|
|
||||||
PCRE2_CONFIG_LINKSIZE
|
PCRE2_CONFIG_LINKSIZE
|
||||||
|
|
||||||
The output is a uint32_t integer that contains the number of bytes used
|
The output is a uint32_t integer that contains the number of bytes used
|
||||||
for internal linkage in compiled regular expressions. When PCRE2 is
|
for internal linkage in compiled regular expressions. When PCRE2 is
|
||||||
configured, the value can be set to 2, 3, or 4, with the default being
|
configured, the value can be set to 2, 3, or 4, with the default being
|
||||||
2. This is the value that is returned by pcre2_config(). However, when
|
2. This is the value that is returned by pcre2_config(). However, when
|
||||||
the 16-bit library is compiled, a value of 3 is rounded up to 4, and
|
the 16-bit library is compiled, a value of 3 is rounded up to 4, and
|
||||||
when the 32-bit library is compiled, internal linkages always use 4
|
when the 32-bit library is compiled, internal linkages always use 4
|
||||||
bytes, so the configured value is not relevant.
|
bytes, so the configured value is not relevant.
|
||||||
|
|
||||||
The default value of 2 for the 8-bit and 16-bit libraries is sufficient
|
The default value of 2 for the 8-bit and 16-bit libraries is sufficient
|
||||||
for all but the most massive patterns, since it allows the size of the
|
for all but the most massive patterns, since it allows the size of the
|
||||||
compiled pattern to be up to 64K code units. Larger values allow larger
|
compiled pattern to be up to 64K code units. Larger values allow larger
|
||||||
regular expressions to be compiled by those two libraries, but at the
|
regular expressions to be compiled by those two libraries, but at the
|
||||||
expense of slower matching.
|
expense of slower matching.
|
||||||
|
|
||||||
PCRE2_CONFIG_MATCHLIMIT
|
PCRE2_CONFIG_MATCHLIMIT
|
||||||
|
|
||||||
The output is a uint32_t integer that gives the default limit for the
|
The output is a uint32_t integer that gives the default limit for the
|
||||||
number of internal matching function calls in a pcre2_match() execu-
|
number of internal matching function calls in a pcre2_match() execu-
|
||||||
tion. Further details are given with pcre2_match() below.
|
tion. Further details are given with pcre2_match() below.
|
||||||
|
|
||||||
PCRE2_CONFIG_NEWLINE
|
PCRE2_CONFIG_NEWLINE
|
||||||
|
|
||||||
The output is a uint32_t integer whose value specifies the default
|
The output is a uint32_t integer whose value specifies the default
|
||||||
character sequence that is recognized as meaning "newline". The values
|
character sequence that is recognized as meaning "newline". The values
|
||||||
are:
|
are:
|
||||||
|
|
||||||
PCRE2_NEWLINE_CR Carriage return (CR)
|
PCRE2_NEWLINE_CR Carriage return (CR)
|
||||||
|
@ -997,56 +1037,56 @@ CHECKING BUILD-TIME OPTIONS
|
||||||
PCRE2_NEWLINE_ANY Any Unicode line ending
|
PCRE2_NEWLINE_ANY Any Unicode line ending
|
||||||
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
|
||||||
|
|
||||||
The default should normally correspond to the standard sequence for
|
The default should normally correspond to the standard sequence for
|
||||||
your operating system.
|
your operating system.
|
||||||
|
|
||||||
PCRE2_CONFIG_PARENSLIMIT
|
PCRE2_CONFIG_PARENSLIMIT
|
||||||
|
|
||||||
The output is a uint32_t integer that gives the maximum depth of nest-
|
The output is a uint32_t integer that gives the maximum depth of nest-
|
||||||
ing of parentheses (of any kind) in a pattern. This limit is imposed to
|
ing of parentheses (of any kind) in a pattern. This limit is imposed to
|
||||||
cap the amount of system stack used when a pattern is compiled. It is
|
cap the amount of system stack used when a pattern is compiled. It is
|
||||||
specified when PCRE2 is built; the default is 250. This limit does not
|
specified when PCRE2 is built; the default is 250. This limit does not
|
||||||
take into account the stack that may already be used by the calling
|
take into account the stack that may already be used by the calling
|
||||||
application. For finer control over compilation stack usage, see
|
application. For finer control over compilation stack usage, see
|
||||||
pcre2_set_compile_recursion_guard().
|
pcre2_set_compile_recursion_guard().
|
||||||
|
|
||||||
PCRE2_CONFIG_RECURSIONLIMIT
|
PCRE2_CONFIG_RECURSIONLIMIT
|
||||||
|
|
||||||
The output is a uint32_t integer that gives the default limit for the
|
The output is a uint32_t integer that gives the default limit for the
|
||||||
depth of recursion when calling the internal matching function in a
|
depth of recursion when calling the internal matching function in a
|
||||||
pcre2_match() execution. Further details are given with pcre2_match()
|
pcre2_match() execution. Further details are given with pcre2_match()
|
||||||
below.
|
below.
|
||||||
|
|
||||||
PCRE2_CONFIG_STACKRECURSE
|
PCRE2_CONFIG_STACKRECURSE
|
||||||
|
|
||||||
The output is a uint32_t integer that is set to one if internal recur-
|
The output is a uint32_t integer that is set to one if internal recur-
|
||||||
sion when running pcre2_match() is implemented by recursive function
|
sion when running pcre2_match() is implemented by recursive function
|
||||||
calls that use the system stack to remember their state. This is the
|
calls that use the system stack to remember their state. This is the
|
||||||
usual way that PCRE2 is compiled. The output is zero if PCRE2 was com-
|
usual way that PCRE2 is compiled. The output is zero if PCRE2 was com-
|
||||||
piled to use blocks of data on the heap instead of recursive function
|
piled to use blocks of data on the heap instead of recursive function
|
||||||
calls.
|
calls.
|
||||||
|
|
||||||
PCRE2_CONFIG_UNICODE_VERSION
|
PCRE2_CONFIG_UNICODE_VERSION
|
||||||
|
|
||||||
The where argument should point to a buffer that is at least 24 code
|
The where argument should point to a buffer that is at least 24 code
|
||||||
units long. (The exact length required can be found by calling
|
units long. (The exact length required can be found by calling
|
||||||
pcre2_config() with where set to NULL.) If PCRE2 has been compiled
|
pcre2_config() with where set to NULL.) If PCRE2 has been compiled
|
||||||
without Unicode support, the buffer is filled with the text "Unicode
|
without Unicode support, the buffer is filled with the text "Unicode
|
||||||
not supported". Otherwise, the Unicode version string (for example,
|
not supported". Otherwise, the Unicode version string (for example,
|
||||||
"8.0.0") is inserted. The number of code units used is returned. This
|
"8.0.0") is inserted. The number of code units used is returned. This
|
||||||
is the length of the string plus one unit for the terminating zero.
|
is the length of the string plus one unit for the terminating zero.
|
||||||
|
|
||||||
PCRE2_CONFIG_UNICODE
|
PCRE2_CONFIG_UNICODE
|
||||||
|
|
||||||
The output is a uint32_t integer that is set to one if Unicode support
|
The output is a uint32_t integer that is set to one if Unicode support
|
||||||
is available; otherwise it is set to zero. Unicode support implies UTF
|
is available; otherwise it is set to zero. Unicode support implies UTF
|
||||||
support.
|
support.
|
||||||
|
|
||||||
PCRE2_CONFIG_VERSION
|
PCRE2_CONFIG_VERSION
|
||||||
|
|
||||||
The where argument should point to a buffer that is at least 12 code
|
The where argument should point to a buffer that is at least 12 code
|
||||||
units long. (The exact length required can be found by calling
|
units long. (The exact length required can be found by calling
|
||||||
pcre2_config() with where set to NULL.) The buffer is filled with the
|
pcre2_config() with where set to NULL.) The buffer is filled with the
|
||||||
PCRE2 version string, zero-terminated. The number of code units used is
|
PCRE2 version string, zero-terminated. The number of code units used is
|
||||||
returned. This is the length of the string plus one unit for the termi-
|
returned. This is the length of the string plus one unit for the termi-
|
||||||
nating zero.
|
nating zero.
|
||||||
|
@ -1060,32 +1100,43 @@ COMPILING A PATTERN
|
||||||
|
|
||||||
void pcre2_code_free(pcre2_code *code);
|
void pcre2_code_free(pcre2_code *code);
|
||||||
|
|
||||||
The pcre2_compile() function compiles a pattern into an internal form.
|
pcre2_code *pcre2_code_copy(const pcre2_code *code);
|
||||||
The pattern is defined by a pointer to a string of code units and a
|
|
||||||
length, If the pattern is zero-terminated, the length can be specified
|
|
||||||
as PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of
|
|
||||||
memory that contains the compiled pattern and related data. The caller
|
|
||||||
must free the memory by calling pcre2_code_free() when it is no longer
|
|
||||||
needed.
|
|
||||||
|
|
||||||
NOTE: When one of the matching functions is called, pointers to the
|
The pcre2_compile() function compiles a pattern into an internal form.
|
||||||
compiled pattern and the subject string are set in the match data block
|
The pattern is defined by a pointer to a string of code units and a
|
||||||
so that they can be referenced by the extraction functions. After run-
|
length. If the pattern is zero-terminated, the length can be specified
|
||||||
ning a match, you must not free a compiled pattern (or a subject
|
as PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of
|
||||||
string) until after all operations on the match data block have taken
|
memory that contains the compiled pattern and related data.
|
||||||
place.
|
|
||||||
|
|
||||||
If the compile context argument ccontext is NULL, memory for the com-
|
If the compile context argument ccontext is NULL, memory for the com-
|
||||||
piled pattern is obtained by calling malloc(). Otherwise, it is
|
piled pattern is obtained by calling malloc(). Otherwise, it is
|
||||||
obtained from the same memory function that was used for the compile
|
obtained from the same memory function that was used for the compile
|
||||||
context.
|
context. The caller must free the memory by calling pcre2_code_free()
|
||||||
|
when it is no longer needed.
|
||||||
|
|
||||||
The options argument contains various bit settings that affect the com-
|
The function pcre2_code_copy() makes a copy of the compiled code in new
|
||||||
pilation. It should be zero if no options are required. The available
|
memory, using the same memory allocator as was used for the original.
|
||||||
options are described below. Some of them (in particular, those that
|
However, if the code has been processed by the JIT compiler (see
|
||||||
are compatible with Perl, but some others as well) can also be set and
|
below), the JIT information cannot be copied (because it is position-
|
||||||
unset from within the pattern (see the detailed description in the
|
dependent). The new copy can initially be used only for non-JIT match-
|
||||||
pcre2pattern documentation).
|
ing, though it can be passed to pcre2_jit_compile() if required. The
|
||||||
|
pcre2_code_copy() function provides a way for individual threads in a
|
||||||
|
multithreaded application to acquire a private copy of shared compiled
|
||||||
|
code.
|
||||||
|
|
||||||
|
NOTE: When one of the matching functions is called, pointers to the
|
||||||
|
compiled pattern and the subject string are set in the match data block
|
||||||
|
so that they can be referenced by the extraction functions. After run-
|
||||||
|
ning a match, you must not free a compiled pattern (or a subject
|
||||||
|
string) until after all operations on the match data block have taken
|
||||||
|
place.
|
||||||
|
|
||||||
|
The options argument for pcre2_compile() contains various bit settings
|
||||||
|
that affect the compilation. It should be zero if no options are
|
||||||
|
required. The available options are described below. Some of them (in
|
||||||
|
particular, those that are compatible with Perl, but some others as
|
||||||
|
well) can also be set and unset from within the pattern (see the
|
||||||
|
detailed description in the pcre2pattern documentation).
|
||||||
|
|
||||||
For those options that can be different in different parts of the pat-
|
For those options that can be different in different parts of the pat-
|
||||||
tern, the contents of the options argument specifies their settings at
|
tern, the contents of the options argument specifies their settings at
|
||||||
|
@ -3058,7 +3109,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 31 January 2016
|
Last updated: 26 February 2016
|
||||||
Copyright (c) 1997-2016 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -296,10 +296,11 @@ COMMAND LINES
|
||||||
wrong file.
|
wrong file.
|
||||||
|
|
||||||
#pop [<modifiers>]
|
#pop [<modifiers>]
|
||||||
|
#popcopy [<modifiers>]
|
||||||
|
|
||||||
This command is used to manipulate the stack of compiled patterns, as
|
These commands are used to manipulate the stack of compiled patterns,
|
||||||
described in the section entitled "Saving and restoring compiled pat-
|
as described in the section entitled "Saving and restoring compiled
|
||||||
terns" below.
|
patterns" below.
|
||||||
|
|
||||||
#save <filename>
|
#save <filename>
|
||||||
|
|
||||||
|
@ -518,6 +519,7 @@ PATTERN MODIFIERS
|
||||||
posix use the POSIX API
|
posix use the POSIX API
|
||||||
posix_nosub use the POSIX API with REG_NOSUB
|
posix_nosub use the POSIX API with REG_NOSUB
|
||||||
push push compiled pattern onto the stack
|
push push compiled pattern onto the stack
|
||||||
|
pushcopy push a copy onto the stack
|
||||||
stackguard=<number> test the stackguard feature
|
stackguard=<number> test the stackguard feature
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
|
|
||||||
|
@ -833,11 +835,15 @@ PATTERN MODIFIERS
|
||||||
next line to contain a new pattern (or a command) instead of a subject
|
next line to contain a new pattern (or a command) instead of a subject
|
||||||
line. This facility is used when saving compiled patterns to a file, as
|
line. This facility is used when saving compiled patterns to a file, as
|
||||||
described in the section entitled "Saving and restoring compiled pat-
|
described in the section entitled "Saving and restoring compiled pat-
|
||||||
terns" below. The push modifier is incompatible with compilation modi-
|
terns" below. If pushcopy is used instead of push, a copy of the com-
|
||||||
fiers such as global that act at match time. Any that are specified are
|
piled pattern is stacked, leaving the original as current, ready to
|
||||||
ignored, with a warning message, except for replace, which causes an
|
match the following input lines. This provides a way of testing the
|
||||||
error. Note that, jitverify, which is allowed, does not carry through
|
pcre2_code_copy() function. The push and pushcopy modifiers are
|
||||||
to any subsequent matching that uses this pattern.
|
incompatible with compilation modifiers such as global that act at
|
||||||
|
match time. Any that are specified are ignored (for the stacked copy),
|
||||||
|
with a warning message, except for replace, which causes an error. Note
|
||||||
|
that jitverify, which is allowed, does not carry through to any subse-
|
||||||
|
quent matching that uses a stacked pattern.
|
||||||
|
|
||||||
|
|
||||||
SUBJECT MODIFIERS
|
SUBJECT MODIFIERS
|
||||||
|
@ -1379,10 +1385,11 @@ CALLOUTS
|
||||||
attempt starting at the fourth character of the subject string, when
|
attempt starting at the fourth character of the subject string, when
|
||||||
the pointer was at the seventh character, and when the next pattern
|
the pointer was at the seventh character, and when the next pattern
|
||||||
item was \d. Just one circumflex is output if the start and current
|
item was \d. Just one circumflex is output if the start and current
|
||||||
positions are the same.
|
positions are the same, or if the current position precedes the start
|
||||||
|
position, which can happen if the callout is in a lookbehind assertion.
|
||||||
|
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||||
a result of the /auto_callout pattern modifier. In this case, instead
|
a result of the /auto_callout pattern modifier. In this case, instead
|
||||||
of showing the callout number, the offset in the pattern, preceded by a
|
of showing the callout number, the offset in the pattern, preceded by a
|
||||||
plus, is output. For example:
|
plus, is output. For example:
|
||||||
|
|
||||||
|
@ -1396,7 +1403,7 @@ CALLOUTS
|
||||||
0: E*
|
0: E*
|
||||||
|
|
||||||
If a pattern contains (*MARK) items, an additional line is output when-
|
If a pattern contains (*MARK) items, an additional line is output when-
|
||||||
ever a change of latest mark is passed to the callout function. For
|
ever a change of latest mark is passed to the callout function. For
|
||||||
example:
|
example:
|
||||||
|
|
||||||
re> /a(*MARK:X)bc/auto_callout
|
re> /a(*MARK:X)bc/auto_callout
|
||||||
|
@ -1410,17 +1417,17 @@ CALLOUTS
|
||||||
+12 ^ ^
|
+12 ^ ^
|
||||||
0: abc
|
0: abc
|
||||||
|
|
||||||
The mark changes between matching "a" and "b", but stays the same for
|
The mark changes between matching "a" and "b", but stays the same for
|
||||||
the rest of the match, so nothing more is output. If, as a result of
|
the rest of the match, so nothing more is output. If, as a result of
|
||||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||||
output.
|
output.
|
||||||
|
|
||||||
Callouts with string arguments
|
Callouts with string arguments
|
||||||
|
|
||||||
The output for a callout with a string argument is similar, except that
|
The output for a callout with a string argument is similar, except that
|
||||||
instead of outputting a callout number before the position indicators,
|
instead of outputting a callout number before the position indicators,
|
||||||
the callout string and its offset in the pattern string are output
|
the callout string and its offset in the pattern string are output
|
||||||
before the reflection of the subject string, and the subject string is
|
before the reflection of the subject string, and the subject string is
|
||||||
reflected for each callout. For example:
|
reflected for each callout. For example:
|
||||||
|
|
||||||
re> /^ab(?C'first')cd(?C"second")ef/
|
re> /^ab(?C'first')cd(?C"second")ef/
|
||||||
|
@ -1437,41 +1444,46 @@ CALLOUTS
|
||||||
NON-PRINTING CHARACTERS
|
NON-PRINTING CHARACTERS
|
||||||
|
|
||||||
When pcre2test is outputting text in the compiled version of a pattern,
|
When pcre2test is outputting text in the compiled version of a pattern,
|
||||||
bytes other than 32-126 are always treated as non-printing characters
|
bytes other than 32-126 are always treated as non-printing characters
|
||||||
and are therefore shown as hex escapes.
|
and are therefore shown as hex escapes.
|
||||||
|
|
||||||
When pcre2test is outputting text that is a matched part of a subject
|
When pcre2test is outputting text that is a matched part of a subject
|
||||||
string, it behaves in the same way, unless a different locale has been
|
string, it behaves in the same way, unless a different locale has been
|
||||||
set for the pattern (using the /locale modifier). In this case, the
|
set for the pattern (using the /locale modifier). In this case, the
|
||||||
isprint() function is used to distinguish printing and non-printing
|
isprint() function is used to distinguish printing and non-printing
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
|
|
||||||
SAVING AND RESTORING COMPILED PATTERNS
|
SAVING AND RESTORING COMPILED PATTERNS
|
||||||
|
|
||||||
It is possible to save compiled patterns on disc or elsewhere, and
|
It is possible to save compiled patterns on disc or elsewhere, and
|
||||||
reload them later, subject to a number of restrictions. JIT data cannot
|
reload them later, subject to a number of restrictions. JIT data cannot
|
||||||
be saved. The host on which the patterns are reloaded must be running
|
be saved. The host on which the patterns are reloaded must be running
|
||||||
the same version of PCRE2, with the same code unit width, and must also
|
the same version of PCRE2, with the same code unit width, and must also
|
||||||
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
have the same endianness, pointer width and PCRE2_SIZE type. Before
|
||||||
compiled patterns can be saved they must be serialized, that is, con-
|
compiled patterns can be saved they must be serialized, that is, con-
|
||||||
verted to a stream of bytes. A single byte stream may contain any num-
|
verted to a stream of bytes. A single byte stream may contain any num-
|
||||||
ber of compiled patterns, but they must all use the same character
|
ber of compiled patterns, but they must all use the same character
|
||||||
tables. A single copy of the tables is included in the byte stream (its
|
tables. A single copy of the tables is included in the byte stream (its
|
||||||
size is 1088 bytes).
|
size is 1088 bytes).
|
||||||
|
|
||||||
The functions whose names begin with pcre2_serialize_ are used for
|
The functions whose names begin with pcre2_serialize_ are used for
|
||||||
serializing and de-serializing. They are described in the pcre2serial-
|
serializing and de-serializing. They are described in the pcre2serial-
|
||||||
ize documentation. In this section we describe the features of
|
ize documentation. In this section we describe the features of
|
||||||
pcre2test that can be used to test these functions.
|
pcre2test that can be used to test these functions.
|
||||||
|
|
||||||
When a pattern with push modifier is successfully compiled, it is
|
When a pattern with push modifier is successfully compiled, it is
|
||||||
pushed onto a stack of compiled patterns, and pcre2test expects the
|
pushed onto a stack of compiled patterns, and pcre2test expects the
|
||||||
next line to contain a new pattern (or command) instead of a subject
|
next line to contain a new pattern (or command) instead of a subject
|
||||||
line. By this means, a number of patterns can be compiled and retained.
|
line. By contrast, the pushcopy modifier causes a copy of the compiled
|
||||||
The push modifier is incompatible with posix, and control modifiers
|
pattern to be stacked, leaving the original available for immediate
|
||||||
that act at match time are ignored (with a message). The jitverify mod-
|
matching. By using push and/or pushcopy, a number of patterns can be
|
||||||
ifier applies only at compile time. The command
|
compiled and retained. These modifiers are incompatible with posix, and
|
||||||
|
control modifiers that act at match time are ignored (with a message)
|
||||||
|
for the stacked patterns. The jitverify modifier applies only at com-
|
||||||
|
pile time.
|
||||||
|
|
||||||
|
The command
|
||||||
|
|
||||||
#save <filename>
|
#save <filename>
|
||||||
|
|
||||||
|
@ -1488,9 +1500,10 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
matched with the pattern, terminated as usual by an empty line or end
|
matched with the pattern, terminated as usual by an empty line or end
|
||||||
of file. This command may be followed by a modifier list containing
|
of file. This command may be followed by a modifier list containing
|
||||||
only control modifiers that act after a pattern has been compiled. In
|
only control modifiers that act after a pattern has been compiled. In
|
||||||
particular, hex, posix, posix_nosub, and push are not allowed, nor are
|
particular, hex, posix, posix_nosub, push, and pushcopy are not
|
||||||
any option-setting modifiers. The JIT modifiers are, however permit-
|
allowed, nor are any option-setting modifiers. The JIT modifiers are,
|
||||||
ted. Here is an example that saves and reloads two patterns.
|
however permitted. Here is an example that saves and reloads two pat-
|
||||||
|
terns.
|
||||||
|
|
||||||
/abc/push
|
/abc/push
|
||||||
/xyz/push
|
/xyz/push
|
||||||
|
@ -1502,9 +1515,13 @@ SAVING AND RESTORING COMPILED PATTERNS
|
||||||
#pop jit,bincode
|
#pop jit,bincode
|
||||||
abc
|
abc
|
||||||
|
|
||||||
If jitverify is used with #pop, it does not automatically imply jit,
|
If jitverify is used with #pop, it does not automatically imply jit,
|
||||||
which is different behaviour from when it is used on a pattern.
|
which is different behaviour from when it is used on a pattern.
|
||||||
|
|
||||||
|
The #popcopy command is analagous to the pushcopy modifier in that it
|
||||||
|
makes current a copy of the topmost stack pattern, leaving the original
|
||||||
|
still on the stack.
|
||||||
|
|
||||||
|
|
||||||
SEE ALSO
|
SEE ALSO
|
||||||
|
|
||||||
|
@ -1521,5 +1538,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 31 January 2016
|
Last updated: 06 February 2016
|
||||||
Copyright (c) 1997-2016 University of Cambridge.
|
Copyright (c) 1997-2016 University of Cambridge.
|
||||||
|
|
Loading…
Reference in New Issue