Further substitution tests (code and data), and more documentation.
This commit is contained in:
parent
adc7be2d3a
commit
07f8372202
|
@ -51,4 +51,6 @@ the currrent group as "unset". Thus, the ovector for those groups contained
|
||||||
whatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when
|
whatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when
|
||||||
matched against "abcd".
|
matched against "abcd".
|
||||||
|
|
||||||
|
8. The pcre2_substitute() function has been implemented.
|
||||||
|
|
||||||
****
|
****
|
||||||
|
|
|
@ -135,7 +135,7 @@ remaining sections, except for the <b>pcre2demo</b> section (which is a program
|
||||||
listing), and the short pages for individual functions, are concatenated in
|
listing), and the short pages for individual functions, are concatenated in
|
||||||
<b>pcre2.txt</b>, for ease of searching. The sections are as follows:
|
<b>pcre2.txt</b>, for ease of searching. The sections are as follows:
|
||||||
<pre>
|
<pre>
|
||||||
pcre2 this document FIXME CHECK THIS LIST
|
pcre2 this document
|
||||||
pcre2-config show PCRE2 installation configuration information
|
pcre2-config show PCRE2 installation configuration information
|
||||||
pcre2api details of PCRE2's native C API
|
pcre2api details of PCRE2's native C API
|
||||||
pcre2build building PCRE2
|
pcre2build building PCRE2
|
||||||
|
|
|
@ -1089,7 +1089,7 @@ equivalent to Perl's /x option, and it can be changed within a pattern by a
|
||||||
Which characters are interpreted as newlines can be specified by a setting in
|
Which characters are interpreted as newlines can be specified by a setting in
|
||||||
the compile context that is passed to <b>pcre2_compile()</b> or by a special
|
the compile context that is passed to <b>pcre2_compile()</b> or by a special
|
||||||
sequence at the start of the pattern, as described in the section entitled
|
sequence at the start of the pattern, as described in the section entitled
|
||||||
<a href="pcrepattern.html#newlines">"Newline conventions"</a>
|
<a href="pcre2pattern.html#newlines">"Newline conventions"</a>
|
||||||
in the <b>pcre2pattern</b> documentation. A default is defined when PCRE2 is
|
in the <b>pcre2pattern</b> documentation. A default is defined when PCRE2 is
|
||||||
built.
|
built.
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -1243,7 +1243,7 @@ This option changes the way PCRE2 processes \B, \b, \D, \d, \S, \s, \W,
|
||||||
\w, and some of the POSIX character classes. By default, only ASCII characters
|
\w, and some of the POSIX character classes. By default, only ASCII characters
|
||||||
are recognized, but if PCRE2_UCP is set, Unicode properties are used instead to
|
are recognized, but if PCRE2_UCP is set, Unicode properties are used instead to
|
||||||
classify characters. More details are given in the section on
|
classify characters. More details are given in the section on
|
||||||
<a href="pcre2.html#genericchartypes">generic character types</a>
|
<a href="pcre2pattern.html#genericchartypes">generic character types</a>
|
||||||
in the
|
in the
|
||||||
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||||
page. If you set PCRE2_UCP, matching one of the items it affects takes much
|
page. If you set PCRE2_UCP, matching one of the items it affects takes much
|
||||||
|
@ -1924,11 +1924,8 @@ documentation.
|
||||||
<P>
|
<P>
|
||||||
When PCRE2 is built, a default newline convention is set; this is usually the
|
When PCRE2 is built, a default newline convention is set; this is usually the
|
||||||
standard convention for the operating system. The default can be overridden in
|
standard convention for the operating system. The default can be overridden in
|
||||||
either a
|
a
|
||||||
<a href="#compilecontext">compile context</a>
|
<a href="#compilecontext">compile context.</a>
|
||||||
or a
|
|
||||||
<a href="#matchcontext">match context.</a>
|
|
||||||
However, changing the newline convention at match time disables JIT matching.
|
|
||||||
During matching, the newline choice affects the behaviour of the dot,
|
During matching, the newline choice affects the behaviour of the dot,
|
||||||
circumflex, and dollar metacharacters. It may also alter the way the match
|
circumflex, and dollar metacharacters. It may also alter the way the match
|
||||||
position is advanced after a match failure for an unanchored pattern.
|
position is advanced after a match failure for an unanchored pattern.
|
||||||
|
@ -2290,7 +2287,7 @@ subpattern <i>n</i> has not been used at all, it returns an empty string. This
|
||||||
can be distinguished from a genuine zero-length substring by inspecting the
|
can be distinguished from a genuine zero-length substring by inspecting the
|
||||||
appropriate offset in the ovector, which contains PCRE2_UNSET for unset
|
appropriate offset in the ovector, which contains PCRE2_UNSET for unset
|
||||||
substrings.
|
substrings.
|
||||||
<a name="extractbynname"></a></P>
|
<a name="extractbyname"></a></P>
|
||||||
<br><a name="SEC27" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
<br><a name="SEC27" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
|
<b>int pcre2_substring_number_from_name(const pcre2_code *<i>code</i>,</b>
|
||||||
|
@ -2358,7 +2355,8 @@ string in <i>outputbuffer</i>, replacing the part that was matched with the
|
||||||
be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
|
be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
In the replacement string, which is interpreted as a UTF string in UTF mode, a
|
In the replacement string, which is interpreted as a UTF string in UTF mode,
|
||||||
|
and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set, a
|
||||||
dollar character is an escape character that can specify the insertion of
|
dollar character is an escape character that can specify the insertion of
|
||||||
characters from capturing groups in the pattern. The following forms are
|
characters from capturing groups in the pattern. The following forms are
|
||||||
recognized:
|
recognized:
|
||||||
|
|
|
@ -51,11 +51,12 @@ JIT support is an optional feature of PCRE2. The "configure" option
|
||||||
you want to use JIT. The support is limited to the following hardware
|
you want to use JIT. The support is limited to the following hardware
|
||||||
platforms:
|
platforms:
|
||||||
<pre>
|
<pre>
|
||||||
ARM v5, v7, and Thumb2
|
ARM 32-bit (v5, v7, and Thumb2)
|
||||||
|
ARM 64-bit
|
||||||
Intel x86 32-bit and 64-bit
|
Intel x86 32-bit and 64-bit
|
||||||
MIPS 32-bit
|
MIPS 32-bit and 64-bit
|
||||||
Power PC 32-bit and 64-bit
|
Power PC 32-bit and 64-bit
|
||||||
SPARC 32-bit (experimental)
|
SPARC 32-bit
|
||||||
</pre>
|
</pre>
|
||||||
If --enable-jit is set on an unsupported platform, compilation fails.
|
If --enable-jit is set on an unsupported platform, compilation fails.
|
||||||
</P>
|
</P>
|
||||||
|
@ -73,11 +74,11 @@ To make use of the JIT support in the simplest way, all you have to do is to
|
||||||
call <b>pcre2_jit_compile()</b> after successfully compiling a pattern with
|
call <b>pcre2_jit_compile()</b> after successfully compiling a pattern with
|
||||||
<b>pcre2_compile()</b>. This function has two arguments: the first is the
|
<b>pcre2_compile()</b>. This function has two arguments: the first is the
|
||||||
compiled pattern pointer that was returned by <b>pcre2_compile()</b>, and the
|
compiled pattern pointer that was returned by <b>pcre2_compile()</b>, and the
|
||||||
second is a set of option bits, which must include at least one of
|
second is zero or more of the following option bits: PCRE2_JIT_COMPLETE,
|
||||||
PCRE2_JIT_COMPLETE, PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT.
|
PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If JIT support is not available, a call to <b>pcre2_jit_comple()</b> does
|
If JIT support is not available, a call to <b>pcre2_jit_compile()</b> does
|
||||||
nothing and returns PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled pattern
|
nothing and returns PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled pattern
|
||||||
is passed to the JIT compiler, which turns it into machine code that executes
|
is passed to the JIT compiler, which turns it into machine code that executes
|
||||||
much faster than the normal interpretive code, but yields exactly the same
|
much faster than the normal interpretive code, but yields exactly the same
|
||||||
|
@ -95,6 +96,20 @@ appropriate code is run if it is available. Otherwise, the pattern is matched
|
||||||
using interpretive code.
|
using interpretive code.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
You can call <b>pcre2_jit_compile()</b> multiple times for the same compiled
|
||||||
|
pattern. It does nothing if it has previously compiled code for any of the
|
||||||
|
option bits. For example, you can call it once with PCRE2_JIT_COMPLETE and
|
||||||
|
(perhaps later, when you find you need partial matching) again with
|
||||||
|
PCRE2_JIT_COMPLETE and PCRE2_JIT_PARTIAL_HARD. This time it will ignore
|
||||||
|
PCRE2_JIT_COMPLETE and just compile code for partial matching. If
|
||||||
|
<b>pcre2_jit_compile()</b> is called with no option bits set, it immediately
|
||||||
|
returns zero. This is an alternative way of testing if JIT is available.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
At present, it is not possible to free JIT compiled code except when the entire
|
||||||
|
compiled pattern is freed by calling <b>pcre2_free_code()</b>.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
In some circumstances you may need to call additional functions. These are
|
In some circumstances you may need to call additional functions. These are
|
||||||
described in the section entitled
|
described in the section entitled
|
||||||
<a href="#stackcontrol">"Controlling the JIT stack"</a>
|
<a href="#stackcontrol">"Controlling the JIT stack"</a>
|
||||||
|
@ -167,7 +182,7 @@ memory allocation), a starting size and a maximum size, and it returns a
|
||||||
pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
|
pointer to an opaque structure of type <b>pcre2_jit_stack</b>, or NULL if there
|
||||||
is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
|
is an error. The <b>pcre2_jit_stack_free()</b> function is used to free a stack
|
||||||
that is no longer needed. (For the technically minded: the address space is
|
that is no longer needed. (For the technically minded: the address space is
|
||||||
allocated by mmap or VirtualAlloc.) FIXME Is this right?
|
allocated by mmap or VirtualAlloc.)
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
JIT uses far less memory for recursion than the interpretive code,
|
JIT uses far less memory for recursion than the interpretive code,
|
||||||
|
@ -187,7 +202,8 @@ passed to a matching function, its information determines which JIT stack is
|
||||||
used. There are three cases for the values of the other two options:
|
used. There are three cases for the values of the other two options:
|
||||||
<pre>
|
<pre>
|
||||||
(1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32K block
|
(1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32K block
|
||||||
on the machine stack is used.
|
on the machine stack is used. This is the default when a match
|
||||||
|
context is created.
|
||||||
|
|
||||||
(2) If <i>callback</i> is NULL and <i>data</i> is not NULL, <i>data</i> must be
|
(2) If <i>callback</i> is NULL and <i>data</i> is not NULL, <i>data</i> must be
|
||||||
a pointer to a valid JIT stack, the result of calling
|
a pointer to a valid JIT stack, the result of calling
|
||||||
|
@ -402,7 +418,7 @@ Cambridge CB2 3QH, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC13" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC13" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 08 November 2014
|
Last updated: 12 November 2014
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2014 University of Cambridge.
|
Copyright © 1997-2014 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -100,8 +100,8 @@ page.
|
||||||
<P>
|
<P>
|
||||||
Some applications that allow their users to supply patterns may wish to
|
Some applications that allow their users to supply patterns may wish to
|
||||||
restrict them to non-UTF data for security reasons. If the PCRE2_NEVER_UTF
|
restrict them to non-UTF data for security reasons. If the PCRE2_NEVER_UTF
|
||||||
option is set at compile time, (*UTF) is not allowed, and its appearance causes
|
option is passed to <b>pcre2_compile()</b>, (*UTF) is not allowed, and its
|
||||||
an error.
|
appearance in a pattern causes an error.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Unicode property support
|
Unicode property support
|
||||||
|
@ -113,6 +113,22 @@ such as \d and \w to use Unicode properties to determine character types,
|
||||||
instead of recognizing only characters with codes less than 128 via a lookup
|
instead of recognizing only characters with codes less than 128 via a lookup
|
||||||
table.
|
table.
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
Some applications that allow their users to supply patterns may wish to
|
||||||
|
restrict them for security reasons. If the PCRE2_NEVER_UCP option is passed to
|
||||||
|
<b>pcre2_compile()</b>, (*UCP) is not allowed, and its appearance in a pattern
|
||||||
|
causes an error.
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Locking out empty string matching
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
Starting a pattern with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) has the same effect
|
||||||
|
as passing the PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART option to whichever
|
||||||
|
matching function is subsequently called to match the pattern. These options
|
||||||
|
lock out the matching of empty strings, either entirely, or only at the start
|
||||||
|
of the subject.
|
||||||
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Disabling auto-possessification
|
Disabling auto-possessification
|
||||||
</b><br>
|
</b><br>
|
||||||
|
@ -133,6 +149,28 @@ PCRE2_NO_START_OPTIMIZE option. This disables several optimizations for quickly
|
||||||
reaching "no match" results. For more details, see the
|
reaching "no match" results. For more details, see the
|
||||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
documentation.
|
documentation.
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
|
Setting match and recursion limits
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
The caller of <b>pcre2_match()</b> can set a limit on the number of times the
|
||||||
|
internal <b>match()</b> function is called and on the maximum depth of
|
||||||
|
recursive calls. These facilities are provided to catch runaway matches that
|
||||||
|
are provoked by patterns with huge matching trees (a typical example is a
|
||||||
|
pattern with nested unlimited repeats) and to avoid running out of system stack
|
||||||
|
by too much recursion. When one of these limits is reached, <b>pcre2_match()</b>
|
||||||
|
gives an error return. The limits can also be set by items at the start of the
|
||||||
|
pattern of the form
|
||||||
|
<pre>
|
||||||
|
(*LIMIT_MATCH=d)
|
||||||
|
(*LIMIT_RECURSION=d)
|
||||||
|
</pre>
|
||||||
|
where d is any number of decimal digits. However, the value of the setting must
|
||||||
|
be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
|
||||||
|
for it to have any effect. In other words, the pattern writer can lower the
|
||||||
|
limits set by the programmer, but not raise them. If there is more than one
|
||||||
|
setting of one of these limits, the lower value is used.
|
||||||
<a name="newlines"></a></P>
|
<a name="newlines"></a></P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Newline conventions
|
Newline conventions
|
||||||
|
@ -179,26 +217,14 @@ below. A change of \R setting can be combined with a change of newline
|
||||||
convention.
|
convention.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting match and recursion limits
|
Specifying what \R matches
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The caller of <b>pcre2_match()</b> can set a limit on the number of times the
|
It is possible to restrict \R to match only CR, LF, or CRLF (instead of the
|
||||||
internal <b>match()</b> function is called and on the maximum depth of
|
complete set of Unicode line endings) by setting the option PCRE2_BSR_ANYCRLF
|
||||||
recursive calls. These facilities are provided to catch runaway matches that
|
at compile time. This effect can also be achieved by starting a pattern with
|
||||||
are provoked by patterns with huge matching trees (a typical example is a
|
(*BSR_ANYCRLF). For completeness, (*BSR_UNICODE) is also recognized,
|
||||||
pattern with nested unlimited repeats) and to avoid running out of system stack
|
corresponding to PCRE2_BSR_UNICODE.
|
||||||
by too much recursion. When one of these limits is reached, <b>pcre2_match()</b>
|
|
||||||
gives an error return. The limits can also be set by items at the start of the
|
|
||||||
pattern of the form
|
|
||||||
<pre>
|
|
||||||
(*LIMIT_MATCH=d)
|
|
||||||
(*LIMIT_RECURSION=d)
|
|
||||||
</pre>
|
|
||||||
where d is any number of decimal digits. However, the value of the setting must
|
|
||||||
be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
|
|
||||||
for it to have any effect. In other words, the pattern writer can lower the
|
|
||||||
limits set by the programmer, but not raise them. If there is more than one
|
|
||||||
setting of one of these limits, the lower value is used.
|
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC3" href="#TOC1">EBCDIC CHARACTER CODES</a><br>
|
<br><a name="SEC3" href="#TOC1">EBCDIC CHARACTER CODES</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -2280,8 +2306,8 @@ complex:
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
There are four kinds of condition: references to subpatterns, references to
|
There are five kinds of condition: references to subpatterns, references to
|
||||||
recursion, a pseudo-condition called DEFINE, and assertions.
|
recursion, two pseudo-conditions called DEFINE and VERSION, and assertions.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Checking for a used subpattern by number
|
Checking for a used subpattern by number
|
||||||
|
@ -2389,6 +2415,23 @@ pattern uses references to the named group to match the four dot-separated
|
||||||
components of an IPv4 address, insisting on a word boundary at each end.
|
components of an IPv4 address, insisting on a word boundary at each end.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
|
Checking the PCRE2 version
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
Programs that link with a PCRE2 library can check the version by calling
|
||||||
|
<b>pcre2_config()</b> with appropriate arguments. Users of applications that do
|
||||||
|
not have access to the underlying code cannot do this. A special "condition"
|
||||||
|
called VERSION exists to allow such users to discover which version of PCRE2
|
||||||
|
they are dealing with by using this condition to match a string such as
|
||||||
|
"yesno". VERSION must be followed either by "=" or ">=" and a version number.
|
||||||
|
For example:
|
||||||
|
<pre>
|
||||||
|
(?(VERSION>=10.4)yes|no)
|
||||||
|
</pre>
|
||||||
|
This pattern matches "yes" if the PCRE2 version is greater or equal to 10.4, or
|
||||||
|
"no" otherwise.
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
Assertion conditions
|
Assertion conditions
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -3180,7 +3223,7 @@ subpattern, (*THEN) causes the subroutine match to fail.
|
||||||
<br><a name="SEC28" href="#TOC1">SEE ALSO</a><br>
|
<br><a name="SEC28" href="#TOC1">SEE ALSO</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2api</b>(3), <b>pcre2callout</b>(3), <b>pcre2matching</b>(3),
|
<b>pcre2api</b>(3), <b>pcre2callout</b>(3), <b>pcre2matching</b>(3),
|
||||||
<b>pcre2syntax</b>(3), <b>pcre2</b>(3), <b>pcre216(3)</b>, <b>pcre232(3)</b>.
|
<b>pcre2syntax</b>(3), <b>pcre2</b>(3).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC29" href="#TOC1">AUTHOR</a><br>
|
<br><a name="SEC29" href="#TOC1">AUTHOR</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -3193,7 +3236,7 @@ Cambridge CB2 3QH, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 03 November 2014
|
Last updated: 14 November 2014
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2014 University of Cambridge.
|
Copyright © 1997-2014 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -493,17 +493,18 @@ Each top-level branch of a look behind must be of a fixed length.
|
||||||
(?(condition)yes-pattern)
|
(?(condition)yes-pattern)
|
||||||
(?(condition)yes-pattern|no-pattern)
|
(?(condition)yes-pattern|no-pattern)
|
||||||
|
|
||||||
(?(n)... absolute reference condition
|
(?(n) absolute reference condition
|
||||||
(?(+n)... relative reference condition
|
(?(+n) relative reference condition
|
||||||
(?(-n)... relative reference condition
|
(?(-n) relative reference condition
|
||||||
(?(<name>)... named reference condition (Perl)
|
(?(<name>) named reference condition (Perl)
|
||||||
(?('name')... named reference condition (Perl)
|
(?('name') named reference condition (Perl)
|
||||||
(?(name)... named reference condition (PCRE2)
|
(?(name) named reference condition (PCRE2)
|
||||||
(?(R)... overall recursion condition
|
(?(R) overall recursion condition
|
||||||
(?(Rn)... specific group recursion condition
|
(?(Rn) specific group recursion condition
|
||||||
(?(R&name)... specific recursion condition
|
(?(R&name) specific recursion condition
|
||||||
(?(DEFINE)... define subpattern for reference
|
(?(DEFINE) define subpattern for reference
|
||||||
(?(assert)... assertion condition
|
(?(VERSION[>]=n.m) test PCRE2 version
|
||||||
|
(?(assert) assertion condition
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
|
<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
|
||||||
|
@ -552,7 +553,7 @@ Cambridge CB2 3QH, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 20 October 2014
|
Last updated: 14 November 2014
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2014 University of Cambridge.
|
Copyright © 1997-2014 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -201,10 +201,11 @@ Behave as if each subject line contains the given modifiers.
|
||||||
<P>
|
<P>
|
||||||
<b>-t</b>
|
<b>-t</b>
|
||||||
Run each compile and match many times with a timer, and output the resulting
|
Run each compile and match many times with a timer, and output the resulting
|
||||||
times per compile or match. You can control the number of iterations that are
|
times per compile or match. When JIT is used, separate times are given for the
|
||||||
used for timing by following <b>-t</b> with a number (as a separate item on the
|
initial compile and the JIT compile. You can control the number of iterations
|
||||||
command line). For example, "-t 1000" iterates 1000 times. The default is to
|
that are used for timing by following <b>-t</b> with a number (as a separate
|
||||||
iterate 500,000 times.
|
item on the command line). For example, "-t 1000" iterates 1000 times. The
|
||||||
|
default is to iterate 500,000 times.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-tm</b>
|
<b>-tm</b>
|
||||||
|
@ -490,7 +491,6 @@ about the pattern:
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
</pre>
|
</pre>
|
||||||
The effects of these modifiers are described in the following sections.
|
The effects of these modifiers are described in the following sections.
|
||||||
FIXME: Give more examples.
|
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Newline and \R handling
|
Newline and \R handling
|
||||||
|
@ -528,7 +528,31 @@ one-off tests.
|
||||||
<P>
|
<P>
|
||||||
The <b>info</b> modifier requests information about the compiled pattern
|
The <b>info</b> modifier requests information about the compiled pattern
|
||||||
(whether it is anchored, has a fixed first character, and so on). The
|
(whether it is anchored, has a fixed first character, and so on). The
|
||||||
information is obtained from the <b>pcre2_pattern_info()</b> function.
|
information is obtained from the <b>pcre2_pattern_info()</b> function. Here are
|
||||||
|
some typical examples:
|
||||||
|
<pre>
|
||||||
|
re> /(?i)(^a|^b)/m,info
|
||||||
|
Capturing subpattern count = 1
|
||||||
|
Compile options: multiline
|
||||||
|
Overall options: caseless multiline
|
||||||
|
First code unit at start or follows newline
|
||||||
|
Subject length lower bound = 1
|
||||||
|
|
||||||
|
re> /(?i)abc/info
|
||||||
|
Capturing subpattern count = 0
|
||||||
|
Compile options: <none>
|
||||||
|
Overall options: caseless
|
||||||
|
First code unit = 'a' (caseless)
|
||||||
|
Last code unit = 'c' (caseless)
|
||||||
|
Subject length lower bound = 3
|
||||||
|
</pre>
|
||||||
|
"Compile options" are those specified to the compile function; "overall
|
||||||
|
options" have added options that are taken or deduced from the pattern. If both
|
||||||
|
sets of options are the same, just a single "options" line is output. "First
|
||||||
|
code unit" is where any match must start; if there is more than one they are
|
||||||
|
listed as "starting code units". "Last code unit" is the last literal code unit
|
||||||
|
that must be present in any match. This is not necessarily the last character.
|
||||||
|
These lines are omitted if no starting or ending code units are recorded.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Specifying a pattern in hex
|
Specifying a pattern in hex
|
||||||
|
@ -543,8 +567,8 @@ pairs. For example:
|
||||||
This feature is provided as a way of creating patterns that contain binary zero
|
This feature is provided as a way of creating patterns that contain binary zero
|
||||||
characters. By default, <b>pcre2test</b> passes patterns as zero-terminated
|
characters. By default, <b>pcre2test</b> passes patterns as zero-terminated
|
||||||
strings to <b>pcre2_compile()</b>, giving the length as PCRE2_ZERO_TERMINATED.
|
strings to <b>pcre2_compile()</b>, giving the length as PCRE2_ZERO_TERMINATED.
|
||||||
However, for patterns specified in hexadecimal, the length of the pattern is
|
However, for patterns specified in hexadecimal, the actual length of the
|
||||||
passed.
|
pattern is passed.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
JIT compilation
|
JIT compilation
|
||||||
|
@ -571,7 +595,7 @@ setting the size of the JIT stack.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If the <b>jitfast</b> modifier is specified, matching is done using the JIT
|
If the <b>jitfast</b> modifier is specified, matching is done using the JIT
|
||||||
"fast path" interface (\fBpcre2_jit_match()), which skips some of the sanity
|
"fast path" interface, \fBpcre2_jit_match(), which skips some of the sanity
|
||||||
checks that are done by <b>pcre2_match()</b>, and of course does not work when
|
checks that are done by <b>pcre2_match()</b>, and of course does not work when
|
||||||
JIT is not supported. If <b>jitfast</b> is specified without <b>jit</b>, jit=7 is
|
JIT is not supported. If <b>jitfast</b> is specified without <b>jit</b>, jit=7 is
|
||||||
assumed.
|
assumed.
|
||||||
|
@ -604,11 +628,17 @@ character tables are mutually exclusive.
|
||||||
Showing pattern memory
|
Showing pattern memory
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
The <b>/memory</b> modifier causes the size in bytes of the memory block used to
|
The <b>/memory</b> modifier causes the size in bytes of the memory used to hold
|
||||||
hold the compiled pattern to be output. This does not include the size of the
|
the compiled pattern to be output. This does not include the size of the
|
||||||
<b>pcre2_code</b> block; it is just the actual compiled data. If the pattern is
|
<b>pcre2_code</b> block; it is just the actual compiled data. If the pattern is
|
||||||
subsequently passed to the JIT compiler, the size of the JIT compiled code is
|
subsequently passed to the JIT compiler, the size of the JIT compiled code is
|
||||||
also output.
|
also output. Here is an example:
|
||||||
|
<pre>
|
||||||
|
re> /a(b)c/jit,memory
|
||||||
|
Memory allocation (code space): 21
|
||||||
|
Memory allocation (JIT code): 1910
|
||||||
|
|
||||||
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Limiting nested parentheses
|
Limiting nested parentheses
|
||||||
|
@ -650,8 +680,8 @@ enable stack availability to be checked during compilation (see the
|
||||||
<a href="pcre2api.html"><b>pcre2api</b></a>
|
<a href="pcre2api.html"><b>pcre2api</b></a>
|
||||||
documentation for details). If the number specified by the modifier is greater
|
documentation for details). If the number specified by the modifier is greater
|
||||||
than zero, <b>pcre2_set_compile_recursion_guard()</b> is called to set up
|
than zero, <b>pcre2_set_compile_recursion_guard()</b> is called to set up
|
||||||
callback from <b>pcre2_compile()</b> to a local function. The argument it is
|
callback from <b>pcre2_compile()</b> to a local function. The argument it
|
||||||
passed is the current nesting parenthesis depth; if this is greater than the
|
receives is the current nesting parenthesis depth; if this is greater than the
|
||||||
value given by the modifier, non-zero is returned, causing the compilation to
|
value given by the modifier, non-zero is returned, causing the compilation to
|
||||||
be aborted.
|
be aborted.
|
||||||
</P>
|
</P>
|
||||||
|
@ -688,6 +718,7 @@ not affect the compilation process.
|
||||||
allusedtext show all consulted text
|
allusedtext show all consulted text
|
||||||
/g global global matching
|
/g global global matching
|
||||||
mark show mark values
|
mark show mark values
|
||||||
|
replace=<string> specify a replacement string
|
||||||
startchar show starting character when relevant
|
startchar show starting character when relevant
|
||||||
</pre>
|
</pre>
|
||||||
These modifiers may not appear in a <b>#pattern</b> command. If you want them as
|
These modifiers may not appear in a <b>#pattern</b> command. If you want them as
|
||||||
|
@ -759,11 +790,11 @@ pattern.
|
||||||
offset=<n> set starting offset
|
offset=<n> set starting offset
|
||||||
ovector=<n> set size of output vector
|
ovector=<n> set size of output vector
|
||||||
recursion_limit=<n> set a recursion limit
|
recursion_limit=<n> set a recursion limit
|
||||||
|
replace=<string> specify a replacement string
|
||||||
startchar show startchar when relevant
|
startchar show startchar when relevant
|
||||||
zero_terminate pass the subject as zero-terminated
|
zero_terminate pass the subject as zero-terminated
|
||||||
</pre>
|
</pre>
|
||||||
The effects of these modifiers are described in the following sections.
|
The effects of these modifiers are described in the following sections.
|
||||||
FIXME: Give more examples.
|
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Showing more text
|
Showing more text
|
||||||
|
@ -841,6 +872,30 @@ Any value other than zero is used as a return from <b>pcre2test</b>'s callout
|
||||||
function.
|
function.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
|
Finding all matches in a string
|
||||||
|
</b><br>
|
||||||
|
<P>
|
||||||
|
Searching for all possible matches within a subject can be requested by the
|
||||||
|
<b>global</b> or <b>/altglobal</b> modifier. After finding a match, the matching
|
||||||
|
function is called again to search the remainder of the subject. The difference
|
||||||
|
between <b>global</b> and <b>altglobal</b> is that the former uses the
|
||||||
|
<i>start_offset</i> argument to <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
|
||||||
|
to start searching at a new point within the entire string (which is what Perl
|
||||||
|
does), whereas the latter passes over a shortened substring. This makes a
|
||||||
|
difference to the matching process if the pattern begins with a lookbehind
|
||||||
|
assertion (including \b or \B).
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
If an empty string is matched, the next match is done with the
|
||||||
|
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for
|
||||||
|
another, non-empty, match at the same point in the subject. If this match
|
||||||
|
fails, the start offset is advanced, and the normal match is retried. This
|
||||||
|
imitates the way Perl handles such cases when using the <b>/g</b> modifier or
|
||||||
|
the <b>split()</b> function. Normally, the start offset is advanced by one
|
||||||
|
character, but if the newline convention recognizes CRLF as a newline, and the
|
||||||
|
current character is CR followed by LF, an advance of two is used.
|
||||||
|
</P>
|
||||||
|
<br><b>
|
||||||
Testing substring extraction functions
|
Testing substring extraction functions
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -867,28 +922,46 @@ length (that is, the return from the extraction function) is given in
|
||||||
parentheses after each substring.
|
parentheses after each substring.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Finding all matches in a string
|
Testing the substitution function
|
||||||
</b><br>
|
</b><br>
|
||||||
<P>
|
<P>
|
||||||
Searching for all possible matches within a subject can be requested by the
|
If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
|
||||||
<b>global</b> or <b>/altglobal</b> modifier. After finding a match, the matching
|
called instead of one of the matching functions. Unlike subject strings,
|
||||||
function is called again to search the remainder of the subject. The difference
|
<b>pcre2test</b> does not process replacement strings for escape sequences. In
|
||||||
between <b>global</b> and <b>altglobal</b> is that the former uses the
|
UTF mode, a replacement string is checked to see if it is a valid UTF-8 string.
|
||||||
<i>start_offset</i> argument to <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
|
If so, it is correctly converted to a UTF string of the appropriate code unit
|
||||||
to start searching at a new point within the entire string (which is what Perl
|
width. If it is not a valid UTF-8 string, the individual code units are copied
|
||||||
does), whereas the latter passes over a shortened substring. This makes a
|
directly. This provides a means of passing an invalid UTF-8 string for testing
|
||||||
difference to the matching process if the pattern begins with a lookbehind
|
purposes.
|
||||||
assertion (including \b or \B).
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If an empty string is matched, the next match is done with the
|
If the <b>global</b> modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
|
||||||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for
|
<b>pcre2_substitute()</b>. After a successful substitution, the modified string
|
||||||
another, non-empty, match at the same point in the subject. If this match
|
is output, preceded by the number of replacements. This may be zero if there
|
||||||
fails, the start offset is advanced, and the normal match is retried. This
|
were no matches. Here is a simple example of a substitution test:
|
||||||
imitates the way Perl handles such cases when using the <b>/g</b> modifier or
|
<pre>
|
||||||
the <b>split()</b> function. Normally, the start offset is advanced by one
|
/abc/replace=xxx
|
||||||
character, but if the newline convention recognizes CRLF as a newline, and the
|
=abc=abc=
|
||||||
current character is CR followed by LF, an advance of two is used.
|
1: =xxx=abc=
|
||||||
|
=abc=abc=\=global
|
||||||
|
2: =xxx=xxx=
|
||||||
|
</pre>
|
||||||
|
Subject and replacement strings should be kept relatively short for
|
||||||
|
substitution tests, as fixed-size buffers are used. To make it easy to test for
|
||||||
|
buffer overflow, if the replacement string starts with a number in square
|
||||||
|
brackets, that number is passed to <b>pcre2_substitute()</b> as the size of the
|
||||||
|
output buffer, with the replacement string starting at the next character. Here
|
||||||
|
is an example that tests the edge case:
|
||||||
|
<pre>
|
||||||
|
/abc/
|
||||||
|
123abc123\=replace=[10]XYZ
|
||||||
|
1: 123XYZ123
|
||||||
|
123abc123\=replace=[9]XYZ
|
||||||
|
Failed: error -47: no more memory
|
||||||
|
</pre>
|
||||||
|
A replacement string is ignored with POSIX and DFA matching. Specifying partial
|
||||||
|
matching provokes an error return ("bad option value") from
|
||||||
|
<b>pcre2_substitute()</b>.
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Setting the JIT stack size
|
Setting the JIT stack size
|
||||||
|
@ -969,10 +1042,10 @@ available for storing matching information. The default is 15.
|
||||||
A value of zero is useful when testing the POSIX API because it causes
|
A value of zero is useful when testing the POSIX API because it causes
|
||||||
<b>regexec()</b> to be called with a NULL capture vector. When not testing the
|
<b>regexec()</b> to be called with a NULL capture vector. When not testing the
|
||||||
POSIX API, a value of zero is used to cause
|
POSIX API, a value of zero is used to cause
|
||||||
<b>pcre2_match_data_create_from_pattern</b> to be called, in order to create a
|
<b>pcre2_match_data_create_from_pattern()</b> to be called, in order to create a
|
||||||
match block of exactly the right size for the pattern. (It is not possible to
|
match block of exactly the right size for the pattern. (It is not possible to
|
||||||
create a match block with a zero-length ovector; there is always one pair of
|
create a match block with a zero-length ovector; there is always at least one
|
||||||
offsets.)
|
pair of offsets.)
|
||||||
</P>
|
</P>
|
||||||
<br><b>
|
<br><b>
|
||||||
Passing the subject as zero-terminated
|
Passing the subject as zero-terminated
|
||||||
|
@ -985,7 +1058,7 @@ be passed as PCRE2_ZERO_TERMINATED. (When matching via the POSIX interface,
|
||||||
this modifier has no effect, as there is no facility for passing a length.)
|
this modifier has no effect, as there is no facility for passing a length.)
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
When testing <b>pcre2_substitute</b>, this modifier also has the effect of
|
When testing <b>pcre2_substitute()</b>, this modifier also has the effect of
|
||||||
passing the replacement string as zero-terminated.
|
passing the replacement string as zero-terminated.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
|
<br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
|
||||||
|
@ -1233,7 +1306,7 @@ Cambridge CB2 3QH, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC20" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC20" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 09 November 2014
|
Last updated: 14 November 2014
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2014 University of Cambridge.
|
Copyright © 1997-2014 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -132,7 +132,7 @@ remaining sections, except for the \fBpcre2demo\fP section (which is a program
|
||||||
listing), and the short pages for individual functions, are concatenated in
|
listing), and the short pages for individual functions, are concatenated in
|
||||||
\fBpcre2.txt\fP, for ease of searching. The sections are as follows:
|
\fBpcre2.txt\fP, for ease of searching. The sections are as follows:
|
||||||
.sp
|
.sp
|
||||||
pcre2 this document FIXME CHECK THIS LIST
|
pcre2 this document
|
||||||
pcre2-config show PCRE2 installation configuration information
|
pcre2-config show PCRE2 installation configuration information
|
||||||
pcre2api details of PCRE2's native C API
|
pcre2api details of PCRE2's native C API
|
||||||
pcre2build building PCRE2
|
pcre2build building PCRE2
|
||||||
|
|
432
doc/pcre2.txt
432
doc/pcre2.txt
|
@ -116,7 +116,7 @@ USER DOCUMENTATION
|
||||||
tions, are concatenated in pcre2.txt, for ease of searching. The sec-
|
tions, are concatenated in pcre2.txt, for ease of searching. The sec-
|
||||||
tions are as follows:
|
tions are as follows:
|
||||||
|
|
||||||
pcre2 this document FIXME CHECK THIS LIST
|
pcre2 this document
|
||||||
pcre2-config show PCRE2 installation configuration information
|
pcre2-config show PCRE2 installation configuration information
|
||||||
pcre2api details of PCRE2's native C API
|
pcre2api details of PCRE2's native C API
|
||||||
pcre2build building PCRE2
|
pcre2build building PCRE2
|
||||||
|
@ -1928,12 +1928,10 @@ NEWLINE HANDLING WHEN MATCHING
|
||||||
|
|
||||||
When PCRE2 is built, a default newline convention is set; this is usu-
|
When PCRE2 is built, a default newline convention is set; this is usu-
|
||||||
ally the standard convention for the operating system. The default can
|
ally the standard convention for the operating system. The default can
|
||||||
be overridden in either a compile context or a match context. However,
|
be overridden in a compile context. During matching, the newline
|
||||||
changing the newline convention at match time disables JIT matching.
|
choice affects the behaviour of the dot, circumflex, and dollar
|
||||||
During matching, the newline choice affects the behaviour of the dot,
|
metacharacters. It may also alter the way the match position is
|
||||||
circumflex, and dollar metacharacters. It may also alter the way the
|
advanced after a match failure for an unanchored pattern.
|
||||||
match position is advanced after a match failure for an unanchored pat-
|
|
||||||
tern.
|
|
||||||
|
|
||||||
When PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is
|
When PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is
|
||||||
set, and a match attempt for an unanchored pattern fails when the cur-
|
set, and a match attempt for an unanchored pattern fails when the cur-
|
||||||
|
@ -2320,46 +2318,47 @@ CREATING A NEW STRING WITH SUBSTITUTIONS
|
||||||
given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
|
given as PCRE2_ZERO_TERMINATED for a zero-terminated string.
|
||||||
|
|
||||||
In the replacement string, which is interpreted as a UTF string in UTF
|
In the replacement string, which is interpreted as a UTF string in UTF
|
||||||
mode, a dollar character is an escape character that can specify the
|
mode, and is checked for UTF validity unless the PCRE2_NO_UTF_CHECK
|
||||||
insertion of characters from capturing groups in the pattern. The fol-
|
option is set, a dollar character is an escape character that can spec-
|
||||||
lowing forms are recognized:
|
ify the insertion of characters from capturing groups in the pattern.
|
||||||
|
The following forms are recognized:
|
||||||
|
|
||||||
$$ insert a dollar character
|
$$ insert a dollar character
|
||||||
$<n> insert the contents of group <n>
|
$<n> insert the contents of group <n>
|
||||||
${<n>} insert the contents of group <n>
|
${<n>} insert the contents of group <n>
|
||||||
|
|
||||||
Either a group number or a group name can be given for <n>. Curly
|
Either a group number or a group name can be given for <n>. Curly
|
||||||
brackets are required only if the following character would be inter-
|
brackets are required only if the following character would be inter-
|
||||||
preted as part of the number or name. The number may be zero to include
|
preted as part of the number or name. The number may be zero to include
|
||||||
the entire matched string. For example, if the pattern a(b)c is
|
the entire matched string. For example, if the pattern a(b)c is
|
||||||
matched with "[abc]" and the replacement string "+$1$0$1+", the result
|
matched with "[abc]" and the replacement string "+$1$0$1+", the result
|
||||||
is "[+babcb+]". Group insertion is done by calling pcre2_copy_byname()
|
is "[+babcb+]". Group insertion is done by calling pcre2_copy_byname()
|
||||||
or pcre2_copy_bynumber() as appropriate.
|
or pcre2_copy_bynumber() as appropriate.
|
||||||
|
|
||||||
The first seven arguments of pcre2_substitute() are the same as for
|
The first seven arguments of pcre2_substitute() are the same as for
|
||||||
pcre2_match(), except that the partial matching options are not permit-
|
pcre2_match(), except that the partial matching options are not permit-
|
||||||
ted, and match_data may be passed as NULL, in which case a match data
|
ted, and match_data may be passed as NULL, in which case a match data
|
||||||
block is obtained and freed within this function, using memory manage-
|
block is obtained and freed within this function, using memory manage-
|
||||||
ment functions from the match context, if provided, or else those that
|
ment functions from the match context, if provided, or else those that
|
||||||
were used to allocate memory for the compiled code.
|
were used to allocate memory for the compiled code.
|
||||||
|
|
||||||
There is one additional option, PCRE2_SUBSTITUTE_GLOBAL, which causes
|
There is one additional option, PCRE2_SUBSTITUTE_GLOBAL, which causes
|
||||||
the function to iterate over the subject string, replacing every match-
|
the function to iterate over the subject string, replacing every match-
|
||||||
ing substring. If this is not set, only the first matching substring is
|
ing substring. If this is not set, only the first matching substring is
|
||||||
replaced.
|
replaced.
|
||||||
|
|
||||||
The outlengthptr argument must point to a variable that contains the
|
The outlengthptr argument must point to a variable that contains the
|
||||||
length, in code units, of the output buffer. It is updated to contain
|
length, in code units, of the output buffer. It is updated to contain
|
||||||
the length of the new string, excluding the trailing zero that is auto-
|
the length of the new string, excluding the trailing zero that is auto-
|
||||||
matically added.
|
matically added.
|
||||||
|
|
||||||
The function returns the number of replacements that were made. This
|
The function returns the number of replacements that were made. This
|
||||||
may be zero if no matches were found, and is never greater than 1
|
may be zero if no matches were found, and is never greater than 1
|
||||||
unless PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a neg-
|
unless PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a neg-
|
||||||
ative error code is returned. Except for PCRE2_ERROR_NOMATCH (which is
|
ative error code is returned. Except for PCRE2_ERROR_NOMATCH (which is
|
||||||
never returned), any errors from pcre2_match() or the substring copying
|
never returned), any errors from pcre2_match() or the substring copying
|
||||||
functions are passed straight back. PCRE2_ERROR_BADREPLACEMENT is
|
functions are passed straight back. PCRE2_ERROR_BADREPLACEMENT is
|
||||||
returned for an invalid replacement string (unrecognized sequence fol-
|
returned for an invalid replacement string (unrecognized sequence fol-
|
||||||
lowing a dollar sign), and PCRE2_ERROR_NOMEMORY is returned if the out-
|
lowing a dollar sign), and PCRE2_ERROR_NOMEMORY is returned if the out-
|
||||||
put buffer is not big enough.
|
put buffer is not big enough.
|
||||||
|
|
||||||
|
@ -2369,54 +2368,54 @@ DUPLICATE SUBPATTERN NAMES
|
||||||
int pcre2_substring_nametable_scan(const pcre2_code *code,
|
int pcre2_substring_nametable_scan(const pcre2_code *code,
|
||||||
PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);
|
PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);
|
||||||
|
|
||||||
When a pattern is compiled with the PCRE2_DUPNAMES option, names for
|
When a pattern is compiled with the PCRE2_DUPNAMES option, names for
|
||||||
subpatterns are not required to be unique. Duplicate names are always
|
subpatterns are not required to be unique. Duplicate names are always
|
||||||
allowed for subpatterns with the same number, created by using the (?|
|
allowed for subpatterns with the same number, created by using the (?|
|
||||||
feature. Indeed, if such subpatterns are named, they are required to
|
feature. Indeed, if such subpatterns are named, they are required to
|
||||||
use the same names.
|
use the same names.
|
||||||
|
|
||||||
Normally, patterns with duplicate names are such that in any one match,
|
Normally, patterns with duplicate names are such that in any one match,
|
||||||
only one of the named subpatterns participates. An example is shown in
|
only one of the named subpatterns participates. An example is shown in
|
||||||
the pcre2pattern documentation.
|
the pcre2pattern documentation.
|
||||||
|
|
||||||
When duplicates are present, pcre2_substring_copy_byname() and
|
When duplicates are present, pcre2_substring_copy_byname() and
|
||||||
pcre2_substring_get_byname() return the first substring corresponding
|
pcre2_substring_get_byname() return the first substring corresponding
|
||||||
to the given name that is set. If none are set, PCRE2_ERROR_NOSUBSTRING
|
to the given name that is set. If none are set, PCRE2_ERROR_NOSUBSTRING
|
||||||
is returned. The pcre2_substring_number_from_name() function returns
|
is returned. The pcre2_substring_number_from_name() function returns
|
||||||
one of the numbers that are associated with the name, but it is not
|
one of the numbers that are associated with the name, but it is not
|
||||||
defined which it is.
|
defined which it is.
|
||||||
|
|
||||||
If you want to get full details of all captured substrings for a given
|
If you want to get full details of all captured substrings for a given
|
||||||
name, you must use the pcre2_substring_nametable_scan() function. The
|
name, you must use the pcre2_substring_nametable_scan() function. The
|
||||||
first argument is the compiled pattern, and the second is the name. If
|
first argument is the compiled pattern, and the second is the name. If
|
||||||
the third and fourth arguments are NULL, the function returns a group
|
the third and fourth arguments are NULL, the function returns a group
|
||||||
number (it is not defined which). Otherwise, the third and fourth argu-
|
number (it is not defined which). Otherwise, the third and fourth argu-
|
||||||
ments must be pointers to variables that are updated by the function.
|
ments must be pointers to variables that are updated by the function.
|
||||||
After it has run, they point to the first and last entries in the name-
|
After it has run, they point to the first and last entries in the name-
|
||||||
to-number table for the given name, and the function returns the length
|
to-number table for the given name, and the function returns the length
|
||||||
of each entry. In both cases, PCRE2_ERROR_NOSUBSTRING is returned if
|
of each entry. In both cases, PCRE2_ERROR_NOSUBSTRING is returned if
|
||||||
there are no entries for the given name.
|
there are no entries for the given name.
|
||||||
|
|
||||||
The format of the name table is described above in the section entitled
|
The format of the name table is described above in the section entitled
|
||||||
Information about a pattern above. Given all the relevant entries for
|
Information about a pattern above. Given all the relevant entries for
|
||||||
the name, you can extract each of their numbers, and hence the captured
|
the name, you can extract each of their numbers, and hence the captured
|
||||||
data.
|
data.
|
||||||
|
|
||||||
|
|
||||||
FINDING ALL POSSIBLE MATCHES
|
FINDING ALL POSSIBLE MATCHES
|
||||||
|
|
||||||
The traditional matching function uses a similar algorithm to Perl,
|
The traditional matching function uses a similar algorithm to Perl,
|
||||||
which stops when it finds the first match, starting at a given point in
|
which stops when it finds the first match, starting at a given point in
|
||||||
the subject. If you want to find all possible matches, or the longest
|
the subject. If you want to find all possible matches, or the longest
|
||||||
possible match at a given position, consider using the alternative
|
possible match at a given position, consider using the alternative
|
||||||
matching function (see below) instead. If you cannot use the alterna-
|
matching function (see below) instead. If you cannot use the alterna-
|
||||||
tive function, you can kludge it up by making use of the callout facil-
|
tive function, you can kludge it up by making use of the callout facil-
|
||||||
ity, which is described in the pcre2callout documentation.
|
ity, which is described in the pcre2callout documentation.
|
||||||
|
|
||||||
What you have to do is to insert a callout right at the end of the pat-
|
What you have to do is to insert a callout right at the end of the pat-
|
||||||
tern. When your callout function is called, extract and save the cur-
|
tern. When your callout function is called, extract and save the cur-
|
||||||
rent matched substring. Then return 1, which forces pcre2_match() to
|
rent matched substring. Then return 1, which forces pcre2_match() to
|
||||||
backtrack and try other alternatives. Ultimately, when it runs out of
|
backtrack and try other alternatives. Ultimately, when it runs out of
|
||||||
matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.
|
matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.
|
||||||
|
|
||||||
|
|
||||||
|
@ -2428,26 +2427,26 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
pcre2_match_context *mcontext,
|
pcre2_match_context *mcontext,
|
||||||
int *workspace, PCRE2_SIZE wscount);
|
int *workspace, PCRE2_SIZE wscount);
|
||||||
|
|
||||||
The function pcre2_dfa_match() is called to match a subject string
|
The function pcre2_dfa_match() is called to match a subject string
|
||||||
against a compiled pattern, using a matching algorithm that scans the
|
against a compiled pattern, using a matching algorithm that scans the
|
||||||
subject string just once, and does not backtrack. This has different
|
subject string just once, and does not backtrack. This has different
|
||||||
characteristics to the normal algorithm, and is not compatible with
|
characteristics to the normal algorithm, and is not compatible with
|
||||||
Perl. Some of the features of PCRE2 patterns are not supported. Never-
|
Perl. Some of the features of PCRE2 patterns are not supported. Never-
|
||||||
theless, there are times when this kind of matching can be useful. For
|
theless, there are times when this kind of matching can be useful. For
|
||||||
a discussion of the two matching algorithms, and a list of features
|
a discussion of the two matching algorithms, and a list of features
|
||||||
that pcre2_dfa_match() does not support, see the pcre2matching documen-
|
that pcre2_dfa_match() does not support, see the pcre2matching documen-
|
||||||
tation.
|
tation.
|
||||||
|
|
||||||
The arguments for the pcre2_dfa_match() function are the same as for
|
The arguments for the pcre2_dfa_match() function are the same as for
|
||||||
pcre2_match(), plus two extras. The ovector within the match data block
|
pcre2_match(), plus two extras. The ovector within the match data block
|
||||||
is used in a different way, and this is described below. The other com-
|
is used in a different way, and this is described below. The other com-
|
||||||
mon arguments are used in the same way as for pcre2_match(), so their
|
mon arguments are used in the same way as for pcre2_match(), so their
|
||||||
description is not repeated here.
|
description is not repeated here.
|
||||||
|
|
||||||
The two additional arguments provide workspace for the function. The
|
The two additional arguments provide workspace for the function. The
|
||||||
workspace vector should contain at least 20 elements. It is used for
|
workspace vector should contain at least 20 elements. It is used for
|
||||||
keeping track of multiple paths through the pattern tree. More
|
keeping track of multiple paths through the pattern tree. More
|
||||||
workspace is needed for patterns and subjects where there are a lot of
|
workspace is needed for patterns and subjects where there are a lot of
|
||||||
potential matches.
|
potential matches.
|
||||||
|
|
||||||
Here is an example of a simple call to pcre2_dfa_match():
|
Here is an example of a simple call to pcre2_dfa_match():
|
||||||
|
@ -2467,45 +2466,45 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
|
|
||||||
Option bits for pcre_dfa_match()
|
Option bits for pcre_dfa_match()
|
||||||
|
|
||||||
The unused bits of the options argument for pcre2_dfa_match() must be
|
The unused bits of the options argument for pcre2_dfa_match() must be
|
||||||
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_NOTBOL,
|
zero. The only bits that may be set are PCRE2_ANCHORED, PCRE2_NOTBOL,
|
||||||
PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
||||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT,
|
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT,
|
||||||
PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last four of
|
PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last four of
|
||||||
these are exactly the same as for pcre2_match(), so their description
|
these are exactly the same as for pcre2_match(), so their description
|
||||||
is not repeated here.
|
is not repeated here.
|
||||||
|
|
||||||
PCRE2_PARTIAL_HARD
|
PCRE2_PARTIAL_HARD
|
||||||
PCRE2_PARTIAL_SOFT
|
PCRE2_PARTIAL_SOFT
|
||||||
|
|
||||||
These have the same general effect as they do for pcre2_match(), but
|
These have the same general effect as they do for pcre2_match(), but
|
||||||
the details are slightly different. When PCRE2_PARTIAL_HARD is set for
|
the details are slightly different. When PCRE2_PARTIAL_HARD is set for
|
||||||
pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if the end of the
|
pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if the end of the
|
||||||
subject is reached and there is still at least one matching possibility
|
subject is reached and there is still at least one matching possibility
|
||||||
that requires additional characters. This happens even if some complete
|
that requires additional characters. This happens even if some complete
|
||||||
matches have already been found. When PCRE2_PARTIAL_SOFT is set, the
|
matches have already been found. When PCRE2_PARTIAL_SOFT is set, the
|
||||||
return code PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL
|
return code PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL
|
||||||
if the end of the subject is reached, there have been no complete
|
if the end of the subject is reached, there have been no complete
|
||||||
matches, but there is still at least one matching possibility. The por-
|
matches, but there is still at least one matching possibility. The por-
|
||||||
tion of the string that was inspected when the longest partial match
|
tion of the string that was inspected when the longest partial match
|
||||||
was found is set as the first matching string in both cases. There is a
|
was found is set as the first matching string in both cases. There is a
|
||||||
more detailed discussion of partial and multi-segment matching, with
|
more detailed discussion of partial and multi-segment matching, with
|
||||||
examples, in the pcre2partial documentation.
|
examples, in the pcre2partial documentation.
|
||||||
|
|
||||||
PCRE2_DFA_SHORTEST
|
PCRE2_DFA_SHORTEST
|
||||||
|
|
||||||
Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to
|
Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to
|
||||||
stop as soon as it has found one match. Because of the way the alterna-
|
stop as soon as it has found one match. Because of the way the alterna-
|
||||||
tive algorithm works, this is necessarily the shortest possible match
|
tive algorithm works, this is necessarily the shortest possible match
|
||||||
at the first possible matching point in the subject string.
|
at the first possible matching point in the subject string.
|
||||||
|
|
||||||
PCRE2_DFA_RESTART
|
PCRE2_DFA_RESTART
|
||||||
|
|
||||||
When pcre2_dfa_match() returns a partial match, it is possible to call
|
When pcre2_dfa_match() returns a partial match, it is possible to call
|
||||||
it again, with additional subject characters, and have it continue with
|
it again, with additional subject characters, and have it continue with
|
||||||
the same match. The PCRE2_DFA_RESTART option requests this action; when
|
the same match. The PCRE2_DFA_RESTART option requests this action; when
|
||||||
it is set, the workspace and wscount options must reference the same
|
it is set, the workspace and wscount options must reference the same
|
||||||
vector as before because data about the match so far is left in them
|
vector as before because data about the match so far is left in them
|
||||||
after a partial match. There is more discussion of this facility in the
|
after a partial match. There is more discussion of this facility in the
|
||||||
pcre2partial documentation.
|
pcre2partial documentation.
|
||||||
|
|
||||||
|
@ -2513,8 +2512,8 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
|
|
||||||
When pcre2_dfa_match() succeeds, it may have matched more than one sub-
|
When pcre2_dfa_match() succeeds, it may have matched more than one sub-
|
||||||
string in the subject. Note, however, that all the matches from one run
|
string in the subject. Note, however, that all the matches from one run
|
||||||
of the function start at the same point in the subject. The shorter
|
of the function start at the same point in the subject. The shorter
|
||||||
matches are all initial substrings of the longer matches. For example,
|
matches are all initial substrings of the longer matches. For example,
|
||||||
if the pattern
|
if the pattern
|
||||||
|
|
||||||
<.*>
|
<.*>
|
||||||
|
@ -2529,66 +2528,66 @@ MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
|
||||||
<something> <something else>
|
<something> <something else>
|
||||||
<something> <something else> <something further>
|
<something> <something else> <something further>
|
||||||
|
|
||||||
On success, the yield of the function is a number greater than zero,
|
On success, the yield of the function is a number greater than zero,
|
||||||
which is the number of matched substrings. The offsets of the sub-
|
which is the number of matched substrings. The offsets of the sub-
|
||||||
strings are returned in the ovector, and can be extracted in the same
|
strings are returned in the ovector, and can be extracted in the same
|
||||||
way as for pcre2_match(). They are returned in reverse order of
|
way as for pcre2_match(). They are returned in reverse order of
|
||||||
length; that is, the longest matching string is given first. If there
|
length; that is, the longest matching string is given first. If there
|
||||||
were too many matches to fit into the ovector, the yield of the func-
|
were too many matches to fit into the ovector, the yield of the func-
|
||||||
tion is zero, and the vector is filled with the longest matches.
|
tion is zero, and the vector is filled with the longest matches.
|
||||||
|
|
||||||
NOTE: PCRE2's "auto-possessification" optimization usually applies to
|
NOTE: PCRE2's "auto-possessification" optimization usually applies to
|
||||||
character repeats at the end of a pattern (as well as internally). For
|
character repeats at the end of a pattern (as well as internally). For
|
||||||
example, the pattern "a\d+" is compiled as if it were "a\d++" because
|
example, the pattern "a\d+" is compiled as if it were "a\d++" because
|
||||||
there is no point in backtracking into the repeated digits. For DFA
|
there is no point in backtracking into the repeated digits. For DFA
|
||||||
matching, this means that only one possible match is found. If you
|
matching, this means that only one possible match is found. If you
|
||||||
really do want multiple matches in such cases, either use an ungreedy
|
really do want multiple matches in such cases, either use an ungreedy
|
||||||
repeat ("a\d+?") or set the PCRE2_NO_AUTO_POSSESS option when compil-
|
repeat ("a\d+?") or set the PCRE2_NO_AUTO_POSSESS option when compil-
|
||||||
ing.
|
ing.
|
||||||
|
|
||||||
Error returns from pcre2_dfa_match()
|
Error returns from pcre2_dfa_match()
|
||||||
|
|
||||||
The pcre2_dfa_match() function returns a negative number when it fails.
|
The pcre2_dfa_match() function returns a negative number when it fails.
|
||||||
Many of the errors are the same as for pcre2_match(), as described
|
Many of the errors are the same as for pcre2_match(), as described
|
||||||
above. There are in addition the following errors that are specific to
|
above. There are in addition the following errors that are specific to
|
||||||
pcre2_dfa_match():
|
pcre2_dfa_match():
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_UITEM
|
PCRE2_ERROR_DFA_UITEM
|
||||||
|
|
||||||
This return is given if pcre2_dfa_match() encounters an item in the
|
This return is given if pcre2_dfa_match() encounters an item in the
|
||||||
pattern that it does not support, for instance, the use of \C or a back
|
pattern that it does not support, for instance, the use of \C or a back
|
||||||
reference.
|
reference.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_UCOND
|
PCRE2_ERROR_DFA_UCOND
|
||||||
|
|
||||||
This return is given if pcre2_dfa_match() encounters a condition item
|
This return is given if pcre2_dfa_match() encounters a condition item
|
||||||
that uses a back reference for the condition, or a test for recursion
|
that uses a back reference for the condition, or a test for recursion
|
||||||
in a specific group. These are not supported.
|
in a specific group. These are not supported.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_WSSIZE
|
PCRE2_ERROR_DFA_WSSIZE
|
||||||
|
|
||||||
This return is given if pcre2_dfa_match() runs out of space in the
|
This return is given if pcre2_dfa_match() runs out of space in the
|
||||||
workspace vector.
|
workspace vector.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_RECURSE
|
PCRE2_ERROR_DFA_RECURSE
|
||||||
|
|
||||||
When a recursive subpattern is processed, the matching function calls
|
When a recursive subpattern is processed, the matching function calls
|
||||||
itself recursively, using private memory for the ovector and workspace.
|
itself recursively, using private memory for the ovector and workspace.
|
||||||
This error is given if the internal ovector is not large enough. This
|
This error is given if the internal ovector is not large enough. This
|
||||||
should be extremely rare, as a vector of size 1000 is used.
|
should be extremely rare, as a vector of size 1000 is used.
|
||||||
|
|
||||||
PCRE2_ERROR_DFA_BADRESTART
|
PCRE2_ERROR_DFA_BADRESTART
|
||||||
|
|
||||||
When pcre2_dfa_match() is called with the pcre2_dfa_RESTART option,
|
When pcre2_dfa_match() is called with the pcre2_dfa_RESTART option,
|
||||||
some plausibility checks are made on the contents of the workspace,
|
some plausibility checks are made on the contents of the workspace,
|
||||||
which should contain data about the previous partial match. If any of
|
which should contain data about the previous partial match. If any of
|
||||||
these checks fail, this error is given.
|
these checks fail, this error is given.
|
||||||
|
|
||||||
|
|
||||||
SEE ALSO
|
SEE ALSO
|
||||||
|
|
||||||
pcre2build(3), pcre2libs(3), pcre2callout(3), pcre2matching(3),
|
pcre2build(3), pcre2libs(3), pcre2callout(3), pcre2matching(3),
|
||||||
pcre2partial(3), pcre2posix(3), pcre2demo(3), pcre2sample(3),
|
pcre2partial(3), pcre2posix(3), pcre2demo(3), pcre2sample(3),
|
||||||
pcre2stack(3).
|
pcre2stack(3).
|
||||||
|
|
||||||
|
|
||||||
|
@ -3508,11 +3507,12 @@ AVAILABILITY OF JIT SUPPORT
|
||||||
built if you want to use JIT. The support is limited to the following
|
built if you want to use JIT. The support is limited to the following
|
||||||
hardware platforms:
|
hardware platforms:
|
||||||
|
|
||||||
ARM v5, v7, and Thumb2
|
ARM 32-bit (v5, v7, and Thumb2)
|
||||||
|
ARM 64-bit
|
||||||
Intel x86 32-bit and 64-bit
|
Intel x86 32-bit and 64-bit
|
||||||
MIPS 32-bit
|
MIPS 32-bit and 64-bit
|
||||||
Power PC 32-bit and 64-bit
|
Power PC 32-bit and 64-bit
|
||||||
SPARC 32-bit (experimental)
|
SPARC 32-bit
|
||||||
|
|
||||||
If --enable-jit is set on an unsupported platform, compilation fails.
|
If --enable-jit is set on an unsupported platform, compilation fails.
|
||||||
|
|
||||||
|
@ -3531,10 +3531,10 @@ SIMPLE USE OF JIT
|
||||||
is to call pcre2_jit_compile() after successfully compiling a pattern
|
is to call pcre2_jit_compile() after successfully compiling a pattern
|
||||||
with pcre2_compile(). This function has two arguments: the first is the
|
with pcre2_compile(). This function has two arguments: the first is the
|
||||||
compiled pattern pointer that was returned by pcre2_compile(), and the
|
compiled pattern pointer that was returned by pcre2_compile(), and the
|
||||||
second is a set of option bits, which must include at least one of
|
second is zero or more of the following option bits: PCRE2_JIT_COM-
|
||||||
PCRE2_JIT_COMPLETE, PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT.
|
PLETE, PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT.
|
||||||
|
|
||||||
If JIT support is not available, a call to pcre2_jit_comple() does
|
If JIT support is not available, a call to pcre2_jit_compile() does
|
||||||
nothing and returns PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled
|
nothing and returns PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled
|
||||||
pattern is passed to the JIT compiler, which turns it into machine code
|
pattern is passed to the JIT compiler, which turns it into machine code
|
||||||
that executes much faster than the normal interpretive code, but yields
|
that executes much faster than the normal interpretive code, but yields
|
||||||
|
@ -3550,81 +3550,94 @@ SIMPLE USE OF JIT
|
||||||
pcre2_match() is called, the appropriate code is run if it is avail-
|
pcre2_match() is called, the appropriate code is run if it is avail-
|
||||||
able. Otherwise, the pattern is matched using interpretive code.
|
able. Otherwise, the pattern is matched using interpretive code.
|
||||||
|
|
||||||
In some circumstances you may need to call additional functions. These
|
You can call pcre2_jit_compile() multiple times for the same compiled
|
||||||
are described in the section entitled "Controlling the JIT stack"
|
pattern. It does nothing if it has previously compiled code for any of
|
||||||
|
the option bits. For example, you can call it once with PCRE2_JIT_COM-
|
||||||
|
PLETE and (perhaps later, when you find you need partial matching)
|
||||||
|
again with PCRE2_JIT_COMPLETE and PCRE2_JIT_PARTIAL_HARD. This time it
|
||||||
|
will ignore PCRE2_JIT_COMPLETE and just compile code for partial match-
|
||||||
|
ing. If pcre2_jit_compile() is called with no option bits set, it imme-
|
||||||
|
diately returns zero. This is an alternative way of testing if JIT is
|
||||||
|
available.
|
||||||
|
|
||||||
|
At present, it is not possible to free JIT compiled code except when
|
||||||
|
the entire compiled pattern is freed by calling pcre2_free_code().
|
||||||
|
|
||||||
|
In some circumstances you may need to call additional functions. These
|
||||||
|
are described in the section entitled "Controlling the JIT stack"
|
||||||
below.
|
below.
|
||||||
|
|
||||||
There are some pcre2_match() options that are not supported by JIT, and
|
There are some pcre2_match() options that are not supported by JIT, and
|
||||||
there are also some pattern items that JIT cannot handle. Details are
|
there are also some pattern items that JIT cannot handle. Details are
|
||||||
given below. In both cases, matching automatically falls back to the
|
given below. In both cases, matching automatically falls back to the
|
||||||
interpretive code. If you want to know whether JIT was actually used
|
interpretive code. If you want to know whether JIT was actually used
|
||||||
for a particular match, you should arrange for a JIT callback function
|
for a particular match, you should arrange for a JIT callback function
|
||||||
to be set up as described in the section entitled "Controlling the JIT
|
to be set up as described in the section entitled "Controlling the JIT
|
||||||
stack" below, even if you do not need to supply a non-default JIT
|
stack" below, even if you do not need to supply a non-default JIT
|
||||||
stack. Such a callback function is called whenever JIT code is about to
|
stack. Such a callback function is called whenever JIT code is about to
|
||||||
be obeyed. If the match-time options are not right for JIT execution,
|
be obeyed. If the match-time options are not right for JIT execution,
|
||||||
the callback function is not obeyed.
|
the callback function is not obeyed.
|
||||||
|
|
||||||
If the JIT compiler finds an unsupported item, no JIT data is gener-
|
If the JIT compiler finds an unsupported item, no JIT data is gener-
|
||||||
ated. You can find out if JIT matching is available after compiling a
|
ated. You can find out if JIT matching is available after compiling a
|
||||||
pattern by calling pcre2_pattern_info() with the PCRE2_INFO_JIT option.
|
pattern by calling pcre2_pattern_info() with the PCRE2_INFO_JIT option.
|
||||||
A result of 1 means that JIT compilation was successful. A result of 0
|
A result of 1 means that JIT compilation was successful. A result of 0
|
||||||
means that JIT support is not available, or the pattern was not pro-
|
means that JIT support is not available, or the pattern was not pro-
|
||||||
cessed by pcre2_jit_compile(), or the JIT compiler was not able to han-
|
cessed by pcre2_jit_compile(), or the JIT compiler was not able to han-
|
||||||
dle the pattern.
|
dle the pattern.
|
||||||
|
|
||||||
|
|
||||||
UNSUPPORTED OPTIONS AND PATTERN ITEMS
|
UNSUPPORTED OPTIONS AND PATTERN ITEMS
|
||||||
|
|
||||||
The pcre2_match() options that are supported for JIT matching are
|
The pcre2_match() options that are supported for JIT matching are
|
||||||
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
|
||||||
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
|
PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
|
||||||
PCRE2_ANCHORED option is not supported at match time.
|
PCRE2_ANCHORED option is not supported at match time.
|
||||||
|
|
||||||
The only unsupported pattern items are \C (match a single data unit)
|
The only unsupported pattern items are \C (match a single data unit)
|
||||||
when running in a UTF mode, and a callout immediately before an asser-
|
when running in a UTF mode, and a callout immediately before an asser-
|
||||||
tion condition in a conditional group.
|
tion condition in a conditional group.
|
||||||
|
|
||||||
|
|
||||||
RETURN VALUES FROM JIT MATCHING
|
RETURN VALUES FROM JIT MATCHING
|
||||||
|
|
||||||
When a pattern is matched using JIT matching, the return values are the
|
When a pattern is matched using JIT matching, the return values are the
|
||||||
same as those given by the interpretive pcre2_match() code, with the
|
same as those given by the interpretive pcre2_match() code, with the
|
||||||
addition of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This means
|
addition of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This means
|
||||||
that the memory used for the JIT stack was insufficient. See "Control-
|
that the memory used for the JIT stack was insufficient. See "Control-
|
||||||
ling the JIT stack" below for a discussion of JIT stack usage.
|
ling the JIT stack" below for a discussion of JIT stack usage.
|
||||||
|
|
||||||
The error code PCRE2_ERROR_MATCHLIMIT is returned by the JIT code if
|
The error code PCRE2_ERROR_MATCHLIMIT is returned by the JIT code if
|
||||||
searching a very large pattern tree goes on for too long, as it is in
|
searching a very large pattern tree goes on for too long, as it is in
|
||||||
the same circumstance when JIT is not used, but the details of exactly
|
the same circumstance when JIT is not used, but the details of exactly
|
||||||
what is counted are not the same. The PCRE2_ERROR_RECURSIONLIMIT error
|
what is counted are not the same. The PCRE2_ERROR_RECURSIONLIMIT error
|
||||||
code is never returned when JIT matching is used.
|
code is never returned when JIT matching is used.
|
||||||
|
|
||||||
|
|
||||||
CONTROLLING THE JIT STACK
|
CONTROLLING THE JIT STACK
|
||||||
|
|
||||||
When the compiled JIT code runs, it needs a block of memory to use as a
|
When the compiled JIT code runs, it needs a block of memory to use as a
|
||||||
stack. By default, it uses 32K on the machine stack. However, some
|
stack. By default, it uses 32K on the machine stack. However, some
|
||||||
large or complicated patterns need more than this. The error
|
large or complicated patterns need more than this. The error
|
||||||
PCRE2_ERROR_JIT_STACKLIMIT is given when there is not enough stack.
|
PCRE2_ERROR_JIT_STACKLIMIT is given when there is not enough stack.
|
||||||
Three functions are provided for managing blocks of memory for use as
|
Three functions are provided for managing blocks of memory for use as
|
||||||
JIT stacks. There is further discussion about the use of JIT stacks in
|
JIT stacks. There is further discussion about the use of JIT stacks in
|
||||||
the section entitled "JIT stack FAQ" below.
|
the section entitled "JIT stack FAQ" below.
|
||||||
|
|
||||||
The pcre2_jit_stack_create() function creates a JIT stack. Its argu-
|
The pcre2_jit_stack_create() function creates a JIT stack. Its argu-
|
||||||
ments are a general context (for memory allocation functions, or NULL
|
ments are a general context (for memory allocation functions, or NULL
|
||||||
for standard memory allocation), a starting size and a maximum size,
|
for standard memory allocation), a starting size and a maximum size,
|
||||||
and it returns a pointer to an opaque structure of type
|
and it returns a pointer to an opaque structure of type
|
||||||
pcre2_jit_stack, or NULL if there is an error. The
|
pcre2_jit_stack, or NULL if there is an error. The
|
||||||
pcre2_jit_stack_free() function is used to free a stack that is no
|
pcre2_jit_stack_free() function is used to free a stack that is no
|
||||||
longer needed. (For the technically minded: the address space is allo-
|
longer needed. (For the technically minded: the address space is allo-
|
||||||
cated by mmap or VirtualAlloc.) FIXME Is this right?
|
cated by mmap or VirtualAlloc.)
|
||||||
|
|
||||||
JIT uses far less memory for recursion than the interpretive code, and
|
JIT uses far less memory for recursion than the interpretive code, and
|
||||||
a maximum stack size of 512K to 1M should be more than enough for any
|
a maximum stack size of 512K to 1M should be more than enough for any
|
||||||
pattern.
|
pattern.
|
||||||
|
|
||||||
The pcre2_jit_stack_assign() function specifies which stack JIT code
|
The pcre2_jit_stack_assign() function specifies which stack JIT code
|
||||||
should use. Its arguments are as follows:
|
should use. Its arguments are as follows:
|
||||||
|
|
||||||
pcre2_match_context *mcontext
|
pcre2_match_context *mcontext
|
||||||
|
@ -3633,11 +3646,12 @@ CONTROLLING THE JIT STACK
|
||||||
|
|
||||||
The first argument is a pointer to a match context. When this is subse-
|
The first argument is a pointer to a match context. When this is subse-
|
||||||
quently passed to a matching function, its information determines which
|
quently passed to a matching function, its information determines which
|
||||||
JIT stack is used. There are three cases for the values of the other
|
JIT stack is used. There are three cases for the values of the other
|
||||||
two options:
|
two options:
|
||||||
|
|
||||||
(1) If callback is NULL and data is NULL, an internal 32K block
|
(1) If callback is NULL and data is NULL, an internal 32K block
|
||||||
on the machine stack is used.
|
on the machine stack is used. This is the default when a match
|
||||||
|
context is created.
|
||||||
|
|
||||||
(2) If callback is NULL and data is not NULL, data must be
|
(2) If callback is NULL and data is not NULL, data must be
|
||||||
a pointer to a valid JIT stack, the result of calling
|
a pointer to a valid JIT stack, the result of calling
|
||||||
|
@ -3650,30 +3664,30 @@ CONTROLLING THE JIT STACK
|
||||||
return value must be a valid JIT stack, the result of calling
|
return value must be a valid JIT stack, the result of calling
|
||||||
pcre2_jit_stack_create().
|
pcre2_jit_stack_create().
|
||||||
|
|
||||||
A callback function is obeyed whenever JIT code is about to be run; it
|
A callback function is obeyed whenever JIT code is about to be run; it
|
||||||
is not obeyed when pcre2_match() is called with options that are incom-
|
is not obeyed when pcre2_match() is called with options that are incom-
|
||||||
patible for JIT matching. A callback function can therefore be used to
|
patible for JIT matching. A callback function can therefore be used to
|
||||||
determine whether a match operation was executed by JIT or by the
|
determine whether a match operation was executed by JIT or by the
|
||||||
interpreter.
|
interpreter.
|
||||||
|
|
||||||
You may safely use the same JIT stack for more than one pattern (either
|
You may safely use the same JIT stack for more than one pattern (either
|
||||||
by assigning directly or by callback), as long as the patterns are all
|
by assigning directly or by callback), as long as the patterns are all
|
||||||
matched sequentially in the same thread. In a multithread application,
|
matched sequentially in the same thread. In a multithread application,
|
||||||
if you do not specify a JIT stack, or if you assign or pass back NULL
|
if you do not specify a JIT stack, or if you assign or pass back NULL
|
||||||
from a callback, that is thread-safe, because each thread has its own
|
from a callback, that is thread-safe, because each thread has its own
|
||||||
machine stack. However, if you assign or pass back a non-NULL JIT
|
machine stack. However, if you assign or pass back a non-NULL JIT
|
||||||
stack, this must be a different stack for each thread so that the
|
stack, this must be a different stack for each thread so that the
|
||||||
application is thread-safe.
|
application is thread-safe.
|
||||||
|
|
||||||
Strictly speaking, even more is allowed. You can assign the same non-
|
Strictly speaking, even more is allowed. You can assign the same non-
|
||||||
NULL stack to a match context that is used by any number of patterns,
|
NULL stack to a match context that is used by any number of patterns,
|
||||||
as long as they are not used for matching by multiple threads at the
|
as long as they are not used for matching by multiple threads at the
|
||||||
same time. For example, you could use the same stack in all compiled
|
same time. For example, you could use the same stack in all compiled
|
||||||
patterns, with a global mutex in the callback to wait until the stack
|
patterns, with a global mutex in the callback to wait until the stack
|
||||||
is available for use. However, this is an inefficient solution, and not
|
is available for use. However, this is an inefficient solution, and not
|
||||||
recommended.
|
recommended.
|
||||||
|
|
||||||
This is a suggestion for how a multithreaded program that needs to set
|
This is a suggestion for how a multithreaded program that needs to set
|
||||||
up non-default JIT stacks might operate:
|
up non-default JIT stacks might operate:
|
||||||
|
|
||||||
During thread initalization
|
During thread initalization
|
||||||
|
@ -3685,7 +3699,7 @@ CONTROLLING THE JIT STACK
|
||||||
Use a one-line callback function
|
Use a one-line callback function
|
||||||
return thread_local_var
|
return thread_local_var
|
||||||
|
|
||||||
All the functions described in this section do nothing if JIT is not
|
All the functions described in this section do nothing if JIT is not
|
||||||
available.
|
available.
|
||||||
|
|
||||||
|
|
||||||
|
@ -3694,20 +3708,20 @@ JIT STACK FAQ
|
||||||
(1) Why do we need JIT stacks?
|
(1) Why do we need JIT stacks?
|
||||||
|
|
||||||
PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a stack
|
PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a stack
|
||||||
where the local data of the current node is pushed before checking its
|
where the local data of the current node is pushed before checking its
|
||||||
child nodes. Allocating real machine stack on some platforms is diffi-
|
child nodes. Allocating real machine stack on some platforms is diffi-
|
||||||
cult. For example, the stack chain needs to be updated every time if we
|
cult. For example, the stack chain needs to be updated every time if we
|
||||||
extend the stack on PowerPC. Although it is possible, its updating
|
extend the stack on PowerPC. Although it is possible, its updating
|
||||||
time overhead decreases performance. So we do the recursion in memory.
|
time overhead decreases performance. So we do the recursion in memory.
|
||||||
|
|
||||||
(2) Why don't we simply allocate blocks of memory with malloc()?
|
(2) Why don't we simply allocate blocks of memory with malloc()?
|
||||||
|
|
||||||
Modern operating systems have a nice feature: they can reserve an
|
Modern operating systems have a nice feature: they can reserve an
|
||||||
address space instead of allocating memory. We can safely allocate mem-
|
address space instead of allocating memory. We can safely allocate mem-
|
||||||
ory pages inside this address space, so the stack could grow without
|
ory pages inside this address space, so the stack could grow without
|
||||||
moving memory data (this is important because of pointers). Thus we can
|
moving memory data (this is important because of pointers). Thus we can
|
||||||
allocate 1M address space, and use only a single memory page (usually
|
allocate 1M address space, and use only a single memory page (usually
|
||||||
4K) if that is enough. However, we can still grow up to 1M anytime if
|
4K) if that is enough. However, we can still grow up to 1M anytime if
|
||||||
needed.
|
needed.
|
||||||
|
|
||||||
(3) Who "owns" a JIT stack?
|
(3) Who "owns" a JIT stack?
|
||||||
|
@ -3715,8 +3729,8 @@ JIT STACK FAQ
|
||||||
The owner of the stack is the user program, not the JIT studied pattern
|
The owner of the stack is the user program, not the JIT studied pattern
|
||||||
or anything else. The user program must ensure that if a stack is being
|
or anything else. The user program must ensure that if a stack is being
|
||||||
used by pcre2_match(), (that is, it is assigned to a match context that
|
used by pcre2_match(), (that is, it is assigned to a match context that
|
||||||
is passed to the pattern currently running), that stack must not be
|
is passed to the pattern currently running), that stack must not be
|
||||||
used by any other threads (to avoid overwriting the same memory area).
|
used by any other threads (to avoid overwriting the same memory area).
|
||||||
The best practice for multithreaded programs is to allocate a stack for
|
The best practice for multithreaded programs is to allocate a stack for
|
||||||
each thread, and return this stack through the JIT callback function.
|
each thread, and return this stack through the JIT callback function.
|
||||||
|
|
||||||
|
@ -3724,36 +3738,36 @@ JIT STACK FAQ
|
||||||
|
|
||||||
You can free a JIT stack at any time, as long as it will not be used by
|
You can free a JIT stack at any time, as long as it will not be used by
|
||||||
pcre2_match() again. When you assign the stack to a match context, only
|
pcre2_match() again. When you assign the stack to a match context, only
|
||||||
a pointer is set. There is no reference counting or any other magic.
|
a pointer is set. There is no reference counting or any other magic.
|
||||||
You can free compiled patterns, contexts, and stacks in any order, any-
|
You can free compiled patterns, contexts, and stacks in any order, any-
|
||||||
time. Just do not call pcre2_match() with a match context pointing to
|
time. Just do not call pcre2_match() with a match context pointing to
|
||||||
an already freed stack, as that will cause SEGFAULT. (Also, do not free
|
an already freed stack, as that will cause SEGFAULT. (Also, do not free
|
||||||
a stack currently used by pcre2_match() in another thread). You can
|
a stack currently used by pcre2_match() in another thread). You can
|
||||||
also replace the stack in a context at any time when it is not in use.
|
also replace the stack in a context at any time when it is not in use.
|
||||||
You can also free the previous stack before assigning a replacement.
|
You can also free the previous stack before assigning a replacement.
|
||||||
|
|
||||||
(5) Should I allocate/free a stack every time before/after calling
|
(5) Should I allocate/free a stack every time before/after calling
|
||||||
pcre2_match()?
|
pcre2_match()?
|
||||||
|
|
||||||
No, because this is too costly in terms of resources. However, you
|
No, because this is too costly in terms of resources. However, you
|
||||||
could implement some clever idea which release the stack if it is not
|
could implement some clever idea which release the stack if it is not
|
||||||
used in let's say two minutes. The JIT callback can help to achieve
|
used in let's say two minutes. The JIT callback can help to achieve
|
||||||
this without keeping a list of patterns.
|
this without keeping a list of patterns.
|
||||||
|
|
||||||
(6) OK, the stack is for long term memory allocation. But what happens
|
(6) OK, the stack is for long term memory allocation. But what happens
|
||||||
if a pattern causes stack overflow with a stack of 1M? Is that 1M kept
|
if a pattern causes stack overflow with a stack of 1M? Is that 1M kept
|
||||||
until the stack is freed?
|
until the stack is freed?
|
||||||
|
|
||||||
Especially on embedded sytems, it might be a good idea to release mem-
|
Especially on embedded sytems, it might be a good idea to release mem-
|
||||||
ory sometimes without freeing the stack. There is no API for this at
|
ory sometimes without freeing the stack. There is no API for this at
|
||||||
the moment. Probably a function call which returns with the currently
|
the moment. Probably a function call which returns with the currently
|
||||||
allocated memory for any stack and another which allows releasing mem-
|
allocated memory for any stack and another which allows releasing mem-
|
||||||
ory (shrinking the stack) would be a good idea if someone needs this.
|
ory (shrinking the stack) would be a good idea if someone needs this.
|
||||||
|
|
||||||
(7) This is too much of a headache. Isn't there any better solution for
|
(7) This is too much of a headache. Isn't there any better solution for
|
||||||
JIT stack handling?
|
JIT stack handling?
|
||||||
|
|
||||||
No, thanks to Windows. If POSIX threads were used everywhere, we could
|
No, thanks to Windows. If POSIX threads were used everywhere, we could
|
||||||
throw out this complicated API.
|
throw out this complicated API.
|
||||||
|
|
||||||
|
|
||||||
|
@ -3762,18 +3776,18 @@ FREEING JIT SPECULATIVE MEMORY
|
||||||
void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext);
|
void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext);
|
||||||
|
|
||||||
The JIT executable allocator does not free all memory when it is possi-
|
The JIT executable allocator does not free all memory when it is possi-
|
||||||
ble. It expects new allocations, and keeps some free memory around to
|
ble. It expects new allocations, and keeps some free memory around to
|
||||||
improve allocation speed. However, in low memory conditions, it might
|
improve allocation speed. However, in low memory conditions, it might
|
||||||
be better to free all possible memory. You can cause this to happen by
|
be better to free all possible memory. You can cause this to happen by
|
||||||
calling pcre2_jit_free_unused_memory(). Its argument is a general con-
|
calling pcre2_jit_free_unused_memory(). Its argument is a general con-
|
||||||
text, for custom memory management, or NULL for standard memory manage-
|
text, for custom memory management, or NULL for standard memory manage-
|
||||||
ment.
|
ment.
|
||||||
|
|
||||||
|
|
||||||
EXAMPLE CODE
|
EXAMPLE CODE
|
||||||
|
|
||||||
This is a single-threaded example that specifies a JIT stack without
|
This is a single-threaded example that specifies a JIT stack without
|
||||||
using a callback. A real program should include error checking after
|
using a callback. A real program should include error checking after
|
||||||
all the function calls.
|
all the function calls.
|
||||||
|
|
||||||
int rc;
|
int rc;
|
||||||
|
@ -3801,28 +3815,28 @@ EXAMPLE CODE
|
||||||
JIT FAST PATH API
|
JIT FAST PATH API
|
||||||
|
|
||||||
Because the API described above falls back to interpreted matching when
|
Because the API described above falls back to interpreted matching when
|
||||||
JIT is not available, it is convenient for programs that are written
|
JIT is not available, it is convenient for programs that are written
|
||||||
for general use in many environments. However, calling JIT via
|
for general use in many environments. However, calling JIT via
|
||||||
pcre2_match() does have a performance impact. Programs that are written
|
pcre2_match() does have a performance impact. Programs that are written
|
||||||
for use where JIT is known to be available, and which need the best
|
for use where JIT is known to be available, and which need the best
|
||||||
possible performance, can instead use a "fast path" API to call JIT
|
possible performance, can instead use a "fast path" API to call JIT
|
||||||
matching directly instead of calling pcre2_match() (obviously only for
|
matching directly instead of calling pcre2_match() (obviously only for
|
||||||
patterns that have been successfully processed by pcre2_jit_compile()).
|
patterns that have been successfully processed by pcre2_jit_compile()).
|
||||||
|
|
||||||
The fast path function is called pcre2_jit_match(), and it takes
|
The fast path function is called pcre2_jit_match(), and it takes
|
||||||
exactly the same arguments as pcre2_match(). The return values are also
|
exactly the same arguments as pcre2_match(). The return values are also
|
||||||
the same, plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or
|
the same, plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or
|
||||||
complete) is requested that was not compiled. Unsupported option bits
|
complete) is requested that was not compiled. Unsupported option bits
|
||||||
(for example, PCRE2_ANCHORED) are ignored.
|
(for example, PCRE2_ANCHORED) are ignored.
|
||||||
|
|
||||||
When you call pcre2_match(), as well as testing for invalid options, a
|
When you call pcre2_match(), as well as testing for invalid options, a
|
||||||
number of other sanity checks are performed on the arguments. For exam-
|
number of other sanity checks are performed on the arguments. For exam-
|
||||||
ple, if the subject pointer is NULL, an immediate error is given. Also,
|
ple, if the subject pointer is NULL, an immediate error is given. Also,
|
||||||
unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for
|
unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for
|
||||||
validity. In the interests of speed, these checks do not happen on the
|
validity. In the interests of speed, these checks do not happen on the
|
||||||
JIT fast path, and if invalid data is passed, the result is undefined.
|
JIT fast path, and if invalid data is passed, the result is undefined.
|
||||||
|
|
||||||
Bypassing the sanity checks and the pcre2_match() wrapping can give
|
Bypassing the sanity checks and the pcre2_match() wrapping can give
|
||||||
speedups of more than 10%.
|
speedups of more than 10%.
|
||||||
|
|
||||||
|
|
||||||
|
@ -3840,7 +3854,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 08 November 2014
|
Last updated: 12 November 2014
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2014 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -1063,7 +1063,7 @@ equivalent to Perl's /x option, and it can be changed within a pattern by a
|
||||||
Which characters are interpreted as newlines can be specified by a setting in
|
Which characters are interpreted as newlines can be specified by a setting in
|
||||||
the compile context that is passed to \fBpcre2_compile()\fP or by a special
|
the compile context that is passed to \fBpcre2_compile()\fP or by a special
|
||||||
sequence at the start of the pattern, as described in the section entitled
|
sequence at the start of the pattern, as described in the section entitled
|
||||||
.\" HTML <a href="pcrepattern.html#newlines">
|
.\" HTML <a href="pcre2pattern.html#newlines">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
"Newline conventions"
|
"Newline conventions"
|
||||||
.\"
|
.\"
|
||||||
|
@ -1226,7 +1226,7 @@ This option changes the way PCRE2 processes \eB, \eb, \eD, \ed, \eS, \es, \eW,
|
||||||
\ew, and some of the POSIX character classes. By default, only ASCII characters
|
\ew, and some of the POSIX character classes. By default, only ASCII characters
|
||||||
are recognized, but if PCRE2_UCP is set, Unicode properties are used instead to
|
are recognized, but if PCRE2_UCP is set, Unicode properties are used instead to
|
||||||
classify characters. More details are given in the section on
|
classify characters. More details are given in the section on
|
||||||
.\" HTML <a href="pcre2.html#genericchartypes">
|
.\" HTML <a href="pcre2pattern.html#genericchartypes">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
generic character types
|
generic character types
|
||||||
.\"
|
.\"
|
||||||
|
@ -1939,17 +1939,11 @@ documentation.
|
||||||
.sp
|
.sp
|
||||||
When PCRE2 is built, a default newline convention is set; this is usually the
|
When PCRE2 is built, a default newline convention is set; this is usually the
|
||||||
standard convention for the operating system. The default can be overridden in
|
standard convention for the operating system. The default can be overridden in
|
||||||
either a
|
a
|
||||||
.\" HTML <a href="#compilecontext">
|
.\" HTML <a href="#compilecontext">
|
||||||
.\" </a>
|
.\" </a>
|
||||||
compile context
|
compile context.
|
||||||
.\"
|
.\"
|
||||||
or a
|
|
||||||
.\" HTML <a href="#matchcontext">
|
|
||||||
.\" </a>
|
|
||||||
match context.
|
|
||||||
.\"
|
|
||||||
However, changing the newline convention at match time disables JIT matching.
|
|
||||||
During matching, the newline choice affects the behaviour of the dot,
|
During matching, the newline choice affects the behaviour of the dot,
|
||||||
circumflex, and dollar metacharacters. It may also alter the way the match
|
circumflex, and dollar metacharacters. It may also alter the way the match
|
||||||
position is advanced after a match failure for an unanchored pattern.
|
position is advanced after a match failure for an unanchored pattern.
|
||||||
|
@ -2322,7 +2316,7 @@ appropriate offset in the ovector, which contains PCRE2_UNSET for unset
|
||||||
substrings.
|
substrings.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.\" HTML <a name="extractbynname"></a>
|
.\" HTML <a name="extractbyname"></a>
|
||||||
.SH "EXTRACTING CAPTURED SUBSTRINGS BY NAME"
|
.SH "EXTRACTING CAPTURED SUBSTRINGS BY NAME"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
|
|
@ -28,7 +28,7 @@ you want to use JIT. The support is limited to the following hardware
|
||||||
platforms:
|
platforms:
|
||||||
.sp
|
.sp
|
||||||
ARM 32-bit (v5, v7, and Thumb2)
|
ARM 32-bit (v5, v7, and Thumb2)
|
||||||
ARM 64-bit
|
ARM 64-bit
|
||||||
Intel x86 32-bit and 64-bit
|
Intel x86 32-bit and 64-bit
|
||||||
MIPS 32-bit and 64-bit
|
MIPS 32-bit and 64-bit
|
||||||
Power PC 32-bit and 64-bit
|
Power PC 32-bit and 64-bit
|
||||||
|
@ -79,7 +79,7 @@ PCRE2_JIT_COMPLETE and just compile code for partial matching. If
|
||||||
\fBpcre2_jit_compile()\fP is called with no option bits set, it immediately
|
\fBpcre2_jit_compile()\fP is called with no option bits set, it immediately
|
||||||
returns zero. This is an alternative way of testing if JIT is available.
|
returns zero. This is an alternative way of testing if JIT is available.
|
||||||
.P
|
.P
|
||||||
At present, it is not possible to free JIT compiled code except when the entire
|
At present, it is not possible to free JIT compiled code except when the entire
|
||||||
compiled pattern is freed by calling \fBpcre2_free_code()\fP.
|
compiled pattern is freed by calling \fBpcre2_free_code()\fP.
|
||||||
.P
|
.P
|
||||||
In some circumstances you may need to call additional functions. These are
|
In some circumstances you may need to call additional functions. These are
|
||||||
|
@ -186,8 +186,8 @@ passed to a matching function, its information determines which JIT stack is
|
||||||
used. There are three cases for the values of the other two options:
|
used. There are three cases for the values of the other two options:
|
||||||
.sp
|
.sp
|
||||||
(1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32K block
|
(1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32K block
|
||||||
on the machine stack is used. This is the default when a match
|
on the machine stack is used. This is the default when a match
|
||||||
context is created.
|
context is created.
|
||||||
.sp
|
.sp
|
||||||
(2) If \fIcallback\fP is NULL and \fIdata\fP is not NULL, \fIdata\fP must be
|
(2) If \fIcallback\fP is NULL and \fIdata\fP is not NULL, \fIdata\fP must be
|
||||||
a pointer to a valid JIT stack, the result of calling
|
a pointer to a valid JIT stack, the result of calling
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2PATTERN 3 "03 November 2014" "PCRE2 10.00"
|
.TH PCRE2PATTERN 3 "14 November 2014" "PCRE2 10.00"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||||
|
@ -63,8 +63,8 @@ page.
|
||||||
.P
|
.P
|
||||||
Some applications that allow their users to supply patterns may wish to
|
Some applications that allow their users to supply patterns may wish to
|
||||||
restrict them to non-UTF data for security reasons. If the PCRE2_NEVER_UTF
|
restrict them to non-UTF data for security reasons. If the PCRE2_NEVER_UTF
|
||||||
option is set at compile time, (*UTF) is not allowed, and its appearance causes
|
option is passed to \fBpcre2_compile()\fP, (*UTF) is not allowed, and its
|
||||||
an error.
|
appearance in a pattern causes an error.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Unicode property support"
|
.SS "Unicode property support"
|
||||||
|
@ -75,6 +75,21 @@ This has the same effect as setting the PCRE2_UCP option: it causes sequences
|
||||||
such as \ed and \ew to use Unicode properties to determine character types,
|
such as \ed and \ew to use Unicode properties to determine character types,
|
||||||
instead of recognizing only characters with codes less than 128 via a lookup
|
instead of recognizing only characters with codes less than 128 via a lookup
|
||||||
table.
|
table.
|
||||||
|
.P
|
||||||
|
Some applications that allow their users to supply patterns may wish to
|
||||||
|
restrict them for security reasons. If the PCRE2_NEVER_UCP option is passed to
|
||||||
|
\fBpcre2_compile()\fP, (*UCP) is not allowed, and its appearance in a pattern
|
||||||
|
causes an error.
|
||||||
|
.
|
||||||
|
.
|
||||||
|
.SS "Locking out empty string matching"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
Starting a pattern with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) has the same effect
|
||||||
|
as passing the PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART option to whichever
|
||||||
|
matching function is subsequently called to match the pattern. These options
|
||||||
|
lock out the matching of empty strings, either entirely, or only at the start
|
||||||
|
of the subject.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Disabling auto-possessification"
|
.SS "Disabling auto-possessification"
|
||||||
|
@ -102,6 +117,28 @@ reaching "no match" results. For more details, see the
|
||||||
documentation.
|
documentation.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SS "Setting match and recursion limits"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
The caller of \fBpcre2_match()\fP can set a limit on the number of times the
|
||||||
|
internal \fBmatch()\fP function is called and on the maximum depth of
|
||||||
|
recursive calls. These facilities are provided to catch runaway matches that
|
||||||
|
are provoked by patterns with huge matching trees (a typical example is a
|
||||||
|
pattern with nested unlimited repeats) and to avoid running out of system stack
|
||||||
|
by too much recursion. When one of these limits is reached, \fBpcre2_match()\fP
|
||||||
|
gives an error return. The limits can also be set by items at the start of the
|
||||||
|
pattern of the form
|
||||||
|
.sp
|
||||||
|
(*LIMIT_MATCH=d)
|
||||||
|
(*LIMIT_RECURSION=d)
|
||||||
|
.sp
|
||||||
|
where d is any number of decimal digits. However, the value of the setting must
|
||||||
|
be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
|
||||||
|
for it to have any effect. In other words, the pattern writer can lower the
|
||||||
|
limits set by the programmer, but not raise them. If there is more than one
|
||||||
|
setting of one of these limits, the lower value is used.
|
||||||
|
.
|
||||||
|
.
|
||||||
.\" HTML <a name="newlines"></a>
|
.\" HTML <a name="newlines"></a>
|
||||||
.SS "Newline conventions"
|
.SS "Newline conventions"
|
||||||
.rs
|
.rs
|
||||||
|
@ -153,26 +190,14 @@ below. A change of \eR setting can be combined with a change of newline
|
||||||
convention.
|
convention.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Setting match and recursion limits"
|
.SS "Specifying what \eR matches"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The caller of \fBpcre2_match()\fP can set a limit on the number of times the
|
It is possible to restrict \eR to match only CR, LF, or CRLF (instead of the
|
||||||
internal \fBmatch()\fP function is called and on the maximum depth of
|
complete set of Unicode line endings) by setting the option PCRE2_BSR_ANYCRLF
|
||||||
recursive calls. These facilities are provided to catch runaway matches that
|
at compile time. This effect can also be achieved by starting a pattern with
|
||||||
are provoked by patterns with huge matching trees (a typical example is a
|
(*BSR_ANYCRLF). For completeness, (*BSR_UNICODE) is also recognized,
|
||||||
pattern with nested unlimited repeats) and to avoid running out of system stack
|
corresponding to PCRE2_BSR_UNICODE.
|
||||||
by too much recursion. When one of these limits is reached, \fBpcre2_match()\fP
|
|
||||||
gives an error return. The limits can also be set by items at the start of the
|
|
||||||
pattern of the form
|
|
||||||
.sp
|
|
||||||
(*LIMIT_MATCH=d)
|
|
||||||
(*LIMIT_RECURSION=d)
|
|
||||||
.sp
|
|
||||||
where d is any number of decimal digits. However, the value of the setting must
|
|
||||||
be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
|
|
||||||
for it to have any effect. In other words, the pattern writer can lower the
|
|
||||||
limits set by the programmer, but not raise them. If there is more than one
|
|
||||||
setting of one of these limits, the lower value is used.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "EBCDIC CHARACTER CODES"
|
.SH "EBCDIC CHARACTER CODES"
|
||||||
|
@ -2302,8 +2327,8 @@ complex:
|
||||||
(?(1) (A|B|C) | (D | (?(2)E|F) | E) )
|
(?(1) (A|B|C) | (D | (?(2)E|F) | E) )
|
||||||
.sp
|
.sp
|
||||||
.P
|
.P
|
||||||
There are four kinds of condition: references to subpatterns, references to
|
There are five kinds of condition: references to subpatterns, references to
|
||||||
recursion, a pseudo-condition called DEFINE, and assertions.
|
recursion, two pseudo-conditions called DEFINE and VERSION, and assertions.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Checking for a used subpattern by number"
|
.SS "Checking for a used subpattern by number"
|
||||||
|
@ -2418,6 +2443,23 @@ pattern uses references to the named group to match the four dot-separated
|
||||||
components of an IPv4 address, insisting on a word boundary at each end.
|
components of an IPv4 address, insisting on a word boundary at each end.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
.SS "Checking the PCRE2 version"
|
||||||
|
.rs
|
||||||
|
.sp
|
||||||
|
Programs that link with a PCRE2 library can check the version by calling
|
||||||
|
\fBpcre2_config()\fP with appropriate arguments. Users of applications that do
|
||||||
|
not have access to the underlying code cannot do this. A special "condition"
|
||||||
|
called VERSION exists to allow such users to discover which version of PCRE2
|
||||||
|
they are dealing with by using this condition to match a string such as
|
||||||
|
"yesno". VERSION must be followed either by "=" or ">=" and a version number.
|
||||||
|
For example:
|
||||||
|
.sp
|
||||||
|
(?(VERSION>=10.4)yes|no)
|
||||||
|
.sp
|
||||||
|
This pattern matches "yes" if the PCRE2 version is greater or equal to 10.4, or
|
||||||
|
"no" otherwise.
|
||||||
|
.
|
||||||
|
.
|
||||||
.SS "Assertion conditions"
|
.SS "Assertion conditions"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
|
@ -3219,7 +3261,7 @@ subpattern, (*THEN) causes the subroutine match to fail.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
\fBpcre2api\fP(3), \fBpcre2callout\fP(3), \fBpcre2matching\fP(3),
|
\fBpcre2api\fP(3), \fBpcre2callout\fP(3), \fBpcre2matching\fP(3),
|
||||||
\fBpcre2syntax\fP(3), \fBpcre2\fP(3), \fBpcre216(3)\fP, \fBpcre232(3)\fP.
|
\fBpcre2syntax\fP(3), \fBpcre2\fP(3).
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH AUTHOR
|
.SH AUTHOR
|
||||||
|
@ -3236,6 +3278,6 @@ Cambridge CB2 3QH, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 03 November 2014
|
Last updated: 14 November 2014
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2014 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2SYNTAX 3 "20 October 2014" "PCRE2 10.00"
|
.TH PCRE2SYNTAX 3 "14 November 2014" "PCRE2 10.00"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
|
||||||
|
@ -470,17 +470,18 @@ Each top-level branch of a look behind must be of a fixed length.
|
||||||
(?(condition)yes-pattern)
|
(?(condition)yes-pattern)
|
||||||
(?(condition)yes-pattern|no-pattern)
|
(?(condition)yes-pattern|no-pattern)
|
||||||
.sp
|
.sp
|
||||||
(?(n)... absolute reference condition
|
(?(n) absolute reference condition
|
||||||
(?(+n)... relative reference condition
|
(?(+n) relative reference condition
|
||||||
(?(-n)... relative reference condition
|
(?(-n) relative reference condition
|
||||||
(?(<name>)... named reference condition (Perl)
|
(?(<name>) named reference condition (Perl)
|
||||||
(?('name')... named reference condition (Perl)
|
(?('name') named reference condition (Perl)
|
||||||
(?(name)... named reference condition (PCRE2)
|
(?(name) named reference condition (PCRE2)
|
||||||
(?(R)... overall recursion condition
|
(?(R) overall recursion condition
|
||||||
(?(Rn)... specific group recursion condition
|
(?(Rn) specific group recursion condition
|
||||||
(?(R&name)... specific recursion condition
|
(?(R&name) specific recursion condition
|
||||||
(?(DEFINE)... define subpattern for reference
|
(?(DEFINE) define subpattern for reference
|
||||||
(?(assert)... assertion condition
|
(?(VERSION[>]=n.m) test PCRE2 version
|
||||||
|
(?(assert) assertion condition
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "BACKTRACKING CONTROL"
|
.SH "BACKTRACKING CONTROL"
|
||||||
|
@ -535,6 +536,6 @@ Cambridge CB2 3QH, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 20 October 2014
|
Last updated: 14 November 2014
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2014 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2TEST 1 "12 November 2014" "PCRE 10.00"
|
.TH PCRE2TEST 1 "14 November 2014" "PCRE 10.00"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2test - a program for testing Perl-compatible regular expressions.
|
pcre2test - a program for testing Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -450,7 +450,6 @@ about the pattern:
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
.sp
|
.sp
|
||||||
The effects of these modifiers are described in the following sections.
|
The effects of these modifiers are described in the following sections.
|
||||||
FIXME: Give more examples.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Newline and \eR handling"
|
.SS "Newline and \eR handling"
|
||||||
|
@ -484,7 +483,31 @@ one-off tests.
|
||||||
.P
|
.P
|
||||||
The \fBinfo\fP modifier requests information about the compiled pattern
|
The \fBinfo\fP modifier requests information about the compiled pattern
|
||||||
(whether it is anchored, has a fixed first character, and so on). The
|
(whether it is anchored, has a fixed first character, and so on). The
|
||||||
information is obtained from the \fBpcre2_pattern_info()\fP function.
|
information is obtained from the \fBpcre2_pattern_info()\fP function. Here are
|
||||||
|
some typical examples:
|
||||||
|
.sp
|
||||||
|
re> /(?i)(^a|^b)/m,info
|
||||||
|
Capturing subpattern count = 1
|
||||||
|
Compile options: multiline
|
||||||
|
Overall options: caseless multiline
|
||||||
|
First code unit at start or follows newline
|
||||||
|
Subject length lower bound = 1
|
||||||
|
.sp
|
||||||
|
re> /(?i)abc/info
|
||||||
|
Capturing subpattern count = 0
|
||||||
|
Compile options: <none>
|
||||||
|
Overall options: caseless
|
||||||
|
First code unit = 'a' (caseless)
|
||||||
|
Last code unit = 'c' (caseless)
|
||||||
|
Subject length lower bound = 3
|
||||||
|
.sp
|
||||||
|
"Compile options" are those specified to the compile function; "overall
|
||||||
|
options" have added options that are taken or deduced from the pattern. If both
|
||||||
|
sets of options are the same, just a single "options" line is output. "First
|
||||||
|
code unit" is where any match must start; if there is more than one they are
|
||||||
|
listed as "starting code units". "Last code unit" is the last literal code unit
|
||||||
|
that must be present in any match. This is not necessarily the last character.
|
||||||
|
These lines are omitted if no starting or ending code units are recorded.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Specifying a pattern in hex"
|
.SS "Specifying a pattern in hex"
|
||||||
|
@ -499,8 +522,8 @@ pairs. For example:
|
||||||
This feature is provided as a way of creating patterns that contain binary zero
|
This feature is provided as a way of creating patterns that contain binary zero
|
||||||
characters. By default, \fBpcre2test\fP passes patterns as zero-terminated
|
characters. By default, \fBpcre2test\fP passes patterns as zero-terminated
|
||||||
strings to \fBpcre2_compile()\fP, giving the length as PCRE2_ZERO_TERMINATED.
|
strings to \fBpcre2_compile()\fP, giving the length as PCRE2_ZERO_TERMINATED.
|
||||||
However, for patterns specified in hexadecimal, the length of the pattern is
|
However, for patterns specified in hexadecimal, the actual length of the
|
||||||
passed.
|
pattern is passed.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "JIT compilation"
|
.SS "JIT compilation"
|
||||||
|
@ -528,7 +551,7 @@ documentation. See also the \fBjitstack\fP modifier below for a way of
|
||||||
setting the size of the JIT stack.
|
setting the size of the JIT stack.
|
||||||
.P
|
.P
|
||||||
If the \fBjitfast\fP modifier is specified, matching is done using the JIT
|
If the \fBjitfast\fP modifier is specified, matching is done using the JIT
|
||||||
"fast path" interface (\fBpcre2_jit_match()), which skips some of the sanity
|
"fast path" interface, \fBpcre2_jit_match(), which skips some of the sanity
|
||||||
checks that are done by \fBpcre2_match()\fP, and of course does not work when
|
checks that are done by \fBpcre2_match()\fP, and of course does not work when
|
||||||
JIT is not supported. If \fBjitfast\fP is specified without \fBjit\fP, jit=7 is
|
JIT is not supported. If \fBjitfast\fP is specified without \fBjit\fP, jit=7 is
|
||||||
assumed.
|
assumed.
|
||||||
|
@ -560,11 +583,16 @@ character tables are mutually exclusive.
|
||||||
.SS "Showing pattern memory"
|
.SS "Showing pattern memory"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
The \fB/memory\fP modifier causes the size in bytes of the memory block used to
|
The \fB/memory\fP modifier causes the size in bytes of the memory used to hold
|
||||||
hold the compiled pattern to be output. This does not include the size of the
|
the compiled pattern to be output. This does not include the size of the
|
||||||
\fBpcre2_code\fP block; it is just the actual compiled data. If the pattern is
|
\fBpcre2_code\fP block; it is just the actual compiled data. If the pattern is
|
||||||
subsequently passed to the JIT compiler, the size of the JIT compiled code is
|
subsequently passed to the JIT compiler, the size of the JIT compiled code is
|
||||||
also output.
|
also output. Here is an example:
|
||||||
|
.sp
|
||||||
|
re> /a(b)c/jit,memory
|
||||||
|
Memory allocation (code space): 21
|
||||||
|
Memory allocation (JIT code): 1910
|
||||||
|
.sp
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Limiting nested parentheses"
|
.SS "Limiting nested parentheses"
|
||||||
|
@ -608,8 +636,8 @@ enable stack availability to be checked during compilation (see the
|
||||||
.\"
|
.\"
|
||||||
documentation for details). If the number specified by the modifier is greater
|
documentation for details). If the number specified by the modifier is greater
|
||||||
than zero, \fBpcre2_set_compile_recursion_guard()\fP is called to set up
|
than zero, \fBpcre2_set_compile_recursion_guard()\fP is called to set up
|
||||||
callback from \fBpcre2_compile()\fP to a local function. The argument it is
|
callback from \fBpcre2_compile()\fP to a local function. The argument it
|
||||||
passed is the current nesting parenthesis depth; if this is greater than the
|
receives is the current nesting parenthesis depth; if this is greater than the
|
||||||
value given by the modifier, non-zero is returned, causing the compilation to
|
value given by the modifier, non-zero is returned, causing the compilation to
|
||||||
be aborted.
|
be aborted.
|
||||||
.
|
.
|
||||||
|
@ -646,7 +674,7 @@ not affect the compilation process.
|
||||||
allusedtext show all consulted text
|
allusedtext show all consulted text
|
||||||
/g global global matching
|
/g global global matching
|
||||||
mark show mark values
|
mark show mark values
|
||||||
replace=<string> specify a replacement string
|
replace=<string> specify a replacement string
|
||||||
startchar show starting character when relevant
|
startchar show starting character when relevant
|
||||||
.sp
|
.sp
|
||||||
These modifiers may not appear in a \fB#pattern\fP command. If you want them as
|
These modifiers may not appear in a \fB#pattern\fP command. If you want them as
|
||||||
|
@ -721,12 +749,11 @@ pattern.
|
||||||
offset=<n> set starting offset
|
offset=<n> set starting offset
|
||||||
ovector=<n> set size of output vector
|
ovector=<n> set size of output vector
|
||||||
recursion_limit=<n> set a recursion limit
|
recursion_limit=<n> set a recursion limit
|
||||||
replace=<string> specify a replacement string
|
replace=<string> specify a replacement string
|
||||||
startchar show startchar when relevant
|
startchar show startchar when relevant
|
||||||
zero_terminate pass the subject as zero-terminated
|
zero_terminate pass the subject as zero-terminated
|
||||||
.sp
|
.sp
|
||||||
The effects of these modifiers are described in the following sections.
|
The effects of these modifiers are described in the following sections.
|
||||||
FIXME: Give more examples.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Showing more text"
|
.SS "Showing more text"
|
||||||
|
@ -850,14 +877,14 @@ parentheses after each substring.
|
||||||
.SS "Testing the substitution function"
|
.SS "Testing the substitution function"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
|
If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
|
||||||
called instead of one of the matching functions. Unlike subject strings,
|
called instead of one of the matching functions. Unlike subject strings,
|
||||||
\fBpcre2test\fP does not process replacement strings for escape sequences. In
|
\fBpcre2test\fP does not process replacement strings for escape sequences. In
|
||||||
UTF mode, a replacement string is checked to see if it is a valid UTF-8 string.
|
UTF mode, a replacement string is checked to see if it is a valid UTF-8 string.
|
||||||
If so, it is correctly converted to a UTF string of the appropriate code unit
|
If so, it is correctly converted to a UTF string of the appropriate code unit
|
||||||
width. If it is not a valid UTF-8 string, the individual code units are copied
|
width. If it is not a valid UTF-8 string, the individual code units are copied
|
||||||
directly. This provides a means of passing an invalid UTF-8 string for testing
|
directly. This provides a means of passing an invalid UTF-8 string for testing
|
||||||
purposes.
|
purposes.
|
||||||
.P
|
.P
|
||||||
If the \fBglobal\fP modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
|
If the \fBglobal\fP modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
|
||||||
\fBpcre2_substitute()\fP. After a successful substitution, the modified string
|
\fBpcre2_substitute()\fP. After a successful substitution, the modified string
|
||||||
|
@ -867,16 +894,23 @@ were no matches. Here is a simple example of a substitution test:
|
||||||
/abc/replace=xxx
|
/abc/replace=xxx
|
||||||
=abc=abc=
|
=abc=abc=
|
||||||
1: =xxx=abc=
|
1: =xxx=abc=
|
||||||
=abc=abc=\=global
|
=abc=abc=\e=global
|
||||||
2: =xxx=xxx=
|
2: =xxx=xxx=
|
||||||
.sp
|
.sp
|
||||||
Subject and replacement strings should be kept relatively short for
|
Subject and replacement strings should be kept relatively short for
|
||||||
substitution tests, as fixed-size buffers are used. To make it easy to test for
|
substitution tests, as fixed-size buffers are used. To make it easy to test for
|
||||||
buffer overflow, if the replacement string starts with a number in square
|
buffer overflow, if the replacement string starts with a number in square
|
||||||
brackets, that number is passed to \fBpcre2_substitute()\fP as the size of the
|
brackets, that number is passed to \fBpcre2_substitute()\fP as the size of the
|
||||||
output buffer, with the replacement string starting at the next character.
|
output buffer, with the replacement string starting at the next character. Here
|
||||||
.P
|
is an example that tests the edge case:
|
||||||
A replacement string is ignored with POSIX and DFA matching. Specifying partial
|
.sp
|
||||||
|
/abc/
|
||||||
|
123abc123\e=replace=[10]XYZ
|
||||||
|
1: 123XYZ123
|
||||||
|
123abc123\e=replace=[9]XYZ
|
||||||
|
Failed: error -47: no more memory
|
||||||
|
.sp
|
||||||
|
A replacement string is ignored with POSIX and DFA matching. Specifying partial
|
||||||
matching provokes an error return ("bad option value") from
|
matching provokes an error return ("bad option value") from
|
||||||
\fBpcre2_substitute()\fP.
|
\fBpcre2_substitute()\fP.
|
||||||
.
|
.
|
||||||
|
@ -957,10 +991,10 @@ available for storing matching information. The default is 15.
|
||||||
A value of zero is useful when testing the POSIX API because it causes
|
A value of zero is useful when testing the POSIX API because it causes
|
||||||
\fBregexec()\fP to be called with a NULL capture vector. When not testing the
|
\fBregexec()\fP to be called with a NULL capture vector. When not testing the
|
||||||
POSIX API, a value of zero is used to cause
|
POSIX API, a value of zero is used to cause
|
||||||
\fBpcre2_match_data_create_from_pattern\fP to be called, in order to create a
|
\fBpcre2_match_data_create_from_pattern()\fP to be called, in order to create a
|
||||||
match block of exactly the right size for the pattern. (It is not possible to
|
match block of exactly the right size for the pattern. (It is not possible to
|
||||||
create a match block with a zero-length ovector; there is always one pair of
|
create a match block with a zero-length ovector; there is always at least one
|
||||||
offsets.)
|
pair of offsets.)
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS "Passing the subject as zero-terminated"
|
.SS "Passing the subject as zero-terminated"
|
||||||
|
@ -972,7 +1006,7 @@ string, the \fBzero_terminate\fP modifier is provided. It causes the length to
|
||||||
be passed as PCRE2_ZERO_TERMINATED. (When matching via the POSIX interface,
|
be passed as PCRE2_ZERO_TERMINATED. (When matching via the POSIX interface,
|
||||||
this modifier has no effect, as there is no facility for passing a length.)
|
this modifier has no effect, as there is no facility for passing a length.)
|
||||||
.P
|
.P
|
||||||
When testing \fBpcre2_substitute\fP, this modifier also has the effect of
|
When testing \fBpcre2_substitute()\fP, this modifier also has the effect of
|
||||||
passing the replacement string as zero-terminated.
|
passing the replacement string as zero-terminated.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
@ -1237,6 +1271,6 @@ Cambridge CB2 3QH, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 12 November 2014
|
Last updated: 14 November 2014
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2014 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -150,17 +150,18 @@ COMMAND LINE OPTIONS
|
||||||
Behave as if each subject line contains the given modifiers.
|
Behave as if each subject line contains the given modifiers.
|
||||||
|
|
||||||
-t Run each compile and match many times with a timer, and out-
|
-t Run each compile and match many times with a timer, and out-
|
||||||
put the resulting times per compile or match. You can control
|
put the resulting times per compile or match. When JIT is
|
||||||
the number of iterations that are used for timing by follow-
|
used, separate times are given for the initial compile and
|
||||||
ing -t with a number (as a separate item on the command
|
the JIT compile. You can control the number of iterations
|
||||||
line). For example, "-t 1000" iterates 1000 times. The
|
that are used for timing by following -t with a number (as a
|
||||||
default is to iterate 500,000 times.
|
separate item on the command line). For example, "-t 1000"
|
||||||
|
iterates 1000 times. The default is to iterate 500,000 times.
|
||||||
|
|
||||||
-tm This is like -t except that it times only the matching phase,
|
-tm This is like -t except that it times only the matching phase,
|
||||||
not the compile phase.
|
not the compile phase.
|
||||||
|
|
||||||
-T -TM These behave like -t and -tm, but in addition, at the end of
|
-T -TM These behave like -t and -tm, but in addition, at the end of
|
||||||
a run, the total times for all compiles and matches are out-
|
a run, the total times for all compiles and matches are out-
|
||||||
put.
|
put.
|
||||||
|
|
||||||
-version Output the PCRE2 version number and then exit.
|
-version Output the PCRE2 version number and then exit.
|
||||||
|
@ -168,139 +169,139 @@ COMMAND LINE OPTIONS
|
||||||
|
|
||||||
DESCRIPTION
|
DESCRIPTION
|
||||||
|
|
||||||
If pcre2test is given two filename arguments, it reads from the first
|
If pcre2test is given two filename arguments, it reads from the first
|
||||||
and writes to the second. If the first name is "-", input is taken from
|
and writes to the second. If the first name is "-", input is taken from
|
||||||
the standard input. If pcre2test is given only one argument, it reads
|
the standard input. If pcre2test is given only one argument, it reads
|
||||||
from that file and writes to stdout. Otherwise, it reads from stdin and
|
from that file and writes to stdout. Otherwise, it reads from stdin and
|
||||||
writes to stdout. When the input is a terminal, it prompts for each
|
writes to stdout. When the input is a terminal, it prompts for each
|
||||||
line of input, using "re>" to prompt for regular expression patterns,
|
line of input, using "re>" to prompt for regular expression patterns,
|
||||||
and "data>" to prompt for subject lines.
|
and "data>" to prompt for subject lines.
|
||||||
|
|
||||||
When pcre2test is built, a configuration option can specify that it
|
When pcre2test is built, a configuration option can specify that it
|
||||||
should be linked with the libreadline or libedit library. When this is
|
should be linked with the libreadline or libedit library. When this is
|
||||||
done, if the input is from a terminal, it is read using the readline()
|
done, if the input is from a terminal, it is read using the readline()
|
||||||
function. This provides line-editing and history facilities. The output
|
function. This provides line-editing and history facilities. The output
|
||||||
from the -help option states whether or not readline() will be used.
|
from the -help option states whether or not readline() will be used.
|
||||||
|
|
||||||
The program handles any number of tests, each of which consists of a
|
The program handles any number of tests, each of which consists of a
|
||||||
set of input lines. Each set starts with a regular expression pattern,
|
set of input lines. Each set starts with a regular expression pattern,
|
||||||
followed by any number of subject lines to be matched against that pat-
|
followed by any number of subject lines to be matched against that pat-
|
||||||
tern. In between sets of test data, command lines that begin with a
|
tern. In between sets of test data, command lines that begin with a
|
||||||
hash (#) character may appear. This file format, with some restric-
|
hash (#) character may appear. This file format, with some restric-
|
||||||
tions, can also be processed by the perltest.pl script that is distrib-
|
tions, can also be processed by the perltest.pl script that is distrib-
|
||||||
uted with PCRE2 as a means of checking that the behaviour of PCRE2 and
|
uted with PCRE2 as a means of checking that the behaviour of PCRE2 and
|
||||||
Perl is the same.
|
Perl is the same.
|
||||||
|
|
||||||
Each subject line is matched separately and independently. If you want
|
Each subject line is matched separately and independently. If you want
|
||||||
to do multi-line matches, you have to use the \n escape sequence (or \r
|
to do multi-line matches, you have to use the \n escape sequence (or \r
|
||||||
or \r\n, etc., depending on the newline setting) in a single line of
|
or \r\n, etc., depending on the newline setting) in a single line of
|
||||||
input to encode the newline sequences. There is no limit on the length
|
input to encode the newline sequences. There is no limit on the length
|
||||||
of subject lines; the input buffer is automatically extended if it is
|
of subject lines; the input buffer is automatically extended if it is
|
||||||
too small. There is a replication feature that makes it possible to
|
too small. There is a replication feature that makes it possible to
|
||||||
generate long subject lines without having to supply them explicitly.
|
generate long subject lines without having to supply them explicitly.
|
||||||
|
|
||||||
An empty line or the end of the file signals the end of the subject
|
An empty line or the end of the file signals the end of the subject
|
||||||
lines for a test, at which point a new pattern or command line is
|
lines for a test, at which point a new pattern or command line is
|
||||||
expected if there is still input to be read.
|
expected if there is still input to be read.
|
||||||
|
|
||||||
|
|
||||||
COMMAND LINES
|
COMMAND LINES
|
||||||
|
|
||||||
In between sets of test data, a line that begins with a hash (#) char-
|
In between sets of test data, a line that begins with a hash (#) char-
|
||||||
acter is interpreted as a command line. If the first character is fol-
|
acter is interpreted as a command line. If the first character is fol-
|
||||||
lowed by white space or an exclamation mark, the line is treated as a
|
lowed by white space or an exclamation mark, the line is treated as a
|
||||||
comment, and ignored. Otherwise, the following commands are recog-
|
comment, and ignored. Otherwise, the following commands are recog-
|
||||||
nized:
|
nized:
|
||||||
|
|
||||||
#forbid_utf
|
#forbid_utf
|
||||||
|
|
||||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and
|
Subsequent patterns automatically have the PCRE2_NEVER_UTF and
|
||||||
PCRE2_NEVER_UCP options set, which locks out the use of UTF and Unicode
|
PCRE2_NEVER_UCP options set, which locks out the use of UTF and Unicode
|
||||||
property features. This is a trigger guard that is used in test files
|
property features. This is a trigger guard that is used in test files
|
||||||
to ensure that UTF/Unicode tests are not accidentally added to files
|
to ensure that UTF/Unicode tests are not accidentally added to files
|
||||||
that are used when UTF support is not included in the library. This
|
that are used when UTF support is not included in the library. This
|
||||||
effect can also be obtained by the use of #pattern; the difference is
|
effect can also be obtained by the use of #pattern; the difference is
|
||||||
that #forbid_utf cannot be unset, and the automatic options are not
|
that #forbid_utf cannot be unset, and the automatic options are not
|
||||||
displayed in pattern information, to avoid cluttering up test output.
|
displayed in pattern information, to avoid cluttering up test output.
|
||||||
|
|
||||||
#pattern <modifier-list>
|
#pattern <modifier-list>
|
||||||
|
|
||||||
This command sets a default modifier list that applies to all subse-
|
This command sets a default modifier list that applies to all subse-
|
||||||
quent patterns. Modifiers on a pattern can change these settings.
|
quent patterns. Modifiers on a pattern can change these settings.
|
||||||
|
|
||||||
#perltest
|
#perltest
|
||||||
|
|
||||||
The appearance of this line causes all subsequent modifier settings to
|
The appearance of this line causes all subsequent modifier settings to
|
||||||
be checked for compatibility with the perltest.pl script, which is used
|
be checked for compatibility with the perltest.pl script, which is used
|
||||||
to confirm that Perl gives the same results as PCRE2. Also, apart from
|
to confirm that Perl gives the same results as PCRE2. Also, apart from
|
||||||
comment lines, none of the other command lines are permitted, because
|
comment lines, none of the other command lines are permitted, because
|
||||||
they and many of the modifiers are specific to pcre2test, and should
|
they and many of the modifiers are specific to pcre2test, and should
|
||||||
not be used in test files that are also processed by perltest.pl. The
|
not be used in test files that are also processed by perltest.pl. The
|
||||||
#perltest command helps detect tests that are accidentally put in the
|
#perltest command helps detect tests that are accidentally put in the
|
||||||
wrong file.
|
wrong file.
|
||||||
|
|
||||||
#subject <modifier-list>
|
#subject <modifier-list>
|
||||||
|
|
||||||
This command sets a default modifier list that applies to all subse-
|
This command sets a default modifier list that applies to all subse-
|
||||||
quent subject lines. Modifiers on a subject line can change these set-
|
quent subject lines. Modifiers on a subject line can change these set-
|
||||||
tings.
|
tings.
|
||||||
|
|
||||||
|
|
||||||
MODIFIER SYNTAX
|
MODIFIER SYNTAX
|
||||||
|
|
||||||
Modifier lists are used with both pattern and subject lines. Items in a
|
Modifier lists are used with both pattern and subject lines. Items in a
|
||||||
list are separated by commas and optional white space. Some modifiers
|
list are separated by commas and optional white space. Some modifiers
|
||||||
may be given for both patterns and subject lines, whereas others are
|
may be given for both patterns and subject lines, whereas others are
|
||||||
valid for one or the other only. Each modifier has a long name, for
|
valid for one or the other only. Each modifier has a long name, for
|
||||||
example "anchored", and some of them must be followed by an equals sign
|
example "anchored", and some of them must be followed by an equals sign
|
||||||
and a value, for example, "offset=12". Modifiers that do not take val-
|
and a value, for example, "offset=12". Modifiers that do not take val-
|
||||||
ues may be preceded by a minus sign to turn off a previous default set-
|
ues may be preceded by a minus sign to turn off a previous default set-
|
||||||
ting.
|
ting.
|
||||||
|
|
||||||
A few of the more common modifiers can also be specified as single let-
|
A few of the more common modifiers can also be specified as single let-
|
||||||
ters, for example "i" for "caseless". In documentation, following the
|
ters, for example "i" for "caseless". In documentation, following the
|
||||||
Perl convention, these are written with a slash ("the /i modifier") for
|
Perl convention, these are written with a slash ("the /i modifier") for
|
||||||
clarity. Abbreviated modifiers must all be concatenated in the first
|
clarity. Abbreviated modifiers must all be concatenated in the first
|
||||||
item of a modifier list. If the first item is not recognized as a long
|
item of a modifier list. If the first item is not recognized as a long
|
||||||
modifier name, it is interpreted as a sequence of these abbreviations.
|
modifier name, it is interpreted as a sequence of these abbreviations.
|
||||||
For example:
|
For example:
|
||||||
|
|
||||||
/abc/ig,newline=cr,jit=3
|
/abc/ig,newline=cr,jit=3
|
||||||
|
|
||||||
This is a pattern line whose modifier list starts with two one-letter
|
This is a pattern line whose modifier list starts with two one-letter
|
||||||
modifiers (/i and /g). The lower-case abbreviated modifiers are the
|
modifiers (/i and /g). The lower-case abbreviated modifiers are the
|
||||||
same as used in Perl.
|
same as used in Perl.
|
||||||
|
|
||||||
|
|
||||||
PATTERN SYNTAX
|
PATTERN SYNTAX
|
||||||
|
|
||||||
A pattern line must start with one of the following characters (common
|
A pattern line must start with one of the following characters (common
|
||||||
symbols, excluding pattern meta-characters):
|
symbols, excluding pattern meta-characters):
|
||||||
|
|
||||||
/ ! " ' ` - = _ : ; , % & @ ~
|
/ ! " ' ` - = _ : ; , % & @ ~
|
||||||
|
|
||||||
This is interpreted as the pattern's delimiter. A regular expression
|
This is interpreted as the pattern's delimiter. A regular expression
|
||||||
may be continued over several input lines, in which case the newline
|
may be continued over several input lines, in which case the newline
|
||||||
characters are included within it. It is possible to include the delim-
|
characters are included within it. It is possible to include the delim-
|
||||||
iter within the pattern by escaping it with a backslash, for example
|
iter within the pattern by escaping it with a backslash, for example
|
||||||
|
|
||||||
/abc\/def/
|
/abc\/def/
|
||||||
|
|
||||||
If you do this, the escape and the delimiter form part of the pattern,
|
If you do this, the escape and the delimiter form part of the pattern,
|
||||||
but since the delimiters are all non-alphanumeric, this does not affect
|
but since the delimiters are all non-alphanumeric, this does not affect
|
||||||
its interpretation. If the terminating delimiter is immediately fol-
|
its interpretation. If the terminating delimiter is immediately fol-
|
||||||
lowed by a backslash, for example,
|
lowed by a backslash, for example,
|
||||||
|
|
||||||
/abc/\
|
/abc/\
|
||||||
|
|
||||||
then a backslash is added to the end of the pattern. This is done to
|
then a backslash is added to the end of the pattern. This is done to
|
||||||
provide a way of testing the error condition that arises if a pattern
|
provide a way of testing the error condition that arises if a pattern
|
||||||
finishes with a backslash, because
|
finishes with a backslash, because
|
||||||
|
|
||||||
/abc\/
|
/abc\/
|
||||||
|
|
||||||
is interpreted as the first line of a pattern that starts with "abc/",
|
is interpreted as the first line of a pattern that starts with "abc/",
|
||||||
causing pcre2test to read the next line as a continuation of the regu-
|
causing pcre2test to read the next line as a continuation of the regu-
|
||||||
lar expression.
|
lar expression.
|
||||||
|
|
||||||
A pattern can be followed by a modifier list (details below).
|
A pattern can be followed by a modifier list (details below).
|
||||||
|
@ -308,7 +309,7 @@ PATTERN SYNTAX
|
||||||
|
|
||||||
SUBJECT LINE SYNTAX
|
SUBJECT LINE SYNTAX
|
||||||
|
|
||||||
Before each subject line is passed to pcre2_match() or
|
Before each subject line is passed to pcre2_match() or
|
||||||
pcre2_dfa_match(), leading and trailing white space is removed, and the
|
pcre2_dfa_match(), leading and trailing white space is removed, and the
|
||||||
line is scanned for backslash escapes. The following provide a means of
|
line is scanned for backslash escapes. The following provide a means of
|
||||||
encoding non-printing characters in a visible way:
|
encoding non-printing characters in a visible way:
|
||||||
|
@ -328,23 +329,23 @@ SUBJECT LINE SYNTAX
|
||||||
\x{hh...} hexadecimal character (any number of hex digits)
|
\x{hh...} hexadecimal character (any number of hex digits)
|
||||||
|
|
||||||
The use of \x{hh...} is not dependent on the use of the utf modifier on
|
The use of \x{hh...} is not dependent on the use of the utf modifier on
|
||||||
the pattern. It is recognized always. There may be any number of hexa-
|
the pattern. It is recognized always. There may be any number of hexa-
|
||||||
decimal digits inside the braces; invalid values provoke error mes-
|
decimal digits inside the braces; invalid values provoke error mes-
|
||||||
sages.
|
sages.
|
||||||
|
|
||||||
Note that \xhh specifies one byte rather than one character in UTF-8
|
Note that \xhh specifies one byte rather than one character in UTF-8
|
||||||
mode; this makes it possible to construct invalid UTF-8 sequences for
|
mode; this makes it possible to construct invalid UTF-8 sequences for
|
||||||
testing purposes. On the other hand, \x{hh} is interpreted as a UTF-8
|
testing purposes. On the other hand, \x{hh} is interpreted as a UTF-8
|
||||||
character in UTF-8 mode, generating more than one byte if the value is
|
character in UTF-8 mode, generating more than one byte if the value is
|
||||||
greater than 127. When testing the 8-bit library not in UTF-8 mode,
|
greater than 127. When testing the 8-bit library not in UTF-8 mode,
|
||||||
\x{hh} generates one byte for values less than 256, and causes an error
|
\x{hh} generates one byte for values less than 256, and causes an error
|
||||||
for greater values.
|
for greater values.
|
||||||
|
|
||||||
In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
|
In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
|
||||||
possible to construct invalid UTF-16 sequences for testing purposes.
|
possible to construct invalid UTF-16 sequences for testing purposes.
|
||||||
|
|
||||||
In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This
|
In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This
|
||||||
makes it possible to construct invalid UTF-32 sequences for testing
|
makes it possible to construct invalid UTF-32 sequences for testing
|
||||||
purposes.
|
purposes.
|
||||||
|
|
||||||
There is a special backslash sequence that specifies replication of one
|
There is a special backslash sequence that specifies replication of one
|
||||||
|
@ -352,38 +353,38 @@ SUBJECT LINE SYNTAX
|
||||||
|
|
||||||
\[<characters>]{<count>}
|
\[<characters>]{<count>}
|
||||||
|
|
||||||
This makes it possible to test long strings without having to provide
|
This makes it possible to test long strings without having to provide
|
||||||
them as part of the file. For example:
|
them as part of the file. For example:
|
||||||
|
|
||||||
\[abc]{4}
|
\[abc]{4}
|
||||||
|
|
||||||
is converted to "abcabcabcabc". This feature does not support nesting.
|
is converted to "abcabcabcabc". This feature does not support nesting.
|
||||||
To include a closing square bracket in the characters, code it as \x5D.
|
To include a closing square bracket in the characters, code it as \x5D.
|
||||||
|
|
||||||
A backslash followed by an equals sign marke the end of the subject
|
A backslash followed by an equals sign marke the end of the subject
|
||||||
string and the start of a modifier list. For example:
|
string and the start of a modifier list. For example:
|
||||||
|
|
||||||
abc\=notbol,notempty
|
abc\=notbol,notempty
|
||||||
|
|
||||||
A backslash followed by any other non-alphanumeric character just
|
A backslash followed by any other non-alphanumeric character just
|
||||||
escapes that character. A backslash followed by anything else causes an
|
escapes that character. A backslash followed by anything else causes an
|
||||||
error. However, if the very last character in the line is a backslash
|
error. However, if the very last character in the line is a backslash
|
||||||
(and there is no modifier list), it is ignored. This gives a way of
|
(and there is no modifier list), it is ignored. This gives a way of
|
||||||
passing an empty line as data, since a real empty line terminates the
|
passing an empty line as data, since a real empty line terminates the
|
||||||
data input.
|
data input.
|
||||||
|
|
||||||
|
|
||||||
PATTERN MODIFIERS
|
PATTERN MODIFIERS
|
||||||
|
|
||||||
There are three types of modifier that can appear in pattern lines, two
|
There are three types of modifier that can appear in pattern lines, two
|
||||||
of which may also be used in a #pattern command. A pattern's modifier
|
of which may also be used in a #pattern command. A pattern's modifier
|
||||||
list can add to or override default modifiers that were set by a previ-
|
list can add to or override default modifiers that were set by a previ-
|
||||||
ous #pattern command.
|
ous #pattern command.
|
||||||
|
|
||||||
Setting compilation options
|
Setting compilation options
|
||||||
|
|
||||||
The following modifiers set options for pcre2_compile(). The most com-
|
The following modifiers set options for pcre2_compile(). The most com-
|
||||||
mon ones have single-letter abbreviations. See pcreapi for a descrip-
|
mon ones have single-letter abbreviations. See pcreapi for a descrip-
|
||||||
tion of their effects.
|
tion of their effects.
|
||||||
|
|
||||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||||
|
@ -409,13 +410,13 @@ PATTERN MODIFIERS
|
||||||
utf set PCRE2_UTF
|
utf set PCRE2_UTF
|
||||||
|
|
||||||
As well as turning on the PCRE2_UTF option, the utf modifier causes all
|
As well as turning on the PCRE2_UTF option, the utf modifier causes all
|
||||||
non-printing characters in output strings to be printed using the
|
non-printing characters in output strings to be printed using the
|
||||||
\x{hh...} notation. Otherwise, those less than 0x100 are output in hex
|
\x{hh...} notation. Otherwise, those less than 0x100 are output in hex
|
||||||
without the curly brackets.
|
without the curly brackets.
|
||||||
|
|
||||||
Setting compilation controls
|
Setting compilation controls
|
||||||
|
|
||||||
The following modifiers affect the compilation process or request
|
The following modifiers affect the compilation process or request
|
||||||
information about the pattern:
|
information about the pattern:
|
||||||
|
|
||||||
bsr=[anycrlf|unicode] specify \R handling
|
bsr=[anycrlf|unicode] specify \R handling
|
||||||
|
@ -437,7 +438,6 @@ PATTERN MODIFIERS
|
||||||
tables=[0|1|2] select internal tables
|
tables=[0|1|2] select internal tables
|
||||||
|
|
||||||
The effects of these modifiers are described in the following sections.
|
The effects of these modifiers are described in the following sections.
|
||||||
FIXME: Give more examples.
|
|
||||||
|
|
||||||
Newline and \R handling
|
Newline and \R handling
|
||||||
|
|
||||||
|
@ -468,7 +468,32 @@ PATTERN MODIFIERS
|
||||||
|
|
||||||
The info modifier requests information about the compiled pattern
|
The info modifier requests information about the compiled pattern
|
||||||
(whether it is anchored, has a fixed first character, and so on). The
|
(whether it is anchored, has a fixed first character, and so on). The
|
||||||
information is obtained from the pcre2_pattern_info() function.
|
information is obtained from the pcre2_pattern_info() function. Here
|
||||||
|
are some typical examples:
|
||||||
|
|
||||||
|
re> /(?i)(^a|^b)/m,info
|
||||||
|
Capturing subpattern count = 1
|
||||||
|
Compile options: multiline
|
||||||
|
Overall options: caseless multiline
|
||||||
|
First code unit at start or follows newline
|
||||||
|
Subject length lower bound = 1
|
||||||
|
|
||||||
|
re> /(?i)abc/info
|
||||||
|
Capturing subpattern count = 0
|
||||||
|
Compile options: <none>
|
||||||
|
Overall options: caseless
|
||||||
|
First code unit = 'a' (caseless)
|
||||||
|
Last code unit = 'c' (caseless)
|
||||||
|
Subject length lower bound = 3
|
||||||
|
|
||||||
|
"Compile options" are those specified to the compile function; "overall
|
||||||
|
options" have added options that are taken or deduced from the pattern.
|
||||||
|
If both sets of options are the same, just a single "options" line is
|
||||||
|
output. "First code unit" is where any match must start; if there is
|
||||||
|
more than one they are listed as "starting code units". "Last code
|
||||||
|
unit" is the last literal code unit that must be present in any match.
|
||||||
|
This is not necessarily the last character. These lines are omitted if
|
||||||
|
no starting or ending code units are recorded.
|
||||||
|
|
||||||
Specifying a pattern in hex
|
Specifying a pattern in hex
|
||||||
|
|
||||||
|
@ -482,7 +507,7 @@ PATTERN MODIFIERS
|
||||||
binary zero characters. By default, pcre2test passes patterns as zero-
|
binary zero characters. By default, pcre2test passes patterns as zero-
|
||||||
terminated strings to pcre2_compile(), giving the length as
|
terminated strings to pcre2_compile(), giving the length as
|
||||||
PCRE2_ZERO_TERMINATED. However, for patterns specified in hexadecimal,
|
PCRE2_ZERO_TERMINATED. However, for patterns specified in hexadecimal,
|
||||||
the length of the pattern is passed.
|
the actual length of the pattern is passed.
|
||||||
|
|
||||||
JIT compilation
|
JIT compilation
|
||||||
|
|
||||||
|
@ -505,7 +530,7 @@ PATTERN MODIFIERS
|
||||||
size of the JIT stack.
|
size of the JIT stack.
|
||||||
|
|
||||||
If the jitfast modifier is specified, matching is done using the JIT
|
If the jitfast modifier is specified, matching is done using the JIT
|
||||||
"fast path" interface (pcre2_jit_match()), which skips some of the san-
|
"fast path" interface, pcre2_jit_match(), which skips some of the san-
|
||||||
ity checks that are done by pcre2_match(), and of course does not work
|
ity checks that are done by pcre2_match(), and of course does not work
|
||||||
when JIT is not supported. If jitfast is specified without jit, jit=7
|
when JIT is not supported. If jitfast is specified without jit, jit=7
|
||||||
is assumed.
|
is assumed.
|
||||||
|
@ -533,11 +558,16 @@ PATTERN MODIFIERS
|
||||||
|
|
||||||
Showing pattern memory
|
Showing pattern memory
|
||||||
|
|
||||||
The /memory modifier causes the size in bytes of the memory block used
|
The /memory modifier causes the size in bytes of the memory used to
|
||||||
to hold the compiled pattern to be output. This does not include the
|
hold the compiled pattern to be output. This does not include the size
|
||||||
size of the pcre2_code block; it is just the actual compiled data. If
|
of the pcre2_code block; it is just the actual compiled data. If the
|
||||||
the pattern is subsequently passed to the JIT compiler, the size of the
|
pattern is subsequently passed to the JIT compiler, the size of the JIT
|
||||||
JIT compiled code is also output.
|
compiled code is also output. Here is an example:
|
||||||
|
|
||||||
|
re> /a(b)c/jit,memory
|
||||||
|
Memory allocation (code space): 21
|
||||||
|
Memory allocation (JIT code): 1910
|
||||||
|
|
||||||
|
|
||||||
Limiting nested parentheses
|
Limiting nested parentheses
|
||||||
|
|
||||||
|
@ -573,7 +603,7 @@ PATTERN MODIFIERS
|
||||||
mentation for details). If the number specified by the modifier is
|
mentation for details). If the number specified by the modifier is
|
||||||
greater than zero, pcre2_set_compile_recursion_guard() is called to set
|
greater than zero, pcre2_set_compile_recursion_guard() is called to set
|
||||||
up callback from pcre2_compile() to a local function. The argument it
|
up callback from pcre2_compile() to a local function. The argument it
|
||||||
is passed is the current nesting parenthesis depth; if this is greater
|
receives is the current nesting parenthesis depth; if this is greater
|
||||||
than the value given by the modifier, non-zero is returned, causing the
|
than the value given by the modifier, non-zero is returned, causing the
|
||||||
compilation to be aborted.
|
compilation to be aborted.
|
||||||
|
|
||||||
|
@ -606,6 +636,7 @@ PATTERN MODIFIERS
|
||||||
allusedtext show all consulted text
|
allusedtext show all consulted text
|
||||||
/g global global matching
|
/g global global matching
|
||||||
mark show mark values
|
mark show mark values
|
||||||
|
replace=<string> specify a replacement string
|
||||||
startchar show starting character when relevant
|
startchar show starting character when relevant
|
||||||
|
|
||||||
These modifiers may not appear in a #pattern command. If you want them
|
These modifiers may not appear in a #pattern command. If you want them
|
||||||
|
@ -671,31 +702,31 @@ SUBJECT MODIFIERS
|
||||||
offset=<n> set starting offset
|
offset=<n> set starting offset
|
||||||
ovector=<n> set size of output vector
|
ovector=<n> set size of output vector
|
||||||
recursion_limit=<n> set a recursion limit
|
recursion_limit=<n> set a recursion limit
|
||||||
|
replace=<string> specify a replacement string
|
||||||
startchar show startchar when relevant
|
startchar show startchar when relevant
|
||||||
zero_terminate pass the subject as zero-terminated
|
zero_terminate pass the subject as zero-terminated
|
||||||
|
|
||||||
The effects of these modifiers are described in the following sections.
|
The effects of these modifiers are described in the following sections.
|
||||||
FIXME: Give more examples.
|
|
||||||
|
|
||||||
Showing more text
|
Showing more text
|
||||||
|
|
||||||
The aftertext modifier requests that as well as outputting the sub-
|
The aftertext modifier requests that as well as outputting the sub-
|
||||||
string that matched the entire pattern, pcre2test should in addition
|
string that matched the entire pattern, pcre2test should in addition
|
||||||
output the remainder of the subject string. This is useful for tests
|
output the remainder of the subject string. This is useful for tests
|
||||||
where the subject contains multiple copies of the same substring. The
|
where the subject contains multiple copies of the same substring. The
|
||||||
allaftertext modifier requests the same action for captured substrings
|
allaftertext modifier requests the same action for captured substrings
|
||||||
as well as the main matched substring. In each case the remainder is
|
as well as the main matched substring. In each case the remainder is
|
||||||
output on the following line with a plus character following the cap-
|
output on the following line with a plus character following the cap-
|
||||||
ture number.
|
ture number.
|
||||||
|
|
||||||
The allusedtext modifier requests that all the text that was consulted
|
The allusedtext modifier requests that all the text that was consulted
|
||||||
during a successful pattern match by the interpreter should be shown.
|
during a successful pattern match by the interpreter should be shown.
|
||||||
This feature is not supported for JIT matching, and if requested with
|
This feature is not supported for JIT matching, and if requested with
|
||||||
JIT it is ignored (with a warning message). Setting this modifier
|
JIT it is ignored (with a warning message). Setting this modifier
|
||||||
affects the output if there is a lookbehind at the start of a match, or
|
affects the output if there is a lookbehind at the start of a match, or
|
||||||
a lookahead at the end, or if \K is used in the pattern. Characters
|
a lookahead at the end, or if \K is used in the pattern. Characters
|
||||||
that precede or follow the start and end of the actual match are indi-
|
that precede or follow the start and end of the actual match are indi-
|
||||||
cated in the output by '<' or '>' characters underneath them. Here is
|
cated in the output by '<' or '>' characters underneath them. Here is
|
||||||
an example:
|
an example:
|
||||||
|
|
||||||
re> /(?<=pqr)abc(?=xyz)/
|
re> /(?<=pqr)abc(?=xyz)/
|
||||||
|
@ -703,15 +734,15 @@ SUBJECT MODIFIERS
|
||||||
0: pqrabcxyz
|
0: pqrabcxyz
|
||||||
<<< >>>
|
<<< >>>
|
||||||
|
|
||||||
This shows that the matched string is "abc", with the preceding and
|
This shows that the matched string is "abc", with the preceding and
|
||||||
following strings "pqr" and "xyz" also consulted during the match.
|
following strings "pqr" and "xyz" also consulted during the match.
|
||||||
|
|
||||||
The startchar modifier requests that the starting character for the
|
The startchar modifier requests that the starting character for the
|
||||||
match be indicated, if it is different to the start of the matched
|
match be indicated, if it is different to the start of the matched
|
||||||
string. The only time when this occurs is when \K has been processed as
|
string. The only time when this occurs is when \K has been processed as
|
||||||
part of the match. In this situation, the output for the matched string
|
part of the match. In this situation, the output for the matched string
|
||||||
is displayed from the starting character instead of from the match
|
is displayed from the starting character instead of from the match
|
||||||
point, with circumflex characters under the earlier characters. For
|
point, with circumflex characters under the earlier characters. For
|
||||||
example:
|
example:
|
||||||
|
|
||||||
re> /abc\Kxyz/
|
re> /abc\Kxyz/
|
||||||
|
@ -719,7 +750,7 @@ SUBJECT MODIFIERS
|
||||||
0: abcxyz
|
0: abcxyz
|
||||||
^^^
|
^^^
|
||||||
|
|
||||||
Unlike allusedtext, the startchar modifier can be used with JIT. How-
|
Unlike allusedtext, the startchar modifier can be used with JIT. How-
|
||||||
ever, these two modifiers are mutually exclusive.
|
ever, these two modifiers are mutually exclusive.
|
||||||
|
|
||||||
Showing the value of all capture groups
|
Showing the value of all capture groups
|
||||||
|
@ -727,183 +758,223 @@ SUBJECT MODIFIERS
|
||||||
The allcaptures modifier requests that the values of all potential cap-
|
The allcaptures modifier requests that the values of all potential cap-
|
||||||
tured parentheses be output after a match. By default, only those up to
|
tured parentheses be output after a match. By default, only those up to
|
||||||
the highest one actually used in the match are output (corresponding to
|
the highest one actually used in the match are output (corresponding to
|
||||||
the return code from pcre2_match()). Groups that did not take part in
|
the return code from pcre2_match()). Groups that did not take part in
|
||||||
the match are output as "<unset>".
|
the match are output as "<unset>".
|
||||||
|
|
||||||
Testing callouts
|
Testing callouts
|
||||||
|
|
||||||
A callout function is supplied when pcre2test calls the library match-
|
A callout function is supplied when pcre2test calls the library match-
|
||||||
ing functions, unless callout_none is specified. If callout_capture is
|
ing functions, unless callout_none is specified. If callout_capture is
|
||||||
set, the current captured groups are output when a callout occurs.
|
set, the current captured groups are output when a callout occurs.
|
||||||
|
|
||||||
The callout_fail modifier can be given one or two numbers. If there is
|
The callout_fail modifier can be given one or two numbers. If there is
|
||||||
only one number, 1 is returned instead of 0 when a callout of that num-
|
only one number, 1 is returned instead of 0 when a callout of that num-
|
||||||
ber is reached. If two numbers are given, 1 is returned when callout
|
ber is reached. If two numbers are given, 1 is returned when callout
|
||||||
<n> is reached for the <m>th time.
|
<n> is reached for the <m>th time.
|
||||||
|
|
||||||
The callout_data modifier can be given an unsigned or a negative num-
|
The callout_data modifier can be given an unsigned or a negative num-
|
||||||
ber. Any value other than zero is used as a return from pcre2test's
|
ber. Any value other than zero is used as a return from pcre2test's
|
||||||
callout function.
|
callout function.
|
||||||
|
|
||||||
Testing substring extraction functions
|
|
||||||
|
|
||||||
The copy and get modifiers can be used to test the pcre2_sub-
|
|
||||||
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
|
|
||||||
given more than once, and each can specify a group name or number, for
|
|
||||||
example:
|
|
||||||
|
|
||||||
abcd\=copy=1,copy=3,get=G1
|
|
||||||
|
|
||||||
If the #subject command is used to set default copy and get lists,
|
|
||||||
these can be unset by specifying a negative number for numbered groups
|
|
||||||
and an empty name for named groups.
|
|
||||||
|
|
||||||
The getall modifier tests pcre2_substring_list_get(), which extracts
|
|
||||||
all captured substrings.
|
|
||||||
|
|
||||||
If the subject line is successfully matched, the substrings extracted
|
|
||||||
by the convenience functions are output with C, G, or L after the
|
|
||||||
string number instead of a colon. This is in addition to the normal
|
|
||||||
full list. The string length (that is, the return from the extraction
|
|
||||||
function) is given in parentheses after each substring.
|
|
||||||
|
|
||||||
Finding all matches in a string
|
Finding all matches in a string
|
||||||
|
|
||||||
Searching for all possible matches within a subject can be requested by
|
Searching for all possible matches within a subject can be requested by
|
||||||
the global or /altglobal modifier. After finding a match, the matching
|
the global or /altglobal modifier. After finding a match, the matching
|
||||||
function is called again to search the remainder of the subject. The
|
function is called again to search the remainder of the subject. The
|
||||||
difference between global and altglobal is that the former uses the
|
difference between global and altglobal is that the former uses the
|
||||||
start_offset argument to pcre2_match() or pcre2_dfa_match() to start
|
start_offset argument to pcre2_match() or pcre2_dfa_match() to start
|
||||||
searching at a new point within the entire string (which is what Perl
|
searching at a new point within the entire string (which is what Perl
|
||||||
does), whereas the latter passes over a shortened substring. This makes
|
does), whereas the latter passes over a shortened substring. This makes
|
||||||
a difference to the matching process if the pattern begins with a look-
|
a difference to the matching process if the pattern begins with a look-
|
||||||
behind assertion (including \b or \B).
|
behind assertion (including \b or \B).
|
||||||
|
|
||||||
If an empty string is matched, the next match is done with the
|
If an empty string is matched, the next match is done with the
|
||||||
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
|
||||||
for another, non-empty, match at the same point in the subject. If this
|
for another, non-empty, match at the same point in the subject. If this
|
||||||
match fails, the start offset is advanced, and the normal match is
|
match fails, the start offset is advanced, and the normal match is
|
||||||
retried. This imitates the way Perl handles such cases when using the
|
retried. This imitates the way Perl handles such cases when using the
|
||||||
/g modifier or the split() function. Normally, the start offset is
|
/g modifier or the split() function. Normally, the start offset is
|
||||||
advanced by one character, but if the newline convention recognizes
|
advanced by one character, but if the newline convention recognizes
|
||||||
CRLF as a newline, and the current character is CR followed by LF, an
|
CRLF as a newline, and the current character is CR followed by LF, an
|
||||||
advance of two is used.
|
advance of two is used.
|
||||||
|
|
||||||
|
Testing substring extraction functions
|
||||||
|
|
||||||
|
The copy and get modifiers can be used to test the pcre2_sub-
|
||||||
|
string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be
|
||||||
|
given more than once, and each can specify a group name or number, for
|
||||||
|
example:
|
||||||
|
|
||||||
|
abcd\=copy=1,copy=3,get=G1
|
||||||
|
|
||||||
|
If the #subject command is used to set default copy and get lists,
|
||||||
|
these can be unset by specifying a negative number for numbered groups
|
||||||
|
and an empty name for named groups.
|
||||||
|
|
||||||
|
The getall modifier tests pcre2_substring_list_get(), which extracts
|
||||||
|
all captured substrings.
|
||||||
|
|
||||||
|
If the subject line is successfully matched, the substrings extracted
|
||||||
|
by the convenience functions are output with C, G, or L after the
|
||||||
|
string number instead of a colon. This is in addition to the normal
|
||||||
|
full list. The string length (that is, the return from the extraction
|
||||||
|
function) is given in parentheses after each substring.
|
||||||
|
|
||||||
|
Testing the substitution function
|
||||||
|
|
||||||
|
If the replace modifier is set, the pcre2_substitute() function is
|
||||||
|
called instead of one of the matching functions. Unlike subject
|
||||||
|
strings, pcre2test does not process replacement strings for escape
|
||||||
|
sequences. In UTF mode, a replacement string is checked to see if it is
|
||||||
|
a valid UTF-8 string. If so, it is correctly converted to a UTF string
|
||||||
|
of the appropriate code unit width. If it is not a valid UTF-8 string,
|
||||||
|
the individual code units are copied directly. This provides a means of
|
||||||
|
passing an invalid UTF-8 string for testing purposes.
|
||||||
|
|
||||||
|
If the global modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
|
||||||
|
pcre2_substitute(). After a successful substitution, the modified
|
||||||
|
string is output, preceded by the number of replacements. This may be
|
||||||
|
zero if there were no matches. Here is a simple example of a substitu-
|
||||||
|
tion test:
|
||||||
|
|
||||||
|
/abc/replace=xxx
|
||||||
|
=abc=abc=
|
||||||
|
1: =xxx=abc=
|
||||||
|
=abc=abc=\=global
|
||||||
|
2: =xxx=xxx=
|
||||||
|
|
||||||
|
Subject and replacement strings should be kept relatively short for
|
||||||
|
substitution tests, as fixed-size buffers are used. To make it easy to
|
||||||
|
test for buffer overflow, if the replacement string starts with a num-
|
||||||
|
ber in square brackets, that number is passed to pcre2_substitute() as
|
||||||
|
the size of the output buffer, with the replacement string starting at
|
||||||
|
the next character. Here is an example that tests the edge case:
|
||||||
|
|
||||||
|
/abc/
|
||||||
|
123abc123\=replace=[10]XYZ
|
||||||
|
1: 123XYZ123
|
||||||
|
123abc123\=replace=[9]XYZ
|
||||||
|
Failed: error -47: no more memory
|
||||||
|
|
||||||
|
A replacement string is ignored with POSIX and DFA matching. Specifying
|
||||||
|
partial matching provokes an error return ("bad option value") from
|
||||||
|
pcre2_substitute().
|
||||||
|
|
||||||
Setting the JIT stack size
|
Setting the JIT stack size
|
||||||
|
|
||||||
The jitstack modifier provides a way of setting the maximum stack size
|
The jitstack modifier provides a way of setting the maximum stack size
|
||||||
that is used by the just-in-time optimization code. It is ignored if
|
that is used by the just-in-time optimization code. It is ignored if
|
||||||
JIT optimization is not being used. The value is a number of kilobytes.
|
JIT optimization is not being used. The value is a number of kilobytes.
|
||||||
Providing a stack that is larger than the default 32K is necessary only
|
Providing a stack that is larger than the default 32K is necessary only
|
||||||
for very complicated patterns.
|
for very complicated patterns.
|
||||||
|
|
||||||
Setting match and recursion limits
|
Setting match and recursion limits
|
||||||
|
|
||||||
The match_limit and recursion_limit modifiers set the appropriate lim-
|
The match_limit and recursion_limit modifiers set the appropriate lim-
|
||||||
its in the match context. These values are ignored when the find_limits
|
its in the match context. These values are ignored when the find_limits
|
||||||
modifier is specified.
|
modifier is specified.
|
||||||
|
|
||||||
Finding minimum limits
|
Finding minimum limits
|
||||||
|
|
||||||
If the find_limits modifier is present, pcre2test calls pcre2_match()
|
If the find_limits modifier is present, pcre2test calls pcre2_match()
|
||||||
several times, setting different values in the match context via
|
several times, setting different values in the match context via
|
||||||
pcre2_set_match_limit() and pcre2_set_recursion_limit() until it finds
|
pcre2_set_match_limit() and pcre2_set_recursion_limit() until it finds
|
||||||
the minimum values for each parameter that allow pcre2_match() to com-
|
the minimum values for each parameter that allow pcre2_match() to com-
|
||||||
plete without error.
|
plete without error.
|
||||||
|
|
||||||
If JIT is being used, only the match limit is relevant. If DFA matching
|
If JIT is being used, only the match limit is relevant. If DFA matching
|
||||||
is being used, neither limit is relevant, and this modifier is ignored
|
is being used, neither limit is relevant, and this modifier is ignored
|
||||||
(with a warning message).
|
(with a warning message).
|
||||||
|
|
||||||
The match_limit number is a measure of the amount of backtracking that
|
The match_limit number is a measure of the amount of backtracking that
|
||||||
takes place, and learning the minimum value can be instructive. For
|
takes place, and learning the minimum value can be instructive. For
|
||||||
most simple matches, the number is quite small, but for patterns with
|
most simple matches, the number is quite small, but for patterns with
|
||||||
very large numbers of matching possibilities, it can become large very
|
very large numbers of matching possibilities, it can become large very
|
||||||
quickly with increasing length of subject string. The
|
quickly with increasing length of subject string. The
|
||||||
match_limit_recursion number is a measure of how much stack (or, if
|
match_limit_recursion number is a measure of how much stack (or, if
|
||||||
PCRE2 is compiled with NO_RECURSE, how much heap) memory is needed to
|
PCRE2 is compiled with NO_RECURSE, how much heap) memory is needed to
|
||||||
complete the match attempt.
|
complete the match attempt.
|
||||||
|
|
||||||
Showing MARK names
|
Showing MARK names
|
||||||
|
|
||||||
|
|
||||||
The mark modifier causes the names from backtracking control verbs that
|
The mark modifier causes the names from backtracking control verbs that
|
||||||
are returned from calls to pcre2_match() to be displayed. If a mark is
|
are returned from calls to pcre2_match() to be displayed. If a mark is
|
||||||
returned for a match, non-match, or partial match, pcre2test shows it.
|
returned for a match, non-match, or partial match, pcre2test shows it.
|
||||||
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
For a match, it is on a line by itself, tagged with "MK:". Otherwise,
|
||||||
it is added to the non-match message.
|
it is added to the non-match message.
|
||||||
|
|
||||||
Showing memory usage
|
Showing memory usage
|
||||||
|
|
||||||
The memory modifier causes pcre2test to log all memory allocation and
|
The memory modifier causes pcre2test to log all memory allocation and
|
||||||
freeing calls that occur during a match operation.
|
freeing calls that occur during a match operation.
|
||||||
|
|
||||||
Setting a starting offset
|
Setting a starting offset
|
||||||
|
|
||||||
The offset modifier sets an offset in the subject string at which
|
The offset modifier sets an offset in the subject string at which
|
||||||
matching starts. Its value is a number of code units, not characters.
|
matching starts. Its value is a number of code units, not characters.
|
||||||
|
|
||||||
Setting the size of the output vector
|
Setting the size of the output vector
|
||||||
|
|
||||||
The ovector modifier applies only to the subject line in which it
|
The ovector modifier applies only to the subject line in which it
|
||||||
appears, though of course it can also be used to set a default in a
|
appears, though of course it can also be used to set a default in a
|
||||||
#subject command. It specifies the number of pairs of offsets that are
|
#subject command. It specifies the number of pairs of offsets that are
|
||||||
available for storing matching information. The default is 15.
|
available for storing matching information. The default is 15.
|
||||||
|
|
||||||
A value of zero is useful when testing the POSIX API because it causes
|
A value of zero is useful when testing the POSIX API because it causes
|
||||||
regexec() to be called with a NULL capture vector. When not testing the
|
regexec() to be called with a NULL capture vector. When not testing the
|
||||||
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
POSIX API, a value of zero is used to cause pcre2_match_data_cre-
|
||||||
ate_from_pattern to be called, in order to create a match block of
|
ate_from_pattern() to be called, in order to create a match block of
|
||||||
exactly the right size for the pattern. (It is not possible to create a
|
exactly the right size for the pattern. (It is not possible to create a
|
||||||
match block with a zero-length ovector; there is always one pair of
|
match block with a zero-length ovector; there is always at least one
|
||||||
offsets.)
|
pair of offsets.)
|
||||||
|
|
||||||
Passing the subject as zero-terminated
|
Passing the subject as zero-terminated
|
||||||
|
|
||||||
By default, the subject string is passed to a native API matching func-
|
By default, the subject string is passed to a native API matching func-
|
||||||
tion with its correct length. In order to test the facility for passing
|
tion with its correct length. In order to test the facility for passing
|
||||||
a zero-terminated string, the zero_terminate modifier is provided. It
|
a zero-terminated string, the zero_terminate modifier is provided. It
|
||||||
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
|
causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
|
||||||
via the POSIX interface, this modifier has no effect, as there is no
|
via the POSIX interface, this modifier has no effect, as there is no
|
||||||
facility for passing a length.)
|
facility for passing a length.)
|
||||||
|
|
||||||
When testing pcre2_substitute, this modifier also has the effect of
|
When testing pcre2_substitute(), this modifier also has the effect of
|
||||||
passing the replacement string as zero-terminated.
|
passing the replacement string as zero-terminated.
|
||||||
|
|
||||||
|
|
||||||
THE ALTERNATIVE MATCHING FUNCTION
|
THE ALTERNATIVE MATCHING FUNCTION
|
||||||
|
|
||||||
By default, pcre2test uses the standard PCRE2 matching function,
|
By default, pcre2test uses the standard PCRE2 matching function,
|
||||||
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
pcre2_match() to match each subject line. PCRE2 also supports an alter-
|
||||||
native matching function, pcre2_dfa_match(), which operates in a dif-
|
native matching function, pcre2_dfa_match(), which operates in a dif-
|
||||||
ferent way, and has some restrictions. The differences between the two
|
ferent way, and has some restrictions. The differences between the two
|
||||||
functions are described in the pcre2matching documentation.
|
functions are described in the pcre2matching documentation.
|
||||||
|
|
||||||
If the dfa modifier is set, the alternative matching function is used.
|
If the dfa modifier is set, the alternative matching function is used.
|
||||||
This function finds all possible matches at a given point in the sub-
|
This function finds all possible matches at a given point in the sub-
|
||||||
ject. If, however, the dfa_shortest modifier is set, processing stops
|
ject. If, however, the dfa_shortest modifier is set, processing stops
|
||||||
after the first match is found. This is always the shortest possible
|
after the first match is found. This is always the shortest possible
|
||||||
match.
|
match.
|
||||||
|
|
||||||
|
|
||||||
DEFAULT OUTPUT FROM pcre2test
|
DEFAULT OUTPUT FROM pcre2test
|
||||||
|
|
||||||
This section describes the output when the normal matching function,
|
This section describes the output when the normal matching function,
|
||||||
pcre2_match(), is being used.
|
pcre2_match(), is being used.
|
||||||
|
|
||||||
When a match succeeds, pcre2test outputs the list of captured sub-
|
When a match succeeds, pcre2test outputs the list of captured sub-
|
||||||
strings, starting with number 0 for the string that matched the whole
|
strings, starting with number 0 for the string that matched the whole
|
||||||
pattern. Otherwise, it outputs "No match" when the return is
|
pattern. Otherwise, it outputs "No match" when the return is
|
||||||
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
PCRE2_ERROR_NOMATCH, or "Partial match:" followed by the partially
|
||||||
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
matching substring when the return is PCRE2_ERROR_PARTIAL. (Note that
|
||||||
this is the entire substring that was inspected during the partial
|
this is the entire substring that was inspected during the partial
|
||||||
match; it may include characters before the actual match start if a
|
match; it may include characters before the actual match start if a
|
||||||
lookbehind assertion, \K, \b, or \B was involved.)
|
lookbehind assertion, \K, \b, or \B was involved.)
|
||||||
|
|
||||||
For any other return, pcre2test outputs the PCRE2 negative error number
|
For any other return, pcre2test outputs the PCRE2 negative error number
|
||||||
and a short descriptive phrase. If the error is a failed UTF string
|
and a short descriptive phrase. If the error is a failed UTF string
|
||||||
check, the offset of the start of the failing character and the reason
|
check, the offset of the start of the failing character and the reason
|
||||||
code are also output. Here is an example of an interactive pcre2test
|
code are also output. Here is an example of an interactive pcre2test
|
||||||
run.
|
run.
|
||||||
|
|
||||||
$ pcre2test
|
$ pcre2test
|
||||||
|
@ -917,10 +988,10 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
No match
|
No match
|
||||||
|
|
||||||
Unset capturing substrings that are not followed by one that is set are
|
Unset capturing substrings that are not followed by one that is set are
|
||||||
not returned by pcre2_match(), and are not shown by pcre2test. In the
|
not returned by pcre2_match(), and are not shown by pcre2test. In the
|
||||||
following example, there are two capturing substrings, but when the
|
following example, there are two capturing substrings, but when the
|
||||||
first data line is matched, the second, unset substring is not shown.
|
first data line is matched, the second, unset substring is not shown.
|
||||||
An "internal" unset substring is shown as "<unset>", as for the second
|
An "internal" unset substring is shown as "<unset>", as for the second
|
||||||
data line.
|
data line.
|
||||||
|
|
||||||
re> /(a)|(b)/
|
re> /(a)|(b)/
|
||||||
|
@ -932,11 +1003,11 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
1: <unset>
|
1: <unset>
|
||||||
2: b
|
2: b
|
||||||
|
|
||||||
If the strings contain any non-printing characters, they are output as
|
If the strings contain any non-printing characters, they are output as
|
||||||
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
\xhh escapes if the value is less than 256 and UTF mode is not set.
|
||||||
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
Otherwise they are output as \x{hh...} escapes. See below for the defi-
|
||||||
nition of non-printing characters. If the /aftertext modifier is set,
|
nition of non-printing characters. If the /aftertext modifier is set,
|
||||||
the output for substring 0 is followed by the the rest of the subject
|
the output for substring 0 is followed by the the rest of the subject
|
||||||
string, identified by "0+" like this:
|
string, identified by "0+" like this:
|
||||||
|
|
||||||
re> /cat/aftertext
|
re> /cat/aftertext
|
||||||
|
@ -944,7 +1015,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
0: cat
|
0: cat
|
||||||
0+ aract
|
0+ aract
|
||||||
|
|
||||||
If global matching is requested, the results of successive matching
|
If global matching is requested, the results of successive matching
|
||||||
attempts are output in sequence, like this:
|
attempts are output in sequence, like this:
|
||||||
|
|
||||||
re> /\Bi(\w\w)/g
|
re> /\Bi(\w\w)/g
|
||||||
|
@ -956,8 +1027,8 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
0: ipp
|
0: ipp
|
||||||
1: pp
|
1: pp
|
||||||
|
|
||||||
"No match" is output only if the first match attempt fails. Here is an
|
"No match" is output only if the first match attempt fails. Here is an
|
||||||
example of a failure message (the offset 4 that is specified by \>4 is
|
example of a failure message (the offset 4 that is specified by \>4 is
|
||||||
past the end of the subject string):
|
past the end of the subject string):
|
||||||
|
|
||||||
re> /xyz/
|
re> /xyz/
|
||||||
|
@ -965,7 +1036,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
Error -24 (bad offset value)
|
Error -24 (bad offset value)
|
||||||
|
|
||||||
Note that whereas patterns can be continued over several lines (a plain
|
Note that whereas patterns can be continued over several lines (a plain
|
||||||
">" prompt is used for continuations), subject lines may not. However
|
">" prompt is used for continuations), subject lines may not. However
|
||||||
newlines can be included in a subject by means of the \n escape (or \r,
|
newlines can be included in a subject by means of the \n escape (or \r,
|
||||||
\r\n, etc., depending on the newline sequence setting).
|
\r\n, etc., depending on the newline sequence setting).
|
||||||
|
|
||||||
|
@ -973,7 +1044,7 @@ DEFAULT OUTPUT FROM pcre2test
|
||||||
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
|
|
||||||
When the alternative matching function, pcre2_dfa_match(), is used, the
|
When the alternative matching function, pcre2_dfa_match(), is used, the
|
||||||
output consists of a list of all the matches that start at the first
|
output consists of a list of all the matches that start at the first
|
||||||
point in the subject where there is at least one match. For example:
|
point in the subject where there is at least one match. For example:
|
||||||
|
|
||||||
re> /(tang|tangerine|tan)/
|
re> /(tang|tangerine|tan)/
|
||||||
|
@ -982,11 +1053,11 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
1: tang
|
1: tang
|
||||||
2: tan
|
2: tan
|
||||||
|
|
||||||
(Using the normal matching function on this data finds only "tang".)
|
(Using the normal matching function on this data finds only "tang".)
|
||||||
The longest matching string is always given first (and numbered zero).
|
The longest matching string is always given first (and numbered zero).
|
||||||
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
After a PCRE2_ERROR_PARTIAL return, the output is "Partial match:",
|
||||||
followed by the partially matching substring. (Note that this is the
|
followed by the partially matching substring. (Note that this is the
|
||||||
entire substring that was inspected during the partial match; it may
|
entire substring that was inspected during the partial match; it may
|
||||||
include characters before the actual match start if a lookbehind asser-
|
include characters before the actual match start if a lookbehind asser-
|
||||||
tion, \K, \b, or \B was involved.)
|
tion, \K, \b, or \B was involved.)
|
||||||
|
|
||||||
|
@ -1002,16 +1073,16 @@ OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||||
1: tan
|
1: tan
|
||||||
0: tan
|
0: tan
|
||||||
|
|
||||||
The alternative matching function does not support substring capture,
|
The alternative matching function does not support substring capture,
|
||||||
so the modifiers that are concerned with captured substrings are not
|
so the modifiers that are concerned with captured substrings are not
|
||||||
relevant.
|
relevant.
|
||||||
|
|
||||||
|
|
||||||
RESTARTING AFTER A PARTIAL MATCH
|
RESTARTING AFTER A PARTIAL MATCH
|
||||||
|
|
||||||
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
When the alternative matching function has given the PCRE2_ERROR_PAR-
|
||||||
TIAL return, indicating that the subject partially matched the pattern,
|
TIAL return, indicating that the subject partially matched the pattern,
|
||||||
you can restart the match with additional subject data by means of the
|
you can restart the match with additional subject data by means of the
|
||||||
dfa_restart modifier. For example:
|
dfa_restart modifier. For example:
|
||||||
|
|
||||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||||
|
@ -1020,29 +1091,29 @@ RESTARTING AFTER A PARTIAL MATCH
|
||||||
data> n05\=dfa,dfa_restart
|
data> n05\=dfa,dfa_restart
|
||||||
0: n05
|
0: n05
|
||||||
|
|
||||||
For further information about partial matching, see the pcre2partial
|
For further information about partial matching, see the pcre2partial
|
||||||
documentation.
|
documentation.
|
||||||
|
|
||||||
|
|
||||||
CALLOUTS
|
CALLOUTS
|
||||||
|
|
||||||
If the pattern contains any callout requests, pcre2test's callout func-
|
If the pattern contains any callout requests, pcre2test's callout func-
|
||||||
tion is called during matching. This works with both matching func-
|
tion is called during matching. This works with both matching func-
|
||||||
tions. By default, the called function displays the callout number, the
|
tions. By default, the called function displays the callout number, the
|
||||||
start and current positions in the text at the callout time, and the
|
start and current positions in the text at the callout time, and the
|
||||||
next pattern item to be tested. For example:
|
next pattern item to be tested. For example:
|
||||||
|
|
||||||
--->pqrabcdef
|
--->pqrabcdef
|
||||||
0 ^ ^ \d
|
0 ^ ^ \d
|
||||||
|
|
||||||
This output indicates that callout number 0 occurred for a match
|
This output indicates that callout number 0 occurred for a match
|
||||||
attempt starting at the fourth character of the subject string, when
|
attempt starting at the fourth character of the subject string, when
|
||||||
the pointer was at the seventh character, and when the next pattern
|
the pointer was at the seventh character, and when the next pattern
|
||||||
item was \d. Just one circumflex is output if the start and current
|
item was \d. Just one circumflex is output if the start and current
|
||||||
positions are the same.
|
positions are the same.
|
||||||
|
|
||||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||||
a result of the /auto_callout pattern modifier. In this case, instead
|
a result of the /auto_callout pattern modifier. In this case, instead
|
||||||
of showing the callout number, the offset in the pattern, preceded by a
|
of showing the callout number, the offset in the pattern, preceded by a
|
||||||
plus, is output. For example:
|
plus, is output. For example:
|
||||||
|
|
||||||
|
@ -1056,7 +1127,7 @@ CALLOUTS
|
||||||
0: E*
|
0: E*
|
||||||
|
|
||||||
If a pattern contains (*MARK) items, an additional line is output when-
|
If a pattern contains (*MARK) items, an additional line is output when-
|
||||||
ever a change of latest mark is passed to the callout function. For
|
ever a change of latest mark is passed to the callout function. For
|
||||||
example:
|
example:
|
||||||
|
|
||||||
re> /a(*MARK:X)bc/auto_callout
|
re> /a(*MARK:X)bc/auto_callout
|
||||||
|
@ -1070,30 +1141,30 @@ CALLOUTS
|
||||||
+12 ^ ^
|
+12 ^ ^
|
||||||
0: abc
|
0: abc
|
||||||
|
|
||||||
The mark changes between matching "a" and "b", but stays the same for
|
The mark changes between matching "a" and "b", but stays the same for
|
||||||
the rest of the match, so nothing more is output. If, as a result of
|
the rest of the match, so nothing more is output. If, as a result of
|
||||||
backtracking, the mark reverts to being unset, the text "<unset>" is
|
backtracking, the mark reverts to being unset, the text "<unset>" is
|
||||||
output.
|
output.
|
||||||
|
|
||||||
The callout function in pcre2test returns zero (carry on matching) by
|
The callout function in pcre2test returns zero (carry on matching) by
|
||||||
default, but you can use a callout_fail modifier in a subject line (as
|
default, but you can use a callout_fail modifier in a subject line (as
|
||||||
described above) to change this and other parameters of the callout.
|
described above) to change this and other parameters of the callout.
|
||||||
|
|
||||||
Inserting callouts can be helpful when using pcre2test to check compli-
|
Inserting callouts can be helpful when using pcre2test to check compli-
|
||||||
cated regular expressions. For further information about callouts, see
|
cated regular expressions. For further information about callouts, see
|
||||||
the pcre2callout documentation.
|
the pcre2callout documentation.
|
||||||
|
|
||||||
|
|
||||||
NON-PRINTING CHARACTERS
|
NON-PRINTING CHARACTERS
|
||||||
|
|
||||||
When pcre2test is outputting text in the compiled version of a pattern,
|
When pcre2test is outputting text in the compiled version of a pattern,
|
||||||
bytes other than 32-126 are always treated as non-printing characters
|
bytes other than 32-126 are always treated as non-printing characters
|
||||||
and are therefore shown as hex escapes.
|
and are therefore shown as hex escapes.
|
||||||
|
|
||||||
When pcre2test is outputting text that is a matched part of a subject
|
When pcre2test is outputting text that is a matched part of a subject
|
||||||
string, it behaves in the same way, unless a different locale has been
|
string, it behaves in the same way, unless a different locale has been
|
||||||
set for the pattern (using the /locale modifier). In this case, the
|
set for the pattern (using the /locale modifier). In this case, the
|
||||||
isprint() function is used to distinguish printing and non-printing
|
isprint() function is used to distinguish printing and non-printing
|
||||||
characters.
|
characters.
|
||||||
|
|
||||||
|
|
||||||
|
@ -1112,5 +1183,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 09 November 2014
|
Last updated: 14 November 2014
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2014 University of Cambridge.
|
||||||
|
|
|
@ -69,7 +69,7 @@ Arguments:
|
||||||
|
|
||||||
Returns: >= 0 number of substitutions made
|
Returns: >= 0 number of substitutions made
|
||||||
< 0 an error code
|
< 0 an error code
|
||||||
PCRE2_ERROR_BADREPLACEMENT means invalid use of $
|
PCRE2_ERROR_BADREPLACEMENT means invalid use of $
|
||||||
*/
|
*/
|
||||||
|
|
||||||
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION
|
||||||
|
@ -84,14 +84,14 @@ uint32_t ovector_count;
|
||||||
uint32_t goptions = 0;
|
uint32_t goptions = 0;
|
||||||
BOOL match_data_created = FALSE;
|
BOOL match_data_created = FALSE;
|
||||||
BOOL global = FALSE;
|
BOOL global = FALSE;
|
||||||
PCRE2_SIZE buff_offset, lengthleft, endlength;
|
PCRE2_SIZE buff_offset, lengthleft, fraglength;
|
||||||
PCRE2_SIZE *ovector;
|
PCRE2_SIZE *ovector;
|
||||||
|
|
||||||
/* Partial matching is not valid. */
|
/* Partial matching is not valid. */
|
||||||
|
|
||||||
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
|
if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0)
|
||||||
return PCRE2_ERROR_BADOPTION;
|
return PCRE2_ERROR_BADOPTION;
|
||||||
|
|
||||||
/* If no match data block is provided, create one. */
|
/* If no match data block is provided, create one. */
|
||||||
|
|
||||||
if (match_data == NULL)
|
if (match_data == NULL)
|
||||||
|
@ -120,7 +120,7 @@ if ((code->overall_options & PCRE2_UTF) != 0 &&
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
#endif /* SUPPORT_UNICODE */
|
#endif /* SUPPORT_UNICODE */
|
||||||
|
|
||||||
/* Notice the global option and remove it from the options that are passed to
|
/* Notice the global option and remove it from the options that are passed to
|
||||||
pcre2_match(). */
|
pcre2_match(). */
|
||||||
|
|
||||||
|
@ -151,17 +151,20 @@ do
|
||||||
|
|
||||||
rc = pcre2_match(code, subject, length, start_offset, options|goptions,
|
rc = pcre2_match(code, subject, length, start_offset, options|goptions,
|
||||||
match_data, mcontext);
|
match_data, mcontext);
|
||||||
|
|
||||||
/* Any error other than no match returns the error code. No match when not
|
/* Any error other than no match returns the error code. No match when not
|
||||||
doing the special after-empty-match global rematch, or when at the end of the
|
doing the special after-empty-match global rematch, or when at the end of the
|
||||||
subject, breaks the global loop. Otherwise, advance the starting point and
|
subject, breaks the global loop. Otherwise, advance the starting point by one
|
||||||
try again. */
|
character, copying it to the output, and try again. */
|
||||||
|
|
||||||
if (rc < 0)
|
if (rc < 0)
|
||||||
{
|
{
|
||||||
|
PCRE2_SIZE save_start;
|
||||||
|
|
||||||
if (rc != PCRE2_ERROR_NOMATCH) goto EXIT;
|
if (rc != PCRE2_ERROR_NOMATCH) goto EXIT;
|
||||||
if (goptions == 0 || start_offset >= length) break;
|
if (goptions == 0 || start_offset >= length) break;
|
||||||
start_offset++;
|
|
||||||
|
save_start = start_offset++;
|
||||||
if ((code->overall_options & PCRE2_UTF) != 0)
|
if ((code->overall_options & PCRE2_UTF) != 0)
|
||||||
{
|
{
|
||||||
#if PCRE2_CODE_UNIT_WIDTH == 8
|
#if PCRE2_CODE_UNIT_WIDTH == 8
|
||||||
|
@ -173,20 +176,28 @@ do
|
||||||
start_offset++;
|
start_offset++;
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fraglength = start_offset - save_start;
|
||||||
|
if (lengthleft < fraglength) goto NOROOM;
|
||||||
|
memcpy(buffer + buff_offset, subject + save_start,
|
||||||
|
fraglength*(PCRE2_CODE_UNIT_WIDTH/8));
|
||||||
|
buff_offset += fraglength;
|
||||||
|
lengthleft -= fraglength;
|
||||||
|
|
||||||
goptions = 0;
|
goptions = 0;
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Handle a successful match. */
|
/* Handle a successful match. */
|
||||||
|
|
||||||
subs++;
|
subs++;
|
||||||
if (rc == 0) rc = ovector_count;
|
if (rc == 0) rc = ovector_count;
|
||||||
endlength = ovector[0] - start_offset;
|
fraglength = ovector[0] - start_offset;
|
||||||
if (endlength >= lengthleft) goto NOROOM;
|
if (fraglength >= lengthleft) goto NOROOM;
|
||||||
memcpy(buffer + buff_offset, subject + start_offset,
|
memcpy(buffer + buff_offset, subject + start_offset,
|
||||||
endlength*(PCRE2_CODE_UNIT_WIDTH/8));
|
fraglength*(PCRE2_CODE_UNIT_WIDTH/8));
|
||||||
buff_offset += endlength;
|
buff_offset += fraglength;
|
||||||
lengthleft -= endlength;
|
lengthleft -= fraglength;
|
||||||
|
|
||||||
for (i = 0; i < rlength; i++)
|
for (i = 0; i < rlength; i++)
|
||||||
{
|
{
|
||||||
|
@ -196,11 +207,11 @@ do
|
||||||
BOOL inparens;
|
BOOL inparens;
|
||||||
PCRE2_SIZE sublength;
|
PCRE2_SIZE sublength;
|
||||||
PCRE2_UCHAR next;
|
PCRE2_UCHAR next;
|
||||||
PCRE2_UCHAR name[33];
|
PCRE2_UCHAR name[33];
|
||||||
|
|
||||||
if (++i == rlength) goto BAD;
|
if (++i == rlength) goto BAD;
|
||||||
if ((next = replacement[i]) == CHAR_DOLLAR_SIGN) goto LITERAL;
|
if ((next = replacement[i]) == CHAR_DOLLAR_SIGN) goto LITERAL;
|
||||||
|
|
||||||
group = -1;
|
group = -1;
|
||||||
n = 0;
|
n = 0;
|
||||||
inparens = FALSE;
|
inparens = FALSE;
|
||||||
|
@ -232,7 +243,7 @@ do
|
||||||
if (i == rlength) break;
|
if (i == rlength) break;
|
||||||
next = replacement[++i];
|
next = replacement[++i];
|
||||||
}
|
}
|
||||||
if (n == 0) goto BAD;
|
if (n == 0) goto BAD;
|
||||||
name[n] = 0;
|
name[n] = 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -241,7 +252,7 @@ do
|
||||||
if (i == rlength || next != CHAR_RIGHT_CURLY_BRACKET) goto BAD;
|
if (i == rlength || next != CHAR_RIGHT_CURLY_BRACKET) goto BAD;
|
||||||
}
|
}
|
||||||
else i--; /* Last code unit of name/number */
|
else i--; /* Last code unit of name/number */
|
||||||
|
|
||||||
/* Have found a syntactically correct group number or name. */
|
/* Have found a syntactically correct group number or name. */
|
||||||
|
|
||||||
sublength = lengthleft;
|
sublength = lengthleft;
|
||||||
|
@ -251,8 +262,8 @@ do
|
||||||
else
|
else
|
||||||
rc = pcre2_substring_copy_bynumber(match_data, group,
|
rc = pcre2_substring_copy_bynumber(match_data, group,
|
||||||
buffer + buff_offset, &sublength);
|
buffer + buff_offset, &sublength);
|
||||||
|
|
||||||
if (rc < 0) goto EXIT;
|
if (rc < 0) goto EXIT;
|
||||||
buff_offset += sublength;
|
buff_offset += sublength;
|
||||||
lengthleft -= sublength;
|
lengthleft -= sublength;
|
||||||
}
|
}
|
||||||
|
@ -279,17 +290,17 @@ do
|
||||||
/* Copy the rest of the subject and return the number of substitutions. */
|
/* Copy the rest of the subject and return the number of substitutions. */
|
||||||
|
|
||||||
rc = subs;
|
rc = subs;
|
||||||
endlength = length - start_offset;
|
fraglength = length - start_offset;
|
||||||
if (endlength + 1 > lengthleft) goto NOROOM;
|
if (fraglength + 1 > lengthleft) goto NOROOM;
|
||||||
memcpy(buffer + buff_offset, subject + start_offset,
|
memcpy(buffer + buff_offset, subject + start_offset,
|
||||||
endlength*(PCRE2_CODE_UNIT_WIDTH/8));
|
fraglength*(PCRE2_CODE_UNIT_WIDTH/8));
|
||||||
buff_offset += endlength;
|
buff_offset += fraglength;
|
||||||
buffer[buff_offset] = 0;
|
buffer[buff_offset] = 0;
|
||||||
*blength = buff_offset;
|
*blength = buff_offset;
|
||||||
|
|
||||||
EXIT:
|
EXIT:
|
||||||
if (match_data_created) pcre2_match_data_free(match_data);
|
if (match_data_created) pcre2_match_data_free(match_data);
|
||||||
else match_data->rc = rc;
|
else match_data->rc = rc;
|
||||||
return rc;
|
return rc;
|
||||||
|
|
||||||
NOROOM:
|
NOROOM:
|
||||||
|
|
|
@ -164,11 +164,12 @@ void vms_setsymbol( char *, char *, int );
|
||||||
#define DFA_WS_DIMENSION 1000 /* Size of DFA workspace */
|
#define DFA_WS_DIMENSION 1000 /* Size of DFA workspace */
|
||||||
#define DEFAULT_OVECCOUNT 15 /* Default ovector count */
|
#define DEFAULT_OVECCOUNT 15 /* Default ovector count */
|
||||||
#define JUNK_OFFSET 0xdeadbeef /* For initializing ovector */
|
#define JUNK_OFFSET 0xdeadbeef /* For initializing ovector */
|
||||||
|
#define LOCALESIZE 32 /* Size of locale name */
|
||||||
#define LOOPREPEAT 500000 /* Default loop count for timing */
|
#define LOOPREPEAT 500000 /* Default loop count for timing */
|
||||||
#define REPLACE_MODSIZE 96 /* Field for reading 8-bit replacement */
|
#define REPLACE_MODSIZE 96 /* Field for reading 8-bit replacement */
|
||||||
#define VERSION_SIZE 64 /* Size of buffer for the version strings */
|
#define VERSION_SIZE 64 /* Size of buffer for the version strings */
|
||||||
|
|
||||||
/* Make sure the buffer into which replacement strings are copied is big enough
|
/* Make sure the buffer into which replacement strings are copied is big enough
|
||||||
to hold them as 32-bit code units. */
|
to hold them as 32-bit code units. */
|
||||||
|
|
||||||
#define REPLACE_BUFFSIZE (4*REPLACE_MODSIZE)
|
#define REPLACE_BUFFSIZE (4*REPLACE_MODSIZE)
|
||||||
|
@ -263,9 +264,9 @@ these inclusions should not be changed. */
|
||||||
|
|
||||||
#define PCRE2_SUFFIX(a) a
|
#define PCRE2_SUFFIX(a) a
|
||||||
|
|
||||||
/* We need to be able to check input text for UTF-8 validity, whatever code
|
/* We need to be able to check input text for UTF-8 validity, whatever code
|
||||||
widths are actually available, because the input to pcre2test is always in
|
widths are actually available, because the input to pcre2test is always in
|
||||||
8-bit code units. So we include the UTF validity checking function for 8-bit
|
8-bit code units. So we include the UTF validity checking function for 8-bit
|
||||||
code units. */
|
code units. */
|
||||||
|
|
||||||
extern int valid_utf(PCRE2_SPTR8, PCRE2_SIZE, PCRE2_SIZE *);
|
extern int valid_utf(PCRE2_SPTR8, PCRE2_SIZE, PCRE2_SIZE *);
|
||||||
|
@ -388,10 +389,10 @@ data line. */
|
||||||
CTL_MARK|\
|
CTL_MARK|\
|
||||||
CTL_MEMORY|\
|
CTL_MEMORY|\
|
||||||
CTL_STARTCHAR)
|
CTL_STARTCHAR)
|
||||||
|
|
||||||
/* Structures for holding modifier information for patterns and subject strings
|
/* Structures for holding modifier information for patterns and subject strings
|
||||||
(data). Fields containing modifiers that can be set either for a pattern or a
|
(data). Fields containing modifiers that can be set either for a pattern or a
|
||||||
subject must be at the start and in the same order in both cases so that the
|
subject must be at the start and in the same order in both cases so that the
|
||||||
same offset in the big table below works for both. */
|
same offset in the big table below works for both. */
|
||||||
|
|
||||||
typedef struct patctl { /* Structure for pattern modifiers. */
|
typedef struct patctl { /* Structure for pattern modifiers. */
|
||||||
|
@ -401,7 +402,7 @@ typedef struct patctl { /* Structure for pattern modifiers. */
|
||||||
uint32_t jit;
|
uint32_t jit;
|
||||||
uint32_t stackguard_test;
|
uint32_t stackguard_test;
|
||||||
uint32_t tables_id;
|
uint32_t tables_id;
|
||||||
uint8_t locale[32];
|
uint8_t locale[LOCALESIZE];
|
||||||
} patctl;
|
} patctl;
|
||||||
|
|
||||||
#define MAXCPYGET 10
|
#define MAXCPYGET 10
|
||||||
|
@ -486,7 +487,7 @@ static modstruct modlist[] = {
|
||||||
{ "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) },
|
{ "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) },
|
||||||
{ "jitstack", MOD_DAT, MOD_INT, 0, DO(jitstack) },
|
{ "jitstack", MOD_DAT, MOD_INT, 0, DO(jitstack) },
|
||||||
{ "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) },
|
{ "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) },
|
||||||
{ "locale", MOD_PAT, MOD_STR, 0, PO(locale) },
|
{ "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) },
|
||||||
{ "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) },
|
{ "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) },
|
||||||
{ "match_limit", MOD_CTM, MOD_INT, 0, MO(match_limit) },
|
{ "match_limit", MOD_CTM, MOD_INT, 0, MO(match_limit) },
|
||||||
{ "match_unset_backref", MOD_PAT, MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) },
|
{ "match_unset_backref", MOD_PAT, MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) },
|
||||||
|
@ -512,7 +513,7 @@ static modstruct modlist[] = {
|
||||||
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
|
||||||
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
|
||||||
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
|
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
|
||||||
{ "replace", MOD_PND, MOD_STR, 0, PO(replacement) },
|
{ "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
|
||||||
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
|
||||||
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
|
{ "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
|
||||||
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
|
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
|
||||||
|
@ -3141,6 +3142,12 @@ for (;;)
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case MOD_STR:
|
case MOD_STR:
|
||||||
|
if (len + 1 > m->value)
|
||||||
|
{
|
||||||
|
fprintf(outfile, "** Overlong value for '%s' (max %d code units)\n",
|
||||||
|
m->name, m->value - 1);
|
||||||
|
return FALSE;
|
||||||
|
}
|
||||||
memcpy(field, pp, len);
|
memcpy(field, pp, len);
|
||||||
((uint8_t *)field)[len] = 0;
|
((uint8_t *)field)[len] = 0;
|
||||||
pp = ep;
|
pp = ep;
|
||||||
|
@ -3974,8 +3981,8 @@ if (TEST(compiled_code, ==, NULL))
|
||||||
if (pattern_info(PCRE2_INFO_MAXLOOKBEHIND, &maxlookbehind, FALSE) != 0)
|
if (pattern_info(PCRE2_INFO_MAXLOOKBEHIND, &maxlookbehind, FALSE) != 0)
|
||||||
return PR_ABEND;
|
return PR_ABEND;
|
||||||
|
|
||||||
/* Call the JIT compiler if requested. When timing, we must free and recompile
|
/* Call the JIT compiler if requested. When timing, we must free and recompile
|
||||||
the pattern each time because that is the only way to free the JIT compiled
|
the pattern each time because that is the only way to free the JIT compiled
|
||||||
code. We know that compilation will always succeed. */
|
code. We know that compilation will always succeed. */
|
||||||
|
|
||||||
if (pat_patctl.jit != 0)
|
if (pat_patctl.jit != 0)
|
||||||
|
@ -3992,7 +3999,7 @@ if (pat_patctl.jit != 0)
|
||||||
pat_patctl.options|forbid_utf, &errorcode, &erroroffset, pat_context);
|
pat_patctl.options|forbid_utf, &errorcode, &erroroffset, pat_context);
|
||||||
start_time = clock();
|
start_time = clock();
|
||||||
PCRE2_JIT_COMPILE(compiled_code, pat_patctl.jit);
|
PCRE2_JIT_COMPILE(compiled_code, pat_patctl.jit);
|
||||||
time_taken += clock() - start_time;
|
time_taken += clock() - start_time;
|
||||||
}
|
}
|
||||||
total_jit_compile_time += time_taken;
|
total_jit_compile_time += time_taken;
|
||||||
fprintf(outfile, "JIT compile %.4f milliseconds\n",
|
fprintf(outfile, "JIT compile %.4f milliseconds\n",
|
||||||
|
@ -4000,9 +4007,9 @@ if (pat_patctl.jit != 0)
|
||||||
(double)CLOCKS_PER_SEC);
|
(double)CLOCKS_PER_SEC);
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
PCRE2_JIT_COMPILE(compiled_code, pat_patctl.jit);
|
PCRE2_JIT_COMPILE(compiled_code, pat_patctl.jit);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Output code size and other information if requested. */
|
/* Output code size and other information if requested. */
|
||||||
|
@ -4765,8 +4772,8 @@ else
|
||||||
PCRE2_MATCH_DATA_FREE(match_data);
|
PCRE2_MATCH_DATA_FREE(match_data);
|
||||||
PCRE2_MATCH_DATA_CREATE(match_data, max_oveccount, NULL);
|
PCRE2_MATCH_DATA_CREATE(match_data, max_oveccount, NULL);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Replacement processing is ignored for DFA matching. */
|
/* Replacement processing is ignored for DFA matching. */
|
||||||
|
|
||||||
if (dat_datctl.replacement[0] != 0 && (dat_datctl.control & CTL_DFA) != 0)
|
if (dat_datctl.replacement[0] != 0 && (dat_datctl.control & CTL_DFA) != 0)
|
||||||
{
|
{
|
||||||
|
@ -4799,7 +4806,7 @@ if (dat_datctl.replacement[0] != 0)
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
if (timeitm)
|
if (timeitm)
|
||||||
fprintf(outfile, "** Timing is not supported with replace: ignored\n");
|
fprintf(outfile, "** Timing is not supported with replace: ignored\n");
|
||||||
|
|
||||||
goption = ((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
|
goption = ((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
|
||||||
PCRE2_SUBSTITUTE_GLOBAL;
|
PCRE2_SUBSTITUTE_GLOBAL;
|
||||||
|
@ -4828,21 +4835,21 @@ if (dat_datctl.replacement[0] != 0)
|
||||||
nsize = n;
|
nsize = n;
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Now copy the replacement string to a buffer of the appropriate width. No
|
/* Now copy the replacement string to a buffer of the appropriate width. No
|
||||||
escape processing is done for replacements. In UTF mode, check for an invalid
|
escape processing is done for replacements. In UTF mode, check for an invalid
|
||||||
UTF-8 input string, and if it is invalid, just copy its code units without
|
UTF-8 input string, and if it is invalid, just copy its code units without
|
||||||
UTF interpretation. This provides a means of checking that an invalid string
|
UTF interpretation. This provides a means of checking that an invalid string
|
||||||
is detected. Otherwise, UTF-8 can be used to include wide characters in a
|
is detected. Otherwise, UTF-8 can be used to include wide characters in a
|
||||||
replacement. */
|
replacement. */
|
||||||
|
|
||||||
if (utf) badutf = valid_utf(pr, strlen((const char *)pr), &erroroffset);
|
if (utf) badutf = valid_utf(pr, strlen((const char *)pr), &erroroffset);
|
||||||
|
|
||||||
/* Not UTF or invalid UTF-8: just copy the code units. */
|
/* Not UTF or invalid UTF-8: just copy the code units. */
|
||||||
|
|
||||||
if (!utf || badutf)
|
if (!utf || badutf)
|
||||||
{
|
{
|
||||||
while ((c = *pr++) != 0)
|
while ((c = *pr++) != 0)
|
||||||
{
|
{
|
||||||
#ifdef SUPPORT_PCRE2_8
|
#ifdef SUPPORT_PCRE2_8
|
||||||
if (test_mode == PCRE8_MODE) *r8++ = c;
|
if (test_mode == PCRE8_MODE) *r8++ = c;
|
||||||
#endif
|
#endif
|
||||||
|
@ -4854,9 +4861,9 @@ if (dat_datctl.replacement[0] != 0)
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Valid UTF-8 replacement string */
|
/* Valid UTF-8 replacement string */
|
||||||
|
|
||||||
else while ((c = *pr++) != 0)
|
else while ((c = *pr++) != 0)
|
||||||
{
|
{
|
||||||
if (HASUTF8EXTRALEN(c)) { GETUTF8INC(c, pr); }
|
if (HASUTF8EXTRALEN(c)) { GETUTF8INC(c, pr); }
|
||||||
|
@ -6314,7 +6321,7 @@ if (INTERACTIVE(infile)) fprintf(outfile, "\n");
|
||||||
|
|
||||||
if (showtotaltimes)
|
if (showtotaltimes)
|
||||||
{
|
{
|
||||||
const char *pad = "";
|
const char *pad = "";
|
||||||
fprintf(outfile, "--------------------------------------\n");
|
fprintf(outfile, "--------------------------------------\n");
|
||||||
if (timeit > 0)
|
if (timeit > 0)
|
||||||
{
|
{
|
||||||
|
@ -6325,7 +6332,7 @@ if (showtotaltimes)
|
||||||
fprintf(outfile, "Total JIT compile %.4f milliseconds\n",
|
fprintf(outfile, "Total JIT compile %.4f milliseconds\n",
|
||||||
(((double)total_jit_compile_time * 1000.0) / (double)timeit) /
|
(((double)total_jit_compile_time * 1000.0) / (double)timeit) /
|
||||||
(double)CLOCKS_PER_SEC);
|
(double)CLOCKS_PER_SEC);
|
||||||
pad = " ";
|
pad = " ";
|
||||||
}
|
}
|
||||||
fprintf(outfile, "Total match time %s%.4f milliseconds\n", pad,
|
fprintf(outfile, "Total match time %s%.4f milliseconds\n", pad,
|
||||||
(((double)total_match_time * 1000.0) / (double)timeitm) /
|
(((double)total_match_time * 1000.0) / (double)timeitm) /
|
||||||
|
|
|
@ -4073,6 +4073,9 @@ a random value. /Ix
|
||||||
123abc456abc789
|
123abc456abc789
|
||||||
123abc456abc789\=g
|
123abc456abc789\=g
|
||||||
|
|
||||||
|
/(?<=abc)(|def)/g,replace=<$0>
|
||||||
|
123abcxyzabcdef789abcpqr
|
||||||
|
|
||||||
# End of substitute tests
|
# End of substitute tests
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -1633,4 +1633,7 @@
|
||||||
/ábc/utf,replace=XሴZ
|
/ábc/utf,replace=XሴZ
|
||||||
123ábc123
|
123ábc123
|
||||||
|
|
||||||
|
/(?<=abc)(|def)/g,utf,replace=<$0>
|
||||||
|
123abcáyzabcdef789abcሴqr
|
||||||
|
|
||||||
# End of testinput5
|
# End of testinput5
|
||||||
|
|
|
@ -13699,6 +13699,10 @@ Failed: error -34: bad option value
|
||||||
123abc456abc789\=g
|
123abc456abc789\=g
|
||||||
2: 123xyz456xyz789
|
2: 123xyz456xyz789
|
||||||
|
|
||||||
|
/(?<=abc)(|def)/g,replace=<$0>
|
||||||
|
123abcxyzabcdef789abcpqr
|
||||||
|
4: 123abc<>xyzabc<><def>789abc<>pqr
|
||||||
|
|
||||||
# End of substitute tests
|
# End of substitute tests
|
||||||
|
|
||||||
# End of testinput2
|
# End of testinput2
|
||||||
|
|
|
@ -4002,4 +4002,8 @@ Subject length lower bound = 1
|
||||||
123ábc123
|
123ábc123
|
||||||
1: 123X\x{1234}Z123
|
1: 123X\x{1234}Z123
|
||||||
|
|
||||||
|
/(?<=abc)(|def)/g,utf,replace=<$0>
|
||||||
|
123abcáyzabcdef789abcሴqr
|
||||||
|
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
|
||||||
|
|
||||||
# End of testinput5
|
# End of testinput5
|
||||||
|
|
Loading…
Reference in New Issue