Source and document file tidies for 10.20-RC1.
This commit is contained in:
parent
a68ddd48b5
commit
07a8fdce25
|
@ -1,8 +1,8 @@
|
||||||
Change Log for PCRE2
|
Change Log for PCRE2
|
||||||
--------------------
|
--------------------
|
||||||
|
|
||||||
Version 10.20 xx-xx-2015
|
Version 10.20 16-June-2015
|
||||||
------------------------
|
--------------------------
|
||||||
|
|
||||||
1. Callouts with string arguments have been added.
|
1. Callouts with string arguments have been added.
|
||||||
|
|
||||||
|
|
30
HACKING
30
HACKING
|
@ -104,6 +104,21 @@ system stack used by the compile function, which uses recursive function calls
|
||||||
for nested parenthesized groups. This is a safety feature for environments with
|
for nested parenthesized groups. This is a safety feature for environments with
|
||||||
small stacks where the patterns are provided by users.
|
small stacks where the patterns are provided by users.
|
||||||
|
|
||||||
|
History repeated itself for release 10.20. A number of bugs relating to named
|
||||||
|
subpatterns had been discovered by fuzzers. Most of these were related to the
|
||||||
|
handling of forward references when it was not known if the named pattern was
|
||||||
|
unique. (References to non-unique names use a different opcode and more
|
||||||
|
memory.) The use of duplicate group numbers (the (?| facility) also caused
|
||||||
|
issues.
|
||||||
|
|
||||||
|
To get around these problems I adopted a new approach by adding a third pass,
|
||||||
|
really a "pre-pass", over the pattern, which does nothing other than identify
|
||||||
|
all the named subpatterns and their corresponding group numbers. This means
|
||||||
|
that the actual compile (both pre-pass and real compile) have full knowledge of
|
||||||
|
group names and numbers throughout. Several dozen lines of messy code were
|
||||||
|
eliminated, though the new pre-pass is not short (skipping over [] classes is
|
||||||
|
complicated).
|
||||||
|
|
||||||
|
|
||||||
Traditional matching function
|
Traditional matching function
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
@ -343,8 +358,9 @@ do.
|
||||||
|
|
||||||
For classes containing characters with values greater than 255 or that contain
|
For classes containing characters with values greater than 255 or that contain
|
||||||
\p or \P, OP_XCLASS is used. It optionally uses a bit map if any acceptable
|
\p or \P, OP_XCLASS is used. It optionally uses a bit map if any acceptable
|
||||||
code points are less than 256, followed by a list of pairs (for a range) and
|
code points are less than 256, followed by a list of pairs (for a range) and/or
|
||||||
single characters. In caseless mode, both cases are explicitly listed.
|
single characters and/or properties. In caseless mode, both cases are
|
||||||
|
explicitly listed.
|
||||||
|
|
||||||
OP_XCLASS is followed by a LINK_SIZE value containing the total length of the
|
OP_XCLASS is followed by a LINK_SIZE value containing the total length of the
|
||||||
opcode and its data. This is followed by a code unit containing flag bits:
|
opcode and its data. This is followed by a code unit containing flag bits:
|
||||||
|
@ -431,7 +447,7 @@ bracket opcode.
|
||||||
If a subpattern is quantified such that it is permitted to match zero times, it
|
If a subpattern is quantified such that it is permitted to match zero times, it
|
||||||
is preceded by one of OP_BRAZERO, OP_BRAMINZERO, or OP_SKIPZERO. These are
|
is preceded by one of OP_BRAZERO, OP_BRAMINZERO, or OP_SKIPZERO. These are
|
||||||
single-unit opcodes that tell the matcher that skipping the following
|
single-unit opcodes that tell the matcher that skipping the following
|
||||||
subpattern entirely is a valid branch. In the case of the first two, not
|
subpattern entirely is a valid match. In the case of the first two, not
|
||||||
skipping the pattern is also valid (greedy and non-greedy). The third is used
|
skipping the pattern is also valid (greedy and non-greedy). The third is used
|
||||||
when a pattern has the quantifier {0,0}. It cannot be entirely discarded,
|
when a pattern has the quantifier {0,0}. It cannot be entirely discarded,
|
||||||
because it may be called as a subroutine from elsewhere in the pattern.
|
because it may be called as a subroutine from elsewhere in the pattern.
|
||||||
|
@ -487,9 +503,9 @@ Forward assertions are also just like other subpatterns, but starting with one
|
||||||
of the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
|
of the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
|
||||||
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
|
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
|
||||||
is OP_REVERSE, followed by a count of the number of characters to move back the
|
is OP_REVERSE, followed by a count of the number of characters to move back the
|
||||||
pointer in the subject string. In ASCII or UTF-32 mode, the count is a number
|
pointer in the subject string. In ASCII or UTF-32 mode, the count is also the
|
||||||
of code units, but in UTF-8/16 mode each character may occupy more than one
|
number of code units, but in UTF-8/16 mode each character may occupy more than
|
||||||
code unit. A separate count is present in each alternative of a lookbehind
|
one code unit. A separate count is present in each alternative of a lookbehind
|
||||||
assertion, allowing them to have different (but fixed) lengths.
|
assertion, allowing them to have different (but fixed) lengths.
|
||||||
|
|
||||||
|
|
||||||
|
@ -585,4 +601,4 @@ not a real opcode, but is used to check that tables indexed by opcode are the
|
||||||
correct length, in order to catch updating errors.
|
correct length, in order to catch updating errors.
|
||||||
|
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
March 2015
|
June 2015
|
||||||
|
|
20
NEWS
20
NEWS
|
@ -1,6 +1,26 @@
|
||||||
News about PCRE2 releases
|
News about PCRE2 releases
|
||||||
-------------------------
|
-------------------------
|
||||||
|
|
||||||
|
Version 10.20 16-June-2015
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
1. Callouts with string arguments and the pcre2_callout_enumerate() function
|
||||||
|
have been implemented.
|
||||||
|
|
||||||
|
2. The PCRE2_NEVER_BACKSLASH_C option, which locks out the use of \C, is added.
|
||||||
|
|
||||||
|
3. The PCRE2_ALT_CIRCUMFLEX option lets ^ match after a newline at the end of a
|
||||||
|
subject in multiline mode.
|
||||||
|
|
||||||
|
4. The way named subpatterns are handled has been refactored. The previous
|
||||||
|
approach had several bugs.
|
||||||
|
|
||||||
|
5. The handling of \c in EBCDIC environments has been changed to conform to the
|
||||||
|
perlebcdic document. This is an incompatible change.
|
||||||
|
|
||||||
|
6. Bugs have been mended, many of them discovered by fuzzers.
|
||||||
|
|
||||||
|
|
||||||
Version 10.10 06-March-2015
|
Version 10.10 06-March-2015
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
|
|
|
@ -11,15 +11,15 @@ dnl be defined as -RC2, for example. For real releases, it should be empty.
|
||||||
m4_define(pcre2_major, [10])
|
m4_define(pcre2_major, [10])
|
||||||
m4_define(pcre2_minor, [20])
|
m4_define(pcre2_minor, [20])
|
||||||
m4_define(pcre2_prerelease, [-RC1])
|
m4_define(pcre2_prerelease, [-RC1])
|
||||||
m4_define(pcre2_date, [2015-03-11])
|
m4_define(pcre2_date, [2015-06-16])
|
||||||
|
|
||||||
# NOTE: The CMakeLists.txt file searches for the above variables in the first
|
# NOTE: The CMakeLists.txt file searches for the above variables in the first
|
||||||
# 50 lines of this file. Please update that if the variables above are moved.
|
# 50 lines of this file. Please update that if the variables above are moved.
|
||||||
|
|
||||||
# Libtool shared library interface versions (current:revision:age)
|
# Libtool shared library interface versions (current:revision:age)
|
||||||
m4_define(libpcre2_8_version, [1:0:1])
|
m4_define(libpcre2_8_version, [2:0:0])
|
||||||
m4_define(libpcre2_16_version, [1:0:1])
|
m4_define(libpcre2_16_version, [2:0:0])
|
||||||
m4_define(libpcre2_32_version, [1:0:1])
|
m4_define(libpcre2_32_version, [2:0:0])
|
||||||
m4_define(libpcre2_posix_version, [0:0:0])
|
m4_define(libpcre2_posix_version, [0:0:0])
|
||||||
|
|
||||||
AC_PREREQ(2.57)
|
AC_PREREQ(2.57)
|
||||||
|
|
|
@ -294,6 +294,9 @@ library. They are also documented in the pcre2build man page.
|
||||||
which specifies that the code value for the EBCDIC NL character is 0x25
|
which specifies that the code value for the EBCDIC NL character is 0x25
|
||||||
instead of the default 0x15.
|
instead of the default 0x15.
|
||||||
|
|
||||||
|
. If you specify --enable-debug, additional debugging code is included in the
|
||||||
|
build. This option is intended for use by the PCRE2 maintainers.
|
||||||
|
|
||||||
. In environments where valgrind is installed, if you specify
|
. In environments where valgrind is installed, if you specify
|
||||||
|
|
||||||
--enable-valgrind
|
--enable-valgrind
|
||||||
|
@ -829,4 +832,4 @@ The distribution should contain the files listed below.
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
Email local part: ph10
|
Email local part: ph10
|
||||||
Email domain: cam.ac.uk
|
Email domain: cam.ac.uk
|
||||||
Last updated: 26 January 2015
|
Last updated: 24 April 2015
|
||||||
|
|
|
@ -108,8 +108,14 @@ lose performance.
|
||||||
<P>
|
<P>
|
||||||
One way of guarding against this possibility is to use the
|
One way of guarding against this possibility is to use the
|
||||||
<b>pcre2_pattern_info()</b> function to check the compiled pattern's options for
|
<b>pcre2_pattern_info()</b> function to check the compiled pattern's options for
|
||||||
UTF. Alternatively, you can set the PCRE2_NEVER_UTF option at compile time.
|
PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when calling
|
||||||
This causes an compile time error if a pattern contains a UTF-setting sequence.
|
<b>pcre2_compile()</b>. This causes an compile time error if a pattern contains
|
||||||
|
a UTF-setting sequence.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The use of Unicode properties for character types such as \d can also be
|
||||||
|
enabled from within the pattern, by specifying "(*UCP)". This feature can be
|
||||||
|
disallowed by setting the PCRE2_NEVER_UCP option.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
If your application is one that supports UTF, be aware that validity checking
|
If your application is one that supports UTF, be aware that validity checking
|
||||||
|
@ -118,6 +124,12 @@ the PCRE2_NO_UTF_CHECK option for the second and subsequent matches to avoid
|
||||||
running redundant checks.
|
running redundant checks.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead to
|
||||||
|
problems, because it may leave the current matching point in the middle of a
|
||||||
|
multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C option can be used to
|
||||||
|
lock out the use of \C, causing a compile-time error if it is encountered.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
Another way that performance can be hit is by running a pattern that has a very
|
Another way that performance can be hit is by running a pattern that has a very
|
||||||
large search tree against a string that will never match. Nested unlimited
|
large search tree against a string that will never match. Nested unlimited
|
||||||
repeats in a pattern are a common example. PCRE2 provides some protection
|
repeats in a pattern are a common example. PCRE2 provides some protection
|
||||||
|
@ -175,9 +187,9 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC5" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 18 November 2014
|
Last updated: 13 April 2015
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2014 University of Cambridge.
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
|
@ -49,6 +49,7 @@ or provide an external function for stack size checking. The option bits are:
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_ANCHORED Force pattern anchoring
|
PCRE2_ANCHORED Force pattern anchoring
|
||||||
PCRE2_ALT_BSUX Alternative handling of \u, \U, and \x
|
PCRE2_ALT_BSUX Alternative handling of \u, \U, and \x
|
||||||
|
PCRE2_ALT_CIRCUMFLEX Alternative handling of ^ in multiline mode
|
||||||
PCRE2_AUTO_CALLOUT Compile automatic callouts
|
PCRE2_AUTO_CALLOUT Compile automatic callouts
|
||||||
PCRE2_CASELESS Do caseless matching
|
PCRE2_CASELESS Do caseless matching
|
||||||
PCRE2_DOLLAR_ENDONLY $ not to match newline at end
|
PCRE2_DOLLAR_ENDONLY $ not to match newline at end
|
||||||
|
@ -58,6 +59,7 @@ or provide an external function for stack size checking. The option bits are:
|
||||||
PCRE2_FIRSTLINE Force matching to be before newline
|
PCRE2_FIRSTLINE Force matching to be before newline
|
||||||
PCRE2_MATCH_UNSET_BACKREF Match unset back references
|
PCRE2_MATCH_UNSET_BACKREF Match unset back references
|
||||||
PCRE2_MULTILINE ^ and $ match newlines within data
|
PCRE2_MULTILINE ^ and $ match newlines within data
|
||||||
|
PCRE2_NEVER_BACKSLASH_C Lock out the use of \C in patterns
|
||||||
PCRE2_NEVER_UCP Lock out PCRE2_UCP, e.g. via (*UCP)
|
PCRE2_NEVER_UCP Lock out PCRE2_UCP, e.g. via (*UCP)
|
||||||
PCRE2_NEVER_UTF Lock out PCRE2_UTF, e.g. via (*UTF)
|
PCRE2_NEVER_UTF Lock out PCRE2_UTF, e.g. via (*UTF)
|
||||||
PCRE2_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
PCRE2_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||||
|
|
|
@ -1074,6 +1074,15 @@ hexadecimal digits, in which case the hexadecimal number defines the code point
|
||||||
to match. By default, as in Perl, a hexadecimal number is always expected after
|
to match. By default, as in Perl, a hexadecimal number is always expected after
|
||||||
\x, but it may have zero, one, or two digits (so, for example, \xz matches a
|
\x, but it may have zero, one, or two digits (so, for example, \xz matches a
|
||||||
binary zero character followed by z).
|
binary zero character followed by z).
|
||||||
|
<pre>
|
||||||
|
PCRE2_ALT_CIRCUMFLEX
|
||||||
|
</pre>
|
||||||
|
In multiline mode (when PCRE2_MULTILINE is set), the circumflex metacharacter
|
||||||
|
matches at the start of the subject (unless PCRE2_NOTBOL is set), and also
|
||||||
|
after any internal newline. However, it does not match after a newline at the
|
||||||
|
end of the subject, for compatibility with Perl. If you want a multiline
|
||||||
|
circumflex also to match after a terminating newline, you must set
|
||||||
|
PCRE2_ALT_CIRCUMFLEX.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_AUTO_CALLOUT
|
PCRE2_AUTO_CALLOUT
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1174,8 +1183,19 @@ When PCRE2_MULTILINE it is set, the "start of line" and "end of line"
|
||||||
constructs match immediately following or immediately before internal newlines
|
constructs match immediately following or immediately before internal newlines
|
||||||
in the subject string, respectively, as well as at the very start and end. This
|
in the subject string, respectively, as well as at the very start and end. This
|
||||||
is equivalent to Perl's /m option, and it can be changed within a pattern by a
|
is equivalent to Perl's /m option, and it can be changed within a pattern by a
|
||||||
(?m) option setting. If there are no newlines in a subject string, or no
|
(?m) option setting. Note that the "start of line" metacharacter does not match
|
||||||
occurrences of ^ or $ in a pattern, setting PCRE2_MULTILINE has no effect.
|
after a newline at the end of the subject, for compatibility with Perl.
|
||||||
|
However, you can change this by setting the PCRE2_ALT_CIRCUMFLEX option. If
|
||||||
|
there are no newlines in a subject string, or no occurrences of ^ or $ in a
|
||||||
|
pattern, setting PCRE2_MULTILINE has no effect.
|
||||||
|
<pre>
|
||||||
|
PCRE2_NEVER_BACKSLASH_C
|
||||||
|
</pre>
|
||||||
|
This option locks out the use of \C in the pattern that is being compiled.
|
||||||
|
This escape can cause unpredictable behaviour in UTF-8 or UTF-16 modes, because
|
||||||
|
it may leave the current matching point in the middle of a multi-code-unit
|
||||||
|
character. This option may be useful in applications that process patterns from
|
||||||
|
external sources.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_NEVER_UCP
|
PCRE2_NEVER_UCP
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -1183,17 +1203,17 @@ This option locks out the use of Unicode properties for handling \B, \b, \D,
|
||||||
\d, \S, \s, \W, \w, and some of the POSIX character classes, as described
|
\d, \S, \s, \W, \w, and some of the POSIX character classes, as described
|
||||||
for the PCRE2_UCP option below. In particular, it prevents the creator of the
|
for the PCRE2_UCP option below. In particular, it prevents the creator of the
|
||||||
pattern from enabling this facility by starting the pattern with (*UCP). This
|
pattern from enabling this facility by starting the pattern with (*UCP). This
|
||||||
may be useful in applications that process patterns from external sources. The
|
option may be useful in applications that process patterns from external
|
||||||
option combination PCRE_UCP and PCRE_NEVER_UCP causes an error.
|
sources. The option combination PCRE_UCP and PCRE_NEVER_UCP causes an error.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_NEVER_UTF
|
PCRE2_NEVER_UTF
|
||||||
</pre>
|
</pre>
|
||||||
This option locks out interpretation of the pattern as UTF-8, UTF-16, or
|
This option locks out interpretation of the pattern as UTF-8, UTF-16, or
|
||||||
UTF-32, depending on which library is in use. In particular, it prevents the
|
UTF-32, depending on which library is in use. In particular, it prevents the
|
||||||
creator of the pattern from switching to UTF interpretation by starting the
|
creator of the pattern from switching to UTF interpretation by starting the
|
||||||
pattern with (*UTF). This may be useful in applications that process patterns
|
pattern with (*UTF). This option may be useful in applications that process
|
||||||
from external sources. The combination of PCRE2_UTF and PCRE2_NEVER_UTF causes
|
patterns from external sources. The combination of PCRE2_UTF and
|
||||||
an error.
|
PCRE2_NEVER_UTF causes an error.
|
||||||
<pre>
|
<pre>
|
||||||
PCRE2_NO_AUTO_CAPTURE
|
PCRE2_NO_AUTO_CAPTURE
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -2863,7 +2883,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC40" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC40" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 23 March 2015
|
Last updated: 22 April 2015
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2015 University of Cambridge.
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -29,11 +29,12 @@ please consult the man page, in case the conversion went wrong.
|
||||||
<li><a name="TOC14" href="#SEC14">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
|
<li><a name="TOC14" href="#SEC14">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
|
||||||
<li><a name="TOC15" href="#SEC15">PCRE2GREP BUFFER SIZE</a>
|
<li><a name="TOC15" href="#SEC15">PCRE2GREP BUFFER SIZE</a>
|
||||||
<li><a name="TOC16" href="#SEC16">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a>
|
<li><a name="TOC16" href="#SEC16">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a>
|
||||||
<li><a name="TOC17" href="#SEC17">DEBUGGING WITH VALGRIND SUPPORT</a>
|
<li><a name="TOC17" href="#SEC17">INCLUDING DEBUGGING CODE</a>
|
||||||
<li><a name="TOC18" href="#SEC18">CODE COVERAGE REPORTING</a>
|
<li><a name="TOC18" href="#SEC18">DEBUGGING WITH VALGRIND SUPPORT</a>
|
||||||
<li><a name="TOC19" href="#SEC19">SEE ALSO</a>
|
<li><a name="TOC19" href="#SEC19">CODE COVERAGE REPORTING</a>
|
||||||
<li><a name="TOC20" href="#SEC20">AUTHOR</a>
|
<li><a name="TOC20" href="#SEC20">SEE ALSO</a>
|
||||||
<li><a name="TOC21" href="#SEC21">REVISION</a>
|
<li><a name="TOC21" href="#SEC21">AUTHOR</a>
|
||||||
|
<li><a name="TOC22" href="#SEC22">REVISION</a>
|
||||||
</ul>
|
</ul>
|
||||||
<br><a name="SEC1" href="#TOC1">BUILDING PCRE2</a><br>
|
<br><a name="SEC1" href="#TOC1">BUILDING PCRE2</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -147,6 +148,12 @@ properties. The application can request that they do by setting the PCRE2_UCP
|
||||||
option. Unless the application has set PCRE2_NEVER_UCP, a pattern may also
|
option. Unless the application has set PCRE2_NEVER_UCP, a pattern may also
|
||||||
request this by starting with (*UCP).
|
request this by starting with (*UCP).
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
The \C escape sequence, which matches a single code unit, even in a UTF mode,
|
||||||
|
can cause unpredictable behaviour because it may leave the current matching
|
||||||
|
point in the middle of a multi-code-unit character. It can be locked out by
|
||||||
|
setting the PCRE2_NEVER_BACKSLASH_C option.
|
||||||
|
</P>
|
||||||
<br><a name="SEC6" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
|
<br><a name="SEC6" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
|
||||||
<P>
|
<P>
|
||||||
Just-in-time compiler support is included in the build by specifying
|
Just-in-time compiler support is included in the build by specifying
|
||||||
|
@ -397,7 +404,16 @@ automatically included, you may need to add something like
|
||||||
</pre>
|
</pre>
|
||||||
immediately before the <b>configure</b> command.
|
immediately before the <b>configure</b> command.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC17" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
|
<br><a name="SEC17" href="#TOC1">INCLUDING DEBUGGING CODE</a><br>
|
||||||
|
<P>
|
||||||
|
If you add
|
||||||
|
<pre>
|
||||||
|
--enable-debug
|
||||||
|
</pre>
|
||||||
|
to the <b>configure</b> command, additional debugging code is included in the
|
||||||
|
build. This feature is intended for use by the PCRE2 maintainers.
|
||||||
|
</P>
|
||||||
|
<br><a name="SEC18" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
|
||||||
<P>
|
<P>
|
||||||
If you add
|
If you add
|
||||||
<pre>
|
<pre>
|
||||||
|
@ -407,7 +423,7 @@ to the <b>configure</b> command, PCRE2 will use valgrind annotations to mark
|
||||||
certain memory regions as unaddressable. This allows it to detect invalid
|
certain memory regions as unaddressable. This allows it to detect invalid
|
||||||
memory accesses, and is mostly useful for debugging PCRE2 itself.
|
memory accesses, and is mostly useful for debugging PCRE2 itself.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC18" href="#TOC1">CODE COVERAGE REPORTING</a><br>
|
<br><a name="SEC19" href="#TOC1">CODE COVERAGE REPORTING</a><br>
|
||||||
<P>
|
<P>
|
||||||
If your C compiler is gcc, you can build a version of PCRE2 that can generate a
|
If your C compiler is gcc, you can build a version of PCRE2 that can generate a
|
||||||
code coverage report for its test suite. To enable this, you must install
|
code coverage report for its test suite. To enable this, you must install
|
||||||
|
@ -464,11 +480,11 @@ This cleans all coverage data including the generated coverage report. For more
|
||||||
information about code coverage, see the <b>gcov</b> and <b>lcov</b>
|
information about code coverage, see the <b>gcov</b> and <b>lcov</b>
|
||||||
documentation.
|
documentation.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
|
<br><a name="SEC20" href="#TOC1">SEE ALSO</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2api</b>(3), <b>pcre2-config</b>(3).
|
<b>pcre2api</b>(3), <b>pcre2-config</b>(3).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC20" href="#TOC1">AUTHOR</a><br>
|
<br><a name="SEC21" href="#TOC1">AUTHOR</a><br>
|
||||||
<P>
|
<P>
|
||||||
Philip Hazel
|
Philip Hazel
|
||||||
<br>
|
<br>
|
||||||
|
@ -477,9 +493,9 @@ University Computing Service
|
||||||
Cambridge, England.
|
Cambridge, England.
|
||||||
<br>
|
<br>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC22" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 26 January 2015
|
Last updated: 24 April 2015
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2015 University of Cambridge.
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -357,10 +357,11 @@ A second use of backslash provides a way of encoding non-printing characters
|
||||||
in patterns in a visible manner. There is no restriction on the appearance of
|
in patterns in a visible manner. There is no restriction on the appearance of
|
||||||
non-printing characters in a pattern, but when a pattern is being prepared by
|
non-printing characters in a pattern, but when a pattern is being prepared by
|
||||||
text editing, it is often easier to use one of the following escape sequences
|
text editing, it is often easier to use one of the following escape sequences
|
||||||
than the binary character it represents:
|
than the binary character it represents. In an ASCII or Unicode environment,
|
||||||
|
these escapes are as follows:
|
||||||
<pre>
|
<pre>
|
||||||
\a alarm, that is, the BEL character (hex 07)
|
\a alarm, that is, the BEL character (hex 07)
|
||||||
\cx "control-x", where x is any ASCII character
|
\cx "control-x", where x is any printable ASCII character
|
||||||
\e escape (hex 1B)
|
\e escape (hex 1B)
|
||||||
\f form feed (hex 0C)
|
\f form feed (hex 0C)
|
||||||
\n linefeed (hex 0A)
|
\n linefeed (hex 0A)
|
||||||
|
@ -377,23 +378,38 @@ The precise effect of \cx on ASCII characters is as follows: if x is a lower
|
||||||
case letter, it is converted to upper case. Then bit 6 of the character (hex
|
case letter, it is converted to upper case. Then bit 6 of the character (hex
|
||||||
40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
|
40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
|
||||||
but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the
|
but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the
|
||||||
code unit following \c has a value greater than 127, a compile-time error
|
code unit following \c has a value less than 32 or greater than 126, a
|
||||||
occurs. This locks out non-ASCII characters in all modes.
|
compile-time error occurs. This locks out non-printable ASCII characters in all
|
||||||
|
modes.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The \c facility was designed for use with ASCII characters, but with the
|
When PCRE2 is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
|
||||||
extension to Unicode it is even less useful than it once was. It is, however,
|
generate the appropriate EBCDIC code values. The \c escape is processed
|
||||||
recognized when PCRE2 is compiled in EBCDIC mode, where data items are always
|
as specified for Perl in the <b>perlebcdic</b> document. The only characters
|
||||||
bytes. In this mode, all values are valid after \c. If the next character is a
|
that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
|
||||||
lower case letter, it is converted to upper case. Then the 0xc0 bits of the
|
other character provokes a compile-time error. The sequence \@ encodes
|
||||||
byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
|
character code 0; the letters (in either case) encode characters 1-26 (hex 01
|
||||||
the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
|
to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
|
||||||
characters also generate different values.
|
\? becomes either 255 (hex FF) or 95 (hex 5F).
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Thus, apart from \?, these escapes generate the same character code values as
|
||||||
|
they do in an ASCII environment, though the meanings of the values mostly
|
||||||
|
differ. For example, \G always generates code value 7, which is BEL in ASCII
|
||||||
|
but DEL in EBCDIC.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
|
||||||
|
because 127 is not a control character in EBCDIC, Perl makes it generate the
|
||||||
|
APC character. Unfortunately, there are several variants of EBCDIC. In most of
|
||||||
|
them the APC character has the value 255 (hex FF), but in the one Perl calls
|
||||||
|
POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
|
||||||
|
values, PCRE2 makes \? generate 95; otherwise it generates 255.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
After \0 up to two further octal digits are read. If there are fewer than two
|
After \0 up to two further octal digits are read. If there are fewer than two
|
||||||
digits, just those that are present are used. Thus the sequence \0\x\07
|
digits, just those that are present are used. Thus the sequence \0\x\015
|
||||||
specifies two binary zeros followed by a BEL character (code value 7). Make
|
specifies two binary zeros followed by a CR character (code value 13). Make
|
||||||
sure you supply two digits after the initial zero if the pattern character that
|
sure you supply two digits after the initial zero if the pattern character that
|
||||||
follows is itself an octal digit.
|
follows is itself an octal digit.
|
||||||
</P>
|
</P>
|
||||||
|
@ -412,21 +428,24 @@ describe the old, ambiguous syntax.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The handling of a backslash followed by a digit other than 0 is complicated,
|
The handling of a backslash followed by a digit other than 0 is complicated,
|
||||||
and Perl has changed in recent releases, causing PCRE2 also to change. Outside
|
and Perl has changed over time, causing PCRE2 also to change.
|
||||||
a character class, PCRE2 reads the digit and any following digits as a decimal
|
</P>
|
||||||
number. If the number is less than 8, or if there have been at least that many
|
<P>
|
||||||
previous capturing left parentheses in the expression, the entire sequence is
|
Outside a character class, PCRE2 reads the digit and any following digits as a
|
||||||
taken as a <i>back reference</i>. A description of how this works is given
|
decimal number. If the number is less than 10, begins with the digit 8 or 9, or
|
||||||
|
if there are at least that many previous capturing left parentheses in the
|
||||||
|
expression, the entire sequence is taken as a <i>back reference</i>. A
|
||||||
|
description of how this works is given
|
||||||
<a href="#backreferences">later,</a>
|
<a href="#backreferences">later,</a>
|
||||||
following the discussion of
|
following the discussion of
|
||||||
<a href="#subpattern">parenthesized subpatterns.</a>
|
<a href="#subpattern">parenthesized subpatterns.</a>
|
||||||
|
Otherwise, up to three octal digits are read to form a character code.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Inside a character class, or if the decimal number following \ is greater than
|
Inside a character class, PCRE2 handles \8 and \9 as the literal characters
|
||||||
7 and there have not been that many capturing subpatterns, PCRE2 handles \8
|
"8" and "9", and otherwise reads up to three octal digits following the
|
||||||
and \9 as the literal characters "8" and "9", and otherwise re-reads up to
|
backslash, using them to generate a data character. Any subsequent digits stand
|
||||||
three octal digits following the backslash, using them to generate a data
|
for themselves. For example, outside a character class:
|
||||||
character. Any subsequent digits stand for themselves. For example:
|
|
||||||
<pre>
|
<pre>
|
||||||
\040 is another way of writing an ASCII space
|
\040 is another way of writing an ASCII space
|
||||||
\40 is the same, provided there are fewer than 40 previous capturing subpatterns
|
\40 is the same, provided there are fewer than 40 previous capturing subpatterns
|
||||||
|
@ -436,7 +455,7 @@ character. Any subsequent digits stand for themselves. For example:
|
||||||
\0113 is a tab followed by the character "3"
|
\0113 is a tab followed by the character "3"
|
||||||
\113 might be a back reference, otherwise the character with octal code 113
|
\113 might be a back reference, otherwise the character with octal code 113
|
||||||
\377 might be a back reference, otherwise the value 255 (decimal)
|
\377 might be a back reference, otherwise the value 255 (decimal)
|
||||||
\81 is either a back reference, or the two characters "8" and "1"
|
\81 is always a back reference .sp
|
||||||
</pre>
|
</pre>
|
||||||
Note that octal values of 100 or greater that are specified using this syntax
|
Note that octal values of 100 or greater that are specified using this syntax
|
||||||
must not be introduced by a leading zero, because no more than three octal
|
must not be introduced by a leading zero, because no more than three octal
|
||||||
|
@ -1105,15 +1124,19 @@ regular expression.
|
||||||
<P>
|
<P>
|
||||||
The circumflex and dollar metacharacters are zero-width assertions. That is,
|
The circumflex and dollar metacharacters are zero-width assertions. That is,
|
||||||
they test for a particular condition being true without consuming any
|
they test for a particular condition being true without consuming any
|
||||||
characters from the subject string.
|
characters from the subject string. These two metacharacters are concerned with
|
||||||
|
matching the starts and ends of lines. If the newline convention is set so that
|
||||||
|
only the two-character sequence CRLF is recognized as a newline, isolated CR
|
||||||
|
and LF characters are treated as ordinary data characters, and are not
|
||||||
|
recognized as newlines.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Outside a character class, in the default matching mode, the circumflex
|
Outside a character class, in the default matching mode, the circumflex
|
||||||
character is an assertion that is true only if the current matching point is at
|
character is an assertion that is true only if the current matching point is at
|
||||||
the start of the subject string. If the <i>startoffset</i> argument of
|
the start of the subject string. If the <i>startoffset</i> argument of
|
||||||
<b>pcre2_match()</b> is non-zero, circumflex can never match if the
|
<b>pcre2_match()</b> is non-zero, or if PCRE2_NOTBOL is set, circumflex can
|
||||||
PCRE2_MULTILINE option is unset. Inside a character class, circumflex has an
|
never match if the PCRE2_MULTILINE option is unset. Inside a character class,
|
||||||
entirely different meaning
|
circumflex has an entirely different meaning
|
||||||
<a href="#characterclass">(see below).</a>
|
<a href="#characterclass">(see below).</a>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -1128,10 +1151,11 @@ to be anchored.)
|
||||||
<P>
|
<P>
|
||||||
The dollar character is an assertion that is true only if the current matching
|
The dollar character is an assertion that is true only if the current matching
|
||||||
point is at the end of the subject string, or immediately before a newline at
|
point is at the end of the subject string, or immediately before a newline at
|
||||||
the end of the string (by default). Note, however, that it does not actually
|
the end of the string (by default), unless PCRE2_NOTEOL is set. Note, however,
|
||||||
match the newline. Dollar need not be the last character of the pattern if a
|
that it does not actually match the newline. Dollar need not be the last
|
||||||
number of alternatives are involved, but it should be the last item in any
|
character of the pattern if a number of alternatives are involved, but it
|
||||||
branch in which it appears. Dollar has no special meaning in a character class.
|
should be the last item in any branch in which it appears. Dollar has no
|
||||||
|
special meaning in a character class.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The meaning of dollar can be changed so that it matches only at the very end of
|
The meaning of dollar can be changed so that it matches only at the very end of
|
||||||
|
@ -1139,13 +1163,13 @@ the string, by setting the PCRE2_DOLLAR_ENDONLY option at compile time. This
|
||||||
does not affect the \Z assertion.
|
does not affect the \Z assertion.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The meanings of the circumflex and dollar characters are changed if the
|
The meanings of the circumflex and dollar metacharacters are changed if the
|
||||||
PCRE2_MULTILINE option is set. When this is the case, a circumflex matches
|
PCRE2_MULTILINE option is set. When this is the case, a dollar character
|
||||||
immediately after internal newlines as well as at the start of the subject
|
matches before any newlines in the string, as well as at the very end, and a
|
||||||
string. It does not match after a newline that ends the string. A dollar
|
circumflex matches immediately after internal newlines as well as at the start
|
||||||
matches before any newlines in the string, as well as at the very end, when
|
of the subject string. It does not match after a newline that ends the string,
|
||||||
PCRE2_MULTILINE is set. When newline is specified as the two-character
|
for compatibility with Perl. However, this can be changed by setting the
|
||||||
sequence CRLF, isolated CR and LF characters do not indicate newlines.
|
PCRE2_ALT_CIRCUMFLEX option.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
For example, the pattern /^abc$/ matches the subject string "def\nabc" (where
|
For example, the pattern /^abc$/ matches the subject string "def\nabc" (where
|
||||||
|
@ -1198,12 +1222,16 @@ whether or not a UTF mode is set. In the 8-bit library, one code unit is one
|
||||||
byte; in the 16-bit library it is a 16-bit unit; in the 32-bit library it is a
|
byte; in the 16-bit library it is a 16-bit unit; in the 32-bit library it is a
|
||||||
32-bit unit. Unlike a dot, \C always matches line-ending characters. The
|
32-bit unit. Unlike a dot, \C always matches line-ending characters. The
|
||||||
feature is provided in Perl in order to match individual bytes in UTF-8 mode,
|
feature is provided in Perl in order to match individual bytes in UTF-8 mode,
|
||||||
but it is unclear how it can usefully be used. Because \C breaks up characters
|
but it is unclear how it can usefully be used.
|
||||||
into individual code units, matching one unit with \C in a UTF mode means that
|
</P>
|
||||||
the rest of the string may start with a malformed UTF character. This has
|
<P>
|
||||||
undefined results, because PCRE2 assumes that it is dealing with valid UTF
|
Because \C breaks up characters into individual code units, matching one unit
|
||||||
strings (and by default it checks this at the start of processing unless the
|
with \C in UTF-8 or UTF-16 mode means that the rest of the string may start
|
||||||
PCRE2_NO_UTF_CHECK option is used).
|
with a malformed UTF character. This has undefined results, because PCRE2
|
||||||
|
assumes that it is matching character by character in a valid UTF string (by
|
||||||
|
default it checks the subject string's validity at the start of processing
|
||||||
|
unless the PCRE2_NO_UTF_CHECK option is used). An application can lock out the
|
||||||
|
use of \C by setting the PCRE2_NEVER_BACKSLASH_C option.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
PCRE2 does not allow \C to appear in lookbehind assertions
|
PCRE2 does not allow \C to appear in lookbehind assertions
|
||||||
|
@ -1475,7 +1503,8 @@ unset these options by preceding the letter with a hyphen, and a combined
|
||||||
setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS and
|
setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS and
|
||||||
PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also
|
||||||
permitted. If a letter appears both before and after the hyphen, the option is
|
permitted. If a letter appears both before and after the hyphen, the option is
|
||||||
unset.
|
unset. An empty options setting "(?)" is allowed. Needless to say, it has no
|
||||||
|
effect.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in
|
||||||
|
@ -1508,11 +1537,20 @@ option settings happen at compile time. There would be some very weird
|
||||||
behaviour otherwise.
|
behaviour otherwise.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
As a convenient shorthand, if any option settings are required at the start of
|
||||||
|
a non-capturing subpattern (see the next section), the option letters may
|
||||||
|
appear between the "?" and the ":". Thus the two patterns
|
||||||
|
<pre>
|
||||||
|
(?i:saturday|sunday)
|
||||||
|
(?:(?i)saturday|sunday)
|
||||||
|
</pre>
|
||||||
|
match exactly the same set of strings.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
<b>Note:</b> There are other PCRE2-specific options that can be set by the
|
<b>Note:</b> There are other PCRE2-specific options that can be set by the
|
||||||
application when the compiling function is called.
|
application when the compiling function is called. The pattern can contain
|
||||||
The pattern can contain special leading sequences such as (*CRLF) to override
|
special leading sequences such as (*CRLF) to override what the application has
|
||||||
what the application has set or what has been defaulted. Details are given in
|
set or what has been defaulted. Details are given in the section entitled
|
||||||
the section entitled
|
|
||||||
<a href="#newlineseq">"Newline sequences"</a>
|
<a href="#newlineseq">"Newline sequences"</a>
|
||||||
above. There are also the (*UTF) and (*UCP) leading sequences that can be used
|
above. There are also the (*UTF) and (*UCP) leading sequences that can be used
|
||||||
to set UTF and Unicode property modes; they are equivalent to setting the
|
to set UTF and Unicode property modes; they are equivalent to setting the
|
||||||
|
@ -3285,7 +3323,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 15 March 2015
|
Last updated: 13 June 2015
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2015 University of Cambridge.
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -15,7 +15,7 @@ please consult the man page, in case the conversion went wrong.
|
||||||
<ul>
|
<ul>
|
||||||
<li><a name="TOC1" href="#SEC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a>
|
<li><a name="TOC1" href="#SEC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a>
|
||||||
<li><a name="TOC2" href="#SEC2">QUOTING</a>
|
<li><a name="TOC2" href="#SEC2">QUOTING</a>
|
||||||
<li><a name="TOC3" href="#SEC3">CHARACTERS</a>
|
<li><a name="TOC3" href="#SEC3">ESCAPED CHARACTERS</a>
|
||||||
<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
|
<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
|
||||||
<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
|
<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
|
||||||
<li><a name="TOC6" href="#SEC6">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
|
<li><a name="TOC6" href="#SEC6">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
|
||||||
|
@ -55,11 +55,12 @@ documentation. This document contains a quick-reference summary of the syntax.
|
||||||
\Q...\E treat enclosed characters as literal
|
\Q...\E treat enclosed characters as literal
|
||||||
</PRE>
|
</PRE>
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
|
<br><a name="SEC3" href="#TOC1">ESCAPED CHARACTERS</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
This table applies to ASCII and Unicode environments.
|
||||||
<pre>
|
<pre>
|
||||||
\a alarm, that is, the BEL character (hex 07)
|
\a alarm, that is, the BEL character (hex 07)
|
||||||
\cx "control-x", where x is any ASCII character
|
\cx "control-x", where x is any ASCII printing character
|
||||||
\e escape (hex 1B)
|
\e escape (hex 1B)
|
||||||
\f form feed (hex 0C)
|
\f form feed (hex 0C)
|
||||||
\n newline (hex 0A)
|
\n newline (hex 0A)
|
||||||
|
@ -68,18 +69,32 @@ documentation. This document contains a quick-reference summary of the syntax.
|
||||||
\0dd character with octal code 0dd
|
\0dd character with octal code 0dd
|
||||||
\ddd character with octal code ddd, or backreference
|
\ddd character with octal code ddd, or backreference
|
||||||
\o{ddd..} character with octal code ddd..
|
\o{ddd..} character with octal code ddd..
|
||||||
|
\U "U" if PCRE2_ALT_BSUX is set (otherwise is an error)
|
||||||
|
\uhhhh character with hex code hhhh (if PCRE2_ALT_BSUX is set)
|
||||||
\xhh character with hex code hh
|
\xhh character with hex code hh
|
||||||
\x{hhh..} character with hex code hhh..
|
\x{hhh..} character with hex code hhh..
|
||||||
</pre>
|
</pre>
|
||||||
Note that \0dd is always an octal code, and that \8 and \9 are the literal
|
Note that \0dd is always an octal code. The treatment of backslash followed by
|
||||||
characters "8" and "9".
|
a non-zero digit is complicated; for details see the section
|
||||||
|
<a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
|
||||||
|
in the
|
||||||
|
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
|
||||||
|
documentation, where details of escape processing in EBCDIC environments are
|
||||||
|
also given.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
When \x is not followed by {, from zero to two hexadecimal digits are read,
|
||||||
|
but if PCRE2_ALT_BSUX is set, \x must be followed by two hexadecimal digits to
|
||||||
|
be recognized as a hexadecimal escape; otherwise it matches a literal "x".
|
||||||
|
Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits,
|
||||||
|
it matches a literal "u".
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
|
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
|
||||||
<P>
|
<P>
|
||||||
<pre>
|
<pre>
|
||||||
. any character except newline;
|
. any character except newline;
|
||||||
in dotall mode, any character whatsoever
|
in dotall mode, any character whatsoever
|
||||||
\C one data unit, even in UTF mode (best avoided)
|
\C one code unit, even in UTF mode (best avoided)
|
||||||
\d a decimal digit
|
\d a decimal digit
|
||||||
\D a character that is not a decimal digit
|
\D a character that is not a decimal digit
|
||||||
\h a horizontal white space character
|
\h a horizontal white space character
|
||||||
|
@ -96,6 +111,11 @@ characters "8" and "9".
|
||||||
\W a "non-word" character
|
\W a "non-word" character
|
||||||
\X a Unicode extended grapheme cluster
|
\X a Unicode extended grapheme cluster
|
||||||
</pre>
|
</pre>
|
||||||
|
The application can lock out the use of \C by setting the
|
||||||
|
PCRE2_NEVER_BACKSLASH_C option. It is dangerous because it may leave the
|
||||||
|
current matching point in the middle of a UTF-8 or UTF-16 character.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode
|
By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode
|
||||||
or in the 16-bit and 32-bit libraries. However, if locale-specific matching is
|
or in the 16-bit and 32-bit libraries. However, if locale-specific matching is
|
||||||
happening, \s and \w may also match characters with code points in the range
|
happening, \s and \w may also match characters with code points in the range
|
||||||
|
@ -348,7 +368,8 @@ but some of them use Unicode properties if PCRE2_UCP is set. You can use
|
||||||
\b word boundary
|
\b word boundary
|
||||||
\B not a word boundary
|
\B not a word boundary
|
||||||
^ start of subject
|
^ start of subject
|
||||||
also after internal newline in multiline mode
|
also after an internal newline in multiline mode
|
||||||
|
(after any newline if PCRE2_ALT_CIRCUMFLEX is set)
|
||||||
\A start of subject
|
\A start of subject
|
||||||
$ end of subject
|
$ end of subject
|
||||||
also before newline at end of subject
|
also before newline at end of subject
|
||||||
|
@ -423,7 +444,9 @@ appear.
|
||||||
(*UCP) set PCRE2_UCP (use Unicode properties for \d etc)
|
(*UCP) set PCRE2_UCP (use Unicode properties for \d etc)
|
||||||
</pre>
|
</pre>
|
||||||
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
|
Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
|
||||||
limits set by the caller of pcre2_match(), not increase them.
|
limits set by the caller of pcre2_match(), not increase them. The application
|
||||||
|
can lock out the use of (*UTF) and (*UCP) by setting the PCRE2_NEVER_UTF or
|
||||||
|
PCRE2_NEVER_UCP options, respectively, at compile time.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
|
<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -559,7 +582,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 15 March 2015
|
Last updated: 13 June 2015
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2015 University of Cambridge.
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -284,13 +284,20 @@ following commands are recognized:
|
||||||
#forbid_utf
|
#forbid_utf
|
||||||
</pre>
|
</pre>
|
||||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP
|
||||||
options set, which locks out the use of UTF and Unicode property features. This
|
options set, which locks out the use of the PCRE2_UTF and PCRE2_UCP options and
|
||||||
is a trigger guard that is used in test files to ensure that UTF or Unicode
|
the use of (*UTF) and (*UCP) at the start of patterns. This command also forces
|
||||||
property tests are not accidentally added to files that are used when Unicode
|
an error if a subsequent pattern contains any occurrences of \P, \p, or \X,
|
||||||
support is not included in the library. This effect can also be obtained by the
|
which are still supported when PCRE2_UTF is not set, but which require Unicode
|
||||||
use of <b>#pattern</b>; the difference is that <b>#forbid_utf</b> cannot be
|
property support to be included in the library.
|
||||||
unset, and the automatic options are not displayed in pattern information, to
|
</P>
|
||||||
avoid cluttering up test output.
|
<P>
|
||||||
|
This is a trigger guard that is used in test files to ensure that UTF or
|
||||||
|
Unicode property tests are not accidentally added to files that are used when
|
||||||
|
Unicode support is not included in the library. Setting PCRE2_NEVER_UTF and
|
||||||
|
PCRE2_NEVER_UCP as a default can also be obtained by the use of <b>#pattern</b>;
|
||||||
|
the difference is that <b>#forbid_utf</b> cannot be unset, and the automatic
|
||||||
|
options are not displayed in pattern information, to avoid cluttering up test
|
||||||
|
output.
|
||||||
<pre>
|
<pre>
|
||||||
#load <filename>
|
#load <filename>
|
||||||
</pre>
|
</pre>
|
||||||
|
@ -471,6 +478,7 @@ for a description of their effects.
|
||||||
<pre>
|
<pre>
|
||||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||||
alt_bsux set PCRE2_ALT_BSUX
|
alt_bsux set PCRE2_ALT_BSUX
|
||||||
|
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
|
||||||
anchored set PCRE2_ANCHORED
|
anchored set PCRE2_ANCHORED
|
||||||
auto_callout set PCRE2_AUTO_CALLOUT
|
auto_callout set PCRE2_AUTO_CALLOUT
|
||||||
/i caseless set PCRE2_CASELESS
|
/i caseless set PCRE2_CASELESS
|
||||||
|
@ -481,6 +489,7 @@ for a description of their effects.
|
||||||
firstline set PCRE2_FIRSTLINE
|
firstline set PCRE2_FIRSTLINE
|
||||||
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
|
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
|
||||||
/m multiline set PCRE2_MULTILINE
|
/m multiline set PCRE2_MULTILINE
|
||||||
|
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
|
||||||
never_ucp set PCRE2_NEVER_UCP
|
never_ucp set PCRE2_NEVER_UCP
|
||||||
never_utf set PCRE2_NEVER_UTF
|
never_utf set PCRE2_NEVER_UTF
|
||||||
no_auto_capture set PCRE2_NO_AUTO_CAPTURE
|
no_auto_capture set PCRE2_NO_AUTO_CAPTURE
|
||||||
|
@ -1460,7 +1469,7 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 22 March 2015
|
Last updated: 20 May 2015
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2015 University of Cambridge.
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
|
|
|
@ -87,16 +87,26 @@ SECURITY CONSIDERATIONS
|
||||||
mance.
|
mance.
|
||||||
|
|
||||||
One way of guarding against this possibility is to use the pcre2_pat-
|
One way of guarding against this possibility is to use the pcre2_pat-
|
||||||
tern_info() function to check the compiled pattern's options for UTF.
|
tern_info() function to check the compiled pattern's options for
|
||||||
Alternatively, you can set the PCRE2_NEVER_UTF option at compile time.
|
PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when
|
||||||
This causes an compile time error if a pattern contains a UTF-setting
|
calling pcre2_compile(). This causes an compile time error if a pattern
|
||||||
sequence.
|
contains a UTF-setting sequence.
|
||||||
|
|
||||||
|
The use of Unicode properties for character types such as \d can also
|
||||||
|
be enabled from within the pattern, by specifying "(*UCP)". This fea-
|
||||||
|
ture can be disallowed by setting the PCRE2_NEVER_UCP option.
|
||||||
|
|
||||||
If your application is one that supports UTF, be aware that validity
|
If your application is one that supports UTF, be aware that validity
|
||||||
checking can take time. If the same data string is to be matched many
|
checking can take time. If the same data string is to be matched many
|
||||||
times, you can use the PCRE2_NO_UTF_CHECK option for the second and
|
times, you can use the PCRE2_NO_UTF_CHECK option for the second and
|
||||||
subsequent matches to avoid running redundant checks.
|
subsequent matches to avoid running redundant checks.
|
||||||
|
|
||||||
|
The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead
|
||||||
|
to problems, because it may leave the current matching point in the
|
||||||
|
middle of a multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C
|
||||||
|
option can be used to lock out the use of \C, causing a compile-time
|
||||||
|
error if it is encountered.
|
||||||
|
|
||||||
Another way that performance can be hit is by running a pattern that
|
Another way that performance can be hit is by running a pattern that
|
||||||
has a very large search tree against a string that will never match.
|
has a very large search tree against a string that will never match.
|
||||||
Nested unlimited repeats in a pattern are a common example. PCRE2 pro-
|
Nested unlimited repeats in a pattern are a common example. PCRE2 pro-
|
||||||
|
@ -155,8 +165,8 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 18 November 2014
|
Last updated: 13 April 2015
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@ -1109,6 +1119,15 @@ COMPILING A PATTERN
|
||||||
always expected after \x, but it may have zero, one, or two digits (so,
|
always expected after \x, but it may have zero, one, or two digits (so,
|
||||||
for example, \xz matches a binary zero character followed by z).
|
for example, \xz matches a binary zero character followed by z).
|
||||||
|
|
||||||
|
PCRE2_ALT_CIRCUMFLEX
|
||||||
|
|
||||||
|
In multiline mode (when PCRE2_MULTILINE is set), the circumflex
|
||||||
|
metacharacter matches at the start of the subject (unless PCRE2_NOTBOL
|
||||||
|
is set), and also after any internal newline. However, it does not
|
||||||
|
match after a newline at the end of the subject, for compatibility with
|
||||||
|
Perl. If you want a multiline circumflex also to match after a termi-
|
||||||
|
nating newline, you must set PCRE2_ALT_CIRCUMFLEX.
|
||||||
|
|
||||||
PCRE2_AUTO_CALLOUT
|
PCRE2_AUTO_CALLOUT
|
||||||
|
|
||||||
If this bit is set, pcre2_compile() automatically inserts callout
|
If this bit is set, pcre2_compile() automatically inserts callout
|
||||||
|
@ -1204,9 +1223,20 @@ COMPILING A PATTERN
|
||||||
constructs match immediately following or immediately before internal
|
constructs match immediately following or immediately before internal
|
||||||
newlines in the subject string, respectively, as well as at the very
|
newlines in the subject string, respectively, as well as at the very
|
||||||
start and end. This is equivalent to Perl's /m option, and it can be
|
start and end. This is equivalent to Perl's /m option, and it can be
|
||||||
changed within a pattern by a (?m) option setting. If there are no new-
|
changed within a pattern by a (?m) option setting. Note that the "start
|
||||||
lines in a subject string, or no occurrences of ^ or $ in a pattern,
|
of line" metacharacter does not match after a newline at the end of the
|
||||||
setting PCRE2_MULTILINE has no effect.
|
subject, for compatibility with Perl. However, you can change this by
|
||||||
|
setting the PCRE2_ALT_CIRCUMFLEX option. If there are no newlines in a
|
||||||
|
subject string, or no occurrences of ^ or $ in a pattern, setting
|
||||||
|
PCRE2_MULTILINE has no effect.
|
||||||
|
|
||||||
|
PCRE2_NEVER_BACKSLASH_C
|
||||||
|
|
||||||
|
This option locks out the use of \C in the pattern that is being com-
|
||||||
|
piled. This escape can cause unpredictable behaviour in UTF-8 or
|
||||||
|
UTF-16 modes, because it may leave the current matching point in the
|
||||||
|
middle of a multi-code-unit character. This option may be useful in
|
||||||
|
applications that process patterns from external sources.
|
||||||
|
|
||||||
PCRE2_NEVER_UCP
|
PCRE2_NEVER_UCP
|
||||||
|
|
||||||
|
@ -1214,18 +1244,18 @@ COMPILING A PATTERN
|
||||||
\b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes, as
|
\b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes, as
|
||||||
described for the PCRE2_UCP option below. In particular, it prevents
|
described for the PCRE2_UCP option below. In particular, it prevents
|
||||||
the creator of the pattern from enabling this facility by starting the
|
the creator of the pattern from enabling this facility by starting the
|
||||||
pattern with (*UCP). This may be useful in applications that process
|
pattern with (*UCP). This option may be useful in applications that
|
||||||
patterns from external sources. The option combination PCRE_UCP and
|
process patterns from external sources. The option combination PCRE_UCP
|
||||||
PCRE_NEVER_UCP causes an error.
|
and PCRE_NEVER_UCP causes an error.
|
||||||
|
|
||||||
PCRE2_NEVER_UTF
|
PCRE2_NEVER_UTF
|
||||||
|
|
||||||
This option locks out interpretation of the pattern as UTF-8, UTF-16,
|
This option locks out interpretation of the pattern as UTF-8, UTF-16,
|
||||||
or UTF-32, depending on which library is in use. In particular, it pre-
|
or UTF-32, depending on which library is in use. In particular, it pre-
|
||||||
vents the creator of the pattern from switching to UTF interpretation
|
vents the creator of the pattern from switching to UTF interpretation
|
||||||
by starting the pattern with (*UTF). This may be useful in applications
|
by starting the pattern with (*UTF). This option may be useful in
|
||||||
that process patterns from external sources. The combination of
|
applications that process patterns from external sources. The combina-
|
||||||
PCRE2_UTF and PCRE2_NEVER_UTF causes an error.
|
tion of PCRE2_UTF and PCRE2_NEVER_UTF causes an error.
|
||||||
|
|
||||||
PCRE2_NO_AUTO_CAPTURE
|
PCRE2_NO_AUTO_CAPTURE
|
||||||
|
|
||||||
|
@ -2796,7 +2826,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 23 March 2015
|
Last updated: 22 April 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -2916,6 +2946,11 @@ UNICODE AND UTF SUPPORT
|
||||||
PCRE2_UCP option. Unless the application has set PCRE2_NEVER_UCP, a
|
PCRE2_UCP option. Unless the application has set PCRE2_NEVER_UCP, a
|
||||||
pattern may also request this by starting with (*UCP).
|
pattern may also request this by starting with (*UCP).
|
||||||
|
|
||||||
|
The \C escape sequence, which matches a single code unit, even in a UTF
|
||||||
|
mode, can cause unpredictable behaviour because it may leave the cur-
|
||||||
|
rent matching point in the middle of a multi-code-unit character. It
|
||||||
|
can be locked out by setting the PCRE2_NEVER_BACKSLASH_C option.
|
||||||
|
|
||||||
|
|
||||||
JUST-IN-TIME COMPILER SUPPORT
|
JUST-IN-TIME COMPILER SUPPORT
|
||||||
|
|
||||||
|
@ -3175,6 +3210,16 @@ PCRE2TEST OPTION FOR LIBREADLINE SUPPORT
|
||||||
immediately before the configure command.
|
immediately before the configure command.
|
||||||
|
|
||||||
|
|
||||||
|
INCLUDING DEBUGGING CODE
|
||||||
|
|
||||||
|
If you add
|
||||||
|
|
||||||
|
--enable-debug
|
||||||
|
|
||||||
|
to the configure command, additional debugging code is included in the
|
||||||
|
build. This feature is intended for use by the PCRE2 maintainers.
|
||||||
|
|
||||||
|
|
||||||
DEBUGGING WITH VALGRIND SUPPORT
|
DEBUGGING WITH VALGRIND SUPPORT
|
||||||
|
|
||||||
If you add
|
If you add
|
||||||
|
@ -3257,7 +3302,7 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 26 January 2015
|
Last updated: 24 April 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
------------------------------------------------------------------------------
|
------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -47,7 +47,7 @@ or provide an external function for stack size checking. The option bits are:
|
||||||
PCRE2_FIRSTLINE Force matching to be before newline
|
PCRE2_FIRSTLINE Force matching to be before newline
|
||||||
PCRE2_MATCH_UNSET_BACKREF Match unset back references
|
PCRE2_MATCH_UNSET_BACKREF Match unset back references
|
||||||
PCRE2_MULTILINE ^ and $ match newlines within data
|
PCRE2_MULTILINE ^ and $ match newlines within data
|
||||||
PCRE2_NEVER_BACKSLASH_C Lock out the use of \C in patterns
|
PCRE2_NEVER_BACKSLASH_C Lock out the use of \eC in patterns
|
||||||
PCRE2_NEVER_UCP Lock out PCRE2_UCP, e.g. via (*UCP)
|
PCRE2_NEVER_UCP Lock out PCRE2_UCP, e.g. via (*UCP)
|
||||||
PCRE2_NEVER_UTF Lock out PCRE2_UTF, e.g. via (*UTF)
|
PCRE2_NEVER_UTF Lock out PCRE2_UTF, e.g. via (*UTF)
|
||||||
PCRE2_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
PCRE2_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||||
|
|
|
@ -1161,7 +1161,6 @@ after a newline at the end of the subject, for compatibility with Perl.
|
||||||
However, you can change this by setting the PCRE2_ALT_CIRCUMFLEX option. If
|
However, you can change this by setting the PCRE2_ALT_CIRCUMFLEX option. If
|
||||||
there are no newlines in a subject string, or no occurrences of ^ or $ in a
|
there are no newlines in a subject string, or no occurrences of ^ or $ in a
|
||||||
pattern, setting PCRE2_MULTILINE has no effect.
|
pattern, setting PCRE2_MULTILINE has no effect.
|
||||||
|
|
||||||
.sp
|
.sp
|
||||||
PCRE2_NEVER_BACKSLASH_C
|
PCRE2_NEVER_BACKSLASH_C
|
||||||
.sp
|
.sp
|
||||||
|
|
|
@ -226,14 +226,20 @@ COMMAND LINES
|
||||||
#forbid_utf
|
#forbid_utf
|
||||||
|
|
||||||
Subsequent patterns automatically have the PCRE2_NEVER_UTF and
|
Subsequent patterns automatically have the PCRE2_NEVER_UTF and
|
||||||
PCRE2_NEVER_UCP options set, which locks out the use of UTF and Unicode
|
PCRE2_NEVER_UCP options set, which locks out the use of the PCRE2_UTF
|
||||||
property features. This is a trigger guard that is used in test files
|
and PCRE2_UCP options and the use of (*UTF) and (*UCP) at the start of
|
||||||
to ensure that UTF or Unicode property tests are not accidentally added
|
patterns. This command also forces an error if a subsequent pattern
|
||||||
to files that are used when Unicode support is not included in the
|
contains any occurrences of \P, \p, or \X, which are still supported
|
||||||
library. This effect can also be obtained by the use of #pattern; the
|
when PCRE2_UTF is not set, but which require Unicode property support
|
||||||
difference is that #forbid_utf cannot be unset, and the automatic
|
to be included in the library.
|
||||||
options are not displayed in pattern information, to avoid cluttering
|
|
||||||
up test output.
|
This is a trigger guard that is used in test files to ensure that UTF
|
||||||
|
or Unicode property tests are not accidentally added to files that are
|
||||||
|
used when Unicode support is not included in the library. Setting
|
||||||
|
PCRE2_NEVER_UTF and PCRE2_NEVER_UCP as a default can also be obtained
|
||||||
|
by the use of #pattern; the difference is that #forbid_utf cannot be
|
||||||
|
unset, and the automatic options are not displayed in pattern informa-
|
||||||
|
tion, to avoid cluttering up test output.
|
||||||
|
|
||||||
#load <filename>
|
#load <filename>
|
||||||
|
|
||||||
|
@ -417,6 +423,7 @@ PATTERN MODIFIERS
|
||||||
|
|
||||||
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
|
||||||
alt_bsux set PCRE2_ALT_BSUX
|
alt_bsux set PCRE2_ALT_BSUX
|
||||||
|
alt_circumflex set PCRE2_ALT_CIRCUMFLEX
|
||||||
anchored set PCRE2_ANCHORED
|
anchored set PCRE2_ANCHORED
|
||||||
auto_callout set PCRE2_AUTO_CALLOUT
|
auto_callout set PCRE2_AUTO_CALLOUT
|
||||||
/i caseless set PCRE2_CASELESS
|
/i caseless set PCRE2_CASELESS
|
||||||
|
@ -427,6 +434,7 @@ PATTERN MODIFIERS
|
||||||
firstline set PCRE2_FIRSTLINE
|
firstline set PCRE2_FIRSTLINE
|
||||||
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
|
match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
|
||||||
/m multiline set PCRE2_MULTILINE
|
/m multiline set PCRE2_MULTILINE
|
||||||
|
never_backslash_c set PCRE2_NEVER_BACKSLASH_C
|
||||||
never_ucp set PCRE2_NEVER_UCP
|
never_ucp set PCRE2_NEVER_UCP
|
||||||
never_utf set PCRE2_NEVER_UTF
|
never_utf set PCRE2_NEVER_UTF
|
||||||
no_auto_capture set PCRE2_NO_AUTO_CAPTURE
|
no_auto_capture set PCRE2_NO_AUTO_CAPTURE
|
||||||
|
@ -1322,5 +1330,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 22 March 2015
|
Last updated: 20 May 2015
|
||||||
Copyright (c) 1997-2015 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
|
|
|
@ -200,7 +200,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#define PACKAGE_NAME "PCRE2"
|
#define PACKAGE_NAME "PCRE2"
|
||||||
|
|
||||||
/* Define to the full name and version of this package. */
|
/* Define to the full name and version of this package. */
|
||||||
#define PACKAGE_STRING "PCRE2 10.10"
|
#define PACKAGE_STRING "PCRE2 10.20-RC1"
|
||||||
|
|
||||||
/* Define to the one symbol short name of this package. */
|
/* Define to the one symbol short name of this package. */
|
||||||
#define PACKAGE_TARNAME "pcre2"
|
#define PACKAGE_TARNAME "pcre2"
|
||||||
|
@ -209,7 +209,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#define PACKAGE_URL ""
|
#define PACKAGE_URL ""
|
||||||
|
|
||||||
/* Define to the version of this package. */
|
/* Define to the version of this package. */
|
||||||
#define PACKAGE_VERSION "10.10"
|
#define PACKAGE_VERSION "10.20-RC1"
|
||||||
|
|
||||||
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested
|
||||||
parentheses (of any kind) in a pattern. This limits the amount of system
|
parentheses (of any kind) in a pattern. This limits the amount of system
|
||||||
|
@ -227,6 +227,9 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
#define PCRE2GREP_BUFSIZE 20480
|
#define PCRE2GREP_BUFSIZE 20480
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
/* Define to any value to include debugging code. */
|
||||||
|
/* #undef PCRE2_DEBUG */
|
||||||
|
|
||||||
/* If you are compiling for a system other than a Unix-like system or
|
/* If you are compiling for a system other than a Unix-like system or
|
||||||
Win32, and it needs some magic to be inserted before the definition
|
Win32, and it needs some magic to be inserted before the definition
|
||||||
of a function that is exported by the library, define this macro to
|
of a function that is exported by the library, define this macro to
|
||||||
|
@ -287,7 +290,7 @@ sure both macros are undefined; an emulation function will then be used. */
|
||||||
/* #undef SUPPORT_VALGRIND */
|
/* #undef SUPPORT_VALGRIND */
|
||||||
|
|
||||||
/* Version number of package */
|
/* Version number of package */
|
||||||
#define VERSION "10.10"
|
#define VERSION "10.20-RC1"
|
||||||
|
|
||||||
/* Define to empty if `const' does not conform to ANSI C. */
|
/* Define to empty if `const' does not conform to ANSI C. */
|
||||||
/* #undef const */
|
/* #undef const */
|
||||||
|
|
|
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||||
/* The current PCRE version information. */
|
/* The current PCRE version information. */
|
||||||
|
|
||||||
#define PCRE2_MAJOR 10
|
#define PCRE2_MAJOR 10
|
||||||
#define PCRE2_MINOR 10
|
#define PCRE2_MINOR 20
|
||||||
#define PCRE2_PRERELEASE
|
#define PCRE2_PRERELEASE -RC1
|
||||||
#define PCRE2_DATE 2015-03-06
|
#define PCRE2_DATE 2015-06-16
|
||||||
|
|
||||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||||
imported have to be identified as such. When building PCRE2, the appropriate
|
imported have to be identified as such. When building PCRE2, the appropriate
|
||||||
|
@ -118,6 +118,8 @@ D is inspected during pcre2_dfa_match() execution
|
||||||
#define PCRE2_UCP 0x00020000u /* C J M D */
|
#define PCRE2_UCP 0x00020000u /* C J M D */
|
||||||
#define PCRE2_UNGREEDY 0x00040000u /* C */
|
#define PCRE2_UNGREEDY 0x00040000u /* C */
|
||||||
#define PCRE2_UTF 0x00080000u /* C J M D */
|
#define PCRE2_UTF 0x00080000u /* C J M D */
|
||||||
|
#define PCRE2_NEVER_BACKSLASH_C 0x00100000u /* C */
|
||||||
|
#define PCRE2_ALT_CIRCUMFLEX 0x00200000u /* J M D */
|
||||||
|
|
||||||
/* These are for pcre2_jit_compile(). */
|
/* These are for pcre2_jit_compile(). */
|
||||||
|
|
||||||
|
@ -125,9 +127,10 @@ D is inspected during pcre2_dfa_match() execution
|
||||||
#define PCRE2_JIT_PARTIAL_SOFT 0x00000002u
|
#define PCRE2_JIT_PARTIAL_SOFT 0x00000002u
|
||||||
#define PCRE2_JIT_PARTIAL_HARD 0x00000004u
|
#define PCRE2_JIT_PARTIAL_HARD 0x00000004u
|
||||||
|
|
||||||
/* These are for pcre2_match() and pcre2_dfa_match(). Note that PCRE2_ANCHORED,
|
/* These are for pcre2_match(), pcre2_dfa_match(), and pcre2_jit_match(). Note
|
||||||
and PCRE2_NO_UTF_CHECK can also be passed to these functions, so take care not
|
that PCRE2_ANCHORED and PCRE2_NO_UTF_CHECK can also be passed to these
|
||||||
to define synonyms by mistake. */
|
functions (though pcre2_jit_match() ignores the latter since it bypasses all
|
||||||
|
sanity checks). */
|
||||||
|
|
||||||
#define PCRE2_NOTBOL 0x00000001u
|
#define PCRE2_NOTBOL 0x00000001u
|
||||||
#define PCRE2_NOTEOL 0x00000002u
|
#define PCRE2_NOTEOL 0x00000002u
|
||||||
|
@ -337,8 +340,24 @@ typedef struct pcre2_callout_block { \
|
||||||
PCRE2_SIZE current_position; /* Where we currently are in the subject */ \
|
PCRE2_SIZE current_position; /* Where we currently are in the subject */ \
|
||||||
PCRE2_SIZE pattern_position; /* Offset to next item in the pattern */ \
|
PCRE2_SIZE pattern_position; /* Offset to next item in the pattern */ \
|
||||||
PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \
|
PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \
|
||||||
|
/* ------------------- Added for Version 1 -------------------------- */ \
|
||||||
|
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
|
||||||
|
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
|
||||||
|
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
|
||||||
/* ------------------------------------------------------------------ */ \
|
/* ------------------------------------------------------------------ */ \
|
||||||
} pcre2_callout_block;
|
} pcre2_callout_block; \
|
||||||
|
\
|
||||||
|
typedef struct pcre2_callout_enumerate_block { \
|
||||||
|
uint32_t version; /* Identifies version of block */ \
|
||||||
|
/* ------------------------ Version 0 ------------------------------- */ \
|
||||||
|
PCRE2_SIZE pattern_position; /* Offset to next item in the pattern */ \
|
||||||
|
PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \
|
||||||
|
uint32_t callout_number; /* Number compiled into pattern */ \
|
||||||
|
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
|
||||||
|
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
|
||||||
|
PCRE2_SPTR callout_string; /* String compiled into pattern */ \
|
||||||
|
/* ------------------------------------------------------------------ */ \
|
||||||
|
} pcre2_callout_enumerate_block;
|
||||||
|
|
||||||
|
|
||||||
/* List the generic forms of all other functions in macros, which will be
|
/* List the generic forms of all other functions in macros, which will be
|
||||||
|
@ -406,6 +425,9 @@ PCRE2_EXP_DECL void pcre2_code_free(pcre2_code *);
|
||||||
|
|
||||||
#define PCRE2_PATTERN_INFO_FUNCTIONS \
|
#define PCRE2_PATTERN_INFO_FUNCTIONS \
|
||||||
PCRE2_EXP_DECL int pcre2_pattern_info(const pcre2_code *, uint32_t, \
|
PCRE2_EXP_DECL int pcre2_pattern_info(const pcre2_code *, uint32_t, \
|
||||||
|
void *); \
|
||||||
|
PCRE2_EXP_DECL int pcre2_callout_enumerate(const pcre2_code *, \
|
||||||
|
int (*)(pcre2_callout_enumerate_block *, void *), \
|
||||||
void *);
|
void *);
|
||||||
|
|
||||||
|
|
||||||
|
@ -535,6 +557,7 @@ pcre2_compile are called by application code. */
|
||||||
/* Data blocks */
|
/* Data blocks */
|
||||||
|
|
||||||
#define pcre2_callout_block PCRE2_SUFFIX(pcre2_callout_block_)
|
#define pcre2_callout_block PCRE2_SUFFIX(pcre2_callout_block_)
|
||||||
|
#define pcre2_callout_enumerate_block PCRE2_SUFFIX(pcre2_callout_enumerate_block_)
|
||||||
#define pcre2_general_context PCRE2_SUFFIX(pcre2_general_context_)
|
#define pcre2_general_context PCRE2_SUFFIX(pcre2_general_context_)
|
||||||
#define pcre2_compile_context PCRE2_SUFFIX(pcre2_compile_context_)
|
#define pcre2_compile_context PCRE2_SUFFIX(pcre2_compile_context_)
|
||||||
#define pcre2_match_context PCRE2_SUFFIX(pcre2_match_context_)
|
#define pcre2_match_context PCRE2_SUFFIX(pcre2_match_context_)
|
||||||
|
@ -543,6 +566,7 @@ pcre2_compile are called by application code. */
|
||||||
|
|
||||||
/* Functions: the complete list in alphabetical order */
|
/* Functions: the complete list in alphabetical order */
|
||||||
|
|
||||||
|
#define pcre2_callout_enumerate PCRE2_SUFFIX(pcre2_callout_enumerate_)
|
||||||
#define pcre2_code_free PCRE2_SUFFIX(pcre2_code_free_)
|
#define pcre2_code_free PCRE2_SUFFIX(pcre2_code_free_)
|
||||||
#define pcre2_compile PCRE2_SUFFIX(pcre2_compile_)
|
#define pcre2_compile PCRE2_SUFFIX(pcre2_compile_)
|
||||||
#define pcre2_compile_context_copy PCRE2_SUFFIX(pcre2_compile_context_copy_)
|
#define pcre2_compile_context_copy PCRE2_SUFFIX(pcre2_compile_context_copy_)
|
||||||
|
@ -550,7 +574,6 @@ pcre2_compile are called by application code. */
|
||||||
#define pcre2_compile_context_free PCRE2_SUFFIX(pcre2_compile_context_free_)
|
#define pcre2_compile_context_free PCRE2_SUFFIX(pcre2_compile_context_free_)
|
||||||
#define pcre2_config PCRE2_SUFFIX(pcre2_config_)
|
#define pcre2_config PCRE2_SUFFIX(pcre2_config_)
|
||||||
#define pcre2_dfa_match PCRE2_SUFFIX(pcre2_dfa_match_)
|
#define pcre2_dfa_match PCRE2_SUFFIX(pcre2_dfa_match_)
|
||||||
#define pcre2_match PCRE2_SUFFIX(pcre2_match_)
|
|
||||||
#define pcre2_general_context_copy PCRE2_SUFFIX(pcre2_general_context_copy_)
|
#define pcre2_general_context_copy PCRE2_SUFFIX(pcre2_general_context_copy_)
|
||||||
#define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_)
|
#define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_)
|
||||||
#define pcre2_general_context_free PCRE2_SUFFIX(pcre2_general_context_free_)
|
#define pcre2_general_context_free PCRE2_SUFFIX(pcre2_general_context_free_)
|
||||||
|
@ -566,6 +589,7 @@ pcre2_compile are called by application code. */
|
||||||
#define pcre2_jit_stack_create PCRE2_SUFFIX(pcre2_jit_stack_create_)
|
#define pcre2_jit_stack_create PCRE2_SUFFIX(pcre2_jit_stack_create_)
|
||||||
#define pcre2_jit_stack_free PCRE2_SUFFIX(pcre2_jit_stack_free_)
|
#define pcre2_jit_stack_free PCRE2_SUFFIX(pcre2_jit_stack_free_)
|
||||||
#define pcre2_maketables PCRE2_SUFFIX(pcre2_maketables_)
|
#define pcre2_maketables PCRE2_SUFFIX(pcre2_maketables_)
|
||||||
|
#define pcre2_match PCRE2_SUFFIX(pcre2_match_)
|
||||||
#define pcre2_match_context_copy PCRE2_SUFFIX(pcre2_match_context_copy_)
|
#define pcre2_match_context_copy PCRE2_SUFFIX(pcre2_match_context_copy_)
|
||||||
#define pcre2_match_context_create PCRE2_SUFFIX(pcre2_match_context_create_)
|
#define pcre2_match_context_create PCRE2_SUFFIX(pcre2_match_context_create_)
|
||||||
#define pcre2_match_context_free PCRE2_SUFFIX(pcre2_match_context_free_)
|
#define pcre2_match_context_free PCRE2_SUFFIX(pcre2_match_context_free_)
|
||||||
|
|
Loading…
Reference in New Issue