Documentation update.
This commit is contained in:
parent
7fe5e441ff
commit
424bba4d15
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2COMPAT 3 "18 October 2016" "PCRE2 10.23"
|
.TH PCRE2COMPAT 3 "29 March 2017" "PCRE2 10.30"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
.SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
|
||||||
|
@ -6,7 +6,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
|
||||||
.sp
|
.sp
|
||||||
This document describes the differences in the ways that PCRE2 and Perl handle
|
This document describes the differences in the ways that PCRE2 and Perl handle
|
||||||
regular expressions. The differences described here are with respect to Perl
|
regular expressions. The differences described here are with respect to Perl
|
||||||
versions 5.10 and above.
|
versions 5.24, but as both Perl and PCRE2 are continually changing, the
|
||||||
|
information may sometimes be out of date.
|
||||||
.P
|
.P
|
||||||
1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does
|
1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does
|
||||||
have are given in the
|
have are given in the
|
||||||
|
@ -15,16 +16,17 @@ have are given in the
|
||||||
.\"
|
.\"
|
||||||
page.
|
page.
|
||||||
.P
|
.P
|
||||||
2. PCRE2 allows repeat quantifiers only on parenthesized assertions, but they
|
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
|
||||||
do not mean what you might think. For example, (?!a){3} does not assert that
|
they do not mean what you might think. For example, (?!a){3} does not assert
|
||||||
the next three characters are not "a". It just asserts that the next character
|
that the next three characters are not "a". It just asserts that the next
|
||||||
is not "a" three times (in principle: PCRE2 optimizes this to run the assertion
|
character is not "a" three times (in principle: PCRE2 optimizes this to run the
|
||||||
just once). Perl allows repeat quantifiers on other assertions such as \eb, but
|
assertion just once). Perl allows some repeat quantifiers on other assertions,
|
||||||
these do not seem to have any use.
|
for example, \eb* (but not \eb{3}), but these do not seem to have any use.
|
||||||
.P
|
.P
|
||||||
3. Capturing subpatterns that occur inside negative lookahead assertions are
|
3. Capturing subpatterns that occur inside negative lookaround assertions are
|
||||||
counted, but their entries in the offsets vector are never set. Perl sometimes
|
counted, but their entries in the offsets vector are set only if the assertion
|
||||||
(but not always) sets its numerical variables from inside negative assertions.
|
is a condition. Perl has changed its behaviour in this regard from time to
|
||||||
|
time.
|
||||||
.P
|
.P
|
||||||
4. The following Perl escape sequences are not supported: \el, \eu, \eL,
|
4. The following Perl escape sequences are not supported: \el, \eu, \eL,
|
||||||
\eU, and \eN when followed by a character name or Unicode value. (\eN on its
|
\eU, and \eN when followed by a character name or Unicode value. (\eN on its
|
||||||
|
@ -35,13 +37,13 @@ generated by default. However, if the PCRE2_ALT_BSUX option is set,
|
||||||
\eU and \eu are interpreted as ECMAScript interprets them.
|
\eU and \eu are interpreted as ECMAScript interprets them.
|
||||||
.P
|
.P
|
||||||
5. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
|
5. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
|
||||||
built with Unicode support. The properties that can be tested with \ep and \eP
|
built with Unicode support (the default). The properties that can be tested
|
||||||
are limited to the general category properties such as Lu and Nd, script names
|
with \ep and \eP are limited to the general category properties such as Lu and
|
||||||
such as Greek or Han, and the derived properties Any and L&. PCRE2 does support
|
Nd, script names such as Greek or Han, and the derived properties Any and L&.
|
||||||
the Cs (surrogate) property, which Perl does not; the Perl documentation says
|
PCRE2 does support the Cs (surrogate) property, which Perl does not; the Perl
|
||||||
"Because Perl hides the need for the user to understand the internal
|
documentation says "Because Perl hides the need for the user to understand the
|
||||||
representation of Unicode characters, there is no need to implement the
|
internal representation of Unicode characters, there is no need to implement
|
||||||
somewhat messy concept of surrogates."
|
the somewhat messy concept of surrogates."
|
||||||
.P
|
.P
|
||||||
6. PCRE2 does support the \eQ...\eE escape for quoting substrings. Characters
|
6. PCRE2 does support the \eQ...\eE escape for quoting substrings. Characters
|
||||||
in between are treated as literals. This is slightly different from Perl in
|
in between are treated as literals. This is slightly different from Perl in
|
||||||
|
@ -60,29 +62,16 @@ Note the following examples:
|
||||||
The \eQ...\eE sequence is recognized both inside and outside character classes.
|
The \eQ...\eE sequence is recognized both inside and outside character classes.
|
||||||
.P
|
.P
|
||||||
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
|
||||||
constructions. However, there is support for recursive patterns. This is not
|
constructions. However, there is support PCRE2's "callout" feature, which
|
||||||
available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE2 "callout"
|
allows an external function to be called during pattern matching. See the
|
||||||
feature allows an external function to be called during pattern matching. See
|
|
||||||
the
|
|
||||||
.\" HREF
|
.\" HREF
|
||||||
\fBpcre2callout\fP
|
\fBpcre2callout\fP
|
||||||
.\"
|
.\"
|
||||||
documentation for details.
|
documentation for details.
|
||||||
.P
|
.P
|
||||||
8. Subroutine calls (whether recursive or not) are treated as atomic groups.
|
8. Subroutine calls (whether recursive or not) were treated as atomic groups up
|
||||||
Atomic recursion is like Python, but unlike Perl. Captured values that are set
|
to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
|
||||||
outside a subroutine call can be referenced from inside in PCRE2, but not in
|
into subroutine calls is now supported, as in Perl.
|
||||||
Perl. There is a discussion that explains these differences in more detail in
|
|
||||||
the
|
|
||||||
.\" HTML <a href="pcre2pattern.html#recursiondifference">
|
|
||||||
.\" </a>
|
|
||||||
section on recursion differences from Perl
|
|
||||||
.\"
|
|
||||||
in the
|
|
||||||
.\" HREF
|
|
||||||
\fBpcre2pattern\fP
|
|
||||||
.\"
|
|
||||||
page.
|
|
||||||
.P
|
.P
|
||||||
9. If any of the backtracking control verbs are used in a subpattern that is
|
9. If any of the backtracking control verbs are used in a subpattern that is
|
||||||
called as a subroutine (whether or not recursively), their effect is confined
|
called as a subroutine (whether or not recursively), their effect is confined
|
||||||
|
@ -130,13 +119,13 @@ certainly user mistakes.
|
||||||
16. In PCRE2, the upper/lower case character properties Lu and Ll are not
|
16. In PCRE2, the upper/lower case character properties Lu and Ll are not
|
||||||
affected when case-independent matching is specified. For example, \ep{Lu}
|
affected when case-independent matching is specified. For example, \ep{Lu}
|
||||||
always matches an upper case letter. I think Perl has changed in this respect;
|
always matches an upper case letter. I think Perl has changed in this respect;
|
||||||
in the release at the time of writing (5.16), \ep{Lu} and \ep{Ll} match all
|
in the release at the time of writing (5.24), \ep{Lu} and \ep{Ll} match all
|
||||||
letters, regardless of case, when case independence is specified.
|
letters, regardless of case, when case independence is specified.
|
||||||
.P
|
.P
|
||||||
17. PCRE2 provides some extensions to the Perl regular expression facilities.
|
17. PCRE2 provides some extensions to the Perl regular expression facilities.
|
||||||
Perl 5.10 includes new features that are not in earlier versions of Perl, some
|
Perl 5.10 includes new features that are not in earlier versions of Perl, some
|
||||||
of which (such as named parentheses) have been in PCRE2 for some time. This
|
of which (such as named parentheses) were in PCRE2 for some time before. This
|
||||||
list is with respect to Perl 5.10:
|
list is with respect to Perl 5.24:
|
||||||
.sp
|
.sp
|
||||||
(a) Although lookbehind assertions in PCRE2 must match fixed length strings,
|
(a) Although lookbehind assertions in PCRE2 must match fixed length strings,
|
||||||
each alternative branch of a lookbehind assertion can match a different length
|
each alternative branch of a lookbehind assertion can match a different length
|
||||||
|
@ -190,6 +179,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 18 October 2016
|
Last updated: 29 March 2017
|
||||||
Copyright (c) 1997-2016 University of Cambridge.
|
Copyright (c) 1997-2017 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
Loading…
Reference in New Issue