From ed9f34b06b260323c07b5f34c40ec75d82b39412 Mon Sep 17 00:00:00 2001 From: "Philip.Hazel" Date: Fri, 31 Mar 2017 16:49:33 +0000 Subject: [PATCH] Documentation update --- Makefile.am | 2 - doc/html/index.html | 3 - doc/html/pcre2compat.html | 64 ++++---- doc/html/pcre2jit.html | 6 +- doc/html/pcre2limits.html | 12 +- doc/html/pcre2perform.html | 121 +++++++++++---- doc/index.html.src | 3 - doc/pcre2.txt | 311 +++++++++++++++++++++---------------- doc/pcre2perform.3 | 130 +++++++++++----- doc/pcre2stack.3 | 212 ------------------------- 10 files changed, 391 insertions(+), 473 deletions(-) delete mode 100644 doc/pcre2stack.3 diff --git a/Makefile.am b/Makefile.am index a370db2..fa57eeb 100644 --- a/Makefile.am +++ b/Makefile.am @@ -103,7 +103,6 @@ dist_html_DATA = \ doc/html/pcre2posix.html \ doc/html/pcre2sample.html \ doc/html/pcre2serialize.html \ - doc/html/pcre2stack.html \ doc/html/pcre2syntax.html \ doc/html/pcre2test.html \ doc/html/pcre2unicode.html @@ -187,7 +186,6 @@ dist_man_MANS = \ doc/pcre2posix.3 \ doc/pcre2sample.3 \ doc/pcre2serialize.3 \ - doc/pcre2stack.3 \ doc/pcre2syntax.3 \ doc/pcre2test.1 \ doc/pcre2unicode.3 diff --git a/doc/html/index.html b/doc/html/index.html index 3920426..3517671 100644 --- a/doc/html/index.html +++ b/doc/html/index.html @@ -68,9 +68,6 @@ first. pcre2serialize   Serializing functions for saving precompiled patterns -pcre2stack -   Discussion of PCRE2's stack usage - pcre2syntax   Syntax quick-reference summary diff --git a/doc/html/pcre2compat.html b/doc/html/pcre2compat.html index 993dfd1..b55ab82 100644 --- a/doc/html/pcre2compat.html +++ b/doc/html/pcre2compat.html @@ -18,7 +18,8 @@ DIFFERENCES BETWEEN PCRE2 AND PERL

This document describes the differences in the ways that PCRE2 and Perl handle regular expressions. The differences described here are with respect to Perl -versions 5.10 and above. +versions 5.24, but as both Perl and PCRE2 are continually changing, the +information may sometimes be out of date.

1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does @@ -27,17 +28,18 @@ have are given in the page.

-2. PCRE2 allows repeat quantifiers only on parenthesized assertions, but they -do not mean what you might think. For example, (?!a){3} does not assert that -the next three characters are not "a". It just asserts that the next character -is not "a" three times (in principle: PCRE2 optimizes this to run the assertion -just once). Perl allows repeat quantifiers on other assertions such as \b, but -these do not seem to have any use. +2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but +they do not mean what you might think. For example, (?!a){3} does not assert +that the next three characters are not "a". It just asserts that the next +character is not "a" three times (in principle: PCRE2 optimizes this to run the +assertion just once). Perl allows some repeat quantifiers on other assertions, +for example, \b* (but not \b{3}), but these do not seem to have any use.

-3. Capturing subpatterns that occur inside negative lookahead assertions are -counted, but their entries in the offsets vector are never set. Perl sometimes -(but not always) sets its numerical variables from inside negative assertions. +3. Capturing subpatterns that occur inside negative lookaround assertions are +counted, but their entries in the offsets vector are set only if the assertion +is a condition. Perl has changed its behaviour in this regard from time to +time.

4. The following Perl escape sequences are not supported: \l, \u, \L, @@ -50,13 +52,13 @@ generated by default. However, if the PCRE2_ALT_BSUX option is set,

5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is -built with Unicode support. The properties that can be tested with \p and \P -are limited to the general category properties such as Lu and Nd, script names -such as Greek or Han, and the derived properties Any and L&. PCRE2 does support -the Cs (surrogate) property, which Perl does not; the Perl documentation says -"Because Perl hides the need for the user to understand the internal -representation of Unicode characters, there is no need to implement the -somewhat messy concept of surrogates." +built with Unicode support (the default). The properties that can be tested +with \p and \P are limited to the general category properties such as Lu and +Nd, script names such as Greek or Han, and the derived properties Any and L&. +PCRE2 does support the Cs (surrogate) property, which Perl does not; the Perl +documentation says "Because Perl hides the need for the user to understand the +internal representation of Unicode characters, there is no need to implement +the somewhat messy concept of surrogates."

6. PCRE2 does support the \Q...\E escape for quoting substrings. Characters @@ -75,23 +77,15 @@ The \Q...\E sequence is recognized both inside and outside character classes.

7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code}) -constructions. However, there is support for recursive patterns. This is not -available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE2 "callout" -feature allows an external function to be called during pattern matching. See -the +constructions. However, there is support PCRE2's "callout" feature, which +allows an external function to be called during pattern matching. See the pcre2callout documentation for details.

-8. Subroutine calls (whether recursive or not) are treated as atomic groups. -Atomic recursion is like Python, but unlike Perl. Captured values that are set -outside a subroutine call can be referenced from inside in PCRE2, but not in -Perl. There is a discussion that explains these differences in more detail in -the -section on recursion differences from Perl -in the -pcre2pattern -page. +8. Subroutine calls (whether recursive or not) were treated as atomic groups up +to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking +into subroutine calls is now supported, as in Perl.

9. If any of the backtracking control verbs are used in a subpattern that is @@ -147,14 +141,14 @@ certainly user mistakes. 16. In PCRE2, the upper/lower case character properties Lu and Ll are not affected when case-independent matching is specified. For example, \p{Lu} always matches an upper case letter. I think Perl has changed in this respect; -in the release at the time of writing (5.16), \p{Lu} and \p{Ll} match all +in the release at the time of writing (5.24), \p{Lu} and \p{Ll} match all letters, regardless of case, when case independence is specified.

17. PCRE2 provides some extensions to the Perl regular expression facilities. Perl 5.10 includes new features that are not in earlier versions of Perl, some -of which (such as named parentheses) have been in PCRE2 for some time. This -list is with respect to Perl 5.10: +of which (such as named parentheses) were in PCRE2 for some time before. This +list is with respect to Perl 5.24:

(a) Although lookbehind assertions in PCRE2 must match fixed length strings, @@ -220,9 +214,9 @@ Cambridge, England. REVISION

-Last updated: 18 October 2016 +Last updated: 29 March 2017
-Copyright © 1997-2016 University of Cambridge. +Copyright © 1997-2017 University of Cambridge.

Return to the PCRE2 index page. diff --git a/doc/html/pcre2jit.html b/doc/html/pcre2jit.html index 4a6d4ff..5eae042 100644 --- a/doc/html/pcre2jit.html +++ b/doc/html/pcre2jit.html @@ -173,7 +173,7 @@ below for a discussion of JIT stack usage. The error code PCRE2_ERROR_MATCHLIMIT is returned by the JIT code if searching a very large pattern tree goes on for too long, as it is in the same circumstance when JIT is not used, but the details of exactly what is counted -are not the same. The PCRE2_ERROR_RECURSIONLIMIT error code is never returned +are not the same. The PCRE2_ERROR_DEPTHLIMIT error code is never returned when JIT matching is used.


CONTROLLING THE JIT STACK
@@ -436,9 +436,9 @@ Cambridge, England.


REVISION

-Last updated: 05 June 2016 +Last updated: 30 March 2017
-Copyright © 1997-2016 University of Cambridge. +Copyright © 1997-2017 University of Cambridge.

Return to the PCRE2 index page. diff --git a/doc/html/pcre2limits.html b/doc/html/pcre2limits.html index d7e382b..640fe3d 100644 --- a/doc/html/pcre2limits.html +++ b/doc/html/pcre2limits.html @@ -44,14 +44,6 @@ integer type, usually defined as size_t. Its maximum value (that is and unset offsets.

-Note that when using the traditional matching function, PCRE2 uses recursion to -handle subpatterns and indefinite repetition. This means that the available -stack space may limit the size of a subject string that can be processed by -certain patterns. For a discussion of stack issues, see the -pcre2stack -documentation. -

-

All values in repeating quantifiers must be less than 65536.

@@ -94,9 +86,9 @@ Cambridge, England. REVISION

-Last updated: 26 October 2016 +Last updated: 30 March 2017
-Copyright © 1997-2016 University of Cambridge. +Copyright © 1997-2017 University of Cambridge.

Return to the PCRE2 index page. diff --git a/doc/html/pcre2perform.html b/doc/html/pcre2perform.html index ac9d23c..ad5d065 100644 --- a/doc/html/pcre2perform.html +++ b/doc/html/pcre2perform.html @@ -15,7 +15,7 @@ please consult the man page, in case the conversion went wrong.