diff --git a/HACKING b/HACKING index cad11b3..67841c7 100644 --- a/HACKING +++ b/HACKING @@ -8,8 +8,8 @@ library is referred to as PCRE1 below. For information about testing PCRE2, see the pcre2test documentation and the comment at the head of the RunTest file. PCRE1 releases were up to 8.3x when PCRE2 was developed, and later bug fix -releases remain in the 8.xx series. PCRE2 releases started at 10.00 to avoid -confusion with PCRE1. +releases carried on the 8.xx series, up to the final 8.45 release. PCRE2 +releases started at 10.00 to avoid confusion with PCRE1. Historical note 1 @@ -38,8 +38,8 @@ Historical note 2 By contrast, the code originally written by Henry Spencer (which was subsequently heavily modified for Perl) compiles the expression twice: once in a dummy mode in order to find out how much store will be needed, and then for -real. (The Perl version probably doesn't do this any more; I'm talking about -the original library.) The execution function operates by backtracking and +real. (The Perl version may or may not still do this; I'm talking about the +original library.) The execution function operates by backtracking and maximizing (or, optionally, minimizing, in Perl) the amount of the subject that matches individual wild portions of the pattern. This is an "NFA algorithm" in Friedl's terminology. @@ -151,8 +151,8 @@ of code units in the item itself. The exception is the aforementioned large advance to check for such values. When auto-callouts are enabled, the generous assumption is made that there will be a callout for each pattern code unit (which of course is only actually true if all code units are literals) plus one -at the end. There is a default parsed pattern vector on the system stack, but -if this is not big enough, heap memory is used. +at the end. A default parsed pattern vector is defined on the system stack, to +minimize memory handling, but if this is not big enough, heap memory is used. As before, the actual compiling function is run twice, the first time to determine the amount of memory needed for the final compiled pattern. It @@ -187,7 +187,7 @@ META_CLASS_EMPTY [] empty class - only with PCRE2_ALLOW_EMPTY_CLASS META_CLASS_EMPTY_NOT [^] negative empty class - ditto META_CLASS_END ] end of non-empty class META_CLASS_NOT [^ start non-empty negative class -META_COMMIT (*COMMIT) +META_COMMIT (*COMMIT) - no argument (see below for with argument) META_COND_ASSERT (?(?assertion) META_DOLLAR $ metacharacter META_DOT . metacharacter @@ -201,14 +201,14 @@ META_NOCAPTURE (?: no capture parens META_PLUS + META_PLUS_PLUS ++ META_PLUS_QUERY +? -META_PRUNE (*PRUNE) - no argument +META_PRUNE (*PRUNE) - no argument (see below for with argument) META_QUERY ? META_QUERY_PLUS ?+ META_QUERY_QUERY ?? META_RANGE_ESCAPED hyphen in class range with at least one escape META_RANGE_LITERAL hyphen in class range defined literally -META_SKIP (*SKIP) - no argument -META_THEN (*THEN) - no argument +META_SKIP (*SKIP) - no argument (see below for with argument) +META_THEN (*THEN) - no argument (see below for with argument) The two RANGE values occur only in character classes. They are positioned between two literals that define the start and end of the range. In an EBCDIC @@ -229,7 +229,8 @@ If the data for META_ALT is non-zero, it is inside a lookbehind, and the data is the length of its branch, for which OP_REVERSE must be generated. META_BACKREF, META_CAPTURE, and META_RECURSE have the capture group number as -their data in the lower 16 bits of the element. +their data in the lower 16 bits of the element. META_RECURSE is followed by an +offset, for use in error messages. META_BACKREF is followed by an offset if the back reference group number is 10 or more. The offsets of the first ocurrences of references to groups whose @@ -238,8 +239,6 @@ occurrence is useful). On 64-bit systems this avoids using more than two parsed pattern elements for items such as \3. The offset is used when an error occurs because the reference is to a non-existent group. -META_RECURSE is always followed by an offset, for use in error messages. - META_ESCAPE has an ESC_xxx value as its data. For ESC_P and ESC_p, the next element contains the 16-bit type and data property values, packed together. ESC_g and ESC_k are used only for named references - numerical ones are turned @@ -291,9 +290,9 @@ META_LOOKBEHIND (?<= start of lookbehind META_LOOKBEHIND_NA (*naplb: start of non-atomic lookbehind META_LOOKBEHINDNOT (?