Documentation correction.

This commit is contained in:
Philip.Hazel 2015-02-20 09:38:54 +00:00
parent 52ba34a73c
commit 8fe95cf804
1 changed files with 99 additions and 90 deletions

39
HACKING
View File

@ -263,10 +263,13 @@ of repeat make use of these opcodes:
OP_POSUPTO OP_POSUPTOI OP_POSUPTO OP_POSUPTOI
OP_EXACT OP_EXACTI OP_EXACT OP_EXACTI
Each of these is followed by a count and then the repeated character. OP_UPTO Each of these is followed by a count and then the repeated character. The count
matches from 0 to the given number. A repeat with a non-zero minimum and a is two bytes long in 8-bit mode (most significant byte first), or one code unit
fixed maximum is coded as an OP_EXACT followed by an OP_UPTO (or OP_MINUPTO or in 16-bit and 32-bit modes.
OPT_POSUPTO).
OP_UPTO matches from 0 to the given number. A repeat with a non-zero minimum
and a fixed maximum is coded as an OP_EXACT followed by an OP_UPTO (or
OP_MINUPTO or OPT_POSUPTO).
Another set of matching repeating opcodes (called OP_NOTSTAR, OP_NOTSTARI, Another set of matching repeating opcodes (called OP_NOTSTAR, OP_NOTSTARI,
etc.) are used for repeated, negated, single-character classes such as [^a]*. etc.) are used for repeated, negated, single-character classes such as [^a]*.
@ -330,19 +333,21 @@ negative one. In either case, the opcode is followed by a 32-byte (16-short,
bits are counted from the least significant end of each unit. In caseless mode, bits are counted from the least significant end of each unit. In caseless mode,
bits for both cases are set. bits for both cases are set.
The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8/16/32 The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 and
mode, subject characters with values greater than 255 can be handled correctly. 16-bit and 32-bit modes, subject characters with values greater than 255 can be
For OP_CLASS they do not match, whereas for OP_NCLASS they do. handled correctly. For OP_CLASS they do not match, whereas for OP_NCLASS they
do.
For classes containing characters with values greater than 255 or that contain For classes containing characters with values greater than 255 or that contain
\p or \P, OP_XCLASS is used. It optionally uses a bit map if any code points \p or \P, OP_XCLASS is used. It optionally uses a bit map if any acceptable
are less than 256, followed by a list of pairs (for a range) and single code points are less than 256, followed by a list of pairs (for a range) and
characters. In caseless mode, both cases are explicitly listed. single characters. In caseless mode, both cases are explicitly listed.
OP_XCLASS is followed by a code unit containing flag bits: XCL_NOT indicates OP_XCLASS is followed by a LINK_SIZE item containing the total length of the
that this is a negative class, and XCL_MAP indicates that a bit map is present. opcode and its data. This is followed by a code unit containing flag bits:
There follows the bit map, if XCL_MAP is set, and then a sequence of items XCL_NOT indicates that this is a negative class, and XCL_MAP indicates that a
coded as follows: bit map is present. There follows the bit map, if XCL_MAP is set, and then a
sequence of items coded as follows:
XCL_END marks the end of the list XCL_END marks the end of the list
XCL_SINGLE one character follows XCL_SINGLE one character follows
@ -354,6 +359,10 @@ If a range starts with a code point less than 256 and ends with one greater
than 256, it is split into two ranges, with characters less than 256 being than 256, it is split into two ranges, with characters less than 256 being
indicated in the bit map, and the rest with XCL_RANGE. indicated in the bit map, and the rest with XCL_RANGE.
When XCL_NOT is set, the bit map, if present, contains bits for characters that
are allowed (exactly as for OP_NCLASS), but the list of items that follow it
specifies characters and properties that are not allowed.
Back references Back references
--------------- ---------------
@ -545,4 +554,4 @@ not a real opcode, but is used to check that tables indexed by opcode are the
correct length, in order to catch updating errors. correct length, in order to catch updating errors.
Philip Hazel Philip Hazel
August 2014 February 2015