Documentation correction.

This commit is contained in:
Philip.Hazel 2015-02-20 09:38:54 +00:00
parent 52ba34a73c
commit 8fe95cf804
1 changed files with 99 additions and 90 deletions

39
HACKING
View File

@ -263,10 +263,13 @@ of repeat make use of these opcodes:
OP_POSUPTO OP_POSUPTOI
OP_EXACT OP_EXACTI
Each of these is followed by a count and then the repeated character. OP_UPTO
matches from 0 to the given number. A repeat with a non-zero minimum and a
fixed maximum is coded as an OP_EXACT followed by an OP_UPTO (or OP_MINUPTO or
OPT_POSUPTO).
Each of these is followed by a count and then the repeated character. The count
is two bytes long in 8-bit mode (most significant byte first), or one code unit
in 16-bit and 32-bit modes.
OP_UPTO matches from 0 to the given number. A repeat with a non-zero minimum
and a fixed maximum is coded as an OP_EXACT followed by an OP_UPTO (or
OP_MINUPTO or OPT_POSUPTO).
Another set of matching repeating opcodes (called OP_NOTSTAR, OP_NOTSTARI,
etc.) are used for repeated, negated, single-character classes such as [^a]*.
@ -330,19 +333,21 @@ negative one. In either case, the opcode is followed by a 32-byte (16-short,
bits are counted from the least significant end of each unit. In caseless mode,
bits for both cases are set.
The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8/16/32
mode, subject characters with values greater than 255 can be handled correctly.
For OP_CLASS they do not match, whereas for OP_NCLASS they do.
The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 and
16-bit and 32-bit modes, subject characters with values greater than 255 can be
handled correctly. For OP_CLASS they do not match, whereas for OP_NCLASS they
do.
For classes containing characters with values greater than 255 or that contain
\p or \P, OP_XCLASS is used. It optionally uses a bit map if any code points
are less than 256, followed by a list of pairs (for a range) and single
characters. In caseless mode, both cases are explicitly listed.
\p or \P, OP_XCLASS is used. It optionally uses a bit map if any acceptable
code points are less than 256, followed by a list of pairs (for a range) and
single characters. In caseless mode, both cases are explicitly listed.
OP_XCLASS is followed by a code unit containing flag bits: XCL_NOT indicates
that this is a negative class, and XCL_MAP indicates that a bit map is present.
There follows the bit map, if XCL_MAP is set, and then a sequence of items
coded as follows:
OP_XCLASS is followed by a LINK_SIZE item containing the total length of the
opcode and its data. This is followed by a code unit containing flag bits:
XCL_NOT indicates that this is a negative class, and XCL_MAP indicates that a
bit map is present. There follows the bit map, if XCL_MAP is set, and then a
sequence of items coded as follows:
XCL_END marks the end of the list
XCL_SINGLE one character follows
@ -354,6 +359,10 @@ If a range starts with a code point less than 256 and ends with one greater
than 256, it is split into two ranges, with characters less than 256 being
indicated in the bit map, and the rest with XCL_RANGE.
When XCL_NOT is set, the bit map, if present, contains bits for characters that
are allowed (exactly as for OP_NCLASS), but the list of items that follow it
specifies characters and properties that are not allowed.
Back references
---------------
@ -545,4 +554,4 @@ not a real opcode, but is used to check that tables indexed by opcode are the
correct length, in order to catch updating errors.
Philip Hazel
August 2014
February 2015