Documentation correction.
This commit is contained in:
parent
52ba34a73c
commit
8fe95cf804
39
HACKING
39
HACKING
|
@ -263,10 +263,13 @@ of repeat make use of these opcodes:
|
|||
OP_POSUPTO OP_POSUPTOI
|
||||
OP_EXACT OP_EXACTI
|
||||
|
||||
Each of these is followed by a count and then the repeated character. OP_UPTO
|
||||
matches from 0 to the given number. A repeat with a non-zero minimum and a
|
||||
fixed maximum is coded as an OP_EXACT followed by an OP_UPTO (or OP_MINUPTO or
|
||||
OPT_POSUPTO).
|
||||
Each of these is followed by a count and then the repeated character. The count
|
||||
is two bytes long in 8-bit mode (most significant byte first), or one code unit
|
||||
in 16-bit and 32-bit modes.
|
||||
|
||||
OP_UPTO matches from 0 to the given number. A repeat with a non-zero minimum
|
||||
and a fixed maximum is coded as an OP_EXACT followed by an OP_UPTO (or
|
||||
OP_MINUPTO or OPT_POSUPTO).
|
||||
|
||||
Another set of matching repeating opcodes (called OP_NOTSTAR, OP_NOTSTARI,
|
||||
etc.) are used for repeated, negated, single-character classes such as [^a]*.
|
||||
|
@ -330,19 +333,21 @@ negative one. In either case, the opcode is followed by a 32-byte (16-short,
|
|||
bits are counted from the least significant end of each unit. In caseless mode,
|
||||
bits for both cases are set.
|
||||
|
||||
The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8/16/32
|
||||
mode, subject characters with values greater than 255 can be handled correctly.
|
||||
For OP_CLASS they do not match, whereas for OP_NCLASS they do.
|
||||
The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 and
|
||||
16-bit and 32-bit modes, subject characters with values greater than 255 can be
|
||||
handled correctly. For OP_CLASS they do not match, whereas for OP_NCLASS they
|
||||
do.
|
||||
|
||||
For classes containing characters with values greater than 255 or that contain
|
||||
\p or \P, OP_XCLASS is used. It optionally uses a bit map if any code points
|
||||
are less than 256, followed by a list of pairs (for a range) and single
|
||||
characters. In caseless mode, both cases are explicitly listed.
|
||||
\p or \P, OP_XCLASS is used. It optionally uses a bit map if any acceptable
|
||||
code points are less than 256, followed by a list of pairs (for a range) and
|
||||
single characters. In caseless mode, both cases are explicitly listed.
|
||||
|
||||
OP_XCLASS is followed by a code unit containing flag bits: XCL_NOT indicates
|
||||
that this is a negative class, and XCL_MAP indicates that a bit map is present.
|
||||
There follows the bit map, if XCL_MAP is set, and then a sequence of items
|
||||
coded as follows:
|
||||
OP_XCLASS is followed by a LINK_SIZE item containing the total length of the
|
||||
opcode and its data. This is followed by a code unit containing flag bits:
|
||||
XCL_NOT indicates that this is a negative class, and XCL_MAP indicates that a
|
||||
bit map is present. There follows the bit map, if XCL_MAP is set, and then a
|
||||
sequence of items coded as follows:
|
||||
|
||||
XCL_END marks the end of the list
|
||||
XCL_SINGLE one character follows
|
||||
|
@ -354,6 +359,10 @@ If a range starts with a code point less than 256 and ends with one greater
|
|||
than 256, it is split into two ranges, with characters less than 256 being
|
||||
indicated in the bit map, and the rest with XCL_RANGE.
|
||||
|
||||
When XCL_NOT is set, the bit map, if present, contains bits for characters that
|
||||
are allowed (exactly as for OP_NCLASS), but the list of items that follow it
|
||||
specifies characters and properties that are not allowed.
|
||||
|
||||
|
||||
Back references
|
||||
---------------
|
||||
|
@ -545,4 +554,4 @@ not a real opcode, but is used to check that tables indexed by opcode are the
|
|||
correct length, in order to catch updating errors.
|
||||
|
||||
Philip Hazel
|
||||
August 2014
|
||||
February 2015
|
||||
|
|
Loading…
Reference in New Issue