Documentation correction.

2015-02-20 09:38:54 +00:00 · 2015-02-20 09:38:54 +00:00 · 8fe95cf804
parent 52ba34a73c
commit 8fe95cf804
1 changed files with 99 additions and 90 deletions
--- a/39
+++ b/39
@ -263,10 +263,13 @@ of repeat make use of these opcodes:
  OP_POSUPTO      OP_POSUPTOI
  OP_EXACT        OP_EXACTI

-Each of these is followed by a count and then the repeated character. OP_UPTO
-matches from 0 to the given number. A repeat with a non-zero minimum and a
-fixed maximum is coded as an OP_EXACT followed by an OP_UPTO (or OP_MINUPTO or
-OPT_POSUPTO).
+Each of these is followed by a count and then the repeated character. The count
+is two bytes long in 8-bit mode (most significant byte first), or one code unit
+in 16-bit and 32-bit modes.
+
+OP_UPTO matches from 0 to the given number. A repeat with a non-zero minimum
+and a fixed maximum is coded as an OP_EXACT followed by an OP_UPTO (or
+OP_MINUPTO or OPT_POSUPTO).

 Another set of matching repeating opcodes (called OP_NOTSTAR, OP_NOTSTARI,
 etc.) are used for repeated, negated, single-character classes such as [^a]*.
@ -330,19 +333,21 @@ negative one. In either case, the opcode is followed by a 32-byte (16-short,
 bits are counted from the least significant end of each unit. In caseless mode,
 bits for both cases are set.

-The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8/16/32
-mode, subject characters with values greater than 255 can be handled correctly.
-For OP_CLASS they do not match, whereas for OP_NCLASS they do.
+The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 and
+16-bit and 32-bit modes, subject characters with values greater than 255 can be
+handled correctly. For OP_CLASS they do not match, whereas for OP_NCLASS they
+do.

 For classes containing characters with values greater than 255 or that contain
-\p or \P, OP_XCLASS is used. It optionally uses a bit map if any code points
-are less than 256, followed by a list of pairs (for a range) and single
-characters. In caseless mode, both cases are explicitly listed.
+\p or \P, OP_XCLASS is used. It optionally uses a bit map if any acceptable
+code points are less than 256, followed by a list of pairs (for a range) and
+single characters. In caseless mode, both cases are explicitly listed.

-OP_XCLASS is followed by a code unit containing flag bits: XCL_NOT indicates
-that this is a negative class, and XCL_MAP indicates that a bit map is present.
-There follows the bit map, if XCL_MAP is set, and then a sequence of items
-coded as follows:
+OP_XCLASS is followed by a LINK_SIZE item containing the total length of the
+opcode and its data. This is followed by a code unit containing flag bits:
+XCL_NOT indicates that this is a negative class, and XCL_MAP indicates that a
+bit map is present. There follows the bit map, if XCL_MAP is set, and then a
+sequence of items coded as follows:

  XCL_END      marks the end of the list
  XCL_SINGLE   one character follows
@ -354,6 +359,10 @@ If a range starts with a code point less than 256 and ends with one greater
 than 256, it is split into two ranges, with characters less than 256 being
 indicated in the bit map, and the rest with XCL_RANGE.

+When XCL_NOT is set, the bit map, if present, contains bits for characters that
+are allowed (exactly as for OP_NCLASS), but the list of items that follow it
+specifies characters and properties that are not allowed.
+

 Back references
 ---------------
@ -545,4 +554,4 @@ not a real opcode, but is used to check that tables indexed by opcode are the
 correct length, in order to catch updating errors.

 Philip Hazel
-August 2014
+February 2015