Test binary zero in callout strings; change offset to PCRE2_SIZE; some

documentation tidies.
This commit is contained in:
Philip.Hazel 2015-03-16 15:38:26 +00:00
parent 2ec7cbf9b5
commit aa8d7342da
7 changed files with 51 additions and 16 deletions

View File

@ -1,4 +1,4 @@
.TH PCRE2CALLOUT 3 "15 March 2015" "PCRE2 10.20" .TH PCRE2CALLOUT 3 "16 March 2015" "PCRE2 10.20"
.SH NAME .SH NAME
PCRE2 - Perl-compatible regular expressions (revised API) PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS .SH SYNOPSIS
@ -197,8 +197,8 @@ documentation). The callout block structure contains the following fields:
PCRE2_SIZE \fIpattern_position\fP; PCRE2_SIZE \fIpattern_position\fP;
PCRE2_SIZE \fInext_item_length\fP; PCRE2_SIZE \fInext_item_length\fP;
PCRE2_SIZE \fIcallout_string_offset\fP; PCRE2_SIZE \fIcallout_string_offset\fP;
PCRE2_SIZE \fIcallout_string_length\fP;
PCRE2_SPTR \fIcallout_string\fP; PCRE2_SPTR \fIcallout_string\fP;
uint32_t \fIcallout_string_length\fP;
.sp .sp
The \fIversion\fP field contains the version number of the block format. The The \fIversion\fP field contains the version number of the block format. The
@ -225,11 +225,12 @@ For callouts with string arguments, \fIcallout_number\fP is always zero, and
\fIcallout_string\fP points to the string that is contained within the compiled \fIcallout_string\fP points to the string that is contained within the compiled
pattern. Its length is given by \fIcallout_string_length\fP. Duplicated ending pattern. Its length is given by \fIcallout_string_length\fP. Duplicated ending
delimiters that were present in the original pattern string have been turned delimiters that were present in the original pattern string have been turned
into single characters. An additional code unit containing binary zero is into single characters, but there is no other processing of the callout string
present after the string, but is not included in the length. The delimiter that argument. An additional code unit containing binary zero is present after the
was used to start the string is also stored within the pattern, immediately string, but is not included in the length. The delimiter that was used to start
before the string itself. You can therefore access this delimiter as the string is also stored within the pattern, immediately before the string
\fIcallout_string\fP[-1] if you need it. itself. You can access this delimiter as \fIcallout_string\fP[-1] if you need
it.
.P .P
The \fIcallout_string_offset\fP field is the code unit offset to the start of The \fIcallout_string_offset\fP field is the code unit offset to the start of
the callout argument string within the original pattern string. This is the callout argument string within the original pattern string. This is
@ -327,6 +328,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 15 March 2015 Last updated: 16 March 2015
Copyright (c) 1997-2015 University of Cambridge. Copyright (c) 1997-2015 University of Cambridge.
.fi .fi

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "14 March 2015" "PCRE 10.20" .TH PCRE2TEST 1 "16 March 2015" "PCRE 10.20"
.SH NAME .SH NAME
pcre2test - a program for testing Perl-compatible regular expressions. pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS .SH SYNOPSIS
@ -61,11 +61,17 @@ names used in the libraries have a suffix _8, _16, or _32, as appropriate.
.sp .sp
Input to \fBpcre2test\fP is processed line by line, either by calling the C Input to \fBpcre2test\fP is processed line by line, either by calling the C
library's \fBfgets()\fP function, or via the \fBlibreadline\fP library (see library's \fBfgets()\fP function, or via the \fBlibreadline\fP library (see
below). In Unix-like environments, \fBfgets()\fP treats any bytes other than below). The input is processed using using C's string functions, so must not
newline as data characters. However, in some Windows environments character 26 contain binary zeroes, even though in Unix-like environments, \fBfgets()\fP
(hex 1A) causes an immediate end of file, and no further data is read. For treats any bytes other than newline as data characters. In some Windows
maximum portability, therefore, it is safest to avoid non-printing characters environments character 26 (hex 1A) causes an immediate end of file, and no
in \fBpcre2test\fP input files. further data is read.
.P
For maximum portability, therefore, it is safest to avoid non-printing
characters in \fBpcre2test\fP input files. There is a facility for specifying a
pattern's characters as hexadecimal pairs, thus making it possible to include
binary zeroes in a pattern for testing purposes. Subject lines are processed
for backslash escapes, which makes it possible to include any data value.
. .
. .
.SH "COMMAND LINE OPTIONS" .SH "COMMAND LINE OPTIONS"
@ -1431,6 +1437,6 @@ Cambridge, England.
.rs .rs
.sp .sp
.nf .nf
Last updated: 14 March 2015 Last updated: 16 March 2015
Copyright (c) 1997-2015 University of Cambridge. Copyright (c) 1997-2015 University of Cambridge.
.fi .fi

View File

@ -339,8 +339,8 @@ typedef struct pcre2_callout_block { \
PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \ PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \
/* ------------------- Added for Version 1 -------------------------- */ \ /* ------------------- Added for Version 1 -------------------------- */ \
PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \ PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \
PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \
PCRE2_SPTR callout_string; /* String compiled into pattern */ \ PCRE2_SPTR callout_string; /* String compiled into pattern */ \
uint32_t callout_string_length; /* Length of string compiled into pattern */ \
/* ------------------------------------------------------------------ */ \ /* ------------------------------------------------------------------ */ \
} pcre2_callout_block; } pcre2_callout_block;

5
testdata/testinput2 vendored
View File

@ -4224,4 +4224,9 @@ a random value. /Ix
/(?:a(?C`code`)){3}X/ /(?:a(?C`code`)){3}X/
aaaXY aaaXY
# Binary zero in callout string
# a ( ? C ' x z ' ) b
/ 61 28 3f 43 27 78 00 7a 27 29 62/hex
abcdefgh
# End of testinput2 # End of testinput2

5
testdata/testinput6 vendored
View File

@ -4841,4 +4841,9 @@
/(?:a(?C`code`)){3}X/ /(?:a(?C`code`)){3}X/
aaaXY aaaXY
# Binary zero in callout string
# a ( ? C ' x z ' ) b
/ 61 28 3f 43 27 78 00 7a 27 29 62/hex
abcdefgh
# End of testinput6 # End of testinput6

View File

@ -14169,4 +14169,13 @@ Callout (8): `code`
^ ^ ) ^ ^ )
0: aaaX 0: aaaX
# Binary zero in callout string
# a ( ? C ' x z ' ) b
/ 61 28 3f 43 27 78 00 7a 27 29 62/hex
abcdefgh
Callout (5): 'x\x00z'
--->abcdefgh
^^ b
0: ab
# End of testinput2 # End of testinput2

View File

@ -7910,4 +7910,13 @@ Callout (8): `code`
^ ^ ) ^ ^ )
0: aaaX 0: aaaX
# Binary zero in callout string
# a ( ? C ' x z ' ) b
/ 61 28 3f 43 27 78 00 7a 27 29 62/hex
abcdefgh
Callout (5): 'x\x00z'
--->abcdefgh
^^ b
0: ab
# End of testinput6 # End of testinput6