From aa8d7342da1b1eb75171b4827a6e18bd497661ba Mon Sep 17 00:00:00 2001 From: "Philip.Hazel" Date: Mon, 16 Mar 2015 15:38:26 +0000 Subject: [PATCH] Test binary zero in callout strings; change offset to PCRE2_SIZE; some documentation tidies. --- doc/pcre2callout.3 | 17 +++++++++-------- doc/pcre2test.1 | 20 +++++++++++++------- src/pcre2.h.in | 2 +- testdata/testinput2 | 5 +++++ testdata/testinput6 | 5 +++++ testdata/testoutput2 | 9 +++++++++ testdata/testoutput6 | 9 +++++++++ 7 files changed, 51 insertions(+), 16 deletions(-) diff --git a/doc/pcre2callout.3 b/doc/pcre2callout.3 index a485e1d..41ca8ab 100644 --- a/doc/pcre2callout.3 +++ b/doc/pcre2callout.3 @@ -1,4 +1,4 @@ -.TH PCRE2CALLOUT 3 "15 March 2015" "PCRE2 10.20" +.TH PCRE2CALLOUT 3 "16 March 2015" "PCRE2 10.20" .SH NAME PCRE2 - Perl-compatible regular expressions (revised API) .SH SYNOPSIS @@ -197,8 +197,8 @@ documentation). The callout block structure contains the following fields: PCRE2_SIZE \fIpattern_position\fP; PCRE2_SIZE \fInext_item_length\fP; PCRE2_SIZE \fIcallout_string_offset\fP; + PCRE2_SIZE \fIcallout_string_length\fP; PCRE2_SPTR \fIcallout_string\fP; - uint32_t \fIcallout_string_length\fP; .sp The \fIversion\fP field contains the version number of the block format. The @@ -225,11 +225,12 @@ For callouts with string arguments, \fIcallout_number\fP is always zero, and \fIcallout_string\fP points to the string that is contained within the compiled pattern. Its length is given by \fIcallout_string_length\fP. Duplicated ending delimiters that were present in the original pattern string have been turned -into single characters. An additional code unit containing binary zero is -present after the string, but is not included in the length. The delimiter that -was used to start the string is also stored within the pattern, immediately -before the string itself. You can therefore access this delimiter as -\fIcallout_string\fP[-1] if you need it. +into single characters, but there is no other processing of the callout string +argument. An additional code unit containing binary zero is present after the +string, but is not included in the length. The delimiter that was used to start +the string is also stored within the pattern, immediately before the string +itself. You can access this delimiter as \fIcallout_string\fP[-1] if you need +it. .P The \fIcallout_string_offset\fP field is the code unit offset to the start of the callout argument string within the original pattern string. This is @@ -327,6 +328,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 15 March 2015 +Last updated: 16 March 2015 Copyright (c) 1997-2015 University of Cambridge. .fi diff --git a/doc/pcre2test.1 b/doc/pcre2test.1 index a1fccb0..e5b3462 100644 --- a/doc/pcre2test.1 +++ b/doc/pcre2test.1 @@ -1,4 +1,4 @@ -.TH PCRE2TEST 1 "14 March 2015" "PCRE 10.20" +.TH PCRE2TEST 1 "16 March 2015" "PCRE 10.20" .SH NAME pcre2test - a program for testing Perl-compatible regular expressions. .SH SYNOPSIS @@ -61,11 +61,17 @@ names used in the libraries have a suffix _8, _16, or _32, as appropriate. .sp Input to \fBpcre2test\fP is processed line by line, either by calling the C library's \fBfgets()\fP function, or via the \fBlibreadline\fP library (see -below). In Unix-like environments, \fBfgets()\fP treats any bytes other than -newline as data characters. However, in some Windows environments character 26 -(hex 1A) causes an immediate end of file, and no further data is read. For -maximum portability, therefore, it is safest to avoid non-printing characters -in \fBpcre2test\fP input files. +below). The input is processed using using C's string functions, so must not +contain binary zeroes, even though in Unix-like environments, \fBfgets()\fP +treats any bytes other than newline as data characters. In some Windows +environments character 26 (hex 1A) causes an immediate end of file, and no +further data is read. +.P +For maximum portability, therefore, it is safest to avoid non-printing +characters in \fBpcre2test\fP input files. There is a facility for specifying a +pattern's characters as hexadecimal pairs, thus making it possible to include +binary zeroes in a pattern for testing purposes. Subject lines are processed +for backslash escapes, which makes it possible to include any data value. . . .SH "COMMAND LINE OPTIONS" @@ -1431,6 +1437,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 14 March 2015 +Last updated: 16 March 2015 Copyright (c) 1997-2015 University of Cambridge. .fi diff --git a/src/pcre2.h.in b/src/pcre2.h.in index 04b82c6..d73cdda 100644 --- a/src/pcre2.h.in +++ b/src/pcre2.h.in @@ -339,8 +339,8 @@ typedef struct pcre2_callout_block { \ PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \ /* ------------------- Added for Version 1 -------------------------- */ \ PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \ + PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \ PCRE2_SPTR callout_string; /* String compiled into pattern */ \ - uint32_t callout_string_length; /* Length of string compiled into pattern */ \ /* ------------------------------------------------------------------ */ \ } pcre2_callout_block; diff --git a/testdata/testinput2 b/testdata/testinput2 index 259625e..9adfd4b 100644 --- a/testdata/testinput2 +++ b/testdata/testinput2 @@ -4224,4 +4224,9 @@ a random value. /Ix /(?:a(?C`code`)){3}X/ aaaXY +# Binary zero in callout string +# a ( ? C ' x z ' ) b +/ 61 28 3f 43 27 78 00 7a 27 29 62/hex + abcdefgh + # End of testinput2 diff --git a/testdata/testinput6 b/testdata/testinput6 index be6205a..65796ca 100644 --- a/testdata/testinput6 +++ b/testdata/testinput6 @@ -4841,4 +4841,9 @@ /(?:a(?C`code`)){3}X/ aaaXY +# Binary zero in callout string +# a ( ? C ' x z ' ) b +/ 61 28 3f 43 27 78 00 7a 27 29 62/hex + abcdefgh + # End of testinput6 diff --git a/testdata/testoutput2 b/testdata/testoutput2 index 5e22343..357e0c7 100644 --- a/testdata/testoutput2 +++ b/testdata/testoutput2 @@ -14169,4 +14169,13 @@ Callout (8): `code` ^ ^ ) 0: aaaX +# Binary zero in callout string +# a ( ? C ' x z ' ) b +/ 61 28 3f 43 27 78 00 7a 27 29 62/hex + abcdefgh +Callout (5): 'x\x00z' +--->abcdefgh + ^^ b + 0: ab + # End of testinput2 diff --git a/testdata/testoutput6 b/testdata/testoutput6 index 1470d2c..d9244bd 100644 --- a/testdata/testoutput6 +++ b/testdata/testoutput6 @@ -7910,4 +7910,13 @@ Callout (8): `code` ^ ^ ) 0: aaaX +# Binary zero in callout string +# a ( ? C ' x z ' ) b +/ 61 28 3f 43 27 78 00 7a 27 29 62/hex + abcdefgh +Callout (5): 'x\x00z' +--->abcdefgh + ^^ b + 0: ab + # End of testinput6