Remove save/load from pcre2test, as they will not be implemented just yet (if

at all).
This commit is contained in:
Philip.Hazel 2014-08-12 17:41:11 +00:00
parent 803c38f004
commit d631f4025c
4 changed files with 26 additions and 349 deletions

View File

@ -1,4 +1,4 @@
.TH PCRE2TEST 1 "07 August 2014" "PCRE 10.00"
.TH PCRE2TEST 1 "12 August 2014" "PCRE 10.00"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@ -231,12 +231,6 @@ included in the library. This effect can also be obtained by the use of
\fB#pattern\fP; the difference is that \fB#forbid_utf\fP cannot be unset, and
the automatic options are not displayed in pattern information, to avoid
cluttering up test output.
.sp
#load <file name>
.sp
Load a pre-compiled pattern that has been saved in a file. This command must be
followed immediately by any subject lines that are to be matched by the
pattern.
.sp
#pattern <modifier-list>
.sp
@ -435,7 +429,6 @@ about the pattern:
bsr=[anycrlf|unicode] specify \eR handling
/B bincode show binary code without lengths
debug same as info,fullbincode
flipbytes flip endianness
fullbincode show binary code with lengths
/I info show info about compiled pattern
hex pattern is coded in hexadecimal
@ -446,7 +439,6 @@ about the pattern:
parens_nest_limit=<n> set maximum parentheses depth
perlcompat lock out non-Perl modifiers
posix use the POSIX API
save=<file name> save compiled pattern
stackguard=<number> test the stackguard feature
tables=[0|1|2] select internal tables
use_length use the pattern's length
@ -492,18 +484,6 @@ The \fBinfo\fP modifier requests information about the compiled pattern
information is obtained from the \fBpcre2_pattern_info()\fP function.
.
.
.SS "Changing byte order"
.rs
.sp
The \fBflipbytes\fP modifier causes \fBpcre2test\fP to flip the byte order of
the 2-byte and 4-byte fields in the compiled pattern. This facility is for
testing the feature that allows PCRE2 to use patterns that were compiled on a
host with a different endianness. This feature is not available when the POSIX
interface is being used, that is, when the \fBposix\fP pattern modifier is
specified. See also the section about saving and reloading compiled patterns
below.
.
.
.SS "Specifying a pattern in hex"
.rs
.sp
@ -1139,77 +1119,6 @@ characters.
.
.
.
.SH "SAVING AND RELOADING COMPILED PATTERNS"
.rs
.sp
FIXME FIXME
The facilities described in this section are not available when the POSIX
interface to PCRE is being used, that is, when the \fB/P\fP pattern modifier is
specified.
.P
When the POSIX interface is not in use, you can cause \fBpcre2test\fP to write a
compiled pattern to a file, by following the modifiers with > and a file name.
For example:
.sp
/pattern/im >/some/file
.sp
See the
.\" HREF
\fBpcreprecompile\fP
.\"
documentation for a discussion about saving and re-using compiled patterns.
Note that if the pattern was successfully studied with JIT optimization, the
JIT data cannot be saved.
.P
The data that is written is binary. The first eight bytes are the length of the
compiled pattern data followed by the length of the optional study data, each
written as four bytes in big-endian order (most significant byte first). If
there is no study data (either the pattern was not studied, or studying did not
return any data), the second length is zero. The lengths are followed by an
exact copy of the compiled pattern. If there is additional study data, this
(excluding any JIT data) follows immediately after the compiled pattern. After
writing the file, \fBpcre2test\fP expects to read a new pattern.
.P
A saved pattern can be reloaded into \fBpcre2test\fP by specifying < and a file
name instead of a pattern. There must be no space between < and the file name,
which must not contain a < character, as otherwise \fBpcre2test\fP will
interpret the line as a pattern delimited by < characters. For example:
.sp
re> </some/file
Compiled pattern loaded from /some/file
No study data
.sp
If the pattern was previously studied with the JIT optimization, the JIT
information cannot be saved and restored, and so is lost. When the pattern has
been loaded, \fBpcre2test\fP proceeds to read data lines in the usual way.
.P
You can copy a file written by \fBpcre2test\fP to a different host and reload it
there, even if the new host has opposite endianness to the one on which the
pattern was compiled. For example, you can compile on an i86 machine and run on
a SPARC machine. When a pattern is reloaded on a host with different
endianness, the confirmation message is changed to:
.sp
Compiled pattern (byte-inverted) loaded from /some/file
.sp
The test suite contains some saved pre-compiled patterns with different
endianness. These are reloaded using "<!" instead of just "<". This suppresses
the "(byte-inverted)" text so that the output is the same on all hosts. It also
forces debugging output once the pattern has been reloaded.
.P
File names for saving and reloading can be absolute or relative, but note that
the shell facility of expanding a file name that starts with a tilde (~) is not
available.
.P
The ability to save and reload files in \fBpcre2test\fP is intended for testing
and experimentation. It is not intended for production use because only a
single pattern can be written to a file. Furthermore, there is no facility for
supplying custom character tables for use with a reloaded pattern. If the
original pattern was compiled with custom tables, an attempt to match a subject
string using a reloaded pattern is likely to cause \fBpcre2test\fP to crash.
Finally, if you attempt to load a file that is not in the correct format, the
result is undefined.
.
.
.SH "SEE ALSO"
.rs
.sp
@ -1233,6 +1142,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
Last updated: 07 August 2014
Last updated: 12 August 2014
Copyright (c) 1997-2014 University of Cambridge.
.fi

View File

@ -42,13 +42,6 @@ POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* FIXME: These are the as-yet-unimplemented features:
. save code and #load
. JIT - compile, time, verify
. memory handling testing
*/
/* This program supports testing of the 8-bit, 16-bit, and 32-bit PCRE2
libraries in a single program, though its input and output are always 8-bit.
@ -325,29 +318,28 @@ enum { MOD_CTB, /* Applies to a compile or a match context */
/* Control bits. Some apply to compiling, some to matching, but some can be set
either on a pattern or a data line, so they must all be distinct. */
#define CTL_AFTERTEXT 0x00000001
#define CTL_ALLAFTERTEXT 0x00000002
#define CTL_ALLCAPTURES 0x00000004
#define CTL_ALTGLOBAL 0x00000008
#define CTL_BINCODE 0x00000010
#define CTL_CALLOUT_CAPTURE 0x00000020
#define CTL_CALLOUT_NONE 0x00000040
#define CTL_DFA 0x00000080
#define CTL_FINDLIMITS 0x00000100
#define CTL_FLIPBYTES 0x00000200
#define CTL_FULLBINCODE 0x00000400
#define CTL_GETALL 0x00000800
#define CTL_GLOBAL 0x00001000
#define CTL_HEXPAT 0x00002000
#define CTL_INFO 0x00004000
#define CTL_JITVERIFY 0x00008000
#define CTL_MARK 0x00010000
#define CTL_MEMORY 0x00020000
#define CTL_PATLEN 0x00040000
#define CTL_POSIX 0x00080000
#define CTL_AFTERTEXT 0x00000001u
#define CTL_ALLAFTERTEXT 0x00000002u
#define CTL_ALLCAPTURES 0x00000004u
#define CTL_ALTGLOBAL 0x00000008u
#define CTL_BINCODE 0x00000010u
#define CTL_CALLOUT_CAPTURE 0x00000020u
#define CTL_CALLOUT_NONE 0x00000040u
#define CTL_DFA 0x00000080u
#define CTL_FINDLIMITS 0x00000100u
#define CTL_FULLBINCODE 0x00000200u
#define CTL_GETALL 0x00000400u
#define CTL_GLOBAL 0x00000800u
#define CTL_HEXPAT 0x00001000u
#define CTL_INFO 0x00002000u
#define CTL_JITVERIFY 0x00004000u
#define CTL_MARK 0x00008000u
#define CTL_MEMORY 0x00010000u
#define CTL_PATLEN 0x00020000u
#define CTL_POSIX 0x00040000u
#define CTL_BSR_SET 0x00100000 /* This is informational */
#define CTL_NL_SET 0x00200000 /* This is informational */
#define CTL_BSR_SET 0x00080000u /* This is informational */
#define CTL_NL_SET 0x00100000u /* This is informational */
#define CTL_DEBUG (CTL_FULLBINCODE|CTL_INFO) /* For setting */
#define CTL_ANYINFO (CTL_DEBUG|CTL_BINCODE) /* For testing */
@ -367,7 +359,6 @@ typedef struct patctl { /* Structure for pattern modifiers. */
uint32_t stackguard_test;
uint32_t tables_id;
uint8_t locale[32];
uint8_t save[64];
} patctl;
#define MAXCPYGET 10
@ -440,7 +431,6 @@ static modstruct modlist[] = {
{ "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) },
{ "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
{ "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
{ "flipbytes", MOD_PAT, MOD_CTL, CTL_FLIPBYTES, PO(control) },
{ "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) },
{ "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) },
{ "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) },
@ -476,7 +466,6 @@ static modstruct modlist[] = {
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
{ "save", MOD_PAT, MOD_STR, 0, PO(save) },
{ "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
{ "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
{ "ucp", MOD_PATP, MOD_OPT, PCRE2_UCP, PO(options) },
@ -2870,14 +2859,13 @@ Returns: nothing
static void
show_compile_controls(uint32_t controls, const char *before, const char *after)
{
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
((controls & CTL_ALLCAPTURES) != 0)? " allcaptures" : "",
((controls & CTL_ALTGLOBAL) != 0)? " altglobal" : "",
((controls & CTL_BINCODE) != 0)? " bincode" : "",
((controls & CTL_FLIPBYTES) != 0)? " flipbytes" : "",
((controls & CTL_FULLBINCODE) != 0)? " fullbincode" : "",
((controls & CTL_GLOBAL) != 0)? " global" : "",
((controls & CTL_HEXPAT) != 0)? " hex" : "",
@ -2997,8 +2985,8 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s",
* Show information about a pattern *
*************************************************/
/* This function is called after a pattern has been compiled or loaded from a
file, if any of the information-requesting controls have been set.
/* This function is called after a pattern has been compiled if any of the
information-requesting controls have been set.
Arguments: none
@ -3290,149 +3278,6 @@ else if (strncmp((char *)buffer, "#subject", 8) == 0 && isspace(buffer[8]))
{
(void)decode_modifiers(buffer + 8, CTX_DEFDAT, NULL, &def_datctl);
}
else if (strncmp((char *)buffer, "#load", 5) == 0 && isspace(buffer[5]))
{
/* FIXME */
fprintf(outfile, "** #load not yet implemented\n");
return PR_ABEND;
#ifdef FIXME
/* See if the pattern is to be loaded pre-compiled from a file. */
if (*p == '<' && strchr((char *)(p+1), '<') == NULL)
{
uint32_t magic;
uint8_t sbuf[8];
FILE *f;
p++;
if (*p == '!')
{
do_debug = TRUE;
do_showinfo = TRUE;
p++;
}
pp = p + (int)strlen((char *)p);
while (isspace(pp[-1])) pp--;
*pp = 0;
f = fopen((char *)p, "rb");
if (f == NULL)
{
fprintf(outfile, "Failed to open %s: %s\n", p, strerror(errno));
continue;
}
if (fread(sbuf, 1, 8, f) != 8) goto FAIL_READ;
true_size =
(sbuf[0] << 24) | (sbuf[1] << 16) | (sbuf[2] << 8) | sbuf[3];
true_study_size =
(sbuf[4] << 24) | (sbuf[5] << 16) | (sbuf[6] << 8) | sbuf[7];
re = (pcre *)new_malloc(true_size);
if (re == NULL)
{
printf("** Failed to get %d bytes of memory for pcre object\n",
(int)true_size);
yield = 1;
goto EXIT;
}
if (fread(re, 1, true_size, f) != true_size) goto FAIL_READ;
magic = REAL_PCRE_MAGIC(re);
if (magic != MAGIC_NUMBER)
{
if (swap_uint32(magic) == MAGIC_NUMBER)
{
do_flip = 1;
}
else
{
fprintf(outfile, "Data in %s is not a compiled PCRE regex\n", p);
new_free(re);
fclose(f);
continue;
}
}
/* We hide the byte-invert info for little and big endian tests. */
fprintf(outfile, "Compiled pattern%s loaded from %s\n",
do_flip && (p[-1] == '<') ? " (byte-inverted)" : "", p);
/* Now see if there is any following study data. */
if (true_study_size != 0)
{
pcre_study_data *psd;
extra = (pcre_extra *)new_malloc(sizeof(pcre_extra) + true_study_size);
extra->flags = PCRE_EXTRA_STUDY_DATA;
psd = (pcre_study_data *)(((char *)extra) + sizeof(pcre_extra));
extra->study_data = psd;
if (fread(psd, 1, true_study_size, f) != true_study_size)
{
FAIL_READ:
fprintf(outfile, "Failed to read data from %s\n", p);
if (extra != NULL)
{
PCRE_FREE_STUDY(extra);
}
new_free(re);
fclose(f);
continue;
}
fprintf(outfile, "Study data loaded from %s\n", p);
do_study = 1; /* To get the data output if requested */
}
else fprintf(outfile, "No study data\n");
/* Flip the necessary bytes. */
if (do_flip)
{
int rc;
PCRE_PATTERN_TO_HOST_BYTE_ORDER(rc, re, extra, NULL);
if (rc == PCRE_ERROR_BADMODE)
{
uint32_t flags_in_host_byte_order;
if (REAL_PCRE_MAGIC(re) == MAGIC_NUMBER)
flags_in_host_byte_order = REAL_PCRE_FLAGS(re);
else
flags_in_host_byte_order = swap_uint32(REAL_PCRE_FLAGS(re));
/* Simulate the result of the function call below. */
fprintf(outfile, "Error %d from pcre%s_fullinfo(%d)\n", rc,
test_mode == PCRE32_MODE ? "32" : test_mode == PCRE16_MODE ? "16" : "",
PCRE_INFO_OPTIONS);
fprintf(outfile, "Running in %d-bit mode but pattern was compiled in "
"%d-bit mode\n", test_mode, 8 * (flags_in_host_byte_order & test_mode_MASK));
new_free(re);
fclose(f);
continue;
}
}
/* Need to know if UTF-8 for printing data strings. */
if (new_info(re, NULL, PCRE_INFO_OPTIONS, &get_options) < 0)
{
new_free(re);
fclose(f);
continue;
}
use_utf = (get_options & PCRE_UTF8) != 0;
fclose(f);
goto SHOW_INFO;
}
#endif /* FIXME */
}
else
{
fprintf(outfile, "** Unknown command: %s", buffer);
@ -3619,7 +3464,6 @@ if ((pat_patctl.control & CTL_POSIX) != 0)
if (pat_patctl.stackguard_test != 0) prmsg(&msg, "stackguard");
if (timeit > 0) prmsg(&msg, "timing");
if (pat_patctl.jit != 0) prmsg(&msg, "JIT");
if (pat_patctl.save[0] != 0) prmsg(&msg, "save");
if ((pat_patctl.options & ~POSIX_SUPPORTED_COMPILE_OPTIONS) != 0)
{
@ -3779,74 +3623,6 @@ if ((pat_patctl.control & CTL_ANYINFO) != 0)
if (rc != PR_OK) return rc;
}
#ifdef FIXME
/* If the '>' option was present, we write out the regex to a file, and
that is all. The first 8 bytes of the file are the regex length and then
the study length, in big-endian order. */
if (to_file != NULL)
{
FILE *f = fopen((char *)to_file, "wb");
if (f == NULL)
{
fprintf(outfile, "Unable to open %s: %s\n", to_file, strerror(errno));
}
else
{
uint8_t sbuf[8];
/* Extract the size for possible writing before possibly flipping it,
and remember the store that was got. */
true_size = REAL_PCRE_SIZE(re);
if (do_flip) regexflip(re, extra);
sbuf[0] = (uint8_t)((true_size >> 24) & 255);
sbuf[1] = (uint8_t)((true_size >> 16) & 255);
sbuf[2] = (uint8_t)((true_size >> 8) & 255);
sbuf[3] = (uint8_t)((true_size) & 255);
sbuf[4] = (uint8_t)((true_study_size >> 24) & 255);
sbuf[5] = (uint8_t)((true_study_size >> 16) & 255);
sbuf[6] = (uint8_t)((true_study_size >> 8) & 255);
sbuf[7] = (uint8_t)((true_study_size) & 255);
if (fwrite(sbuf, 1, 8, f) < 8 ||
fwrite(re, 1, true_size, f) < true_size)
{
fprintf(outfile, "Write error on %s: %s\n", to_file, strerror(errno));
}
else
{
fprintf(outfile, "Compiled pattern written to %s\n", to_file);
/* If there is study data, write it. */
if (extra != NULL)
{
if (fwrite(extra->study_data, 1, true_study_size, f) <
true_study_size)
{
fprintf(outfile, "Write error on %s: %s\n", to_file,
strerror(errno));
}
else fprintf(outfile, "Study data written to %s\n", to_file);
}
}
fclose(f);
}
new_free(re);
if (extra != NULL)
{
PCRE_FREE_STUDY(extra);
}
continue; /* With next regex */
}
#endif /* FIXME */
return PR_OK;
}

2
testdata/testinput2 vendored
View File

@ -1391,8 +1391,6 @@
"<(\w+)/?>(.)*</(\1)>"Igms
<!DOCTYPE seite SYSTEM "http://www.lco.lineas.de/xmlCms.dtd">\n<seite>\n<dokumenteninformation>\n<seitentitel>Partner der LCO</seitentitel>\n<sprache>de</sprache>\n<seitenbeschreibung>Partner der LINEAS Consulting\nGmbH</seitenbeschreibung>\n<schluesselworte>LINEAS Consulting GmbH Hamburg\nPartnerfirmen</schluesselworte>\n<revisit>30 days</revisit>\n<robots>index,follow</robots>\n<menueinformation>\n<aktiv>ja</aktiv>\n<menueposition>3</menueposition>\n<menuetext>Partner</menuetext>\n</menueinformation>\n<lastedited>\n<autor>LCO</autor>\n<firma>LINEAS Consulting</firma>\n<datum>15.10.2003</datum>\n</lastedited>\n</dokumenteninformation>\n<inhalt>\n\n<absatzueberschrift>Die Partnerfirmen der LINEAS Consulting\nGmbH</absatzueberschrift>\n\n<absatz><link ziel="http://www.ca.com/" zielfenster="_blank">\n<bild name="logo_ca.gif" rahmen="no"/></link> <link\nziel="http://www.ey.com/" zielfenster="_blank"><bild\nname="logo_euy.gif" rahmen="no"/></link>\n</absatz>\n\n<absatz><link ziel="http://www.cisco.de/" zielfenster="_blank">\n<bild name="logo_cisco.gif" rahmen="ja"/></link></absatz>\n\n<absatz><link ziel="http://www.atelion.de/"\nzielfenster="_blank"><bild\nname="logo_atelion.gif" rahmen="no"/></link>\n</absatz>\n\n<absatz><link ziel="http://www.line-information.de/"\nzielfenster="_blank">\n<bild name="logo_line_information.gif" rahmen="no"/></link>\n</absatz>\n\n<absatz><bild name="logo_aw.gif" rahmen="no"/></absatz>\n\n<absatz><link ziel="http://www.incognis.de/"\nzielfenster="_blank"><bild\nname="logo_incognis.gif" rahmen="no"/></link></absatz>\n\n<absatz><link ziel="http://www.addcraft.com/"\nzielfenster="_blank"><bild\nname="logo_addcraft.gif" rahmen="no"/></link></absatz>\n\n<absatz><link ziel="http://www.comendo.com/"\nzielfenster="_blank"><bild\nname="logo_comendo.gif" rahmen="no"/></link></absatz>\n\n</inhalt>\n</seite>\=jitstack=1024
/^a/I,flipbytes
/line\nbreak/I
this is a line\nbreak
line one\nthis is a line\nbreak in the second line

View File

@ -5373,12 +5373,6 @@ Subject length lower bound = 7
2: \x0a
3: seite
/^a/I,flipbytes
Capturing subpattern count = 0
Compile options: <none>
Overall options: anchored
Subject length lower bound = 1
/line\nbreak/I
Capturing subpattern count = 0
Contains explicit CR or LF match