Convert pcre2grep to use new pcre2_compile() options, thereby fixing two minor

(?) bugs.
This commit is contained in:
Philip.Hazel 2017-06-17 11:32:06 +00:00
parent 69eab9cfe7
commit 76a57bd839
9 changed files with 232 additions and 184 deletions

View File

@ -192,6 +192,12 @@ pattern lines.
42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit
of pcre2grep.
43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL,
PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs:
(a) The -F option did not work for fixed strings containing \E.
(b) The -w option did not work for patterns with multiple branches.
Version 10.23 14-February-2017
------------------------------

View File

@ -602,6 +602,19 @@ echo "---------------------------- Test 120 ------------------------------" >>te
(cd $srcdir; $valgrind $vjs $pcre2grep -HO '$0:$2$1$3' '(\w+) binary (\w+)(\.)?' ./testdata/grepinput) >>testtrygrep
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 121 -----------------------------" >>testtrygrep
(cd $srcdir; $valgrind $vjs $pcre2grep -F '\E and (regex)' testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 122 -----------------------------" >>testtrygrep
(cd $srcdir; $valgrind $vjs $pcre2grep -w 'cat|dog' testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 122 -----------------------------" >>testtrygrep
(cd $srcdir; $valgrind $vjs $pcre2grep -w 'dog|cat' testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep
# Now compare the results.
$cf $srcdir/testdata/grepoutput testtrygrep

View File

@ -740,20 +740,21 @@ the patterns are the ones that are found.
</P>
<P>
<b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
Force the patterns to match only whole words. This is equivalent to having \b
at the start and end of the pattern. This option applies only to the patterns
that are matched against the contents of files; it does not apply to patterns
specified by any of the <b>--include</b> or <b>--exclude</b> options.
Force the patterns only to match "words". That is, there must be a word
boundary at the start and end of each matched string. This is equivalent to
having "\b(?:" at the start of each pattern, and ")\b" at the end. This
option applies only to the patterns that are matched against the contents of
files; it does not apply to patterns specified by any of the <b>--include</b> or
<b>--exclude</b> options.
</P>
<P>
<b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
Force the patterns to be anchored (each must start matching at the beginning of
a line) and in addition, require them to match entire lines. In multiline mode
the match may be more than one line. This is equivalent to having \A and \Z
characters at the start and end of each alternative top-level branch in every
pattern. This option applies only to the patterns that are matched against the
contents of files; it does not apply to patterns specified by any of the
<b>--include</b> or <b>--exclude</b> options.
Force the patterns to start matching only at the beginnings of lines, and in
addition, require them to match entire lines. In multiline mode the match may
be more than one line. This is equivalent to having "^(?:" at the start of each
pattern and ")$" at the end. This option applies only to the patterns that are
matched against the contents of files; it does not apply to patterns specified
by any of the <b>--include</b> or <b>--exclude</b> options.
</P>
<br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
<P>
@ -936,7 +937,7 @@ Cambridge, England.
</P>
<br><a name="SEC15" href="#TOC1">REVISION</a><br>
<P>
Last updated: 26 May 2017
Last updated: 17 June 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -1,4 +1,4 @@
.TH PCRE2GREP 1 "26 May 2017" "PCRE2 10.30"
.TH PCRE2GREP 1 "17 June 2017" "PCRE2 10.30"
.SH NAME
pcre2grep - a grep with Perl-compatible regular expressions.
.SH SYNOPSIS
@ -639,19 +639,20 @@ Invert the sense of the match, so that lines which do \fInot\fP match any of
the patterns are the ones that are found.
.TP
\fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP
Force the patterns to match only whole words. This is equivalent to having \eb
at the start and end of the pattern. This option applies only to the patterns
that are matched against the contents of files; it does not apply to patterns
specified by any of the \fB--include\fP or \fB--exclude\fP options.
Force the patterns only to match "words". That is, there must be a word
boundary at the start and end of each matched string. This is equivalent to
having "\eb(?:" at the start of each pattern, and ")\eb" at the end. This
option applies only to the patterns that are matched against the contents of
files; it does not apply to patterns specified by any of the \fB--include\fP or
\fB--exclude\fP options.
.TP
\fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP
Force the patterns to be anchored (each must start matching at the beginning of
a line) and in addition, require them to match entire lines. In multiline mode
the match may be more than one line. This is equivalent to having \eA and \eZ
characters at the start and end of each alternative top-level branch in every
pattern. This option applies only to the patterns that are matched against the
contents of files; it does not apply to patterns specified by any of the
\fB--include\fP or \fB--exclude\fP options.
Force the patterns to start matching only at the beginnings of lines, and in
addition, require them to match entire lines. In multiline mode the match may
be more than one line. This is equivalent to having "^(?:" at the start of each
pattern and ")$" at the end. This option applies only to the patterns that are
matched against the contents of files; it does not apply to patterns specified
by any of the \fB--include\fP or \fB--exclude\fP options.
.
.
.SH "ENVIRONMENT VARIABLES"
@ -850,6 +851,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 26 May 2017
Last updated: 17 June 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -718,22 +718,23 @@ OPTIONS
match any of the patterns are the ones that are found.
-w, --word-regex, --word-regexp
Force the patterns to match only whole words. This is equiva-
lent to having \b at the start and end of the pattern. This
option applies only to the patterns that are matched against
the contents of files; it does not apply to patterns speci-
fied by any of the --include or --exclude options.
Force the patterns only to match "words". That is, there must
be a word boundary at the start and end of each matched
string. This is equivalent to having "\b(?:" at the start of
each pattern, and ")\b" at the end. This option applies only
to the patterns that are matched against the contents of
files; it does not apply to patterns specified by any of the
--include or --exclude options.
-x, --line-regex, --line-regexp
Force the patterns to be anchored (each must start matching
at the beginning of a line) and in addition, require them to
match entire lines. In multiline mode the match may be more
than one line. This is equivalent to having \A and \Z charac-
ters at the start and end of each alternative top-level
branch in every pattern. This option applies only to the pat-
terns that are matched against the contents of files; it does
not apply to patterns specified by any of the --include or
--exclude options.
Force the patterns to start matching only at the beginnings
of lines, and in addition, require them to match entire
lines. In multiline mode the match may be more than one line.
This is equivalent to having "^(?:" at the start of each pat-
tern and ")$" at the end. This option applies only to the
patterns that are matched against the contents of files; it
does not apply to patterns specified by any of the --include
or --exclude options.
ENVIRONMENT VARIABLES
@ -917,5 +918,5 @@ AUTHOR
REVISION
Last updated: 26 May 2017
Last updated: 17 June 2017
Copyright (c) 1997-2017 University of Cambridge.

View File

@ -103,7 +103,8 @@ typedef int BOOL;
#define MAXPATLEN 8192
#endif
#define PATBUFSIZE (MAXPATLEN + 10) /* Allows for prefix+suffix */
#define FNBUFSIZ 1024
#define ERRBUFSIZ 256
/* Values for the "filenames" variable, which specifies options for file name
output. The order is important; it is assumed that a file name is wanted for
@ -211,7 +212,7 @@ static BOOL use_jit = FALSE;
static const uint8_t *character_tables = NULL;
static uint32_t pcre2_options = 0;
static uint32_t process_options = 0;
static uint32_t extra_options = 0;
static PCRE2_SIZE heap_limit = PCRE2_UNSET;
static uint32_t match_limit = 0;
static uint32_t depth_limit = 0;
@ -441,19 +442,6 @@ of PCRE2_NEWLINE_xx in pcre2.h. */
static const char *newlines[] = {
"DEFAULT", "CR", "LF", "CRLF", "ANY", "ANYCRLF", "NUL" };
/* Tables for prefixing and suffixing patterns, according to the -w, -x, and -F
options. These set the 1, 2, and 4 bits in process_options, respectively. Note
that the combination of -w and -x has the same effect as -x on its own, so we
can treat them as the same. Note that the MAXPATLEN macro assumes the longest
prefix+suffix is 10 characters; if anything longer is added, it must be
adjusted. */
static const char *prefix[] = {
"", "\\b", "^(?:", "^(?:", "\\Q", "\\b\\Q", "^(?:\\Q", "^(?:\\Q" };
static const char *suffix[] = {
"", "\\b", ")$", ")$", "\\E", "\\E\\b", "\\E)$", "\\E)$" };
/* UTF-8 tables - used only when the newline setting is "any". */
const int utf8_table3[] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
@ -3224,7 +3212,7 @@ switch(letter)
case N_NOJIT: use_jit = FALSE; break;
case 'a': binary_files = BIN_TEXT; break;
case 'c': count_only = TRUE; break;
case 'F': process_options |= PO_FIXED_STRINGS; break;
case 'F': options |= PCRE2_LITERAL; break;
case 'H': filenames = FN_FORCE; break;
case 'I': binary_files = BIN_NOMATCH; break;
case 'h': filenames = FN_NONE; break;
@ -3245,8 +3233,8 @@ switch(letter)
case 't': show_total_count = TRUE; break;
case 'u': options |= PCRE2_UTF; utf = TRUE; break;
case 'v': invert = TRUE; break;
case 'w': process_options |= PO_WORD_MATCH; break;
case 'x': process_options |= PO_LINE_MATCH; break;
case 'w': extra_options |= PCRE2_EXTRA_MATCH_WORD; break;
case 'x': extra_options |= PCRE2_EXTRA_MATCH_LINE; break;
case 'V':
{
@ -3309,7 +3297,6 @@ pattern chain.
Arguments:
p points to the pattern block
options the PCRE options
popts the processing options
fromfile TRUE if the pattern was read from a file
fromtext file name or identifying text (e.g. "include")
count 0 if this is the only command line pattern, or
@ -3320,18 +3307,20 @@ Returns: TRUE on success, FALSE after an error
*/
static BOOL
compile_pattern(patstr *p, int options, int popts, int fromfile,
const char *fromtext, int count)
compile_pattern(patstr *p, int options, int fromfile, const char *fromtext,
int count)
{
unsigned char buffer[PATBUFSIZE];
PCRE2_SIZE erroffset;
char *ps = p->string;
unsigned int patlen = strlen(ps);
char *ps;
int errcode;
PCRE2_SIZE patlen, erroffset;
PCRE2_UCHAR errmessbuffer[ERRBUFSIZ];
if (p->compiled != NULL) return TRUE;
if ((popts & PO_FIXED_STRINGS) != 0)
ps = p->string;
patlen = strlen(ps);
if ((options & PCRE2_LITERAL) != 0)
{
int ellength;
char *eop = ps + patlen;
@ -3344,8 +3333,7 @@ if ((popts & PO_FIXED_STRINGS) != 0)
}
}
sprintf((char *)buffer, "%s%.*s%s", prefix[popts], patlen, ps, suffix[popts]);
p->compiled = pcre2_compile(buffer, PCRE2_ZERO_TERMINATED, options, &errcode,
p->compiled = pcre2_compile((PCRE2_SPTR)ps, patlen, options, &errcode,
&erroffset, compile_context);
/* Handle successful compile. Try JIT-compiling if supported and enabled. We
@ -3362,23 +3350,22 @@ if (p->compiled != NULL)
/* Handle compile errors */
erroffset -= (int)strlen(prefix[popts]);
if (erroffset > patlen) erroffset = patlen;
pcre2_get_error_message(errcode, buffer, PATBUFSIZE);
pcre2_get_error_message(errcode, errmessbuffer, sizeof(errmessbuffer));
if (fromfile)
{
fprintf(stderr, "pcre2grep: Error in regex in line %d of %s "
"at offset %d: %s\n", count, fromtext, (int)erroffset, buffer);
"at offset %d: %s\n", count, fromtext, (int)erroffset, errmessbuffer);
}
else
{
if (count == 0)
fprintf(stderr, "pcre2grep: Error in %s regex at offset %d: %s\n",
fromtext, (int)erroffset, buffer);
fromtext, (int)erroffset, errmessbuffer);
else
fprintf(stderr, "pcre2grep: Error in %s %s regex at offset %d: %s\n",
ordin(count), fromtext, (int)erroffset, buffer);
ordin(count), fromtext, (int)erroffset, errmessbuffer);
}
return FALSE;
@ -3396,18 +3383,17 @@ Arguments:
name the name of the file; "-" is stdin
patptr pointer to the pattern chain anchor
patlastptr pointer to the last pattern pointer
popts the process options to pass to pattern_compile()
Returns: TRUE if all went well
*/
static BOOL
read_pattern_file(char *name, patstr **patptr, patstr **patlastptr, int popts)
read_pattern_file(char *name, patstr **patptr, patstr **patlastptr)
{
int linenumber = 0;
FILE *f;
const char *filename;
char buffer[PATBUFSIZE];
char buffer[MAXPATLEN+20];
if (strcmp(name, "-") == 0)
{
@ -3425,7 +3411,7 @@ else
filename = name;
}
while (fgets(buffer, PATBUFSIZE, f) != NULL)
while (fgets(buffer, sizeof(buffer), f) != NULL)
{
char *s = buffer + (int)strlen(buffer);
while (s > buffer && isspace((unsigned char)(s[-1]))) s--;
@ -3453,7 +3439,7 @@ while (fgets(buffer, PATBUFSIZE, f) != NULL)
for(;;)
{
if (!compile_pattern(*patlastptr, pcre2_options, popts, TRUE, filename,
if (!compile_pattern(*patlastptr, pcre2_options, TRUE, filename,
linenumber))
{
if (f != stdin) fclose(f);
@ -3978,6 +3964,10 @@ if (DEE_option != NULL)
}
}
/* Set the extra options */
(void)pcre2_set_compile_extra_options(compile_context, extra_options);
/* Check the values for Jeffrey Friedl's debugging options. */
#ifdef JFRIEDL_DEBUG
@ -4038,7 +4028,7 @@ chain, so we must not access the next pointer till after the compile. */
for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
{
if (!compile_pattern(cp, pcre2_options, process_options, FALSE, "command-line",
if (!compile_pattern(cp, pcre2_options, FALSE, "command-line",
(j == 1 && patterns->next == NULL)? 0 : j))
goto EXIT2;
}
@ -4047,29 +4037,11 @@ for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
for (fn = pattern_files; fn != NULL; fn = fn->next)
{
if (!read_pattern_file(fn->name, &patterns, &patterns_last, process_options))
goto EXIT2;
if (!read_pattern_file(fn->name, &patterns, &patterns_last)) goto EXIT2;
}
/* Unless JIT has been explicitly disabled, arrange a stack for it to use. */
#ifdef NEVER
#ifdef SUPPORT_PCRE2GREP_JIT
if (use_jit)
jit_stack = pcre2_jit_stack_create(32*1024, 1024*1024, NULL);
#endif
for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
{
#ifdef SUPPORT_PCRE2GREP_JIT
if (jit_stack != NULL && cp->compiled != NULL)
pcre2_jit_stack_assign(match_context, NULL, jit_stack);
#endif
}
#endif
#ifdef SUPPORT_PCRE2GREP_JIT
if (use_jit)
{
@ -4079,16 +4051,21 @@ if (use_jit)
}
#endif
/* -F, -w, and -x do not apply to include or exclude patterns, so we must
adjust the options. */
pcre2_options &= ~PCRE2_LITERAL;
(void)pcre2_set_compile_extra_options(compile_context, 0);
/* If there are include or exclude patterns read from the command line, compile
them. -F, -w, and -x do not apply, so the third argument of compile_pattern is
0. */
them. */
for (j = 0; j < 4; j++)
{
int k;
for (k = 1, cp = *(incexlist[j]); cp != NULL; k++, cp = cp->next)
{
if (!compile_pattern(cp, pcre2_options, 0, FALSE, incexname[j],
if (!compile_pattern(cp, pcre2_options, FALSE, incexname[j],
(k == 1 && cp->next == NULL)? 0 : k))
goto EXIT2;
}
@ -4098,13 +4075,13 @@ for (j = 0; j < 4; j++)
for (fn = include_from; fn != NULL; fn = fn->next)
{
if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last, 0))
if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last))
goto EXIT2;
}
for (fn = exclude_from; fn != NULL; fn = fn->next)
{
if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last, 0))
if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last))
goto EXIT2;
}
@ -4123,7 +4100,7 @@ read them line by line and search the given files. */
for (fn = file_lists; fn != NULL; fn = fn->next)
{
char buffer[PATBUFSIZE];
char buffer[FNBUFSIZ];
FILE *fl;
if (strcmp(fn->name, "-") == 0) fl = stdin; else
{
@ -4135,7 +4112,7 @@ for (fn = file_lists; fn != NULL; fn = fn->next)
goto EXIT2;
}
}
while (fgets(buffer, PATBUFSIZE, fl) != NULL)
while (fgets(buffer, sizeof(buffer), fl) != NULL)
{
int frc;
char *end = buffer + (int)strlen(buffer);

5
testdata/grepinputv vendored
View File

@ -2,3 +2,8 @@ The quick brown
fox jumps
over the lazy dog.
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate

32
testdata/grepoutput vendored
View File

@ -454,6 +454,11 @@ RC=1
---------------------------- Test 51 ------------------------------
over the lazy dog.
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate
RC=0
---------------------------- Test 52 ------------------------------
fox jumps
@ -788,32 +793,32 @@ RC=0
37216,12
RC=0
---------------------------- Test 113 -----------------------------
476
478
RC=0
---------------------------- Test 114 -----------------------------
testdata/grepinput:469
testdata/grepinput3:0
testdata/grepinput8:0
testdata/grepinputv:1
testdata/grepinputv:3
testdata/grepinputx:6
TOTAL:476
TOTAL:478
RC=0
---------------------------- Test 115 -----------------------------
testdata/grepinput:469
testdata/grepinputv:1
testdata/grepinputv:3
testdata/grepinputx:6
TOTAL:476
TOTAL:478
RC=0
---------------------------- Test 116 -----------------------------
476
478
RC=0
---------------------------- Test 117 -----------------------------
469
0
0
1
3
6
476
478
RC=0
---------------------------- Test 118 -----------------------------
testdata/grepinput3
@ -834,3 +839,14 @@ RC=0
./testdata/grepinput:a binary zero:zeroa
./testdata/grepinput:the binary zero.:zerothe.
RC=0
---------------------------- Test 121 -----------------------------
This line contains \E and (regex) *meta* [characters].
RC=0
---------------------------- Test 122 -----------------------------
over the lazy dog.
The word is cat in this line
RC=0
---------------------------- Test 122 -----------------------------
over the lazy dog.
The word is cat in this line
RC=0

28
testdata/grepoutputC vendored
View File

@ -1,14 +1,42 @@
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
Arg1: [T] [his] [s] Arg2: |T| () () (0)
Arg1: [T] [his] [s] Arg2: |T| () () (0)
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
The quick brown
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
Arg1: [qu] [qu]
Arg1: [ t] [ t]
Arg1: [ l] [ l]
Arg1: [wo] [wo]
Arg1: [ca] [ca]
Arg1: [sn] [sn]
The quick brown
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
0:T
The quick brown
0:T
This time it jumps and jumps and jumps.
0:T
This line contains \E and (regex) *meta* [characters].
0:T
The word is cat in this line
0:T
The caterpillar sat on the mat
0:T
The snowcat is not an animal
T
T
T
T
T
T