Convert pcre2grep to use new pcre2_compile() options, thereby fixing two minor

(?) bugs.
This commit is contained in:
Philip.Hazel 2017-06-17 11:32:06 +00:00
parent 69eab9cfe7
commit 76a57bd839
9 changed files with 232 additions and 184 deletions

View File

@ -192,6 +192,12 @@ pattern lines.
42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit
of pcre2grep.
43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL,
PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs:
(a) The -F option did not work for fixed strings containing \E.
(b) The -w option did not work for patterns with multiple branches.
Version 10.23 14-February-2017
------------------------------

View File

@ -602,6 +602,19 @@ echo "---------------------------- Test 120 ------------------------------" >>te
(cd $srcdir; $valgrind $vjs $pcre2grep -HO '$0:$2$1$3' '(\w+) binary (\w+)(\.)?' ./testdata/grepinput) >>testtrygrep
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 121 -----------------------------" >>testtrygrep
(cd $srcdir; $valgrind $vjs $pcre2grep -F '\E and (regex)' testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 122 -----------------------------" >>testtrygrep
(cd $srcdir; $valgrind $vjs $pcre2grep -w 'cat|dog' testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 122 -----------------------------" >>testtrygrep
(cd $srcdir; $valgrind $vjs $pcre2grep -w 'dog|cat' testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep
# Now compare the results.
$cf $srcdir/testdata/grepoutput testtrygrep

View File

@ -740,20 +740,21 @@ the patterns are the ones that are found.
</P>
<P>
<b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
Force the patterns to match only whole words. This is equivalent to having \b
at the start and end of the pattern. This option applies only to the patterns
that are matched against the contents of files; it does not apply to patterns
specified by any of the <b>--include</b> or <b>--exclude</b> options.
Force the patterns only to match "words". That is, there must be a word
boundary at the start and end of each matched string. This is equivalent to
having "\b(?:" at the start of each pattern, and ")\b" at the end. This
option applies only to the patterns that are matched against the contents of
files; it does not apply to patterns specified by any of the <b>--include</b> or
<b>--exclude</b> options.
</P>
<P>
<b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
Force the patterns to be anchored (each must start matching at the beginning of
a line) and in addition, require them to match entire lines. In multiline mode
the match may be more than one line. This is equivalent to having \A and \Z
characters at the start and end of each alternative top-level branch in every
pattern. This option applies only to the patterns that are matched against the
contents of files; it does not apply to patterns specified by any of the
<b>--include</b> or <b>--exclude</b> options.
Force the patterns to start matching only at the beginnings of lines, and in
addition, require them to match entire lines. In multiline mode the match may
be more than one line. This is equivalent to having "^(?:" at the start of each
pattern and ")$" at the end. This option applies only to the patterns that are
matched against the contents of files; it does not apply to patterns specified
by any of the <b>--include</b> or <b>--exclude</b> options.
</P>
<br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
<P>
@ -936,7 +937,7 @@ Cambridge, England.
</P>
<br><a name="SEC15" href="#TOC1">REVISION</a><br>
<P>
Last updated: 26 May 2017
Last updated: 17 June 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>

View File

@ -1,4 +1,4 @@
.TH PCRE2GREP 1 "26 May 2017" "PCRE2 10.30"
.TH PCRE2GREP 1 "17 June 2017" "PCRE2 10.30"
.SH NAME
pcre2grep - a grep with Perl-compatible regular expressions.
.SH SYNOPSIS
@ -639,19 +639,20 @@ Invert the sense of the match, so that lines which do \fInot\fP match any of
the patterns are the ones that are found.
.TP
\fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP
Force the patterns to match only whole words. This is equivalent to having \eb
at the start and end of the pattern. This option applies only to the patterns
that are matched against the contents of files; it does not apply to patterns
specified by any of the \fB--include\fP or \fB--exclude\fP options.
Force the patterns only to match "words". That is, there must be a word
boundary at the start and end of each matched string. This is equivalent to
having "\eb(?:" at the start of each pattern, and ")\eb" at the end. This
option applies only to the patterns that are matched against the contents of
files; it does not apply to patterns specified by any of the \fB--include\fP or
\fB--exclude\fP options.
.TP
\fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP
Force the patterns to be anchored (each must start matching at the beginning of
a line) and in addition, require them to match entire lines. In multiline mode
the match may be more than one line. This is equivalent to having \eA and \eZ
characters at the start and end of each alternative top-level branch in every
pattern. This option applies only to the patterns that are matched against the
contents of files; it does not apply to patterns specified by any of the
\fB--include\fP or \fB--exclude\fP options.
Force the patterns to start matching only at the beginnings of lines, and in
addition, require them to match entire lines. In multiline mode the match may
be more than one line. This is equivalent to having "^(?:" at the start of each
pattern and ")$" at the end. This option applies only to the patterns that are
matched against the contents of files; it does not apply to patterns specified
by any of the \fB--include\fP or \fB--exclude\fP options.
.
.
.SH "ENVIRONMENT VARIABLES"
@ -850,6 +851,6 @@ Cambridge, England.
.rs
.sp
.nf
Last updated: 26 May 2017
Last updated: 17 June 2017
Copyright (c) 1997-2017 University of Cambridge.
.fi

View File

@ -718,29 +718,30 @@ OPTIONS
match any of the patterns are the ones that are found.
-w, --word-regex, --word-regexp
Force the patterns to match only whole words. This is equiva-
lent to having \b at the start and end of the pattern. This
option applies only to the patterns that are matched against
the contents of files; it does not apply to patterns speci-
fied by any of the --include or --exclude options.
Force the patterns only to match "words". That is, there must
be a word boundary at the start and end of each matched
string. This is equivalent to having "\b(?:" at the start of
each pattern, and ")\b" at the end. This option applies only
to the patterns that are matched against the contents of
files; it does not apply to patterns specified by any of the
--include or --exclude options.
-x, --line-regex, --line-regexp
Force the patterns to be anchored (each must start matching
at the beginning of a line) and in addition, require them to
match entire lines. In multiline mode the match may be more
than one line. This is equivalent to having \A and \Z charac-
ters at the start and end of each alternative top-level
branch in every pattern. This option applies only to the pat-
terns that are matched against the contents of files; it does
not apply to patterns specified by any of the --include or
--exclude options.
Force the patterns to start matching only at the beginnings
of lines, and in addition, require them to match entire
lines. In multiline mode the match may be more than one line.
This is equivalent to having "^(?:" at the start of each pat-
tern and ")$" at the end. This option applies only to the
patterns that are matched against the contents of files; it
does not apply to patterns specified by any of the --include
or --exclude options.
ENVIRONMENT VARIABLES
The environment variables LC_ALL and LC_CTYPE are examined, in that
order, for a locale. The first one that is set is used. This can be
overridden by the --locale option. If no locale is set, the PCRE2
The environment variables LC_ALL and LC_CTYPE are examined, in that
order, for a locale. The first one that is set is used. This can be
overridden by the --locale option. If no locale is set, the PCRE2
library's default (usually the "C" locale) is used.
@ -748,99 +749,99 @@ NEWLINES
The -N (--newline) option allows pcre2grep to scan files with different
newline conventions from the default. Any parts of the input files that
are written to the standard output are copied identically, with what-
ever newline sequences they have in the input. However, the setting of
this option does not affect the interpretation of files specified by
are written to the standard output are copied identically, with what-
ever newline sequences they have in the input. However, the setting of
this option does not affect the interpretation of files specified by
the -f, --exclude-from, or --include-from options, which are assumed to
use the operating system's standard newline sequence, nor does it
affect the way in which pcre2grep writes informational messages to the
use the operating system's standard newline sequence, nor does it
affect the way in which pcre2grep writes informational messages to the
standard error and output streams. For these it uses the string "\n" to
indicate newlines, relying on the C I/O library to convert this to an
indicate newlines, relying on the C I/O library to convert this to an
appropriate sequence.
OPTIONS COMPATIBILITY
Many of the short and long forms of pcre2grep's options are the same as
in the GNU grep program. Any long option of the form --xxx-regexp (GNU
in the GNU grep program. Any long option of the form --xxx-regexp (GNU
terminology) is also available as --xxx-regex (PCRE2 terminology). How-
ever, the --depth-limit, --file-list, --file-offsets, --heap-limit,
--include-dir, --line-offsets, --locale, --match-limit, -M, --multi-
line, -N, --newline, --om-separator, --output, -u, and --utf-8 options
are specific to pcre2grep, as is the use of the --only-matching option
ever, the --depth-limit, --file-list, --file-offsets, --heap-limit,
--include-dir, --line-offsets, --locale, --match-limit, -M, --multi-
line, -N, --newline, --om-separator, --output, -u, and --utf-8 options
are specific to pcre2grep, as is the use of the --only-matching option
with a capturing parentheses number.
Although most of the common options work the same way, a few are dif-
ferent in pcre2grep. For example, the --include option's argument is a
glob for GNU grep, but a regular expression for pcre2grep. If both the
-c and -l options are given, GNU grep lists only file names, without
Although most of the common options work the same way, a few are dif-
ferent in pcre2grep. For example, the --include option's argument is a
glob for GNU grep, but a regular expression for pcre2grep. If both the
-c and -l options are given, GNU grep lists only file names, without
counts, but pcre2grep gives the counts as well.
OPTIONS WITH DATA
There are four different ways in which an option with data can be spec-
ified. If a short form option is used, the data may follow immedi-
ified. If a short form option is used, the data may follow immedi-
ately, or (with one exception) in the next command line item. For exam-
ple:
-f/some/file
-f /some/file
The exception is the -o option, which may appear with or without data.
Because of this, if data is present, it must follow immediately in the
The exception is the -o option, which may appear with or without data.
Because of this, if data is present, it must follow immediately in the
same item, for example -o3.
If a long form option is used, the data may appear in the same command
line item, separated by an equals character, or (with two exceptions)
If a long form option is used, the data may appear in the same command
line item, separated by an equals character, or (with two exceptions)
it may appear in the next command line item. For example:
--file=/some/file
--file /some/file
Note, however, that if you want to supply a file name beginning with ~
as data in a shell command, and have the shell expand ~ to a home
Note, however, that if you want to supply a file name beginning with ~
as data in a shell command, and have the shell expand ~ to a home
directory, you must separate the file name from the option, because the
shell does not treat ~ specially unless it is at the start of an item.
The exceptions to the above are the --colour (or --color) and --only-
matching options, for which the data is optional. If one of these
options does have data, it must be given in the first form, using an
The exceptions to the above are the --colour (or --color) and --only-
matching options, for which the data is optional. If one of these
options does have data, it must be given in the first form, using an
equals character. Otherwise pcre2grep will assume that it has no data.
USING PCRE2'S CALLOUT FACILITY
pcre2grep has, by default, support for calling external programs or
scripts or echoing specific strings during matching by making use of
PCRE2's callout facility. However, this support can be disabled when
pcre2grep is built. You can find out whether your binary has support
for callouts by running it with the --help option. If the support is
pcre2grep has, by default, support for calling external programs or
scripts or echoing specific strings during matching by making use of
PCRE2's callout facility. However, this support can be disabled when
pcre2grep is built. You can find out whether your binary has support
for callouts by running it with the --help option. If the support is
not enabled, all callouts in patterns are ignored by pcre2grep.
A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu-
ment is either a number or a quoted string (see the pcre2callout docu-
mentation for details). Numbered callouts are ignored by pcre2grep;
A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu-
ment is either a number or a quoted string (see the pcre2callout docu-
mentation for details). Numbered callouts are ignored by pcre2grep;
only callouts with string arguments are useful.
Calling external programs or scripts
If the callout string does not start with a pipe (vertical bar) charac-
ter, it is parsed into a list of substrings separated by pipe charac-
ters. The first substring must be an executable name, with the follow-
ter, it is parsed into a list of substrings separated by pipe charac-
ters. The first substring must be an executable name, with the follow-
ing substrings specifying arguments:
executable_name|arg1|arg2|...
Any substring (including the executable name) may contain escape
sequences started by a dollar character: $<digits> or ${<digits>} is
replaced by the captured substring of the given decimal number, which
must be greater than zero. If the number is greater than the number of
capturing substrings, or if the capture is unset, the replacement is
Any substring (including the executable name) may contain escape
sequences started by a dollar character: $<digits> or ${<digits>} is
replaced by the captured substring of the given decimal number, which
must be greater than zero. If the number is greater than the number of
capturing substrings, or if the capture is unset, the replacement is
empty.
Any other character is substituted by itself. In particular, $$ is
replaced by a single dollar and $| is replaced by a pipe character.
Any other character is substituted by itself. In particular, $$ is
replaced by a single dollar and $| is replaced by a pipe character.
Here is an example:
echo -e "abcde\n12345" | pcre2grep \
@ -856,49 +857,49 @@ USING PCRE2'S CALLOUT FACILITY
The parameters for the execv() system call that is used to run the pro-
gram or script are zero-terminated strings. This means that binary zero
characters in the callout argument will cause premature termination of
their substrings, and therefore should not be present. Any syntax
errors in the string (for example, a dollar not followed by another
character) cause the callout to be ignored. If running the program
characters in the callout argument will cause premature termination of
their substrings, and therefore should not be present. Any syntax
errors in the string (for example, a dollar not followed by another
character) cause the callout to be ignored. If running the program
fails for any reason (including the non-existence of the executable), a
local matching failure occurs and the matcher backtracks in the normal
local matching failure occurs and the matcher backtracks in the normal
way.
Echoing a specific string
If the callout string starts with a pipe (vertical bar) character, the
If the callout string starts with a pipe (vertical bar) character, the
rest of the string is written to the output, having been passed through
the same escape processing as text from the --output option. This pro-
the same escape processing as text from the --output option. This pro-
vides a simple echoing facility that avoids calling an external program
or script. No terminator is added to the string, so if you want a new-
line, you must include it explicitly. Matching continues normally
after the string is output. If you want to see only the callout output
but not any output from an actual match, you should end the relevant
or script. No terminator is added to the string, so if you want a new-
line, you must include it explicitly. Matching continues normally
after the string is output. If you want to see only the callout output
but not any output from an actual match, you should end the relevant
pattern with (*FAIL).
MATCHING ERRORS
It is possible to supply a regular expression that takes a very long
time to fail to match certain lines. Such patterns normally involve
nested indefinite repeats, for example: (a+)*\d when matched against a
line of a's with no final digit. The PCRE2 matching function has a
resource limit that causes it to abort in these circumstances. If this
happens, pcre2grep outputs an error message and the line that caused
the problem to the standard error stream. If there are more than 20
It is possible to supply a regular expression that takes a very long
time to fail to match certain lines. Such patterns normally involve
nested indefinite repeats, for example: (a+)*\d when matched against a
line of a's with no final digit. The PCRE2 matching function has a
resource limit that causes it to abort in these circumstances. If this
happens, pcre2grep outputs an error message and the line that caused
the problem to the standard error stream. If there are more than 20
such errors, pcre2grep gives up.
The --match-limit option of pcre2grep can be used to set the overall
resource limit. There are also other limits that affect the amount of
memory used during matching; see the discussion of --heap-limit and
The --match-limit option of pcre2grep can be used to set the overall
resource limit. There are also other limits that affect the amount of
memory used during matching; see the discussion of --heap-limit and
--depth-limit above.
DIAGNOSTICS
Exit status is 0 if any matches were found, 1 if no matches were found,
and 2 for syntax errors, overlong lines, non-existent or inaccessible
files (even if matches were found in other files) or too many matching
and 2 for syntax errors, overlong lines, non-existent or inaccessible
files (even if matches were found in other files) or too many matching
errors. Using the -s option to suppress error messages about inaccessi-
ble files does not affect the return code.
@ -917,5 +918,5 @@ AUTHOR
REVISION
Last updated: 26 May 2017
Last updated: 17 June 2017
Copyright (c) 1997-2017 University of Cambridge.

View File

@ -103,7 +103,8 @@ typedef int BOOL;
#define MAXPATLEN 8192
#endif
#define PATBUFSIZE (MAXPATLEN + 10) /* Allows for prefix+suffix */
#define FNBUFSIZ 1024
#define ERRBUFSIZ 256
/* Values for the "filenames" variable, which specifies options for file name
output. The order is important; it is assumed that a file name is wanted for
@ -211,7 +212,7 @@ static BOOL use_jit = FALSE;
static const uint8_t *character_tables = NULL;
static uint32_t pcre2_options = 0;
static uint32_t process_options = 0;
static uint32_t extra_options = 0;
static PCRE2_SIZE heap_limit = PCRE2_UNSET;
static uint32_t match_limit = 0;
static uint32_t depth_limit = 0;
@ -441,19 +442,6 @@ of PCRE2_NEWLINE_xx in pcre2.h. */
static const char *newlines[] = {
"DEFAULT", "CR", "LF", "CRLF", "ANY", "ANYCRLF", "NUL" };
/* Tables for prefixing and suffixing patterns, according to the -w, -x, and -F
options. These set the 1, 2, and 4 bits in process_options, respectively. Note
that the combination of -w and -x has the same effect as -x on its own, so we
can treat them as the same. Note that the MAXPATLEN macro assumes the longest
prefix+suffix is 10 characters; if anything longer is added, it must be
adjusted. */
static const char *prefix[] = {
"", "\\b", "^(?:", "^(?:", "\\Q", "\\b\\Q", "^(?:\\Q", "^(?:\\Q" };
static const char *suffix[] = {
"", "\\b", ")$", ")$", "\\E", "\\E\\b", "\\E)$", "\\E)$" };
/* UTF-8 tables - used only when the newline setting is "any". */
const int utf8_table3[] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
@ -2339,7 +2327,7 @@ file. However, when the newline convention is binary zero, we can't do this. */
if (binary_files != BIN_TEXT)
{
if (endlinetype != PCRE2_NEWLINE_NUL)
binary = memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength)
binary = memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength)
!= NULL;
if (binary && binary_files == BIN_NOMATCH) return 1;
}
@ -3224,7 +3212,7 @@ switch(letter)
case N_NOJIT: use_jit = FALSE; break;
case 'a': binary_files = BIN_TEXT; break;
case 'c': count_only = TRUE; break;
case 'F': process_options |= PO_FIXED_STRINGS; break;
case 'F': options |= PCRE2_LITERAL; break;
case 'H': filenames = FN_FORCE; break;
case 'I': binary_files = BIN_NOMATCH; break;
case 'h': filenames = FN_NONE; break;
@ -3245,8 +3233,8 @@ switch(letter)
case 't': show_total_count = TRUE; break;
case 'u': options |= PCRE2_UTF; utf = TRUE; break;
case 'v': invert = TRUE; break;
case 'w': process_options |= PO_WORD_MATCH; break;
case 'x': process_options |= PO_LINE_MATCH; break;
case 'w': extra_options |= PCRE2_EXTRA_MATCH_WORD; break;
case 'x': extra_options |= PCRE2_EXTRA_MATCH_LINE; break;
case 'V':
{
@ -3309,7 +3297,6 @@ pattern chain.
Arguments:
p points to the pattern block
options the PCRE options
popts the processing options
fromfile TRUE if the pattern was read from a file
fromtext file name or identifying text (e.g. "include")
count 0 if this is the only command line pattern, or
@ -3320,18 +3307,20 @@ Returns: TRUE on success, FALSE after an error
*/
static BOOL
compile_pattern(patstr *p, int options, int popts, int fromfile,
const char *fromtext, int count)
compile_pattern(patstr *p, int options, int fromfile, const char *fromtext,
int count)
{
unsigned char buffer[PATBUFSIZE];
PCRE2_SIZE erroffset;
char *ps = p->string;
unsigned int patlen = strlen(ps);
char *ps;
int errcode;
PCRE2_SIZE patlen, erroffset;
PCRE2_UCHAR errmessbuffer[ERRBUFSIZ];
if (p->compiled != NULL) return TRUE;
if ((popts & PO_FIXED_STRINGS) != 0)
ps = p->string;
patlen = strlen(ps);
if ((options & PCRE2_LITERAL) != 0)
{
int ellength;
char *eop = ps + patlen;
@ -3344,8 +3333,7 @@ if ((popts & PO_FIXED_STRINGS) != 0)
}
}
sprintf((char *)buffer, "%s%.*s%s", prefix[popts], patlen, ps, suffix[popts]);
p->compiled = pcre2_compile(buffer, PCRE2_ZERO_TERMINATED, options, &errcode,
p->compiled = pcre2_compile((PCRE2_SPTR)ps, patlen, options, &errcode,
&erroffset, compile_context);
/* Handle successful compile. Try JIT-compiling if supported and enabled. We
@ -3362,23 +3350,22 @@ if (p->compiled != NULL)
/* Handle compile errors */
erroffset -= (int)strlen(prefix[popts]);
if (erroffset > patlen) erroffset = patlen;
pcre2_get_error_message(errcode, buffer, PATBUFSIZE);
pcre2_get_error_message(errcode, errmessbuffer, sizeof(errmessbuffer));
if (fromfile)
{
fprintf(stderr, "pcre2grep: Error in regex in line %d of %s "
"at offset %d: %s\n", count, fromtext, (int)erroffset, buffer);
"at offset %d: %s\n", count, fromtext, (int)erroffset, errmessbuffer);
}
else
{
if (count == 0)
fprintf(stderr, "pcre2grep: Error in %s regex at offset %d: %s\n",
fromtext, (int)erroffset, buffer);
fromtext, (int)erroffset, errmessbuffer);
else
fprintf(stderr, "pcre2grep: Error in %s %s regex at offset %d: %s\n",
ordin(count), fromtext, (int)erroffset, buffer);
ordin(count), fromtext, (int)erroffset, errmessbuffer);
}
return FALSE;
@ -3396,18 +3383,17 @@ Arguments:
name the name of the file; "-" is stdin
patptr pointer to the pattern chain anchor
patlastptr pointer to the last pattern pointer
popts the process options to pass to pattern_compile()
Returns: TRUE if all went well
*/
static BOOL
read_pattern_file(char *name, patstr **patptr, patstr **patlastptr, int popts)
read_pattern_file(char *name, patstr **patptr, patstr **patlastptr)
{
int linenumber = 0;
FILE *f;
const char *filename;
char buffer[PATBUFSIZE];
char buffer[MAXPATLEN+20];
if (strcmp(name, "-") == 0)
{
@ -3425,7 +3411,7 @@ else
filename = name;
}
while (fgets(buffer, PATBUFSIZE, f) != NULL)
while (fgets(buffer, sizeof(buffer), f) != NULL)
{
char *s = buffer + (int)strlen(buffer);
while (s > buffer && isspace((unsigned char)(s[-1]))) s--;
@ -3453,7 +3439,7 @@ while (fgets(buffer, PATBUFSIZE, f) != NULL)
for(;;)
{
if (!compile_pattern(*patlastptr, pcre2_options, popts, TRUE, filename,
if (!compile_pattern(*patlastptr, pcre2_options, TRUE, filename,
linenumber))
{
if (f != stdin) fclose(f);
@ -3823,7 +3809,7 @@ for (i = 1; i < argc; i++)
{
unsigned long int n = decode_number(option_data, op, longop);
if (op->type == OP_U32NUMBER) *((uint32_t *)op->dataptr) = n;
else if (op->type == OP_SIZE) *((PCRE2_SIZE *)op->dataptr) = n;
else if (op->type == OP_SIZE) *((PCRE2_SIZE *)op->dataptr) = n;
else *((int *)op->dataptr) = n;
}
}
@ -3978,6 +3964,10 @@ if (DEE_option != NULL)
}
}
/* Set the extra options */
(void)pcre2_set_compile_extra_options(compile_context, extra_options);
/* Check the values for Jeffrey Friedl's debugging options. */
#ifdef JFRIEDL_DEBUG
@ -4038,7 +4028,7 @@ chain, so we must not access the next pointer till after the compile. */
for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
{
if (!compile_pattern(cp, pcre2_options, process_options, FALSE, "command-line",
if (!compile_pattern(cp, pcre2_options, FALSE, "command-line",
(j == 1 && patterns->next == NULL)? 0 : j))
goto EXIT2;
}
@ -4047,48 +4037,35 @@ for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
for (fn = pattern_files; fn != NULL; fn = fn->next)
{
if (!read_pattern_file(fn->name, &patterns, &patterns_last, process_options))
goto EXIT2;
if (!read_pattern_file(fn->name, &patterns, &patterns_last)) goto EXIT2;
}
/* Unless JIT has been explicitly disabled, arrange a stack for it to use. */
#ifdef NEVER
#ifdef SUPPORT_PCRE2GREP_JIT
if (use_jit)
jit_stack = pcre2_jit_stack_create(32*1024, 1024*1024, NULL);
#endif
for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
{
#ifdef SUPPORT_PCRE2GREP_JIT
if (jit_stack != NULL && cp->compiled != NULL)
pcre2_jit_stack_assign(match_context, NULL, jit_stack);
#endif
}
#endif
#ifdef SUPPORT_PCRE2GREP_JIT
if (use_jit)
{
jit_stack = pcre2_jit_stack_create(32*1024, 1024*1024, NULL);
if (jit_stack != NULL )
pcre2_jit_stack_assign(match_context, NULL, jit_stack);
}
}
#endif
/* -F, -w, and -x do not apply to include or exclude patterns, so we must
adjust the options. */
pcre2_options &= ~PCRE2_LITERAL;
(void)pcre2_set_compile_extra_options(compile_context, 0);
/* If there are include or exclude patterns read from the command line, compile
them. -F, -w, and -x do not apply, so the third argument of compile_pattern is
0. */
them. */
for (j = 0; j < 4; j++)
{
int k;
for (k = 1, cp = *(incexlist[j]); cp != NULL; k++, cp = cp->next)
{
if (!compile_pattern(cp, pcre2_options, 0, FALSE, incexname[j],
if (!compile_pattern(cp, pcre2_options, FALSE, incexname[j],
(k == 1 && cp->next == NULL)? 0 : k))
goto EXIT2;
}
@ -4098,13 +4075,13 @@ for (j = 0; j < 4; j++)
for (fn = include_from; fn != NULL; fn = fn->next)
{
if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last, 0))
if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last))
goto EXIT2;
}
for (fn = exclude_from; fn != NULL; fn = fn->next)
{
if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last, 0))
if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last))
goto EXIT2;
}
@ -4123,7 +4100,7 @@ read them line by line and search the given files. */
for (fn = file_lists; fn != NULL; fn = fn->next)
{
char buffer[PATBUFSIZE];
char buffer[FNBUFSIZ];
FILE *fl;
if (strcmp(fn->name, "-") == 0) fl = stdin; else
{
@ -4135,7 +4112,7 @@ for (fn = file_lists; fn != NULL; fn = fn->next)
goto EXIT2;
}
}
while (fgets(buffer, PATBUFSIZE, fl) != NULL)
while (fgets(buffer, sizeof(buffer), fl) != NULL)
{
int frc;
char *end = buffer + (int)strlen(buffer);

5
testdata/grepinputv vendored
View File

@ -2,3 +2,8 @@ The quick brown
fox jumps
over the lazy dog.
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate

32
testdata/grepoutput vendored
View File

@ -454,6 +454,11 @@ RC=1
---------------------------- Test 51 ------------------------------
over the lazy dog.
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate
RC=0
---------------------------- Test 52 ------------------------------
fox jumps
@ -788,32 +793,32 @@ RC=0
37216,12
RC=0
---------------------------- Test 113 -----------------------------
476
478
RC=0
---------------------------- Test 114 -----------------------------
testdata/grepinput:469
testdata/grepinput3:0
testdata/grepinput8:0
testdata/grepinputv:1
testdata/grepinputv:3
testdata/grepinputx:6
TOTAL:476
TOTAL:478
RC=0
---------------------------- Test 115 -----------------------------
testdata/grepinput:469
testdata/grepinputv:1
testdata/grepinputv:3
testdata/grepinputx:6
TOTAL:476
TOTAL:478
RC=0
---------------------------- Test 116 -----------------------------
476
478
RC=0
---------------------------- Test 117 -----------------------------
469
0
0
1
3
6
476
478
RC=0
---------------------------- Test 118 -----------------------------
testdata/grepinput3
@ -834,3 +839,14 @@ RC=0
./testdata/grepinput:a binary zero:zeroa
./testdata/grepinput:the binary zero.:zerothe.
RC=0
---------------------------- Test 121 -----------------------------
This line contains \E and (regex) *meta* [characters].
RC=0
---------------------------- Test 122 -----------------------------
over the lazy dog.
The word is cat in this line
RC=0
---------------------------- Test 122 -----------------------------
over the lazy dog.
The word is cat in this line
RC=0

28
testdata/grepoutputC vendored
View File

@ -1,14 +1,42 @@
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
Arg1: [T] [his] [s] Arg2: |T| () () (0)
Arg1: [T] [his] [s] Arg2: |T| () () (0)
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
The quick brown
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
Arg1: [qu] [qu]
Arg1: [ t] [ t]
Arg1: [ l] [ l]
Arg1: [wo] [wo]
Arg1: [ca] [ca]
Arg1: [sn] [sn]
The quick brown
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
0:T
The quick brown
0:T
This time it jumps and jumps and jumps.
0:T
This line contains \E and (regex) *meta* [characters].
0:T
The word is cat in this line
0:T
The caterpillar sat on the mat
0:T
The snowcat is not an animal
T
T
T
T
T
T