Convert pcre2grep to use new pcre2_compile() options, thereby fixing two minor
(?) bugs.
This commit is contained in:
parent
69eab9cfe7
commit
76a57bd839
|
@ -192,6 +192,12 @@ pattern lines.
|
|||
42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit
|
||||
of pcre2grep.
|
||||
|
||||
43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL,
|
||||
PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs:
|
||||
|
||||
(a) The -F option did not work for fixed strings containing \E.
|
||||
(b) The -w option did not work for patterns with multiple branches.
|
||||
|
||||
|
||||
Version 10.23 14-February-2017
|
||||
------------------------------
|
||||
|
|
13
RunGrepTest
13
RunGrepTest
|
@ -602,6 +602,19 @@ echo "---------------------------- Test 120 ------------------------------" >>te
|
|||
(cd $srcdir; $valgrind $vjs $pcre2grep -HO '$0:$2$1$3' '(\w+) binary (\w+)(\.)?' ./testdata/grepinput) >>testtrygrep
|
||||
echo "RC=$?" >>testtrygrep
|
||||
|
||||
echo "---------------------------- Test 121 -----------------------------" >>testtrygrep
|
||||
(cd $srcdir; $valgrind $vjs $pcre2grep -F '\E and (regex)' testdata/grepinputv) >>testtrygrep
|
||||
echo "RC=$?" >>testtrygrep
|
||||
|
||||
echo "---------------------------- Test 122 -----------------------------" >>testtrygrep
|
||||
(cd $srcdir; $valgrind $vjs $pcre2grep -w 'cat|dog' testdata/grepinputv) >>testtrygrep
|
||||
echo "RC=$?" >>testtrygrep
|
||||
|
||||
echo "---------------------------- Test 122 -----------------------------" >>testtrygrep
|
||||
(cd $srcdir; $valgrind $vjs $pcre2grep -w 'dog|cat' testdata/grepinputv) >>testtrygrep
|
||||
echo "RC=$?" >>testtrygrep
|
||||
|
||||
|
||||
# Now compare the results.
|
||||
|
||||
$cf $srcdir/testdata/grepoutput testtrygrep
|
||||
|
|
|
@ -740,20 +740,21 @@ the patterns are the ones that are found.
|
|||
</P>
|
||||
<P>
|
||||
<b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
|
||||
Force the patterns to match only whole words. This is equivalent to having \b
|
||||
at the start and end of the pattern. This option applies only to the patterns
|
||||
that are matched against the contents of files; it does not apply to patterns
|
||||
specified by any of the <b>--include</b> or <b>--exclude</b> options.
|
||||
Force the patterns only to match "words". That is, there must be a word
|
||||
boundary at the start and end of each matched string. This is equivalent to
|
||||
having "\b(?:" at the start of each pattern, and ")\b" at the end. This
|
||||
option applies only to the patterns that are matched against the contents of
|
||||
files; it does not apply to patterns specified by any of the <b>--include</b> or
|
||||
<b>--exclude</b> options.
|
||||
</P>
|
||||
<P>
|
||||
<b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
|
||||
Force the patterns to be anchored (each must start matching at the beginning of
|
||||
a line) and in addition, require them to match entire lines. In multiline mode
|
||||
the match may be more than one line. This is equivalent to having \A and \Z
|
||||
characters at the start and end of each alternative top-level branch in every
|
||||
pattern. This option applies only to the patterns that are matched against the
|
||||
contents of files; it does not apply to patterns specified by any of the
|
||||
<b>--include</b> or <b>--exclude</b> options.
|
||||
Force the patterns to start matching only at the beginnings of lines, and in
|
||||
addition, require them to match entire lines. In multiline mode the match may
|
||||
be more than one line. This is equivalent to having "^(?:" at the start of each
|
||||
pattern and ")$" at the end. This option applies only to the patterns that are
|
||||
matched against the contents of files; it does not apply to patterns specified
|
||||
by any of the <b>--include</b> or <b>--exclude</b> options.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
|
||||
<P>
|
||||
|
@ -936,7 +937,7 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC15" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 26 May 2017
|
||||
Last updated: 17 June 2017
|
||||
<br>
|
||||
Copyright © 1997-2017 University of Cambridge.
|
||||
<br>
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2GREP 1 "26 May 2017" "PCRE2 10.30"
|
||||
.TH PCRE2GREP 1 "17 June 2017" "PCRE2 10.30"
|
||||
.SH NAME
|
||||
pcre2grep - a grep with Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -639,19 +639,20 @@ Invert the sense of the match, so that lines which do \fInot\fP match any of
|
|||
the patterns are the ones that are found.
|
||||
.TP
|
||||
\fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP
|
||||
Force the patterns to match only whole words. This is equivalent to having \eb
|
||||
at the start and end of the pattern. This option applies only to the patterns
|
||||
that are matched against the contents of files; it does not apply to patterns
|
||||
specified by any of the \fB--include\fP or \fB--exclude\fP options.
|
||||
Force the patterns only to match "words". That is, there must be a word
|
||||
boundary at the start and end of each matched string. This is equivalent to
|
||||
having "\eb(?:" at the start of each pattern, and ")\eb" at the end. This
|
||||
option applies only to the patterns that are matched against the contents of
|
||||
files; it does not apply to patterns specified by any of the \fB--include\fP or
|
||||
\fB--exclude\fP options.
|
||||
.TP
|
||||
\fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP
|
||||
Force the patterns to be anchored (each must start matching at the beginning of
|
||||
a line) and in addition, require them to match entire lines. In multiline mode
|
||||
the match may be more than one line. This is equivalent to having \eA and \eZ
|
||||
characters at the start and end of each alternative top-level branch in every
|
||||
pattern. This option applies only to the patterns that are matched against the
|
||||
contents of files; it does not apply to patterns specified by any of the
|
||||
\fB--include\fP or \fB--exclude\fP options.
|
||||
Force the patterns to start matching only at the beginnings of lines, and in
|
||||
addition, require them to match entire lines. In multiline mode the match may
|
||||
be more than one line. This is equivalent to having "^(?:" at the start of each
|
||||
pattern and ")$" at the end. This option applies only to the patterns that are
|
||||
matched against the contents of files; it does not apply to patterns specified
|
||||
by any of the \fB--include\fP or \fB--exclude\fP options.
|
||||
.
|
||||
.
|
||||
.SH "ENVIRONMENT VARIABLES"
|
||||
|
@ -850,6 +851,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 26 May 2017
|
||||
Last updated: 17 June 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -718,29 +718,30 @@ OPTIONS
|
|||
match any of the patterns are the ones that are found.
|
||||
|
||||
-w, --word-regex, --word-regexp
|
||||
Force the patterns to match only whole words. This is equiva-
|
||||
lent to having \b at the start and end of the pattern. This
|
||||
option applies only to the patterns that are matched against
|
||||
the contents of files; it does not apply to patterns speci-
|
||||
fied by any of the --include or --exclude options.
|
||||
Force the patterns only to match "words". That is, there must
|
||||
be a word boundary at the start and end of each matched
|
||||
string. This is equivalent to having "\b(?:" at the start of
|
||||
each pattern, and ")\b" at the end. This option applies only
|
||||
to the patterns that are matched against the contents of
|
||||
files; it does not apply to patterns specified by any of the
|
||||
--include or --exclude options.
|
||||
|
||||
-x, --line-regex, --line-regexp
|
||||
Force the patterns to be anchored (each must start matching
|
||||
at the beginning of a line) and in addition, require them to
|
||||
match entire lines. In multiline mode the match may be more
|
||||
than one line. This is equivalent to having \A and \Z charac-
|
||||
ters at the start and end of each alternative top-level
|
||||
branch in every pattern. This option applies only to the pat-
|
||||
terns that are matched against the contents of files; it does
|
||||
not apply to patterns specified by any of the --include or
|
||||
--exclude options.
|
||||
Force the patterns to start matching only at the beginnings
|
||||
of lines, and in addition, require them to match entire
|
||||
lines. In multiline mode the match may be more than one line.
|
||||
This is equivalent to having "^(?:" at the start of each pat-
|
||||
tern and ")$" at the end. This option applies only to the
|
||||
patterns that are matched against the contents of files; it
|
||||
does not apply to patterns specified by any of the --include
|
||||
or --exclude options.
|
||||
|
||||
|
||||
ENVIRONMENT VARIABLES
|
||||
|
||||
The environment variables LC_ALL and LC_CTYPE are examined, in that
|
||||
order, for a locale. The first one that is set is used. This can be
|
||||
overridden by the --locale option. If no locale is set, the PCRE2
|
||||
The environment variables LC_ALL and LC_CTYPE are examined, in that
|
||||
order, for a locale. The first one that is set is used. This can be
|
||||
overridden by the --locale option. If no locale is set, the PCRE2
|
||||
library's default (usually the "C" locale) is used.
|
||||
|
||||
|
||||
|
@ -748,99 +749,99 @@ NEWLINES
|
|||
|
||||
The -N (--newline) option allows pcre2grep to scan files with different
|
||||
newline conventions from the default. Any parts of the input files that
|
||||
are written to the standard output are copied identically, with what-
|
||||
ever newline sequences they have in the input. However, the setting of
|
||||
this option does not affect the interpretation of files specified by
|
||||
are written to the standard output are copied identically, with what-
|
||||
ever newline sequences they have in the input. However, the setting of
|
||||
this option does not affect the interpretation of files specified by
|
||||
the -f, --exclude-from, or --include-from options, which are assumed to
|
||||
use the operating system's standard newline sequence, nor does it
|
||||
affect the way in which pcre2grep writes informational messages to the
|
||||
use the operating system's standard newline sequence, nor does it
|
||||
affect the way in which pcre2grep writes informational messages to the
|
||||
standard error and output streams. For these it uses the string "\n" to
|
||||
indicate newlines, relying on the C I/O library to convert this to an
|
||||
indicate newlines, relying on the C I/O library to convert this to an
|
||||
appropriate sequence.
|
||||
|
||||
|
||||
OPTIONS COMPATIBILITY
|
||||
|
||||
Many of the short and long forms of pcre2grep's options are the same as
|
||||
in the GNU grep program. Any long option of the form --xxx-regexp (GNU
|
||||
in the GNU grep program. Any long option of the form --xxx-regexp (GNU
|
||||
terminology) is also available as --xxx-regex (PCRE2 terminology). How-
|
||||
ever, the --depth-limit, --file-list, --file-offsets, --heap-limit,
|
||||
--include-dir, --line-offsets, --locale, --match-limit, -M, --multi-
|
||||
line, -N, --newline, --om-separator, --output, -u, and --utf-8 options
|
||||
are specific to pcre2grep, as is the use of the --only-matching option
|
||||
ever, the --depth-limit, --file-list, --file-offsets, --heap-limit,
|
||||
--include-dir, --line-offsets, --locale, --match-limit, -M, --multi-
|
||||
line, -N, --newline, --om-separator, --output, -u, and --utf-8 options
|
||||
are specific to pcre2grep, as is the use of the --only-matching option
|
||||
with a capturing parentheses number.
|
||||
|
||||
Although most of the common options work the same way, a few are dif-
|
||||
ferent in pcre2grep. For example, the --include option's argument is a
|
||||
glob for GNU grep, but a regular expression for pcre2grep. If both the
|
||||
-c and -l options are given, GNU grep lists only file names, without
|
||||
Although most of the common options work the same way, a few are dif-
|
||||
ferent in pcre2grep. For example, the --include option's argument is a
|
||||
glob for GNU grep, but a regular expression for pcre2grep. If both the
|
||||
-c and -l options are given, GNU grep lists only file names, without
|
||||
counts, but pcre2grep gives the counts as well.
|
||||
|
||||
|
||||
OPTIONS WITH DATA
|
||||
|
||||
There are four different ways in which an option with data can be spec-
|
||||
ified. If a short form option is used, the data may follow immedi-
|
||||
ified. If a short form option is used, the data may follow immedi-
|
||||
ately, or (with one exception) in the next command line item. For exam-
|
||||
ple:
|
||||
|
||||
-f/some/file
|
||||
-f /some/file
|
||||
|
||||
The exception is the -o option, which may appear with or without data.
|
||||
Because of this, if data is present, it must follow immediately in the
|
||||
The exception is the -o option, which may appear with or without data.
|
||||
Because of this, if data is present, it must follow immediately in the
|
||||
same item, for example -o3.
|
||||
|
||||
If a long form option is used, the data may appear in the same command
|
||||
line item, separated by an equals character, or (with two exceptions)
|
||||
If a long form option is used, the data may appear in the same command
|
||||
line item, separated by an equals character, or (with two exceptions)
|
||||
it may appear in the next command line item. For example:
|
||||
|
||||
--file=/some/file
|
||||
--file /some/file
|
||||
|
||||
Note, however, that if you want to supply a file name beginning with ~
|
||||
as data in a shell command, and have the shell expand ~ to a home
|
||||
Note, however, that if you want to supply a file name beginning with ~
|
||||
as data in a shell command, and have the shell expand ~ to a home
|
||||
directory, you must separate the file name from the option, because the
|
||||
shell does not treat ~ specially unless it is at the start of an item.
|
||||
|
||||
The exceptions to the above are the --colour (or --color) and --only-
|
||||
matching options, for which the data is optional. If one of these
|
||||
options does have data, it must be given in the first form, using an
|
||||
The exceptions to the above are the --colour (or --color) and --only-
|
||||
matching options, for which the data is optional. If one of these
|
||||
options does have data, it must be given in the first form, using an
|
||||
equals character. Otherwise pcre2grep will assume that it has no data.
|
||||
|
||||
|
||||
USING PCRE2'S CALLOUT FACILITY
|
||||
|
||||
pcre2grep has, by default, support for calling external programs or
|
||||
scripts or echoing specific strings during matching by making use of
|
||||
PCRE2's callout facility. However, this support can be disabled when
|
||||
pcre2grep is built. You can find out whether your binary has support
|
||||
for callouts by running it with the --help option. If the support is
|
||||
pcre2grep has, by default, support for calling external programs or
|
||||
scripts or echoing specific strings during matching by making use of
|
||||
PCRE2's callout facility. However, this support can be disabled when
|
||||
pcre2grep is built. You can find out whether your binary has support
|
||||
for callouts by running it with the --help option. If the support is
|
||||
not enabled, all callouts in patterns are ignored by pcre2grep.
|
||||
|
||||
A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu-
|
||||
ment is either a number or a quoted string (see the pcre2callout docu-
|
||||
mentation for details). Numbered callouts are ignored by pcre2grep;
|
||||
A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu-
|
||||
ment is either a number or a quoted string (see the pcre2callout docu-
|
||||
mentation for details). Numbered callouts are ignored by pcre2grep;
|
||||
only callouts with string arguments are useful.
|
||||
|
||||
Calling external programs or scripts
|
||||
|
||||
If the callout string does not start with a pipe (vertical bar) charac-
|
||||
ter, it is parsed into a list of substrings separated by pipe charac-
|
||||
ters. The first substring must be an executable name, with the follow-
|
||||
ter, it is parsed into a list of substrings separated by pipe charac-
|
||||
ters. The first substring must be an executable name, with the follow-
|
||||
ing substrings specifying arguments:
|
||||
|
||||
executable_name|arg1|arg2|...
|
||||
|
||||
Any substring (including the executable name) may contain escape
|
||||
sequences started by a dollar character: $<digits> or ${<digits>} is
|
||||
replaced by the captured substring of the given decimal number, which
|
||||
must be greater than zero. If the number is greater than the number of
|
||||
capturing substrings, or if the capture is unset, the replacement is
|
||||
Any substring (including the executable name) may contain escape
|
||||
sequences started by a dollar character: $<digits> or ${<digits>} is
|
||||
replaced by the captured substring of the given decimal number, which
|
||||
must be greater than zero. If the number is greater than the number of
|
||||
capturing substrings, or if the capture is unset, the replacement is
|
||||
empty.
|
||||
|
||||
Any other character is substituted by itself. In particular, $$ is
|
||||
replaced by a single dollar and $| is replaced by a pipe character.
|
||||
Any other character is substituted by itself. In particular, $$ is
|
||||
replaced by a single dollar and $| is replaced by a pipe character.
|
||||
Here is an example:
|
||||
|
||||
echo -e "abcde\n12345" | pcre2grep \
|
||||
|
@ -856,49 +857,49 @@ USING PCRE2'S CALLOUT FACILITY
|
|||
|
||||
The parameters for the execv() system call that is used to run the pro-
|
||||
gram or script are zero-terminated strings. This means that binary zero
|
||||
characters in the callout argument will cause premature termination of
|
||||
their substrings, and therefore should not be present. Any syntax
|
||||
errors in the string (for example, a dollar not followed by another
|
||||
character) cause the callout to be ignored. If running the program
|
||||
characters in the callout argument will cause premature termination of
|
||||
their substrings, and therefore should not be present. Any syntax
|
||||
errors in the string (for example, a dollar not followed by another
|
||||
character) cause the callout to be ignored. If running the program
|
||||
fails for any reason (including the non-existence of the executable), a
|
||||
local matching failure occurs and the matcher backtracks in the normal
|
||||
local matching failure occurs and the matcher backtracks in the normal
|
||||
way.
|
||||
|
||||
Echoing a specific string
|
||||
|
||||
If the callout string starts with a pipe (vertical bar) character, the
|
||||
If the callout string starts with a pipe (vertical bar) character, the
|
||||
rest of the string is written to the output, having been passed through
|
||||
the same escape processing as text from the --output option. This pro-
|
||||
the same escape processing as text from the --output option. This pro-
|
||||
vides a simple echoing facility that avoids calling an external program
|
||||
or script. No terminator is added to the string, so if you want a new-
|
||||
line, you must include it explicitly. Matching continues normally
|
||||
after the string is output. If you want to see only the callout output
|
||||
but not any output from an actual match, you should end the relevant
|
||||
or script. No terminator is added to the string, so if you want a new-
|
||||
line, you must include it explicitly. Matching continues normally
|
||||
after the string is output. If you want to see only the callout output
|
||||
but not any output from an actual match, you should end the relevant
|
||||
pattern with (*FAIL).
|
||||
|
||||
|
||||
MATCHING ERRORS
|
||||
|
||||
It is possible to supply a regular expression that takes a very long
|
||||
time to fail to match certain lines. Such patterns normally involve
|
||||
nested indefinite repeats, for example: (a+)*\d when matched against a
|
||||
line of a's with no final digit. The PCRE2 matching function has a
|
||||
resource limit that causes it to abort in these circumstances. If this
|
||||
happens, pcre2grep outputs an error message and the line that caused
|
||||
the problem to the standard error stream. If there are more than 20
|
||||
It is possible to supply a regular expression that takes a very long
|
||||
time to fail to match certain lines. Such patterns normally involve
|
||||
nested indefinite repeats, for example: (a+)*\d when matched against a
|
||||
line of a's with no final digit. The PCRE2 matching function has a
|
||||
resource limit that causes it to abort in these circumstances. If this
|
||||
happens, pcre2grep outputs an error message and the line that caused
|
||||
the problem to the standard error stream. If there are more than 20
|
||||
such errors, pcre2grep gives up.
|
||||
|
||||
The --match-limit option of pcre2grep can be used to set the overall
|
||||
resource limit. There are also other limits that affect the amount of
|
||||
memory used during matching; see the discussion of --heap-limit and
|
||||
The --match-limit option of pcre2grep can be used to set the overall
|
||||
resource limit. There are also other limits that affect the amount of
|
||||
memory used during matching; see the discussion of --heap-limit and
|
||||
--depth-limit above.
|
||||
|
||||
|
||||
DIAGNOSTICS
|
||||
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found,
|
||||
and 2 for syntax errors, overlong lines, non-existent or inaccessible
|
||||
files (even if matches were found in other files) or too many matching
|
||||
and 2 for syntax errors, overlong lines, non-existent or inaccessible
|
||||
files (even if matches were found in other files) or too many matching
|
||||
errors. Using the -s option to suppress error messages about inaccessi-
|
||||
ble files does not affect the return code.
|
||||
|
||||
|
@ -917,5 +918,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 26 May 2017
|
||||
Last updated: 17 June 2017
|
||||
Copyright (c) 1997-2017 University of Cambridge.
|
||||
|
|
113
src/pcre2grep.c
113
src/pcre2grep.c
|
@ -103,7 +103,8 @@ typedef int BOOL;
|
|||
#define MAXPATLEN 8192
|
||||
#endif
|
||||
|
||||
#define PATBUFSIZE (MAXPATLEN + 10) /* Allows for prefix+suffix */
|
||||
#define FNBUFSIZ 1024
|
||||
#define ERRBUFSIZ 256
|
||||
|
||||
/* Values for the "filenames" variable, which specifies options for file name
|
||||
output. The order is important; it is assumed that a file name is wanted for
|
||||
|
@ -211,7 +212,7 @@ static BOOL use_jit = FALSE;
|
|||
static const uint8_t *character_tables = NULL;
|
||||
|
||||
static uint32_t pcre2_options = 0;
|
||||
static uint32_t process_options = 0;
|
||||
static uint32_t extra_options = 0;
|
||||
static PCRE2_SIZE heap_limit = PCRE2_UNSET;
|
||||
static uint32_t match_limit = 0;
|
||||
static uint32_t depth_limit = 0;
|
||||
|
@ -441,19 +442,6 @@ of PCRE2_NEWLINE_xx in pcre2.h. */
|
|||
static const char *newlines[] = {
|
||||
"DEFAULT", "CR", "LF", "CRLF", "ANY", "ANYCRLF", "NUL" };
|
||||
|
||||
/* Tables for prefixing and suffixing patterns, according to the -w, -x, and -F
|
||||
options. These set the 1, 2, and 4 bits in process_options, respectively. Note
|
||||
that the combination of -w and -x has the same effect as -x on its own, so we
|
||||
can treat them as the same. Note that the MAXPATLEN macro assumes the longest
|
||||
prefix+suffix is 10 characters; if anything longer is added, it must be
|
||||
adjusted. */
|
||||
|
||||
static const char *prefix[] = {
|
||||
"", "\\b", "^(?:", "^(?:", "\\Q", "\\b\\Q", "^(?:\\Q", "^(?:\\Q" };
|
||||
|
||||
static const char *suffix[] = {
|
||||
"", "\\b", ")$", ")$", "\\E", "\\E\\b", "\\E)$", "\\E)$" };
|
||||
|
||||
/* UTF-8 tables - used only when the newline setting is "any". */
|
||||
|
||||
const int utf8_table3[] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
|
||||
|
@ -2339,7 +2327,7 @@ file. However, when the newline convention is binary zero, we can't do this. */
|
|||
if (binary_files != BIN_TEXT)
|
||||
{
|
||||
if (endlinetype != PCRE2_NEWLINE_NUL)
|
||||
binary = memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength)
|
||||
binary = memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength)
|
||||
!= NULL;
|
||||
if (binary && binary_files == BIN_NOMATCH) return 1;
|
||||
}
|
||||
|
@ -3224,7 +3212,7 @@ switch(letter)
|
|||
case N_NOJIT: use_jit = FALSE; break;
|
||||
case 'a': binary_files = BIN_TEXT; break;
|
||||
case 'c': count_only = TRUE; break;
|
||||
case 'F': process_options |= PO_FIXED_STRINGS; break;
|
||||
case 'F': options |= PCRE2_LITERAL; break;
|
||||
case 'H': filenames = FN_FORCE; break;
|
||||
case 'I': binary_files = BIN_NOMATCH; break;
|
||||
case 'h': filenames = FN_NONE; break;
|
||||
|
@ -3245,8 +3233,8 @@ switch(letter)
|
|||
case 't': show_total_count = TRUE; break;
|
||||
case 'u': options |= PCRE2_UTF; utf = TRUE; break;
|
||||
case 'v': invert = TRUE; break;
|
||||
case 'w': process_options |= PO_WORD_MATCH; break;
|
||||
case 'x': process_options |= PO_LINE_MATCH; break;
|
||||
case 'w': extra_options |= PCRE2_EXTRA_MATCH_WORD; break;
|
||||
case 'x': extra_options |= PCRE2_EXTRA_MATCH_LINE; break;
|
||||
|
||||
case 'V':
|
||||
{
|
||||
|
@ -3309,7 +3297,6 @@ pattern chain.
|
|||
Arguments:
|
||||
p points to the pattern block
|
||||
options the PCRE options
|
||||
popts the processing options
|
||||
fromfile TRUE if the pattern was read from a file
|
||||
fromtext file name or identifying text (e.g. "include")
|
||||
count 0 if this is the only command line pattern, or
|
||||
|
@ -3320,18 +3307,20 @@ Returns: TRUE on success, FALSE after an error
|
|||
*/
|
||||
|
||||
static BOOL
|
||||
compile_pattern(patstr *p, int options, int popts, int fromfile,
|
||||
const char *fromtext, int count)
|
||||
compile_pattern(patstr *p, int options, int fromfile, const char *fromtext,
|
||||
int count)
|
||||
{
|
||||
unsigned char buffer[PATBUFSIZE];
|
||||
PCRE2_SIZE erroffset;
|
||||
char *ps = p->string;
|
||||
unsigned int patlen = strlen(ps);
|
||||
char *ps;
|
||||
int errcode;
|
||||
PCRE2_SIZE patlen, erroffset;
|
||||
PCRE2_UCHAR errmessbuffer[ERRBUFSIZ];
|
||||
|
||||
if (p->compiled != NULL) return TRUE;
|
||||
|
||||
if ((popts & PO_FIXED_STRINGS) != 0)
|
||||
ps = p->string;
|
||||
patlen = strlen(ps);
|
||||
|
||||
if ((options & PCRE2_LITERAL) != 0)
|
||||
{
|
||||
int ellength;
|
||||
char *eop = ps + patlen;
|
||||
|
@ -3344,8 +3333,7 @@ if ((popts & PO_FIXED_STRINGS) != 0)
|
|||
}
|
||||
}
|
||||
|
||||
sprintf((char *)buffer, "%s%.*s%s", prefix[popts], patlen, ps, suffix[popts]);
|
||||
p->compiled = pcre2_compile(buffer, PCRE2_ZERO_TERMINATED, options, &errcode,
|
||||
p->compiled = pcre2_compile((PCRE2_SPTR)ps, patlen, options, &errcode,
|
||||
&erroffset, compile_context);
|
||||
|
||||
/* Handle successful compile. Try JIT-compiling if supported and enabled. We
|
||||
|
@ -3362,23 +3350,22 @@ if (p->compiled != NULL)
|
|||
|
||||
/* Handle compile errors */
|
||||
|
||||
erroffset -= (int)strlen(prefix[popts]);
|
||||
if (erroffset > patlen) erroffset = patlen;
|
||||
pcre2_get_error_message(errcode, buffer, PATBUFSIZE);
|
||||
pcre2_get_error_message(errcode, errmessbuffer, sizeof(errmessbuffer));
|
||||
|
||||
if (fromfile)
|
||||
{
|
||||
fprintf(stderr, "pcre2grep: Error in regex in line %d of %s "
|
||||
"at offset %d: %s\n", count, fromtext, (int)erroffset, buffer);
|
||||
"at offset %d: %s\n", count, fromtext, (int)erroffset, errmessbuffer);
|
||||
}
|
||||
else
|
||||
{
|
||||
if (count == 0)
|
||||
fprintf(stderr, "pcre2grep: Error in %s regex at offset %d: %s\n",
|
||||
fromtext, (int)erroffset, buffer);
|
||||
fromtext, (int)erroffset, errmessbuffer);
|
||||
else
|
||||
fprintf(stderr, "pcre2grep: Error in %s %s regex at offset %d: %s\n",
|
||||
ordin(count), fromtext, (int)erroffset, buffer);
|
||||
ordin(count), fromtext, (int)erroffset, errmessbuffer);
|
||||
}
|
||||
|
||||
return FALSE;
|
||||
|
@ -3396,18 +3383,17 @@ Arguments:
|
|||
name the name of the file; "-" is stdin
|
||||
patptr pointer to the pattern chain anchor
|
||||
patlastptr pointer to the last pattern pointer
|
||||
popts the process options to pass to pattern_compile()
|
||||
|
||||
Returns: TRUE if all went well
|
||||
*/
|
||||
|
||||
static BOOL
|
||||
read_pattern_file(char *name, patstr **patptr, patstr **patlastptr, int popts)
|
||||
read_pattern_file(char *name, patstr **patptr, patstr **patlastptr)
|
||||
{
|
||||
int linenumber = 0;
|
||||
FILE *f;
|
||||
const char *filename;
|
||||
char buffer[PATBUFSIZE];
|
||||
char buffer[MAXPATLEN+20];
|
||||
|
||||
if (strcmp(name, "-") == 0)
|
||||
{
|
||||
|
@ -3425,7 +3411,7 @@ else
|
|||
filename = name;
|
||||
}
|
||||
|
||||
while (fgets(buffer, PATBUFSIZE, f) != NULL)
|
||||
while (fgets(buffer, sizeof(buffer), f) != NULL)
|
||||
{
|
||||
char *s = buffer + (int)strlen(buffer);
|
||||
while (s > buffer && isspace((unsigned char)(s[-1]))) s--;
|
||||
|
@ -3453,7 +3439,7 @@ while (fgets(buffer, PATBUFSIZE, f) != NULL)
|
|||
|
||||
for(;;)
|
||||
{
|
||||
if (!compile_pattern(*patlastptr, pcre2_options, popts, TRUE, filename,
|
||||
if (!compile_pattern(*patlastptr, pcre2_options, TRUE, filename,
|
||||
linenumber))
|
||||
{
|
||||
if (f != stdin) fclose(f);
|
||||
|
@ -3823,7 +3809,7 @@ for (i = 1; i < argc; i++)
|
|||
{
|
||||
unsigned long int n = decode_number(option_data, op, longop);
|
||||
if (op->type == OP_U32NUMBER) *((uint32_t *)op->dataptr) = n;
|
||||
else if (op->type == OP_SIZE) *((PCRE2_SIZE *)op->dataptr) = n;
|
||||
else if (op->type == OP_SIZE) *((PCRE2_SIZE *)op->dataptr) = n;
|
||||
else *((int *)op->dataptr) = n;
|
||||
}
|
||||
}
|
||||
|
@ -3978,6 +3964,10 @@ if (DEE_option != NULL)
|
|||
}
|
||||
}
|
||||
|
||||
/* Set the extra options */
|
||||
|
||||
(void)pcre2_set_compile_extra_options(compile_context, extra_options);
|
||||
|
||||
/* Check the values for Jeffrey Friedl's debugging options. */
|
||||
|
||||
#ifdef JFRIEDL_DEBUG
|
||||
|
@ -4038,7 +4028,7 @@ chain, so we must not access the next pointer till after the compile. */
|
|||
|
||||
for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
|
||||
{
|
||||
if (!compile_pattern(cp, pcre2_options, process_options, FALSE, "command-line",
|
||||
if (!compile_pattern(cp, pcre2_options, FALSE, "command-line",
|
||||
(j == 1 && patterns->next == NULL)? 0 : j))
|
||||
goto EXIT2;
|
||||
}
|
||||
|
@ -4047,48 +4037,35 @@ for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
|
|||
|
||||
for (fn = pattern_files; fn != NULL; fn = fn->next)
|
||||
{
|
||||
if (!read_pattern_file(fn->name, &patterns, &patterns_last, process_options))
|
||||
goto EXIT2;
|
||||
if (!read_pattern_file(fn->name, &patterns, &patterns_last)) goto EXIT2;
|
||||
}
|
||||
|
||||
/* Unless JIT has been explicitly disabled, arrange a stack for it to use. */
|
||||
|
||||
|
||||
#ifdef NEVER
|
||||
#ifdef SUPPORT_PCRE2GREP_JIT
|
||||
if (use_jit)
|
||||
jit_stack = pcre2_jit_stack_create(32*1024, 1024*1024, NULL);
|
||||
#endif
|
||||
|
||||
for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
|
||||
{
|
||||
#ifdef SUPPORT_PCRE2GREP_JIT
|
||||
if (jit_stack != NULL && cp->compiled != NULL)
|
||||
pcre2_jit_stack_assign(match_context, NULL, jit_stack);
|
||||
#endif
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
#ifdef SUPPORT_PCRE2GREP_JIT
|
||||
if (use_jit)
|
||||
{
|
||||
jit_stack = pcre2_jit_stack_create(32*1024, 1024*1024, NULL);
|
||||
if (jit_stack != NULL )
|
||||
pcre2_jit_stack_assign(match_context, NULL, jit_stack);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
/* -F, -w, and -x do not apply to include or exclude patterns, so we must
|
||||
adjust the options. */
|
||||
|
||||
pcre2_options &= ~PCRE2_LITERAL;
|
||||
(void)pcre2_set_compile_extra_options(compile_context, 0);
|
||||
|
||||
/* If there are include or exclude patterns read from the command line, compile
|
||||
them. -F, -w, and -x do not apply, so the third argument of compile_pattern is
|
||||
0. */
|
||||
them. */
|
||||
|
||||
for (j = 0; j < 4; j++)
|
||||
{
|
||||
int k;
|
||||
for (k = 1, cp = *(incexlist[j]); cp != NULL; k++, cp = cp->next)
|
||||
{
|
||||
if (!compile_pattern(cp, pcre2_options, 0, FALSE, incexname[j],
|
||||
if (!compile_pattern(cp, pcre2_options, FALSE, incexname[j],
|
||||
(k == 1 && cp->next == NULL)? 0 : k))
|
||||
goto EXIT2;
|
||||
}
|
||||
|
@ -4098,13 +4075,13 @@ for (j = 0; j < 4; j++)
|
|||
|
||||
for (fn = include_from; fn != NULL; fn = fn->next)
|
||||
{
|
||||
if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last, 0))
|
||||
if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last))
|
||||
goto EXIT2;
|
||||
}
|
||||
|
||||
for (fn = exclude_from; fn != NULL; fn = fn->next)
|
||||
{
|
||||
if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last, 0))
|
||||
if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last))
|
||||
goto EXIT2;
|
||||
}
|
||||
|
||||
|
@ -4123,7 +4100,7 @@ read them line by line and search the given files. */
|
|||
|
||||
for (fn = file_lists; fn != NULL; fn = fn->next)
|
||||
{
|
||||
char buffer[PATBUFSIZE];
|
||||
char buffer[FNBUFSIZ];
|
||||
FILE *fl;
|
||||
if (strcmp(fn->name, "-") == 0) fl = stdin; else
|
||||
{
|
||||
|
@ -4135,7 +4112,7 @@ for (fn = file_lists; fn != NULL; fn = fn->next)
|
|||
goto EXIT2;
|
||||
}
|
||||
}
|
||||
while (fgets(buffer, PATBUFSIZE, fl) != NULL)
|
||||
while (fgets(buffer, sizeof(buffer), fl) != NULL)
|
||||
{
|
||||
int frc;
|
||||
char *end = buffer + (int)strlen(buffer);
|
||||
|
|
|
@ -2,3 +2,8 @@ The quick brown
|
|||
fox jumps
|
||||
over the lazy dog.
|
||||
This time it jumps and jumps and jumps.
|
||||
This line contains \E and (regex) *meta* [characters].
|
||||
The word is cat in this line
|
||||
The caterpillar sat on the mat
|
||||
The snowcat is not an animal
|
||||
A buried feline in the syndicate
|
||||
|
|
|
@ -454,6 +454,11 @@ RC=1
|
|||
---------------------------- Test 51 ------------------------------
|
||||
over the lazy dog.
|
||||
This time it jumps and jumps and jumps.
|
||||
This line contains \E and (regex) *meta* [characters].
|
||||
The word is cat in this line
|
||||
The caterpillar sat on the mat
|
||||
The snowcat is not an animal
|
||||
A buried feline in the syndicate
|
||||
RC=0
|
||||
---------------------------- Test 52 ------------------------------
|
||||
fox [1;31mjumps[0m
|
||||
|
@ -788,32 +793,32 @@ RC=0
|
|||
37216,12
|
||||
RC=0
|
||||
---------------------------- Test 113 -----------------------------
|
||||
476
|
||||
478
|
||||
RC=0
|
||||
---------------------------- Test 114 -----------------------------
|
||||
testdata/grepinput:469
|
||||
testdata/grepinput3:0
|
||||
testdata/grepinput8:0
|
||||
testdata/grepinputv:1
|
||||
testdata/grepinputv:3
|
||||
testdata/grepinputx:6
|
||||
TOTAL:476
|
||||
TOTAL:478
|
||||
RC=0
|
||||
---------------------------- Test 115 -----------------------------
|
||||
testdata/grepinput:469
|
||||
testdata/grepinputv:1
|
||||
testdata/grepinputv:3
|
||||
testdata/grepinputx:6
|
||||
TOTAL:476
|
||||
TOTAL:478
|
||||
RC=0
|
||||
---------------------------- Test 116 -----------------------------
|
||||
476
|
||||
478
|
||||
RC=0
|
||||
---------------------------- Test 117 -----------------------------
|
||||
469
|
||||
0
|
||||
0
|
||||
1
|
||||
3
|
||||
6
|
||||
476
|
||||
478
|
||||
RC=0
|
||||
---------------------------- Test 118 -----------------------------
|
||||
testdata/grepinput3
|
||||
|
@ -834,3 +839,14 @@ RC=0
|
|||
./testdata/grepinput:a binary zero:zeroa
|
||||
./testdata/grepinput:the binary zero.:zerothe.
|
||||
RC=0
|
||||
---------------------------- Test 121 -----------------------------
|
||||
This line contains \E and (regex) *meta* [characters].
|
||||
RC=0
|
||||
---------------------------- Test 122 -----------------------------
|
||||
over the lazy dog.
|
||||
The word is cat in this line
|
||||
RC=0
|
||||
---------------------------- Test 122 -----------------------------
|
||||
over the lazy dog.
|
||||
The word is cat in this line
|
||||
RC=0
|
||||
|
|
|
@ -1,14 +1,42 @@
|
|||
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
|
||||
Arg1: [T] [his] [s] Arg2: |T| () () (0)
|
||||
Arg1: [T] [his] [s] Arg2: |T| () () (0)
|
||||
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
|
||||
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
|
||||
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
|
||||
The quick brown
|
||||
This time it jumps and jumps and jumps.
|
||||
This line contains \E and (regex) *meta* [characters].
|
||||
The word is cat in this line
|
||||
The caterpillar sat on the mat
|
||||
The snowcat is not an animal
|
||||
Arg1: [qu] [qu]
|
||||
Arg1: [ t] [ t]
|
||||
Arg1: [ l] [ l]
|
||||
Arg1: [wo] [wo]
|
||||
Arg1: [ca] [ca]
|
||||
Arg1: [sn] [sn]
|
||||
The quick brown
|
||||
This time it jumps and jumps and jumps.
|
||||
This line contains \E and (regex) *meta* [characters].
|
||||
The word is cat in this line
|
||||
The caterpillar sat on the mat
|
||||
The snowcat is not an animal
|
||||
0:T
|
||||
The quick brown
|
||||
0:T
|
||||
This time it jumps and jumps and jumps.
|
||||
0:T
|
||||
This line contains \E and (regex) *meta* [characters].
|
||||
0:T
|
||||
The word is cat in this line
|
||||
0:T
|
||||
The caterpillar sat on the mat
|
||||
0:T
|
||||
The snowcat is not an animal
|
||||
T
|
||||
T
|
||||
T
|
||||
T
|
||||
T
|
||||
T
|
||||
|
|
Loading…
Reference in New Issue