diff --git a/ChangeLog b/ChangeLog index 3812c23..67f393c 100644 --- a/ChangeLog +++ b/ChangeLog @@ -192,6 +192,12 @@ pattern lines. 42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit of pcre2grep. +43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL, +PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs: + + (a) The -F option did not work for fixed strings containing \E. + (b) The -w option did not work for patterns with multiple branches. + Version 10.23 14-February-2017 ------------------------------ diff --git a/RunGrepTest b/RunGrepTest index 205caf0..7c3498a 100755 --- a/RunGrepTest +++ b/RunGrepTest @@ -602,6 +602,19 @@ echo "---------------------------- Test 120 ------------------------------" >>te (cd $srcdir; $valgrind $vjs $pcre2grep -HO '$0:$2$1$3' '(\w+) binary (\w+)(\.)?' ./testdata/grepinput) >>testtrygrep echo "RC=$?" >>testtrygrep +echo "---------------------------- Test 121 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -F '\E and (regex)' testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 122 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -w 'cat|dog' testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 122 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -w 'dog|cat' testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + + # Now compare the results. $cf $srcdir/testdata/grepoutput testtrygrep diff --git a/doc/html/pcre2grep.html b/doc/html/pcre2grep.html index ec2f726..0a028a0 100644 --- a/doc/html/pcre2grep.html +++ b/doc/html/pcre2grep.html @@ -740,20 +740,21 @@ the patterns are the ones that are found.

-w, --word-regex, --word-regexp -Force the patterns to match only whole words. This is equivalent to having \b -at the start and end of the pattern. This option applies only to the patterns -that are matched against the contents of files; it does not apply to patterns -specified by any of the --include or --exclude options. +Force the patterns only to match "words". That is, there must be a word +boundary at the start and end of each matched string. This is equivalent to +having "\b(?:" at the start of each pattern, and ")\b" at the end. This +option applies only to the patterns that are matched against the contents of +files; it does not apply to patterns specified by any of the --include or +--exclude options.

-x, --line-regex, --line-regexp -Force the patterns to be anchored (each must start matching at the beginning of -a line) and in addition, require them to match entire lines. In multiline mode -the match may be more than one line. This is equivalent to having \A and \Z -characters at the start and end of each alternative top-level branch in every -pattern. This option applies only to the patterns that are matched against the -contents of files; it does not apply to patterns specified by any of the ---include or --exclude options. +Force the patterns to start matching only at the beginnings of lines, and in +addition, require them to match entire lines. In multiline mode the match may +be more than one line. This is equivalent to having "^(?:" at the start of each +pattern and ")$" at the end. This option applies only to the patterns that are +matched against the contents of files; it does not apply to patterns specified +by any of the --include or --exclude options.


ENVIRONMENT VARIABLES

@@ -936,7 +937,7 @@ Cambridge, England.


REVISION

-Last updated: 26 May 2017 +Last updated: 17 June 2017
Copyright © 1997-2017 University of Cambridge.
diff --git a/doc/pcre2grep.1 b/doc/pcre2grep.1 index 2b23347..f6f79b4 100644 --- a/doc/pcre2grep.1 +++ b/doc/pcre2grep.1 @@ -1,4 +1,4 @@ -.TH PCRE2GREP 1 "26 May 2017" "PCRE2 10.30" +.TH PCRE2GREP 1 "17 June 2017" "PCRE2 10.30" .SH NAME pcre2grep - a grep with Perl-compatible regular expressions. .SH SYNOPSIS @@ -639,19 +639,20 @@ Invert the sense of the match, so that lines which do \fInot\fP match any of the patterns are the ones that are found. .TP \fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP -Force the patterns to match only whole words. This is equivalent to having \eb -at the start and end of the pattern. This option applies only to the patterns -that are matched against the contents of files; it does not apply to patterns -specified by any of the \fB--include\fP or \fB--exclude\fP options. +Force the patterns only to match "words". That is, there must be a word +boundary at the start and end of each matched string. This is equivalent to +having "\eb(?:" at the start of each pattern, and ")\eb" at the end. This +option applies only to the patterns that are matched against the contents of +files; it does not apply to patterns specified by any of the \fB--include\fP or +\fB--exclude\fP options. .TP \fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP -Force the patterns to be anchored (each must start matching at the beginning of -a line) and in addition, require them to match entire lines. In multiline mode -the match may be more than one line. This is equivalent to having \eA and \eZ -characters at the start and end of each alternative top-level branch in every -pattern. This option applies only to the patterns that are matched against the -contents of files; it does not apply to patterns specified by any of the -\fB--include\fP or \fB--exclude\fP options. +Force the patterns to start matching only at the beginnings of lines, and in +addition, require them to match entire lines. In multiline mode the match may +be more than one line. This is equivalent to having "^(?:" at the start of each +pattern and ")$" at the end. This option applies only to the patterns that are +matched against the contents of files; it does not apply to patterns specified +by any of the \fB--include\fP or \fB--exclude\fP options. . . .SH "ENVIRONMENT VARIABLES" @@ -850,6 +851,6 @@ Cambridge, England. .rs .sp .nf -Last updated: 26 May 2017 +Last updated: 17 June 2017 Copyright (c) 1997-2017 University of Cambridge. .fi diff --git a/doc/pcre2grep.txt b/doc/pcre2grep.txt index 612fbd9..7dee204 100644 --- a/doc/pcre2grep.txt +++ b/doc/pcre2grep.txt @@ -718,29 +718,30 @@ OPTIONS match any of the patterns are the ones that are found. -w, --word-regex, --word-regexp - Force the patterns to match only whole words. This is equiva- - lent to having \b at the start and end of the pattern. This - option applies only to the patterns that are matched against - the contents of files; it does not apply to patterns speci- - fied by any of the --include or --exclude options. + Force the patterns only to match "words". That is, there must + be a word boundary at the start and end of each matched + string. This is equivalent to having "\b(?:" at the start of + each pattern, and ")\b" at the end. This option applies only + to the patterns that are matched against the contents of + files; it does not apply to patterns specified by any of the + --include or --exclude options. -x, --line-regex, --line-regexp - Force the patterns to be anchored (each must start matching - at the beginning of a line) and in addition, require them to - match entire lines. In multiline mode the match may be more - than one line. This is equivalent to having \A and \Z charac- - ters at the start and end of each alternative top-level - branch in every pattern. This option applies only to the pat- - terns that are matched against the contents of files; it does - not apply to patterns specified by any of the --include or - --exclude options. + Force the patterns to start matching only at the beginnings + of lines, and in addition, require them to match entire + lines. In multiline mode the match may be more than one line. + This is equivalent to having "^(?:" at the start of each pat- + tern and ")$" at the end. This option applies only to the + patterns that are matched against the contents of files; it + does not apply to patterns specified by any of the --include + or --exclude options. ENVIRONMENT VARIABLES - The environment variables LC_ALL and LC_CTYPE are examined, in that - order, for a locale. The first one that is set is used. This can be - overridden by the --locale option. If no locale is set, the PCRE2 + The environment variables LC_ALL and LC_CTYPE are examined, in that + order, for a locale. The first one that is set is used. This can be + overridden by the --locale option. If no locale is set, the PCRE2 library's default (usually the "C" locale) is used. @@ -748,99 +749,99 @@ NEWLINES The -N (--newline) option allows pcre2grep to scan files with different newline conventions from the default. Any parts of the input files that - are written to the standard output are copied identically, with what- - ever newline sequences they have in the input. However, the setting of - this option does not affect the interpretation of files specified by + are written to the standard output are copied identically, with what- + ever newline sequences they have in the input. However, the setting of + this option does not affect the interpretation of files specified by the -f, --exclude-from, or --include-from options, which are assumed to - use the operating system's standard newline sequence, nor does it - affect the way in which pcre2grep writes informational messages to the + use the operating system's standard newline sequence, nor does it + affect the way in which pcre2grep writes informational messages to the standard error and output streams. For these it uses the string "\n" to - indicate newlines, relying on the C I/O library to convert this to an + indicate newlines, relying on the C I/O library to convert this to an appropriate sequence. OPTIONS COMPATIBILITY Many of the short and long forms of pcre2grep's options are the same as - in the GNU grep program. Any long option of the form --xxx-regexp (GNU + in the GNU grep program. Any long option of the form --xxx-regexp (GNU terminology) is also available as --xxx-regex (PCRE2 terminology). How- - ever, the --depth-limit, --file-list, --file-offsets, --heap-limit, - --include-dir, --line-offsets, --locale, --match-limit, -M, --multi- - line, -N, --newline, --om-separator, --output, -u, and --utf-8 options - are specific to pcre2grep, as is the use of the --only-matching option + ever, the --depth-limit, --file-list, --file-offsets, --heap-limit, + --include-dir, --line-offsets, --locale, --match-limit, -M, --multi- + line, -N, --newline, --om-separator, --output, -u, and --utf-8 options + are specific to pcre2grep, as is the use of the --only-matching option with a capturing parentheses number. - Although most of the common options work the same way, a few are dif- - ferent in pcre2grep. For example, the --include option's argument is a - glob for GNU grep, but a regular expression for pcre2grep. If both the - -c and -l options are given, GNU grep lists only file names, without + Although most of the common options work the same way, a few are dif- + ferent in pcre2grep. For example, the --include option's argument is a + glob for GNU grep, but a regular expression for pcre2grep. If both the + -c and -l options are given, GNU grep lists only file names, without counts, but pcre2grep gives the counts as well. OPTIONS WITH DATA There are four different ways in which an option with data can be spec- - ified. If a short form option is used, the data may follow immedi- + ified. If a short form option is used, the data may follow immedi- ately, or (with one exception) in the next command line item. For exam- ple: -f/some/file -f /some/file - The exception is the -o option, which may appear with or without data. - Because of this, if data is present, it must follow immediately in the + The exception is the -o option, which may appear with or without data. + Because of this, if data is present, it must follow immediately in the same item, for example -o3. - If a long form option is used, the data may appear in the same command - line item, separated by an equals character, or (with two exceptions) + If a long form option is used, the data may appear in the same command + line item, separated by an equals character, or (with two exceptions) it may appear in the next command line item. For example: --file=/some/file --file /some/file - Note, however, that if you want to supply a file name beginning with ~ - as data in a shell command, and have the shell expand ~ to a home + Note, however, that if you want to supply a file name beginning with ~ + as data in a shell command, and have the shell expand ~ to a home directory, you must separate the file name from the option, because the shell does not treat ~ specially unless it is at the start of an item. - The exceptions to the above are the --colour (or --color) and --only- - matching options, for which the data is optional. If one of these - options does have data, it must be given in the first form, using an + The exceptions to the above are the --colour (or --color) and --only- + matching options, for which the data is optional. If one of these + options does have data, it must be given in the first form, using an equals character. Otherwise pcre2grep will assume that it has no data. USING PCRE2'S CALLOUT FACILITY - pcre2grep has, by default, support for calling external programs or - scripts or echoing specific strings during matching by making use of - PCRE2's callout facility. However, this support can be disabled when - pcre2grep is built. You can find out whether your binary has support - for callouts by running it with the --help option. If the support is + pcre2grep has, by default, support for calling external programs or + scripts or echoing specific strings during matching by making use of + PCRE2's callout facility. However, this support can be disabled when + pcre2grep is built. You can find out whether your binary has support + for callouts by running it with the --help option. If the support is not enabled, all callouts in patterns are ignored by pcre2grep. - A callout in a PCRE2 pattern is of the form (?C) where the argu- - ment is either a number or a quoted string (see the pcre2callout docu- - mentation for details). Numbered callouts are ignored by pcre2grep; + A callout in a PCRE2 pattern is of the form (?C) where the argu- + ment is either a number or a quoted string (see the pcre2callout docu- + mentation for details). Numbered callouts are ignored by pcre2grep; only callouts with string arguments are useful. Calling external programs or scripts If the callout string does not start with a pipe (vertical bar) charac- - ter, it is parsed into a list of substrings separated by pipe charac- - ters. The first substring must be an executable name, with the follow- + ter, it is parsed into a list of substrings separated by pipe charac- + ters. The first substring must be an executable name, with the follow- ing substrings specifying arguments: executable_name|arg1|arg2|... - Any substring (including the executable name) may contain escape - sequences started by a dollar character: $ or ${} is - replaced by the captured substring of the given decimal number, which - must be greater than zero. If the number is greater than the number of - capturing substrings, or if the capture is unset, the replacement is + Any substring (including the executable name) may contain escape + sequences started by a dollar character: $ or ${} is + replaced by the captured substring of the given decimal number, which + must be greater than zero. If the number is greater than the number of + capturing substrings, or if the capture is unset, the replacement is empty. - Any other character is substituted by itself. In particular, $$ is - replaced by a single dollar and $| is replaced by a pipe character. + Any other character is substituted by itself. In particular, $$ is + replaced by a single dollar and $| is replaced by a pipe character. Here is an example: echo -e "abcde\n12345" | pcre2grep \ @@ -856,49 +857,49 @@ USING PCRE2'S CALLOUT FACILITY The parameters for the execv() system call that is used to run the pro- gram or script are zero-terminated strings. This means that binary zero - characters in the callout argument will cause premature termination of - their substrings, and therefore should not be present. Any syntax - errors in the string (for example, a dollar not followed by another - character) cause the callout to be ignored. If running the program + characters in the callout argument will cause premature termination of + their substrings, and therefore should not be present. Any syntax + errors in the string (for example, a dollar not followed by another + character) cause the callout to be ignored. If running the program fails for any reason (including the non-existence of the executable), a - local matching failure occurs and the matcher backtracks in the normal + local matching failure occurs and the matcher backtracks in the normal way. Echoing a specific string - If the callout string starts with a pipe (vertical bar) character, the + If the callout string starts with a pipe (vertical bar) character, the rest of the string is written to the output, having been passed through - the same escape processing as text from the --output option. This pro- + the same escape processing as text from the --output option. This pro- vides a simple echoing facility that avoids calling an external program - or script. No terminator is added to the string, so if you want a new- - line, you must include it explicitly. Matching continues normally - after the string is output. If you want to see only the callout output - but not any output from an actual match, you should end the relevant + or script. No terminator is added to the string, so if you want a new- + line, you must include it explicitly. Matching continues normally + after the string is output. If you want to see only the callout output + but not any output from an actual match, you should end the relevant pattern with (*FAIL). MATCHING ERRORS - It is possible to supply a regular expression that takes a very long - time to fail to match certain lines. Such patterns normally involve - nested indefinite repeats, for example: (a+)*\d when matched against a - line of a's with no final digit. The PCRE2 matching function has a - resource limit that causes it to abort in these circumstances. If this - happens, pcre2grep outputs an error message and the line that caused - the problem to the standard error stream. If there are more than 20 + It is possible to supply a regular expression that takes a very long + time to fail to match certain lines. Such patterns normally involve + nested indefinite repeats, for example: (a+)*\d when matched against a + line of a's with no final digit. The PCRE2 matching function has a + resource limit that causes it to abort in these circumstances. If this + happens, pcre2grep outputs an error message and the line that caused + the problem to the standard error stream. If there are more than 20 such errors, pcre2grep gives up. - The --match-limit option of pcre2grep can be used to set the overall - resource limit. There are also other limits that affect the amount of - memory used during matching; see the discussion of --heap-limit and + The --match-limit option of pcre2grep can be used to set the overall + resource limit. There are also other limits that affect the amount of + memory used during matching; see the discussion of --heap-limit and --depth-limit above. DIAGNOSTICS Exit status is 0 if any matches were found, 1 if no matches were found, - and 2 for syntax errors, overlong lines, non-existent or inaccessible - files (even if matches were found in other files) or too many matching + and 2 for syntax errors, overlong lines, non-existent or inaccessible + files (even if matches were found in other files) or too many matching errors. Using the -s option to suppress error messages about inaccessi- ble files does not affect the return code. @@ -917,5 +918,5 @@ AUTHOR REVISION - Last updated: 26 May 2017 + Last updated: 17 June 2017 Copyright (c) 1997-2017 University of Cambridge. diff --git a/src/pcre2grep.c b/src/pcre2grep.c index 8e4b1e8..eda9daa 100644 --- a/src/pcre2grep.c +++ b/src/pcre2grep.c @@ -103,7 +103,8 @@ typedef int BOOL; #define MAXPATLEN 8192 #endif -#define PATBUFSIZE (MAXPATLEN + 10) /* Allows for prefix+suffix */ +#define FNBUFSIZ 1024 +#define ERRBUFSIZ 256 /* Values for the "filenames" variable, which specifies options for file name output. The order is important; it is assumed that a file name is wanted for @@ -211,7 +212,7 @@ static BOOL use_jit = FALSE; static const uint8_t *character_tables = NULL; static uint32_t pcre2_options = 0; -static uint32_t process_options = 0; +static uint32_t extra_options = 0; static PCRE2_SIZE heap_limit = PCRE2_UNSET; static uint32_t match_limit = 0; static uint32_t depth_limit = 0; @@ -441,19 +442,6 @@ of PCRE2_NEWLINE_xx in pcre2.h. */ static const char *newlines[] = { "DEFAULT", "CR", "LF", "CRLF", "ANY", "ANYCRLF", "NUL" }; -/* Tables for prefixing and suffixing patterns, according to the -w, -x, and -F -options. These set the 1, 2, and 4 bits in process_options, respectively. Note -that the combination of -w and -x has the same effect as -x on its own, so we -can treat them as the same. Note that the MAXPATLEN macro assumes the longest -prefix+suffix is 10 characters; if anything longer is added, it must be -adjusted. */ - -static const char *prefix[] = { - "", "\\b", "^(?:", "^(?:", "\\Q", "\\b\\Q", "^(?:\\Q", "^(?:\\Q" }; - -static const char *suffix[] = { - "", "\\b", ")$", ")$", "\\E", "\\E\\b", "\\E)$", "\\E)$" }; - /* UTF-8 tables - used only when the newline setting is "any". */ const int utf8_table3[] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01}; @@ -2339,7 +2327,7 @@ file. However, when the newline convention is binary zero, we can't do this. */ if (binary_files != BIN_TEXT) { if (endlinetype != PCRE2_NEWLINE_NUL) - binary = memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength) + binary = memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength) != NULL; if (binary && binary_files == BIN_NOMATCH) return 1; } @@ -3224,7 +3212,7 @@ switch(letter) case N_NOJIT: use_jit = FALSE; break; case 'a': binary_files = BIN_TEXT; break; case 'c': count_only = TRUE; break; - case 'F': process_options |= PO_FIXED_STRINGS; break; + case 'F': options |= PCRE2_LITERAL; break; case 'H': filenames = FN_FORCE; break; case 'I': binary_files = BIN_NOMATCH; break; case 'h': filenames = FN_NONE; break; @@ -3245,8 +3233,8 @@ switch(letter) case 't': show_total_count = TRUE; break; case 'u': options |= PCRE2_UTF; utf = TRUE; break; case 'v': invert = TRUE; break; - case 'w': process_options |= PO_WORD_MATCH; break; - case 'x': process_options |= PO_LINE_MATCH; break; + case 'w': extra_options |= PCRE2_EXTRA_MATCH_WORD; break; + case 'x': extra_options |= PCRE2_EXTRA_MATCH_LINE; break; case 'V': { @@ -3309,7 +3297,6 @@ pattern chain. Arguments: p points to the pattern block options the PCRE options - popts the processing options fromfile TRUE if the pattern was read from a file fromtext file name or identifying text (e.g. "include") count 0 if this is the only command line pattern, or @@ -3320,18 +3307,20 @@ Returns: TRUE on success, FALSE after an error */ static BOOL -compile_pattern(patstr *p, int options, int popts, int fromfile, - const char *fromtext, int count) +compile_pattern(patstr *p, int options, int fromfile, const char *fromtext, + int count) { -unsigned char buffer[PATBUFSIZE]; -PCRE2_SIZE erroffset; -char *ps = p->string; -unsigned int patlen = strlen(ps); +char *ps; int errcode; +PCRE2_SIZE patlen, erroffset; +PCRE2_UCHAR errmessbuffer[ERRBUFSIZ]; if (p->compiled != NULL) return TRUE; -if ((popts & PO_FIXED_STRINGS) != 0) +ps = p->string; +patlen = strlen(ps); + +if ((options & PCRE2_LITERAL) != 0) { int ellength; char *eop = ps + patlen; @@ -3344,8 +3333,7 @@ if ((popts & PO_FIXED_STRINGS) != 0) } } -sprintf((char *)buffer, "%s%.*s%s", prefix[popts], patlen, ps, suffix[popts]); -p->compiled = pcre2_compile(buffer, PCRE2_ZERO_TERMINATED, options, &errcode, +p->compiled = pcre2_compile((PCRE2_SPTR)ps, patlen, options, &errcode, &erroffset, compile_context); /* Handle successful compile. Try JIT-compiling if supported and enabled. We @@ -3362,23 +3350,22 @@ if (p->compiled != NULL) /* Handle compile errors */ -erroffset -= (int)strlen(prefix[popts]); if (erroffset > patlen) erroffset = patlen; -pcre2_get_error_message(errcode, buffer, PATBUFSIZE); +pcre2_get_error_message(errcode, errmessbuffer, sizeof(errmessbuffer)); if (fromfile) { fprintf(stderr, "pcre2grep: Error in regex in line %d of %s " - "at offset %d: %s\n", count, fromtext, (int)erroffset, buffer); + "at offset %d: %s\n", count, fromtext, (int)erroffset, errmessbuffer); } else { if (count == 0) fprintf(stderr, "pcre2grep: Error in %s regex at offset %d: %s\n", - fromtext, (int)erroffset, buffer); + fromtext, (int)erroffset, errmessbuffer); else fprintf(stderr, "pcre2grep: Error in %s %s regex at offset %d: %s\n", - ordin(count), fromtext, (int)erroffset, buffer); + ordin(count), fromtext, (int)erroffset, errmessbuffer); } return FALSE; @@ -3396,18 +3383,17 @@ Arguments: name the name of the file; "-" is stdin patptr pointer to the pattern chain anchor patlastptr pointer to the last pattern pointer - popts the process options to pass to pattern_compile() Returns: TRUE if all went well */ static BOOL -read_pattern_file(char *name, patstr **patptr, patstr **patlastptr, int popts) +read_pattern_file(char *name, patstr **patptr, patstr **patlastptr) { int linenumber = 0; FILE *f; const char *filename; -char buffer[PATBUFSIZE]; +char buffer[MAXPATLEN+20]; if (strcmp(name, "-") == 0) { @@ -3425,7 +3411,7 @@ else filename = name; } -while (fgets(buffer, PATBUFSIZE, f) != NULL) +while (fgets(buffer, sizeof(buffer), f) != NULL) { char *s = buffer + (int)strlen(buffer); while (s > buffer && isspace((unsigned char)(s[-1]))) s--; @@ -3453,7 +3439,7 @@ while (fgets(buffer, PATBUFSIZE, f) != NULL) for(;;) { - if (!compile_pattern(*patlastptr, pcre2_options, popts, TRUE, filename, + if (!compile_pattern(*patlastptr, pcre2_options, TRUE, filename, linenumber)) { if (f != stdin) fclose(f); @@ -3823,7 +3809,7 @@ for (i = 1; i < argc; i++) { unsigned long int n = decode_number(option_data, op, longop); if (op->type == OP_U32NUMBER) *((uint32_t *)op->dataptr) = n; - else if (op->type == OP_SIZE) *((PCRE2_SIZE *)op->dataptr) = n; + else if (op->type == OP_SIZE) *((PCRE2_SIZE *)op->dataptr) = n; else *((int *)op->dataptr) = n; } } @@ -3978,6 +3964,10 @@ if (DEE_option != NULL) } } +/* Set the extra options */ + +(void)pcre2_set_compile_extra_options(compile_context, extra_options); + /* Check the values for Jeffrey Friedl's debugging options. */ #ifdef JFRIEDL_DEBUG @@ -4038,7 +4028,7 @@ chain, so we must not access the next pointer till after the compile. */ for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next) { - if (!compile_pattern(cp, pcre2_options, process_options, FALSE, "command-line", + if (!compile_pattern(cp, pcre2_options, FALSE, "command-line", (j == 1 && patterns->next == NULL)? 0 : j)) goto EXIT2; } @@ -4047,48 +4037,35 @@ for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next) for (fn = pattern_files; fn != NULL; fn = fn->next) { - if (!read_pattern_file(fn->name, &patterns, &patterns_last, process_options)) - goto EXIT2; + if (!read_pattern_file(fn->name, &patterns, &patterns_last)) goto EXIT2; } /* Unless JIT has been explicitly disabled, arrange a stack for it to use. */ - -#ifdef NEVER -#ifdef SUPPORT_PCRE2GREP_JIT -if (use_jit) - jit_stack = pcre2_jit_stack_create(32*1024, 1024*1024, NULL); -#endif - -for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next) - { -#ifdef SUPPORT_PCRE2GREP_JIT - if (jit_stack != NULL && cp->compiled != NULL) - pcre2_jit_stack_assign(match_context, NULL, jit_stack); -#endif - } -#endif - - #ifdef SUPPORT_PCRE2GREP_JIT if (use_jit) { jit_stack = pcre2_jit_stack_create(32*1024, 1024*1024, NULL); if (jit_stack != NULL ) pcre2_jit_stack_assign(match_context, NULL, jit_stack); - } + } #endif +/* -F, -w, and -x do not apply to include or exclude patterns, so we must +adjust the options. */ + +pcre2_options &= ~PCRE2_LITERAL; +(void)pcre2_set_compile_extra_options(compile_context, 0); + /* If there are include or exclude patterns read from the command line, compile -them. -F, -w, and -x do not apply, so the third argument of compile_pattern is -0. */ +them. */ for (j = 0; j < 4; j++) { int k; for (k = 1, cp = *(incexlist[j]); cp != NULL; k++, cp = cp->next) { - if (!compile_pattern(cp, pcre2_options, 0, FALSE, incexname[j], + if (!compile_pattern(cp, pcre2_options, FALSE, incexname[j], (k == 1 && cp->next == NULL)? 0 : k)) goto EXIT2; } @@ -4098,13 +4075,13 @@ for (j = 0; j < 4; j++) for (fn = include_from; fn != NULL; fn = fn->next) { - if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last, 0)) + if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last)) goto EXIT2; } for (fn = exclude_from; fn != NULL; fn = fn->next) { - if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last, 0)) + if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last)) goto EXIT2; } @@ -4123,7 +4100,7 @@ read them line by line and search the given files. */ for (fn = file_lists; fn != NULL; fn = fn->next) { - char buffer[PATBUFSIZE]; + char buffer[FNBUFSIZ]; FILE *fl; if (strcmp(fn->name, "-") == 0) fl = stdin; else { @@ -4135,7 +4112,7 @@ for (fn = file_lists; fn != NULL; fn = fn->next) goto EXIT2; } } - while (fgets(buffer, PATBUFSIZE, fl) != NULL) + while (fgets(buffer, sizeof(buffer), fl) != NULL) { int frc; char *end = buffer + (int)strlen(buffer); diff --git a/testdata/grepinputv b/testdata/grepinputv index d33d326..366d4fb 100644 --- a/testdata/grepinputv +++ b/testdata/grepinputv @@ -2,3 +2,8 @@ The quick brown fox jumps over the lazy dog. This time it jumps and jumps and jumps. +This line contains \E and (regex) *meta* [characters]. +The word is cat in this line +The caterpillar sat on the mat +The snowcat is not an animal +A buried feline in the syndicate diff --git a/testdata/grepoutput b/testdata/grepoutput index 2fdd660..52e0d17 100644 --- a/testdata/grepoutput +++ b/testdata/grepoutput @@ -454,6 +454,11 @@ RC=1 ---------------------------- Test 51 ------------------------------ over the lazy dog. This time it jumps and jumps and jumps. +This line contains \E and (regex) *meta* [characters]. +The word is cat in this line +The caterpillar sat on the mat +The snowcat is not an animal +A buried feline in the syndicate RC=0 ---------------------------- Test 52 ------------------------------ fox jumps @@ -788,32 +793,32 @@ RC=0 37216,12 RC=0 ---------------------------- Test 113 ----------------------------- -476 +478 RC=0 ---------------------------- Test 114 ----------------------------- testdata/grepinput:469 testdata/grepinput3:0 testdata/grepinput8:0 -testdata/grepinputv:1 +testdata/grepinputv:3 testdata/grepinputx:6 -TOTAL:476 +TOTAL:478 RC=0 ---------------------------- Test 115 ----------------------------- testdata/grepinput:469 -testdata/grepinputv:1 +testdata/grepinputv:3 testdata/grepinputx:6 -TOTAL:476 +TOTAL:478 RC=0 ---------------------------- Test 116 ----------------------------- -476 +478 RC=0 ---------------------------- Test 117 ----------------------------- 469 0 0 -1 +3 6 -476 +478 RC=0 ---------------------------- Test 118 ----------------------------- testdata/grepinput3 @@ -834,3 +839,14 @@ RC=0 ./testdata/grepinput:a binary zero:zeroa ./testdata/grepinput:the binary zero.:zerothe. RC=0 +---------------------------- Test 121 ----------------------------- +This line contains \E and (regex) *meta* [characters]. +RC=0 +---------------------------- Test 122 ----------------------------- +over the lazy dog. +The word is cat in this line +RC=0 +---------------------------- Test 122 ----------------------------- +over the lazy dog. +The word is cat in this line +RC=0 diff --git a/testdata/grepoutputC b/testdata/grepoutputC index 2545079..60f249f 100644 --- a/testdata/grepoutputC +++ b/testdata/grepoutputC @@ -1,14 +1,42 @@ Arg1: [T] [he ] [ ] Arg2: |T| () () (0) Arg1: [T] [his] [s] Arg2: |T| () () (0) +Arg1: [T] [his] [s] Arg2: |T| () () (0) +Arg1: [T] [he ] [ ] Arg2: |T| () () (0) +Arg1: [T] [he ] [ ] Arg2: |T| () () (0) +Arg1: [T] [he ] [ ] Arg2: |T| () () (0) The quick brown This time it jumps and jumps and jumps. +This line contains \E and (regex) *meta* [characters]. +The word is cat in this line +The caterpillar sat on the mat +The snowcat is not an animal Arg1: [qu] [qu] Arg1: [ t] [ t] +Arg1: [ l] [ l] +Arg1: [wo] [wo] +Arg1: [ca] [ca] +Arg1: [sn] [sn] The quick brown This time it jumps and jumps and jumps. +This line contains \E and (regex) *meta* [characters]. +The word is cat in this line +The caterpillar sat on the mat +The snowcat is not an animal 0:T The quick brown 0:T This time it jumps and jumps and jumps. +0:T +This line contains \E and (regex) *meta* [characters]. +0:T +The word is cat in this line +0:T +The caterpillar sat on the mat +0:T +The snowcat is not an animal +T +T +T +T T T