Update pcre2grep documentation to give more details of -M matching.
This commit is contained in:
parent
5a18651441
commit
4819827879
|
@ -67,22 +67,23 @@ If no files are specified, <b>pcre2grep</b> reads the standard input. The
|
|||
standard input can also be referenced by a name consisting of a single hyphen.
|
||||
For example:
|
||||
<pre>
|
||||
pcre2grep some-pattern /file1 - /file3
|
||||
pcre2grep some-pattern file1 - file3
|
||||
</pre>
|
||||
By default, each line that matches a pattern is copied to the standard
|
||||
output, and if there is more than one file, the file name is output at the
|
||||
start of each line, followed by a colon. However, there are options that can
|
||||
change how <b>pcre2grep</b> behaves. In particular, the <b>-M</b> option makes it
|
||||
possible to search for patterns that span line boundaries. What defines a line
|
||||
boundary is controlled by the <b>-N</b> (<b>--newline</b>) option.
|
||||
Input files are searched line by line. By default, each line that matches a
|
||||
pattern is copied to the standard output, and if there is more than one file,
|
||||
the file name is output at the start of each line, followed by a colon.
|
||||
However, there are options that can change how <b>pcre2grep</b> behaves. In
|
||||
particular, the <b>-M</b> option makes it possible to search for strings that
|
||||
span line boundaries. What defines a line boundary is controlled by the
|
||||
<b>-N</b> (<b>--newline</b>) option.
|
||||
</P>
|
||||
<P>
|
||||
The amount of memory used for buffering files that are being scanned is
|
||||
controlled by a parameter that can be set by the <b>--buffer-size</b> option.
|
||||
The default value for this parameter is specified when <b>pcre2grep</b> is built,
|
||||
with the default default being 20K. A block of memory three times this size is
|
||||
used (to allow for buffering "before" and "after" lines). An error occurs if a
|
||||
line overflows the buffer.
|
||||
The default value for this parameter is specified when <b>pcre2grep</b> is
|
||||
built, with the default default being 20K. A block of memory three times this
|
||||
size is used (to allow for buffering "before" and "after" lines). An error
|
||||
occurs if a line overflows the buffer.
|
||||
</P>
|
||||
<P>
|
||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
||||
|
@ -149,11 +150,11 @@ to signify multiplication by 1024 or 1024*1024 respectively.
|
|||
<b>--</b>
|
||||
This terminates the list of options. It is useful if the next item on the
|
||||
command line starts with a hyphen but is not an option. This allows for the
|
||||
processing of patterns and filenames that start with hyphens.
|
||||
processing of patterns and file names that start with hyphens.
|
||||
</P>
|
||||
<P>
|
||||
<b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
|
||||
Output <i>number</i> lines of context after each matching line. If filenames
|
||||
Output <i>number</i> lines of context after each matching line. If file names
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
|
@ -167,7 +168,7 @@ Treat binary files as text. This is equivalent to
|
|||
</P>
|
||||
<P>
|
||||
<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
|
||||
Output <i>number</i> lines of context before each matching line. If filenames
|
||||
Output <i>number</i> lines of context before each matching line. If file names
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
|
@ -184,7 +185,8 @@ processed in the same way as any other file. In this case, when a match
|
|||
succeeds, the output may be binary garbage, which can have nasty effects if
|
||||
sent to a terminal. If the word is "without-match", which is equivalent to the
|
||||
<b>-I</b> option, binary files are not processed at all; they are assumed not to
|
||||
be of interest.
|
||||
be of interest and are skipped without causing any output or affecting the
|
||||
return code.
|
||||
</P>
|
||||
<P>
|
||||
<b>--buffer-size=</b><i>number</i>
|
||||
|
@ -198,10 +200,15 @@ This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
|
|||
</P>
|
||||
<P>
|
||||
<b>-c</b>, <b>--count</b>
|
||||
Do not output individual lines from the files that are being scanned; instead
|
||||
output the number of lines that would otherwise have been shown. If no lines
|
||||
are selected, the number zero is output. If several files are are being
|
||||
scanned, a count is output for each of them. However, if the
|
||||
Do not output lines from the files that are being scanned; instead output the
|
||||
number of matches (or non-matches if <b>-v</b> is used) that would otherwise
|
||||
have caused lines to be shown. By default, this count is the same as the number
|
||||
of suppressed lines, but if the <b>-M</b> (multiline) option is used (without
|
||||
<b>-v</b>), there may be more suppressed lines than the number of matches.
|
||||
<br>
|
||||
<br>
|
||||
If no lines are selected, the number zero is output. If several files are are
|
||||
being scanned, a count is output for each of them. However, if the
|
||||
<b>--files-with-matches</b> option is also used, only those files whose counts
|
||||
are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
|
||||
<b>-B</b>, and <b>-C</b> options are ignored.
|
||||
|
@ -271,10 +278,10 @@ of the line that matched.
|
|||
Files (but not directories) whose names match the pattern are skipped without
|
||||
being processed. This applies to all files, whether listed on the command line,
|
||||
obtained from <b>--file-list</b>, or by scanning a directory. The pattern is a
|
||||
PCRE2 regular expression, and is matched against the final component of the file
|
||||
name, not the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do not
|
||||
apply to this pattern. The option may be given any number of times in order to
|
||||
specify multiple patterns. If a file name matches both an <b>--include</b>
|
||||
PCRE2 regular expression, and is matched against the final component of the
|
||||
file name, not the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do
|
||||
not apply to this pattern. The option may be given any number of times in order
|
||||
to specify multiple patterns. If a file name matches both an <b>--include</b>
|
||||
and an <b>--exclude</b> pattern, it is excluded. There is no short form for this
|
||||
option.
|
||||
</P>
|
||||
|
@ -323,7 +330,7 @@ alternatives in the description of <b>-e</b> above.
|
|||
<br>
|
||||
<br>
|
||||
If this option is given more than once, all the specified files are
|
||||
read. A data line is output if any of the patterns match it. A filename can
|
||||
read. A data line is output if any of the patterns match it. A file name can
|
||||
be given as "-" to refer to the standard input. When <b>-f</b> is used, patterns
|
||||
specified on the command line using <b>-e</b> may also be present; they are
|
||||
tested before the file's patterns. However, no other pattern is taken from the
|
||||
|
@ -334,7 +341,7 @@ command line; all arguments are treated as the names of paths to be searched.
|
|||
Read a list of files and/or directories that are to be scanned from the given
|
||||
file, one per line. Trailing white space is removed from each line, and blank
|
||||
lines are ignored. These paths are processed before any that are listed on the
|
||||
command line. The filename can be given as "-" to refer to the standard input.
|
||||
command line. The file name can be given as "-" to refer to the standard input.
|
||||
If <b>--file</b> and <b>--file-list</b> are both specified as "-", patterns are
|
||||
read first. This is useful only when the standard input is a terminal, from
|
||||
which further lines (the list of files) can be read after an end-of-file
|
||||
|
@ -352,17 +359,18 @@ and <b>--only-matching</b>.
|
|||
</P>
|
||||
<P>
|
||||
<b>-H</b>, <b>--with-filename</b>
|
||||
Force the inclusion of the filename at the start of output lines when searching
|
||||
a single file. By default, the filename is not shown in this case. For matching
|
||||
lines, the filename is followed by a colon; for context lines, a hyphen
|
||||
separator is used. If a line number is also being output, it follows the file
|
||||
name.
|
||||
Force the inclusion of the file name at the start of output lines when
|
||||
searching a single file. By default, the file name is not shown in this case.
|
||||
For matching lines, the file name is followed by a colon; for context lines, a
|
||||
hyphen separator is used. If a line number is also being output, it follows the
|
||||
file name. When the <b>-M</b> option causes a pattern to match more than one
|
||||
line, only the first is preceded by the file name.
|
||||
</P>
|
||||
<P>
|
||||
<b>-h</b>, <b>--no-filename</b>
|
||||
Suppress the output filenames when searching multiple files. By default,
|
||||
filenames are shown when multiple files are searched. For matching lines, the
|
||||
filename is followed by a colon; for context lines, a hyphen separator is used.
|
||||
Suppress the output file names when searching multiple files. By default,
|
||||
file names are shown when multiple files are searched. For matching lines, the
|
||||
file name is followed by a colon; for context lines, a hyphen separator is used.
|
||||
If a line number is also being output, it follows the file name.
|
||||
</P>
|
||||
<P>
|
||||
|
@ -373,7 +381,7 @@ ignored.
|
|||
</P>
|
||||
<P>
|
||||
<b>-I</b>
|
||||
Treat binary files as never matching. This is equivalent to
|
||||
Ignore binary files. This is equivalent to
|
||||
<b>--binary-files</b>=<i>without-match</i>.
|
||||
</P>
|
||||
<P>
|
||||
|
@ -406,8 +414,8 @@ If any <b>--include-dir</b> patterns are specified, the only directories that
|
|||
are processed are those that match one of the patterns (and do not match an
|
||||
<b>--exclude-dir</b> pattern). This applies to all directories, whether listed
|
||||
on the command line, obtained from <b>--file-list</b>, or by scanning a parent
|
||||
directory. The pattern is a PCRE2 regular expression, and is matched against the
|
||||
final component of the directory name, not the entire path. The <b>-F</b>,
|
||||
directory. The pattern is a PCRE2 regular expression, and is matched against
|
||||
the final component of the directory name, not the entire path. The <b>-F</b>,
|
||||
<b>-w</b>, and <b>-x</b> options do not apply to this pattern. The option may be
|
||||
given any number of times. If a directory matches both <b>--include-dir</b> and
|
||||
<b>--exclude-dir</b>, it is excluded. There is no short form for this option.
|
||||
|
@ -442,8 +450,8 @@ unless <b>pcre2grep</b> can determine that it is reading from a terminal (which
|
|||
is currently possible only in Unix-like environments). Output to terminal is
|
||||
normally automatically flushed by the operating system. This option can be
|
||||
useful when the input or output is attached to a pipe and you do not want
|
||||
<b>pcre2grep</b> to buffer up large amounts of data. However, its use will affect
|
||||
performance, and the <b>-M</b> (multiline) option ceases to work.
|
||||
<b>pcre2grep</b> to buffer up large amounts of data. However, its use will
|
||||
affect performance, and the <b>-M</b> (multiline) option ceases to work.
|
||||
</P>
|
||||
<P>
|
||||
<b>--line-offsets</b>
|
||||
|
@ -497,18 +505,33 @@ when the PCRE2 library is compiled, with the default default being 10 million.
|
|||
Allow patterns to match more than one line. When this option is given, patterns
|
||||
may usefully contain literal newline characters and internal occurrences of ^
|
||||
and $ characters. The output for a successful match may consist of more than
|
||||
one line, the last of which is the one in which the match ended. If the matched
|
||||
string ends with a newline sequence the output ends at the end of that line.
|
||||
one line. The first is the line in which the match started, and the last is the
|
||||
line in which the match ended. If the matched string ends with a newline
|
||||
sequence the output ends at the end of that line.
|
||||
<br>
|
||||
<br>
|
||||
When this option is set, the PCRE2 library is called in "multiline" mode.
|
||||
However, <b>pcre2grep</b> still processes the input line by line. The difference
|
||||
is that a matched string may extend past the end of a line and continue on
|
||||
one or more subsequent lines. The newline sequence must be matched as part of
|
||||
the pattern. For example, to find the phrase "regular expression" in a file
|
||||
where "regular" might be at the end of a line and "expression" at the start of
|
||||
the next line, you could use this command:
|
||||
<pre>
|
||||
pcre2grep -M 'regular\s+expression' <file>
|
||||
</pre>
|
||||
The \s escape sequence matches any white space character, including newlines,
|
||||
and is followed by + so as to match trailing white space on the first line as
|
||||
well as possibly handling a two-character newline sequence.
|
||||
<br>
|
||||
<br>
|
||||
There is a limit to the number of lines that can be matched, imposed by the way
|
||||
that <b>pcre2grep</b> buffers the input file as it scans it. However,
|
||||
<b>pcre2grep</b> ensures that at least 8K characters or the rest of the document
|
||||
<b>pcre2grep</b> ensures that at least 8K characters or the rest of the file
|
||||
(whichever is the shorter) are available for forward matching, and similarly
|
||||
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
||||
are guaranteed to be available for lookbehind assertions. This option does not
|
||||
work when input is read line by line (see \fP--line-buffered\fP.)
|
||||
are guaranteed to be available for lookbehind assertions. The <b>-M</b> option
|
||||
does not work when input is read line by line (see \fP--line-buffered\fP.)
|
||||
</P>
|
||||
<P>
|
||||
<b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
|
||||
|
@ -526,9 +549,9 @@ When the PCRE2 library is built, a default line-ending sequence is specified.
|
|||
This is normally the standard sequence for the operating system. Unless
|
||||
otherwise specified by this option, <b>pcre2grep</b> uses the library's default.
|
||||
The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
|
||||
makes it possible to use <b>pcre2grep</b> to scan files that have come from other
|
||||
environments without having to modify their line endings. If the data that is
|
||||
being scanned does not agree with the convention set by this option,
|
||||
makes it possible to use <b>pcre2grep</b> to scan files that have come from
|
||||
other environments without having to modify their line endings. If the data
|
||||
that is being scanned does not agree with the convention set by this option,
|
||||
<b>pcre2grep</b> may behave in strange ways. Note that this option does not
|
||||
apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
|
||||
<b>--include-from</b> options, which are expected to use the operating system's
|
||||
|
@ -537,9 +560,10 @@ standard newline sequence.
|
|||
<P>
|
||||
<b>-n</b>, <b>--line-number</b>
|
||||
Precede each output line by its line number in the file, followed by a colon
|
||||
for matching lines or a hyphen for context lines. If the filename is also being
|
||||
output, it precedes the line number. This option is forced if
|
||||
<b>--line-offsets</b> is used.
|
||||
for matching lines or a hyphen for context lines. If the file name is also
|
||||
being output, it precedes the line number. When the <b>-M</b> option causes a
|
||||
pattern to match more than one line, only the first is preceded by its line
|
||||
number. This option is forced if <b>--line-offsets</b> is used.
|
||||
</P>
|
||||
<P>
|
||||
<b>--no-jit</b>
|
||||
|
@ -570,7 +594,7 @@ without an argument (see above), if an argument is present, it must be given in
|
|||
the same shell item, for example, -o3 or --only-matching=2. The comments given
|
||||
for the non-argument case above also apply to this case. If the specified
|
||||
capturing parentheses do not exist in the pattern, or were not set in the
|
||||
match, nothing is output unless the file name or line number are being printed.
|
||||
match, nothing is output unless the file name or line number are being output.
|
||||
<br>
|
||||
<br>
|
||||
If this option is given multiple times, multiple substrings are output, in the
|
||||
|
@ -635,10 +659,10 @@ specified by any of the <b>--include</b> or <b>--exclude</b> options.
|
|||
<b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
|
||||
Force the patterns to be anchored (each must start matching at the beginning of
|
||||
a line) and in addition, require them to match entire lines. This is equivalent
|
||||
to having ^ and $ characters at the start and end of each alternative branch in
|
||||
every pattern. This option applies only to the patterns that are matched
|
||||
against the contents of files; it does not apply to patterns specified by any
|
||||
of the <b>--include</b> or <b>--exclude</b> options.
|
||||
to having ^ and $ characters at the start and end of each alternative top-level
|
||||
branch in every pattern. This option applies only to the patterns that are
|
||||
matched against the contents of files; it does not apply to patterns specified
|
||||
by any of the <b>--include</b> or <b>--exclude</b> options.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
|
||||
<P>
|
||||
|
@ -677,7 +701,7 @@ Although most of the common options work the same way, a few are different in
|
|||
<b>pcre2grep</b>. For example, the <b>--include</b> option's argument is a glob
|
||||
for GNU <b>grep</b>, but a regular expression for <b>pcre2grep</b>. If both the
|
||||
<b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
|
||||
without counts, but <b>pcre2grep</b> gives the counts.
|
||||
without counts, but <b>pcre2grep</b> gives the counts as well.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">OPTIONS WITH DATA</a><br>
|
||||
<P>
|
||||
|
@ -722,9 +746,9 @@ message and the line that caused the problem to the standard error stream. If
|
|||
there are more than 20 such errors, <b>pcre2grep</b> gives up.
|
||||
</P>
|
||||
<P>
|
||||
The <b>--match-limit</b> option of <b>pcre2grep</b> can be used to set the overall
|
||||
resource limit; there is a second option called <b>--recursion-limit</b> that
|
||||
sets a limit on the amount of memory (usually stack) that is used (see the
|
||||
The <b>--match-limit</b> option of <b>pcre2grep</b> can be used to set the
|
||||
overall resource limit; there is a second option called <b>--recursion-limit</b>
|
||||
that sets a limit on the amount of memory (usually stack) that is used (see the
|
||||
discussion of these options above).
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">DIAGNOSTICS</a><br>
|
||||
|
@ -737,7 +761,7 @@ affect the return code.
|
|||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
|
||||
<P>
|
||||
<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3), <b>pcre2test</b>(1).
|
||||
<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3).
|
||||
</P>
|
||||
<br><a name="SEC13" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
|
@ -750,9 +774,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC14" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 23 November 2014
|
||||
Last updated: 03 January 2015
|
||||
<br>
|
||||
Copyright © 1997-2014 University of Cambridge.
|
||||
Copyright © 1997-2015 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
146
doc/pcre2grep.1
146
doc/pcre2grep.1
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2GREP 1 "23 November 2014" "PCRE2 10.00"
|
||||
.TH PCRE2GREP 1 "03 January 2015" "PCRE2 10.00"
|
||||
.SH NAME
|
||||
pcre2grep - a grep with Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -41,21 +41,22 @@ If no files are specified, \fBpcre2grep\fP reads the standard input. The
|
|||
standard input can also be referenced by a name consisting of a single hyphen.
|
||||
For example:
|
||||
.sp
|
||||
pcre2grep some-pattern /file1 - /file3
|
||||
pcre2grep some-pattern file1 - file3
|
||||
.sp
|
||||
By default, each line that matches a pattern is copied to the standard
|
||||
output, and if there is more than one file, the file name is output at the
|
||||
start of each line, followed by a colon. However, there are options that can
|
||||
change how \fBpcre2grep\fP behaves. In particular, the \fB-M\fP option makes it
|
||||
possible to search for patterns that span line boundaries. What defines a line
|
||||
boundary is controlled by the \fB-N\fP (\fB--newline\fP) option.
|
||||
Input files are searched line by line. By default, each line that matches a
|
||||
pattern is copied to the standard output, and if there is more than one file,
|
||||
the file name is output at the start of each line, followed by a colon.
|
||||
However, there are options that can change how \fBpcre2grep\fP behaves. In
|
||||
particular, the \fB-M\fP option makes it possible to search for strings that
|
||||
span line boundaries. What defines a line boundary is controlled by the
|
||||
\fB-N\fP (\fB--newline\fP) option.
|
||||
.P
|
||||
The amount of memory used for buffering files that are being scanned is
|
||||
controlled by a parameter that can be set by the \fB--buffer-size\fP option.
|
||||
The default value for this parameter is specified when \fBpcre2grep\fP is built,
|
||||
with the default default being 20K. A block of memory three times this size is
|
||||
used (to allow for buffering "before" and "after" lines). An error occurs if a
|
||||
line overflows the buffer.
|
||||
The default value for this parameter is specified when \fBpcre2grep\fP is
|
||||
built, with the default default being 20K. A block of memory three times this
|
||||
size is used (to allow for buffering "before" and "after" lines). An error
|
||||
occurs if a line overflows the buffer.
|
||||
.P
|
||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
||||
BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
|
||||
|
@ -122,10 +123,10 @@ to signify multiplication by 1024 or 1024*1024 respectively.
|
|||
\fB--\fP
|
||||
This terminates the list of options. It is useful if the next item on the
|
||||
command line starts with a hyphen but is not an option. This allows for the
|
||||
processing of patterns and filenames that start with hyphens.
|
||||
processing of patterns and file names that start with hyphens.
|
||||
.TP
|
||||
\fB-A\fP \fInumber\fP, \fB--after-context=\fP\fInumber\fP
|
||||
Output \fInumber\fP lines of context after each matching line. If filenames
|
||||
Output \fInumber\fP lines of context after each matching line. If file names
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
|
@ -137,7 +138,7 @@ Treat binary files as text. This is equivalent to
|
|||
\fB--binary-files\fP=\fItext\fP.
|
||||
.TP
|
||||
\fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP
|
||||
Output \fInumber\fP lines of context before each matching line. If filenames
|
||||
Output \fInumber\fP lines of context before each matching line. If file names
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
|
@ -153,7 +154,8 @@ processed in the same way as any other file. In this case, when a match
|
|||
succeeds, the output may be binary garbage, which can have nasty effects if
|
||||
sent to a terminal. If the word is "without-match", which is equivalent to the
|
||||
\fB-I\fP option, binary files are not processed at all; they are assumed not to
|
||||
be of interest.
|
||||
be of interest and are skipped without causing any output or affecting the
|
||||
return code.
|
||||
.TP
|
||||
\fB--buffer-size=\fP\fInumber\fP
|
||||
Set the parameter that controls how much memory is used for buffering files
|
||||
|
@ -164,10 +166,14 @@ Output \fInumber\fP lines of context both before and after each matching line.
|
|||
This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value.
|
||||
.TP
|
||||
\fB-c\fP, \fB--count\fP
|
||||
Do not output individual lines from the files that are being scanned; instead
|
||||
output the number of lines that would otherwise have been shown. If no lines
|
||||
are selected, the number zero is output. If several files are are being
|
||||
scanned, a count is output for each of them. However, if the
|
||||
Do not output lines from the files that are being scanned; instead output the
|
||||
number of matches (or non-matches if \fB-v\fP is used) that would otherwise
|
||||
have caused lines to be shown. By default, this count is the same as the number
|
||||
of suppressed lines, but if the \fB-M\fP (multiline) option is used (without
|
||||
\fB-v\fP), there may be more suppressed lines than the number of matches.
|
||||
.sp
|
||||
If no lines are selected, the number zero is output. If several files are are
|
||||
being scanned, a count is output for each of them. However, if the
|
||||
\fB--files-with-matches\fP option is also used, only those files whose counts
|
||||
are greater than zero are listed. When \fB-c\fP is used, the \fB-A\fP,
|
||||
\fB-B\fP, and \fB-C\fP options are ignored.
|
||||
|
@ -229,10 +235,10 @@ of the line that matched.
|
|||
Files (but not directories) whose names match the pattern are skipped without
|
||||
being processed. This applies to all files, whether listed on the command line,
|
||||
obtained from \fB--file-list\fP, or by scanning a directory. The pattern is a
|
||||
PCRE2 regular expression, and is matched against the final component of the file
|
||||
name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do not
|
||||
apply to this pattern. The option may be given any number of times in order to
|
||||
specify multiple patterns. If a file name matches both an \fB--include\fP
|
||||
PCRE2 regular expression, and is matched against the final component of the
|
||||
file name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do
|
||||
not apply to this pattern. The option may be given any number of times in order
|
||||
to specify multiple patterns. If a file name matches both an \fB--include\fP
|
||||
and an \fB--exclude\fP pattern, it is excluded. There is no short form for this
|
||||
option.
|
||||
.TP
|
||||
|
@ -276,7 +282,7 @@ also the comments about multiple patterns versus a single pattern with
|
|||
alternatives in the description of \fB-e\fP above.
|
||||
.sp
|
||||
If this option is given more than once, all the specified files are
|
||||
read. A data line is output if any of the patterns match it. A filename can
|
||||
read. A data line is output if any of the patterns match it. A file name can
|
||||
be given as "-" to refer to the standard input. When \fB-f\fP is used, patterns
|
||||
specified on the command line using \fB-e\fP may also be present; they are
|
||||
tested before the file's patterns. However, no other pattern is taken from the
|
||||
|
@ -286,7 +292,7 @@ command line; all arguments are treated as the names of paths to be searched.
|
|||
Read a list of files and/or directories that are to be scanned from the given
|
||||
file, one per line. Trailing white space is removed from each line, and blank
|
||||
lines are ignored. These paths are processed before any that are listed on the
|
||||
command line. The filename can be given as "-" to refer to the standard input.
|
||||
command line. The file name can be given as "-" to refer to the standard input.
|
||||
If \fB--file\fP and \fB--file-list\fP are both specified as "-", patterns are
|
||||
read first. This is useful only when the standard input is a terminal, from
|
||||
which further lines (the list of files) can be read after an end-of-file
|
||||
|
@ -302,16 +308,17 @@ shown separately. This option is mutually exclusive with \fB--line-offsets\fP
|
|||
and \fB--only-matching\fP.
|
||||
.TP
|
||||
\fB-H\fP, \fB--with-filename\fP
|
||||
Force the inclusion of the filename at the start of output lines when searching
|
||||
a single file. By default, the filename is not shown in this case. For matching
|
||||
lines, the filename is followed by a colon; for context lines, a hyphen
|
||||
separator is used. If a line number is also being output, it follows the file
|
||||
name.
|
||||
Force the inclusion of the file name at the start of output lines when
|
||||
searching a single file. By default, the file name is not shown in this case.
|
||||
For matching lines, the file name is followed by a colon; for context lines, a
|
||||
hyphen separator is used. If a line number is also being output, it follows the
|
||||
file name. When the \fB-M\fP option causes a pattern to match more than one
|
||||
line, only the first is preceded by the file name.
|
||||
.TP
|
||||
\fB-h\fP, \fB--no-filename\fP
|
||||
Suppress the output filenames when searching multiple files. By default,
|
||||
filenames are shown when multiple files are searched. For matching lines, the
|
||||
filename is followed by a colon; for context lines, a hyphen separator is used.
|
||||
Suppress the output file names when searching multiple files. By default,
|
||||
file names are shown when multiple files are searched. For matching lines, the
|
||||
file name is followed by a colon; for context lines, a hyphen separator is used.
|
||||
If a line number is also being output, it follows the file name.
|
||||
.TP
|
||||
\fB--help\fP
|
||||
|
@ -320,7 +327,7 @@ type support, and then exit. Anything else on the command line is
|
|||
ignored.
|
||||
.TP
|
||||
\fB-I\fP
|
||||
Treat binary files as never matching. This is equivalent to
|
||||
Ignore binary files. This is equivalent to
|
||||
\fB--binary-files\fP=\fIwithout-match\fP.
|
||||
.TP
|
||||
\fB-i\fP, \fB--ignore-case\fP
|
||||
|
@ -349,8 +356,8 @@ If any \fB--include-dir\fP patterns are specified, the only directories that
|
|||
are processed are those that match one of the patterns (and do not match an
|
||||
\fB--exclude-dir\fP pattern). This applies to all directories, whether listed
|
||||
on the command line, obtained from \fB--file-list\fP, or by scanning a parent
|
||||
directory. The pattern is a PCRE2 regular expression, and is matched against the
|
||||
final component of the directory name, not the entire path. The \fB-F\fP,
|
||||
directory. The pattern is a PCRE2 regular expression, and is matched against
|
||||
the final component of the directory name, not the entire path. The \fB-F\fP,
|
||||
\fB-w\fP, and \fB-x\fP options do not apply to this pattern. The option may be
|
||||
given any number of times. If a directory matches both \fB--include-dir\fP and
|
||||
\fB--exclude-dir\fP, it is excluded. There is no short form for this option.
|
||||
|
@ -381,8 +388,8 @@ unless \fBpcre2grep\fP can determine that it is reading from a terminal (which
|
|||
is currently possible only in Unix-like environments). Output to terminal is
|
||||
normally automatically flushed by the operating system. This option can be
|
||||
useful when the input or output is attached to a pipe and you do not want
|
||||
\fBpcre2grep\fP to buffer up large amounts of data. However, its use will affect
|
||||
performance, and the \fB-M\fP (multiline) option ceases to work.
|
||||
\fBpcre2grep\fP to buffer up large amounts of data. However, its use will
|
||||
affect performance, and the \fB-M\fP (multiline) option ceases to work.
|
||||
.TP
|
||||
\fB--line-offsets\fP
|
||||
Instead of showing lines or parts of lines that match, show each match as a
|
||||
|
@ -429,17 +436,31 @@ when the PCRE2 library is compiled, with the default default being 10 million.
|
|||
Allow patterns to match more than one line. When this option is given, patterns
|
||||
may usefully contain literal newline characters and internal occurrences of ^
|
||||
and $ characters. The output for a successful match may consist of more than
|
||||
one line, the last of which is the one in which the match ended. If the matched
|
||||
string ends with a newline sequence the output ends at the end of that line.
|
||||
one line. The first is the line in which the match started, and the last is the
|
||||
line in which the match ended. If the matched string ends with a newline
|
||||
sequence the output ends at the end of that line.
|
||||
.sp
|
||||
When this option is set, the PCRE2 library is called in "multiline" mode.
|
||||
However, \fBpcre2grep\fP still processes the input line by line. The difference
|
||||
is that a matched string may extend past the end of a line and continue on
|
||||
one or more subsequent lines. The newline sequence must be matched as part of
|
||||
the pattern. For example, to find the phrase "regular expression" in a file
|
||||
where "regular" might be at the end of a line and "expression" at the start of
|
||||
the next line, you could use this command:
|
||||
.sp
|
||||
pcre2grep -M 'regular\es+expression' <file>
|
||||
.sp
|
||||
The \es escape sequence matches any white space character, including newlines,
|
||||
and is followed by + so as to match trailing white space on the first line as
|
||||
well as possibly handling a two-character newline sequence.
|
||||
.sp
|
||||
There is a limit to the number of lines that can be matched, imposed by the way
|
||||
that \fBpcre2grep\fP buffers the input file as it scans it. However,
|
||||
\fBpcre2grep\fP ensures that at least 8K characters or the rest of the document
|
||||
\fBpcre2grep\fP ensures that at least 8K characters or the rest of the file
|
||||
(whichever is the shorter) are available for forward matching, and similarly
|
||||
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
||||
are guaranteed to be available for lookbehind assertions. This option does not
|
||||
work when input is read line by line (see \fP--line-buffered\fP.)
|
||||
are guaranteed to be available for lookbehind assertions. The \fB-M\fP option
|
||||
does not work when input is read line by line (see \fP--line-buffered\fP.)
|
||||
.TP
|
||||
\fB-N\fP \fInewline-type\fP, \fB--newline\fP=\fInewline-type\fP
|
||||
The PCRE2 library supports five different conventions for indicating
|
||||
|
@ -455,9 +476,9 @@ When the PCRE2 library is built, a default line-ending sequence is specified.
|
|||
This is normally the standard sequence for the operating system. Unless
|
||||
otherwise specified by this option, \fBpcre2grep\fP uses the library's default.
|
||||
The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
|
||||
makes it possible to use \fBpcre2grep\fP to scan files that have come from other
|
||||
environments without having to modify their line endings. If the data that is
|
||||
being scanned does not agree with the convention set by this option,
|
||||
makes it possible to use \fBpcre2grep\fP to scan files that have come from
|
||||
other environments without having to modify their line endings. If the data
|
||||
that is being scanned does not agree with the convention set by this option,
|
||||
\fBpcre2grep\fP may behave in strange ways. Note that this option does not
|
||||
apply to files specified by the \fB-f\fP, \fB--exclude-from\fP, or
|
||||
\fB--include-from\fP options, which are expected to use the operating system's
|
||||
|
@ -465,9 +486,10 @@ standard newline sequence.
|
|||
.TP
|
||||
\fB-n\fP, \fB--line-number\fP
|
||||
Precede each output line by its line number in the file, followed by a colon
|
||||
for matching lines or a hyphen for context lines. If the filename is also being
|
||||
output, it precedes the line number. This option is forced if
|
||||
\fB--line-offsets\fP is used.
|
||||
for matching lines or a hyphen for context lines. If the file name is also
|
||||
being output, it precedes the line number. When the \fB-M\fP option causes a
|
||||
pattern to match more than one line, only the first is preceded by its line
|
||||
number. This option is forced if \fB--line-offsets\fP is used.
|
||||
.TP
|
||||
\fB--no-jit\fP
|
||||
If the PCRE2 library is built with support for just-in-time compiling (which
|
||||
|
@ -495,7 +517,7 @@ without an argument (see above), if an argument is present, it must be given in
|
|||
the same shell item, for example, -o3 or --only-matching=2. The comments given
|
||||
for the non-argument case above also apply to this case. If the specified
|
||||
capturing parentheses do not exist in the pattern, or were not set in the
|
||||
match, nothing is output unless the file name or line number are being printed.
|
||||
match, nothing is output unless the file name or line number are being output.
|
||||
.sp
|
||||
If this option is given multiple times, multiple substrings are output, in the
|
||||
order the options are given. For example, -o3 -o1 -o3 causes the substrings
|
||||
|
@ -549,10 +571,10 @@ specified by any of the \fB--include\fP or \fB--exclude\fP options.
|
|||
\fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP
|
||||
Force the patterns to be anchored (each must start matching at the beginning of
|
||||
a line) and in addition, require them to match entire lines. This is equivalent
|
||||
to having ^ and $ characters at the start and end of each alternative branch in
|
||||
every pattern. This option applies only to the patterns that are matched
|
||||
against the contents of files; it does not apply to patterns specified by any
|
||||
of the \fB--include\fP or \fB--exclude\fP options.
|
||||
to having ^ and $ characters at the start and end of each alternative top-level
|
||||
branch in every pattern. This option applies only to the patterns that are
|
||||
matched against the contents of files; it does not apply to patterns specified
|
||||
by any of the \fB--include\fP or \fB--exclude\fP options.
|
||||
.
|
||||
.
|
||||
.SH "ENVIRONMENT VARIABLES"
|
||||
|
@ -596,7 +618,7 @@ Although most of the common options work the same way, a few are different in
|
|||
\fBpcre2grep\fP. For example, the \fB--include\fP option's argument is a glob
|
||||
for GNU \fBgrep\fP, but a regular expression for \fBpcre2grep\fP. If both the
|
||||
\fB-c\fP and \fB-l\fP options are given, GNU grep lists only file names,
|
||||
without counts, but \fBpcre2grep\fP gives the counts.
|
||||
without counts, but \fBpcre2grep\fP gives the counts as well.
|
||||
.
|
||||
.
|
||||
.SH "OPTIONS WITH DATA"
|
||||
|
@ -642,9 +664,9 @@ in these circumstances. If this happens, \fBpcre2grep\fP outputs an error
|
|||
message and the line that caused the problem to the standard error stream. If
|
||||
there are more than 20 such errors, \fBpcre2grep\fP gives up.
|
||||
.P
|
||||
The \fB--match-limit\fP option of \fBpcre2grep\fP can be used to set the overall
|
||||
resource limit; there is a second option called \fB--recursion-limit\fP that
|
||||
sets a limit on the amount of memory (usually stack) that is used (see the
|
||||
The \fB--match-limit\fP option of \fBpcre2grep\fP can be used to set the
|
||||
overall resource limit; there is a second option called \fB--recursion-limit\fP
|
||||
that sets a limit on the amount of memory (usually stack) that is used (see the
|
||||
discussion of these options above).
|
||||
.
|
||||
.
|
||||
|
@ -661,7 +683,7 @@ affect the return code.
|
|||
.SH "SEE ALSO"
|
||||
.rs
|
||||
.sp
|
||||
\fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3), \fBpcre2test\fP(1).
|
||||
\fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3).
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
|
@ -678,6 +700,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 23 November 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
Last updated: 03 January 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -40,15 +40,15 @@ DESCRIPTION
|
|||
standard input can also be referenced by a name consisting of a single
|
||||
hyphen. For example:
|
||||
|
||||
pcre2grep some-pattern /file1 - /file3
|
||||
pcre2grep some-pattern file1 - file3
|
||||
|
||||
By default, each line that matches a pattern is copied to the standard
|
||||
output, and if there is more than one file, the file name is output at
|
||||
the start of each line, followed by a colon. However, there are options
|
||||
that can change how pcre2grep behaves. In particular, the -M option
|
||||
makes it possible to search for patterns that span line boundaries.
|
||||
What defines a line boundary is controlled by the -N (--newline)
|
||||
option.
|
||||
Input files are searched line by line. By default, each line that
|
||||
matches a pattern is copied to the standard output, and if there is
|
||||
more than one file, the file name is output at the start of each line,
|
||||
followed by a colon. However, there are options that can change how
|
||||
pcre2grep behaves. In particular, the -M option makes it possible to
|
||||
search for strings that span line boundaries. What defines a line
|
||||
boundary is controlled by the -N (--newline) option.
|
||||
|
||||
The amount of memory used for buffering files that are being scanned is
|
||||
controlled by a parameter that can be set by the --buffer-size option.
|
||||
|
@ -122,13 +122,13 @@ OPTIONS
|
|||
|
||||
-- This terminates the list of options. It is useful if the next
|
||||
item on the command line starts with a hyphen but is not an
|
||||
option. This allows for the processing of patterns and file-
|
||||
option. This allows for the processing of patterns and file
|
||||
names that start with hyphens.
|
||||
|
||||
-A number, --after-context=number
|
||||
Output number lines of context after each matching line. If
|
||||
filenames and/or line numbers are being output, a hyphen sep-
|
||||
arator is used instead of a colon for the context lines. A
|
||||
file names and/or line numbers are being output, a hyphen
|
||||
separator is used instead of a colon for the context lines. A
|
||||
line containing "--" is output between each group of lines,
|
||||
unless they are in fact contiguous in the input file. The
|
||||
value of number is expected to be relatively small. However,
|
||||
|
@ -141,8 +141,8 @@ OPTIONS
|
|||
|
||||
-B number, --before-context=number
|
||||
Output number lines of context before each matching line. If
|
||||
filenames and/or line numbers are being output, a hyphen sep-
|
||||
arator is used instead of a colon for the context lines. A
|
||||
file names and/or line numbers are being output, a hyphen
|
||||
separator is used instead of a colon for the context lines. A
|
||||
line containing "--" is output between each group of lines,
|
||||
unless they are in fact contiguous in the input file. The
|
||||
value of number is expected to be relatively small. However,
|
||||
|
@ -160,249 +160,258 @@ OPTIONS
|
|||
which can have nasty effects if sent to a terminal. If the
|
||||
word is "without-match", which is equivalent to the -I
|
||||
option, binary files are not processed at all; they are
|
||||
assumed not to be of interest.
|
||||
assumed not to be of interest and are skipped without causing
|
||||
any output or affecting the return code.
|
||||
|
||||
--buffer-size=number
|
||||
Set the parameter that controls how much memory is used for
|
||||
Set the parameter that controls how much memory is used for
|
||||
buffering files that are being scanned.
|
||||
|
||||
-C number, --context=number
|
||||
Output number lines of context both before and after each
|
||||
matching line. This is equivalent to setting both -A and -B
|
||||
Output number lines of context both before and after each
|
||||
matching line. This is equivalent to setting both -A and -B
|
||||
to the same value.
|
||||
|
||||
-c, --count
|
||||
Do not output individual lines from the files that are being
|
||||
scanned; instead output the number of lines that would other-
|
||||
wise have been shown. If no lines are selected, the number
|
||||
zero is output. If several files are are being scanned, a
|
||||
count is output for each of them. However, if the --files-
|
||||
with-matches option is also used, only those files whose
|
||||
counts are greater than zero are listed. When -c is used, the
|
||||
-A, -B, and -C options are ignored.
|
||||
Do not output lines from the files that are being scanned;
|
||||
instead output the number of matches (or non-matches if -v is
|
||||
used) that would otherwise have caused lines to be shown. By
|
||||
default, this count is the same as the number of suppressed
|
||||
lines, but if the -M (multiline) option is used (without -v),
|
||||
there may be more suppressed lines than the number of
|
||||
matches.
|
||||
|
||||
If no lines are selected, the number zero is output. If sev-
|
||||
eral files are are being scanned, a count is output for each
|
||||
of them. However, if the --files-with-matches option is also
|
||||
used, only those files whose counts are greater than zero are
|
||||
listed. When -c is used, the -A, -B, and -C options are
|
||||
ignored.
|
||||
|
||||
--colour, --color
|
||||
If this option is given without any data, it is equivalent to
|
||||
"--colour=auto". If data is required, it must be given in
|
||||
"--colour=auto". If data is required, it must be given in
|
||||
the same shell item, separated by an equals sign.
|
||||
|
||||
--colour=value, --color=value
|
||||
This option specifies under what circumstances the parts of a
|
||||
line that matched a pattern should be coloured in the output.
|
||||
By default, the output is not coloured. The value (which is
|
||||
optional, see above) may be "never", "always", or "auto". In
|
||||
the latter case, colouring happens only if the standard out-
|
||||
put is connected to a terminal. More resources are used when
|
||||
By default, the output is not coloured. The value (which is
|
||||
optional, see above) may be "never", "always", or "auto". In
|
||||
the latter case, colouring happens only if the standard out-
|
||||
put is connected to a terminal. More resources are used when
|
||||
colouring is enabled, because pcre2grep has to search for all
|
||||
possible matches in a line, not just one, in order to colour
|
||||
possible matches in a line, not just one, in order to colour
|
||||
them all.
|
||||
|
||||
The colour that is used can be specified by setting the envi-
|
||||
ronment variable PCRE2GREP_COLOUR or PCRE2GREP_COLOR. The
|
||||
value of this variable should be a string of two numbers,
|
||||
separated by a semicolon. They are copied directly into the
|
||||
control string for setting colour on a terminal, so it is
|
||||
your responsibility to ensure that they make sense. If nei-
|
||||
ther of the environment variables is set, the default is
|
||||
ronment variable PCRE2GREP_COLOUR or PCRE2GREP_COLOR. The
|
||||
value of this variable should be a string of two numbers,
|
||||
separated by a semicolon. They are copied directly into the
|
||||
control string for setting colour on a terminal, so it is
|
||||
your responsibility to ensure that they make sense. If nei-
|
||||
ther of the environment variables is set, the default is
|
||||
"1;31", which gives red.
|
||||
|
||||
-D action, --devices=action
|
||||
If an input path is not a regular file or a directory,
|
||||
"action" specifies how it is to be processed. Valid values
|
||||
If an input path is not a regular file or a directory,
|
||||
"action" specifies how it is to be processed. Valid values
|
||||
are "read" (the default) or "skip" (silently skip the path).
|
||||
|
||||
-d action, --directories=action
|
||||
If an input path is a directory, "action" specifies how it is
|
||||
to be processed. Valid values are "read" (the default in
|
||||
non-Windows environments, for compatibility with GNU grep),
|
||||
"recurse" (equivalent to the -r option), or "skip" (silently
|
||||
skip the path, the default in Windows environments). In the
|
||||
"read" case, directories are read as if they were ordinary
|
||||
files. In some operating systems the effect of reading a
|
||||
to be processed. Valid values are "read" (the default in
|
||||
non-Windows environments, for compatibility with GNU grep),
|
||||
"recurse" (equivalent to the -r option), or "skip" (silently
|
||||
skip the path, the default in Windows environments). In the
|
||||
"read" case, directories are read as if they were ordinary
|
||||
files. In some operating systems the effect of reading a
|
||||
directory like this is an immediate end-of-file; in others it
|
||||
may provoke an error.
|
||||
|
||||
-e pattern, --regex=pattern, --regexp=pattern
|
||||
Specify a pattern to be matched. This option can be used mul-
|
||||
tiple times in order to specify several patterns. It can also
|
||||
be used as a way of specifying a single pattern that starts
|
||||
with a hyphen. When -e is used, no argument pattern is taken
|
||||
from the command line; all arguments are treated as file
|
||||
names. There is no limit to the number of patterns. They are
|
||||
applied to each line in the order in which they are defined
|
||||
be used as a way of specifying a single pattern that starts
|
||||
with a hyphen. When -e is used, no argument pattern is taken
|
||||
from the command line; all arguments are treated as file
|
||||
names. There is no limit to the number of patterns. They are
|
||||
applied to each line in the order in which they are defined
|
||||
until one matches.
|
||||
|
||||
If -f is used with -e, the command line patterns are matched
|
||||
If -f is used with -e, the command line patterns are matched
|
||||
first, followed by the patterns from the file(s), independent
|
||||
of the order in which these options are specified. Note that
|
||||
multiple use of -e is not the same as a single pattern with
|
||||
of the order in which these options are specified. Note that
|
||||
multiple use of -e is not the same as a single pattern with
|
||||
alternatives. For example, X|Y finds the first character in a
|
||||
line that is X or Y, whereas if the two patterns are given
|
||||
line that is X or Y, whereas if the two patterns are given
|
||||
separately, with X first, pcre2grep finds X if it is present,
|
||||
even if it follows Y in the line. It finds Y only if there is
|
||||
no X in the line. This matters only if you are using -o or
|
||||
no X in the line. This matters only if you are using -o or
|
||||
--colo(u)r to show the part(s) of the line that matched.
|
||||
|
||||
--exclude=pattern
|
||||
Files (but not directories) whose names match the pattern are
|
||||
skipped without being processed. This applies to all files,
|
||||
whether listed on the command line, obtained from --file-
|
||||
skipped without being processed. This applies to all files,
|
||||
whether listed on the command line, obtained from --file-
|
||||
list, or by scanning a directory. The pattern is a PCRE2 reg-
|
||||
ular expression, and is matched against the final component
|
||||
of the file name, not the entire path. The -F, -w, and -x
|
||||
ular expression, and is matched against the final component
|
||||
of the file name, not the entire path. The -F, -w, and -x
|
||||
options do not apply to this pattern. The option may be given
|
||||
any number of times in order to specify multiple patterns. If
|
||||
a file name matches both an --include and an --exclude pat-
|
||||
a file name matches both an --include and an --exclude pat-
|
||||
tern, it is excluded. There is no short form for this option.
|
||||
|
||||
--exclude-from=filename
|
||||
Treat each non-empty line of the file as the data for an
|
||||
Treat each non-empty line of the file as the data for an
|
||||
--exclude option. What constitutes a newline when reading the
|
||||
file is the operating system's default. The --newline option
|
||||
has no effect on this option. This option may be given more
|
||||
file is the operating system's default. The --newline option
|
||||
has no effect on this option. This option may be given more
|
||||
than once in order to specify a number of files to read.
|
||||
|
||||
--exclude-dir=pattern
|
||||
Directories whose names match the pattern are skipped without
|
||||
being processed, whatever the setting of the --recursive
|
||||
option. This applies to all directories, whether listed on
|
||||
being processed, whatever the setting of the --recursive
|
||||
option. This applies to all directories, whether listed on
|
||||
the command line, obtained from --file-list, or by scanning a
|
||||
parent directory. The pattern is a PCRE2 regular expression,
|
||||
and is matched against the final component of the directory
|
||||
name, not the entire path. The -F, -w, and -x options do not
|
||||
apply to this pattern. The option may be given any number of
|
||||
times in order to specify more than one pattern. If a direc-
|
||||
tory matches both --include-dir and --exclude-dir, it is
|
||||
parent directory. The pattern is a PCRE2 regular expression,
|
||||
and is matched against the final component of the directory
|
||||
name, not the entire path. The -F, -w, and -x options do not
|
||||
apply to this pattern. The option may be given any number of
|
||||
times in order to specify more than one pattern. If a direc-
|
||||
tory matches both --include-dir and --exclude-dir, it is
|
||||
excluded. There is no short form for this option.
|
||||
|
||||
-F, --fixed-strings
|
||||
Interpret each data-matching pattern as a list of fixed
|
||||
strings, separated by newlines, instead of as a regular
|
||||
expression. What constitutes a newline for this purpose is
|
||||
controlled by the --newline option. The -w (match as a word)
|
||||
and -x (match whole line) options can be used with -F. They
|
||||
Interpret each data-matching pattern as a list of fixed
|
||||
strings, separated by newlines, instead of as a regular
|
||||
expression. What constitutes a newline for this purpose is
|
||||
controlled by the --newline option. The -w (match as a word)
|
||||
and -x (match whole line) options can be used with -F. They
|
||||
apply to each of the fixed strings. A line is selected if any
|
||||
of the fixed strings are found in it (subject to -w or -x, if
|
||||
present). This option applies only to the patterns that are
|
||||
matched against the contents of files; it does not apply to
|
||||
patterns specified by any of the --include or --exclude
|
||||
present). This option applies only to the patterns that are
|
||||
matched against the contents of files; it does not apply to
|
||||
patterns specified by any of the --include or --exclude
|
||||
options.
|
||||
|
||||
-f filename, --file=filename
|
||||
Read patterns from the file, one per line, and match them
|
||||
against each line of input. What constitutes a newline when
|
||||
reading the file is the operating system's default. The
|
||||
Read patterns from the file, one per line, and match them
|
||||
against each line of input. What constitutes a newline when
|
||||
reading the file is the operating system's default. The
|
||||
--newline option has no effect on this option. Trailing white
|
||||
space is removed from each line, and blank lines are ignored.
|
||||
An empty file contains no patterns and therefore matches
|
||||
An empty file contains no patterns and therefore matches
|
||||
nothing. See also the comments about multiple patterns versus
|
||||
a single pattern with alternatives in the description of -e
|
||||
a single pattern with alternatives in the description of -e
|
||||
above.
|
||||
|
||||
If this option is given more than once, all the specified
|
||||
files are read. A data line is output if any of the patterns
|
||||
match it. A filename can be given as "-" to refer to the
|
||||
standard input. When -f is used, patterns specified on the
|
||||
command line using -e may also be present; they are tested
|
||||
before the file's patterns. However, no other pattern is
|
||||
If this option is given more than once, all the specified
|
||||
files are read. A data line is output if any of the patterns
|
||||
match it. A file name can be given as "-" to refer to the
|
||||
standard input. When -f is used, patterns specified on the
|
||||
command line using -e may also be present; they are tested
|
||||
before the file's patterns. However, no other pattern is
|
||||
taken from the command line; all arguments are treated as the
|
||||
names of paths to be searched.
|
||||
|
||||
--file-list=filename
|
||||
Read a list of files and/or directories that are to be
|
||||
scanned from the given file, one per line. Trailing white
|
||||
Read a list of files and/or directories that are to be
|
||||
scanned from the given file, one per line. Trailing white
|
||||
space is removed from each line, and blank lines are ignored.
|
||||
These paths are processed before any that are listed on the
|
||||
command line. The filename can be given as "-" to refer to
|
||||
These paths are processed before any that are listed on the
|
||||
command line. The file name can be given as "-" to refer to
|
||||
the standard input. If --file and --file-list are both spec-
|
||||
ified as "-", patterns are read first. This is useful only
|
||||
when the standard input is a terminal, from which further
|
||||
lines (the list of files) can be read after an end-of-file
|
||||
indication. If this option is given more than once, all the
|
||||
ified as "-", patterns are read first. This is useful only
|
||||
when the standard input is a terminal, from which further
|
||||
lines (the list of files) can be read after an end-of-file
|
||||
indication. If this option is given more than once, all the
|
||||
specified files are read.
|
||||
|
||||
--file-offsets
|
||||
Instead of showing lines or parts of lines that match, show
|
||||
each match as an offset from the start of the file and a
|
||||
length, separated by a comma. In this mode, no context is
|
||||
shown. That is, the -A, -B, and -C options are ignored. If
|
||||
Instead of showing lines or parts of lines that match, show
|
||||
each match as an offset from the start of the file and a
|
||||
length, separated by a comma. In this mode, no context is
|
||||
shown. That is, the -A, -B, and -C options are ignored. If
|
||||
there is more than one match in a line, each of them is shown
|
||||
separately. This option is mutually exclusive with --line-
|
||||
separately. This option is mutually exclusive with --line-
|
||||
offsets and --only-matching.
|
||||
|
||||
-H, --with-filename
|
||||
Force the inclusion of the filename at the start of output
|
||||
lines when searching a single file. By default, the filename
|
||||
is not shown in this case. For matching lines, the filename
|
||||
Force the inclusion of the file name at the start of output
|
||||
lines when searching a single file. By default, the file name
|
||||
is not shown in this case. For matching lines, the file name
|
||||
is followed by a colon; for context lines, a hyphen separator
|
||||
is used. If a line number is also being output, it follows
|
||||
the file name.
|
||||
is used. If a line number is also being output, it follows
|
||||
the file name. When the -M option causes a pattern to match
|
||||
more than one line, only the first is preceded by the file
|
||||
name.
|
||||
|
||||
-h, --no-filename
|
||||
Suppress the output filenames when searching multiple files.
|
||||
By default, filenames are shown when multiple files are
|
||||
searched. For matching lines, the filename is followed by a
|
||||
colon; for context lines, a hyphen separator is used. If a
|
||||
Suppress the output file names when searching multiple files.
|
||||
By default, file names are shown when multiple files are
|
||||
searched. For matching lines, the file name is followed by a
|
||||
colon; for context lines, a hyphen separator is used. If a
|
||||
line number is also being output, it follows the file name.
|
||||
|
||||
--help Output a help message, giving brief details of the command
|
||||
options and file type support, and then exit. Anything else
|
||||
--help Output a help message, giving brief details of the command
|
||||
options and file type support, and then exit. Anything else
|
||||
on the command line is ignored.
|
||||
|
||||
-I Treat binary files as never matching. This is equivalent to
|
||||
--binary-files=without-match.
|
||||
-I Ignore binary files. This is equivalent to --binary-
|
||||
files=without-match.
|
||||
|
||||
-i, --ignore-case
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
|
||||
--include=pattern
|
||||
If any --include patterns are specified, the only files that
|
||||
are processed are those that match one of the patterns (and
|
||||
do not match an --exclude pattern). This option does not
|
||||
affect directories, but it applies to all files, whether
|
||||
listed on the command line, obtained from --file-list, or by
|
||||
scanning a directory. The pattern is a PCRE2 regular expres-
|
||||
sion, and is matched against the final component of the file
|
||||
name, not the entire path. The -F, -w, and -x options do not
|
||||
apply to this pattern. The option may be given any number of
|
||||
times. If a file name matches both an --include and an
|
||||
--exclude pattern, it is excluded. There is no short form
|
||||
If any --include patterns are specified, the only files that
|
||||
are processed are those that match one of the patterns (and
|
||||
do not match an --exclude pattern). This option does not
|
||||
affect directories, but it applies to all files, whether
|
||||
listed on the command line, obtained from --file-list, or by
|
||||
scanning a directory. The pattern is a PCRE2 regular expres-
|
||||
sion, and is matched against the final component of the file
|
||||
name, not the entire path. The -F, -w, and -x options do not
|
||||
apply to this pattern. The option may be given any number of
|
||||
times. If a file name matches both an --include and an
|
||||
--exclude pattern, it is excluded. There is no short form
|
||||
for this option.
|
||||
|
||||
--include-from=filename
|
||||
Treat each non-empty line of the file as the data for an
|
||||
Treat each non-empty line of the file as the data for an
|
||||
--include option. What constitutes a newline for this purpose
|
||||
is the operating system's default. The --newline option has
|
||||
is the operating system's default. The --newline option has
|
||||
no effect on this option. This option may be given any number
|
||||
of times; all the files are read.
|
||||
|
||||
--include-dir=pattern
|
||||
If any --include-dir patterns are specified, the only direc-
|
||||
tories that are processed are those that match one of the
|
||||
patterns (and do not match an --exclude-dir pattern). This
|
||||
applies to all directories, whether listed on the command
|
||||
line, obtained from --file-list, or by scanning a parent
|
||||
directory. The pattern is a PCRE2 regular expression, and is
|
||||
matched against the final component of the directory name,
|
||||
not the entire path. The -F, -w, and -x options do not apply
|
||||
If any --include-dir patterns are specified, the only direc-
|
||||
tories that are processed are those that match one of the
|
||||
patterns (and do not match an --exclude-dir pattern). This
|
||||
applies to all directories, whether listed on the command
|
||||
line, obtained from --file-list, or by scanning a parent
|
||||
directory. The pattern is a PCRE2 regular expression, and is
|
||||
matched against the final component of the directory name,
|
||||
not the entire path. The -F, -w, and -x options do not apply
|
||||
to this pattern. The option may be given any number of times.
|
||||
If a directory matches both --include-dir and --exclude-dir,
|
||||
If a directory matches both --include-dir and --exclude-dir,
|
||||
it is excluded. There is no short form for this option.
|
||||
|
||||
-L, --files-without-match
|
||||
Instead of outputting lines from the files, just output the
|
||||
names of the files that do not contain any lines that would
|
||||
have been output. Each file name is output once, on a sepa-
|
||||
Instead of outputting lines from the files, just output the
|
||||
names of the files that do not contain any lines that would
|
||||
have been output. Each file name is output once, on a sepa-
|
||||
rate line.
|
||||
|
||||
-l, --files-with-matches
|
||||
Instead of outputting lines from the files, just output the
|
||||
Instead of outputting lines from the files, just output the
|
||||
names of the files containing lines that would have been out-
|
||||
put. Each file name is output once, on a separate line.
|
||||
Searching normally stops as soon as a matching line is found
|
||||
in a file. However, if the -c (count) option is also used,
|
||||
matching continues in order to obtain the correct count, and
|
||||
those files that have at least one match are listed along
|
||||
put. Each file name is output once, on a separate line.
|
||||
Searching normally stops as soon as a matching line is found
|
||||
in a file. However, if the -c (count) option is also used,
|
||||
matching continues in order to obtain the correct count, and
|
||||
those files that have at least one match are listed along
|
||||
with their counts. Using this option with -c is a way of sup-
|
||||
pressing the listing of files with no matches.
|
||||
|
||||
|
@ -412,86 +421,103 @@ OPTIONS
|
|||
input)" is used. There is no short form for this option.
|
||||
|
||||
--line-buffered
|
||||
When this option is given, input is read and processed line
|
||||
by line, and the output is flushed after each write. By
|
||||
default, input is read in large chunks, unless pcre2grep can
|
||||
determine that it is reading from a terminal (which is cur-
|
||||
rently possible only in Unix-like environments). Output to
|
||||
terminal is normally automatically flushed by the operating
|
||||
When this option is given, input is read and processed line
|
||||
by line, and the output is flushed after each write. By
|
||||
default, input is read in large chunks, unless pcre2grep can
|
||||
determine that it is reading from a terminal (which is cur-
|
||||
rently possible only in Unix-like environments). Output to
|
||||
terminal is normally automatically flushed by the operating
|
||||
system. This option can be useful when the input or output is
|
||||
attached to a pipe and you do not want pcre2grep to buffer up
|
||||
large amounts of data. However, its use will affect perfor-
|
||||
large amounts of data. However, its use will affect perfor-
|
||||
mance, and the -M (multiline) option ceases to work.
|
||||
|
||||
--line-offsets
|
||||
Instead of showing lines or parts of lines that match, show
|
||||
Instead of showing lines or parts of lines that match, show
|
||||
each match as a line number, the offset from the start of the
|
||||
line, and a length. The line number is terminated by a colon
|
||||
(as usual; see the -n option), and the offset and length are
|
||||
separated by a comma. In this mode, no context is shown.
|
||||
That is, the -A, -B, and -C options are ignored. If there is
|
||||
more than one match in a line, each of them is shown sepa-
|
||||
line, and a length. The line number is terminated by a colon
|
||||
(as usual; see the -n option), and the offset and length are
|
||||
separated by a comma. In this mode, no context is shown.
|
||||
That is, the -A, -B, and -C options are ignored. If there is
|
||||
more than one match in a line, each of them is shown sepa-
|
||||
rately. This option is mutually exclusive with --file-offsets
|
||||
and --only-matching.
|
||||
|
||||
--locale=locale-name
|
||||
This option specifies a locale to be used for pattern match-
|
||||
ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
|
||||
ronment variables. If no locale is specified, the PCRE2
|
||||
library's default (usually the "C" locale) is used. There is
|
||||
This option specifies a locale to be used for pattern match-
|
||||
ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
|
||||
ronment variables. If no locale is specified, the PCRE2
|
||||
library's default (usually the "C" locale) is used. There is
|
||||
no short form for this option.
|
||||
|
||||
--match-limit=number
|
||||
Processing some regular expression patterns can require a
|
||||
very large amount of memory, leading in some cases to a pro-
|
||||
gram crash if not enough is available. Other patterns may
|
||||
take a very long time to search for all possible matching
|
||||
strings. The pcre2_match() function that is called by
|
||||
pcre2grep to do the matching has two parameters that can
|
||||
Processing some regular expression patterns can require a
|
||||
very large amount of memory, leading in some cases to a pro-
|
||||
gram crash if not enough is available. Other patterns may
|
||||
take a very long time to search for all possible matching
|
||||
strings. The pcre2_match() function that is called by
|
||||
pcre2grep to do the matching has two parameters that can
|
||||
limit the resources that it uses.
|
||||
|
||||
The --match-limit option provides a means of limiting
|
||||
The --match-limit option provides a means of limiting
|
||||
resource usage when processing patterns that are not going to
|
||||
match, but which have a very large number of possibilities in
|
||||
their search trees. The classic example is a pattern that
|
||||
their search trees. The classic example is a pattern that
|
||||
uses nested unlimited repeats. Internally, PCRE2 uses a func-
|
||||
tion called match() which it calls repeatedly (sometimes
|
||||
recursively). The limit set by --match-limit is imposed on
|
||||
the number of times this function is called during a match,
|
||||
which has the effect of limiting the amount of backtracking
|
||||
tion called match() which it calls repeatedly (sometimes
|
||||
recursively). The limit set by --match-limit is imposed on
|
||||
the number of times this function is called during a match,
|
||||
which has the effect of limiting the amount of backtracking
|
||||
that can take place.
|
||||
|
||||
The --recursion-limit option is similar to --match-limit, but
|
||||
instead of limiting the total number of times that match() is
|
||||
called, it limits the depth of recursive calls, which in turn
|
||||
limits the amount of memory that can be used. The recursion
|
||||
depth is a smaller number than the total number of calls,
|
||||
limits the amount of memory that can be used. The recursion
|
||||
depth is a smaller number than the total number of calls,
|
||||
because not all calls to match() are recursive. This limit is
|
||||
of use only if it is set smaller than --match-limit.
|
||||
|
||||
There are no short forms for these options. The default set-
|
||||
tings are specified when the PCRE2 library is compiled, with
|
||||
There are no short forms for these options. The default set-
|
||||
tings are specified when the PCRE2 library is compiled, with
|
||||
the default default being 10 million.
|
||||
|
||||
-M, --multiline
|
||||
Allow patterns to match more than one line. When this option
|
||||
Allow patterns to match more than one line. When this option
|
||||
is given, patterns may usefully contain literal newline char-
|
||||
acters and internal occurrences of ^ and $ characters. The
|
||||
output for a successful match may consist of more than one
|
||||
line, the last of which is the one in which the match ended.
|
||||
If the matched string ends with a newline sequence the output
|
||||
ends at the end of that line.
|
||||
acters and internal occurrences of ^ and $ characters. The
|
||||
output for a successful match may consist of more than one
|
||||
line. The first is the line in which the match started, and
|
||||
the last is the line in which the match ended. If the matched
|
||||
string ends with a newline sequence the output ends at the
|
||||
end of that line.
|
||||
|
||||
When this option is set, the PCRE2 library is called in "mul-
|
||||
tiline" mode. There is a limit to the number of lines that
|
||||
can be matched, imposed by the way that pcre2grep buffers the
|
||||
input file as it scans it. However, pcre2grep ensures that at
|
||||
least 8K characters or the rest of the document (whichever is
|
||||
the shorter) are available for forward matching, and simi-
|
||||
larly the previous 8K characters (or all the previous charac-
|
||||
ters, if fewer than 8K) are guaranteed to be available for
|
||||
lookbehind assertions. This option does not work when input
|
||||
is read line by line (see --line-buffered.)
|
||||
tiline" mode. However, pcre2grep still processes the input
|
||||
line by line. The difference is that a matched string may
|
||||
extend past the end of a line and continue on one or more
|
||||
subsequent lines. The newline sequence must be matched as
|
||||
part of the pattern. For example, to find the phrase "regular
|
||||
expression" in a file where "regular" might be at the end of
|
||||
a line and "expression" at the start of the next line, you
|
||||
could use this command:
|
||||
|
||||
pcre2grep -M 'regular\s+expression' <file>
|
||||
|
||||
The \s escape sequence matches any white space character,
|
||||
including newlines, and is followed by + so as to match
|
||||
trailing white space on the first line as well as possibly
|
||||
handling a two-character newline sequence.
|
||||
|
||||
There is a limit to the number of lines that can be matched,
|
||||
imposed by the way that pcre2grep buffers the input file as
|
||||
it scans it. However, pcre2grep ensures that at least 8K
|
||||
characters or the rest of the file (whichever is the shorter)
|
||||
are available for forward matching, and similarly the previ-
|
||||
ous 8K characters (or all the previous characters, if fewer
|
||||
than 8K) are guaranteed to be available for lookbehind asser-
|
||||
tions. The -M option does not work when input is read line by
|
||||
line (see --line-buffered.)
|
||||
|
||||
-N newline-type, --newline=newline-type
|
||||
The PCRE2 library supports five different conventions for
|
||||
|
@ -522,8 +548,10 @@ OPTIONS
|
|||
-n, --line-number
|
||||
Precede each output line by its line number in the file, fol-
|
||||
lowed by a colon for matching lines or a hyphen for context
|
||||
lines. If the filename is also being output, it precedes the
|
||||
line number. This option is forced if --line-offsets is used.
|
||||
lines. If the file name is also being output, it precedes the
|
||||
line number. When the -M option causes a pattern to match
|
||||
more than one line, only the first is preceded by its line
|
||||
number. This option is forced if --line-offsets is used.
|
||||
|
||||
--no-jit If the PCRE2 library is built with support for just-in-time
|
||||
compiling (which speeds up matching), pcre2grep automatically
|
||||
|
@ -555,8 +583,8 @@ OPTIONS
|
|||
The comments given for the non-argument case above also apply
|
||||
to this case. If the specified capturing parentheses do not
|
||||
exist in the pattern, or were not set in the match, nothing
|
||||
is output unless the file name or line number are being
|
||||
printed.
|
||||
is output unless the file name or line number are being out-
|
||||
put.
|
||||
|
||||
If this option is given multiple times, multiple substrings
|
||||
are output, in the order the options are given. For example,
|
||||
|
@ -617,11 +645,11 @@ OPTIONS
|
|||
Force the patterns to be anchored (each must start matching
|
||||
at the beginning of a line) and in addition, require them to
|
||||
match entire lines. This is equivalent to having ^ and $
|
||||
characters at the start and end of each alternative branch in
|
||||
every pattern. This option applies only to the patterns that
|
||||
are matched against the contents of files; it does not apply
|
||||
to patterns specified by any of the --include or --exclude
|
||||
options.
|
||||
characters at the start and end of each alternative top-level
|
||||
branch in every pattern. This option applies only to the pat-
|
||||
terns that are matched against the contents of files; it does
|
||||
not apply to patterns specified by any of the --include or
|
||||
--exclude options.
|
||||
|
||||
|
||||
ENVIRONMENT VARIABLES
|
||||
|
@ -662,7 +690,7 @@ OPTIONS COMPATIBILITY
|
|||
ferent in pcre2grep. For example, the --include option's argument is a
|
||||
glob for GNU grep, but a regular expression for pcre2grep. If both the
|
||||
-c and -l options are given, GNU grep lists only file names, without
|
||||
counts, but pcre2grep gives the counts.
|
||||
counts, but pcre2grep gives the counts as well.
|
||||
|
||||
|
||||
OPTIONS WITH DATA
|
||||
|
@ -725,7 +753,7 @@ DIAGNOSTICS
|
|||
|
||||
SEE ALSO
|
||||
|
||||
pcre2pattern(3), pcre2syntax(3), pcre2test(1).
|
||||
pcre2pattern(3), pcre2syntax(3).
|
||||
|
||||
|
||||
AUTHOR
|
||||
|
@ -737,5 +765,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 23 November 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
Last updated: 03 January 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
|
|
Loading…
Reference in New Issue