Update pcre2grep documentation to give more details of -M matching.
This commit is contained in:
parent
5a18651441
commit
4819827879
|
@ -67,22 +67,23 @@ If no files are specified, <b>pcre2grep</b> reads the standard input. The
|
||||||
standard input can also be referenced by a name consisting of a single hyphen.
|
standard input can also be referenced by a name consisting of a single hyphen.
|
||||||
For example:
|
For example:
|
||||||
<pre>
|
<pre>
|
||||||
pcre2grep some-pattern /file1 - /file3
|
pcre2grep some-pattern file1 - file3
|
||||||
</pre>
|
</pre>
|
||||||
By default, each line that matches a pattern is copied to the standard
|
Input files are searched line by line. By default, each line that matches a
|
||||||
output, and if there is more than one file, the file name is output at the
|
pattern is copied to the standard output, and if there is more than one file,
|
||||||
start of each line, followed by a colon. However, there are options that can
|
the file name is output at the start of each line, followed by a colon.
|
||||||
change how <b>pcre2grep</b> behaves. In particular, the <b>-M</b> option makes it
|
However, there are options that can change how <b>pcre2grep</b> behaves. In
|
||||||
possible to search for patterns that span line boundaries. What defines a line
|
particular, the <b>-M</b> option makes it possible to search for strings that
|
||||||
boundary is controlled by the <b>-N</b> (<b>--newline</b>) option.
|
span line boundaries. What defines a line boundary is controlled by the
|
||||||
|
<b>-N</b> (<b>--newline</b>) option.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The amount of memory used for buffering files that are being scanned is
|
The amount of memory used for buffering files that are being scanned is
|
||||||
controlled by a parameter that can be set by the <b>--buffer-size</b> option.
|
controlled by a parameter that can be set by the <b>--buffer-size</b> option.
|
||||||
The default value for this parameter is specified when <b>pcre2grep</b> is built,
|
The default value for this parameter is specified when <b>pcre2grep</b> is
|
||||||
with the default default being 20K. A block of memory three times this size is
|
built, with the default default being 20K. A block of memory three times this
|
||||||
used (to allow for buffering "before" and "after" lines). An error occurs if a
|
size is used (to allow for buffering "before" and "after" lines). An error
|
||||||
line overflows the buffer.
|
occurs if a line overflows the buffer.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
||||||
|
@ -149,11 +150,11 @@ to signify multiplication by 1024 or 1024*1024 respectively.
|
||||||
<b>--</b>
|
<b>--</b>
|
||||||
This terminates the list of options. It is useful if the next item on the
|
This terminates the list of options. It is useful if the next item on the
|
||||||
command line starts with a hyphen but is not an option. This allows for the
|
command line starts with a hyphen but is not an option. This allows for the
|
||||||
processing of patterns and filenames that start with hyphens.
|
processing of patterns and file names that start with hyphens.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
|
<b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
|
||||||
Output <i>number</i> lines of context after each matching line. If filenames
|
Output <i>number</i> lines of context after each matching line. If file names
|
||||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||||
colon for the context lines. A line containing "--" is output between each
|
colon for the context lines. A line containing "--" is output between each
|
||||||
group of lines, unless they are in fact contiguous in the input file. The value
|
group of lines, unless they are in fact contiguous in the input file. The value
|
||||||
|
@ -167,7 +168,7 @@ Treat binary files as text. This is equivalent to
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
|
<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
|
||||||
Output <i>number</i> lines of context before each matching line. If filenames
|
Output <i>number</i> lines of context before each matching line. If file names
|
||||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||||
colon for the context lines. A line containing "--" is output between each
|
colon for the context lines. A line containing "--" is output between each
|
||||||
group of lines, unless they are in fact contiguous in the input file. The value
|
group of lines, unless they are in fact contiguous in the input file. The value
|
||||||
|
@ -184,7 +185,8 @@ processed in the same way as any other file. In this case, when a match
|
||||||
succeeds, the output may be binary garbage, which can have nasty effects if
|
succeeds, the output may be binary garbage, which can have nasty effects if
|
||||||
sent to a terminal. If the word is "without-match", which is equivalent to the
|
sent to a terminal. If the word is "without-match", which is equivalent to the
|
||||||
<b>-I</b> option, binary files are not processed at all; they are assumed not to
|
<b>-I</b> option, binary files are not processed at all; they are assumed not to
|
||||||
be of interest.
|
be of interest and are skipped without causing any output or affecting the
|
||||||
|
return code.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>--buffer-size=</b><i>number</i>
|
<b>--buffer-size=</b><i>number</i>
|
||||||
|
@ -198,10 +200,15 @@ This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-c</b>, <b>--count</b>
|
<b>-c</b>, <b>--count</b>
|
||||||
Do not output individual lines from the files that are being scanned; instead
|
Do not output lines from the files that are being scanned; instead output the
|
||||||
output the number of lines that would otherwise have been shown. If no lines
|
number of matches (or non-matches if <b>-v</b> is used) that would otherwise
|
||||||
are selected, the number zero is output. If several files are are being
|
have caused lines to be shown. By default, this count is the same as the number
|
||||||
scanned, a count is output for each of them. However, if the
|
of suppressed lines, but if the <b>-M</b> (multiline) option is used (without
|
||||||
|
<b>-v</b>), there may be more suppressed lines than the number of matches.
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
If no lines are selected, the number zero is output. If several files are are
|
||||||
|
being scanned, a count is output for each of them. However, if the
|
||||||
<b>--files-with-matches</b> option is also used, only those files whose counts
|
<b>--files-with-matches</b> option is also used, only those files whose counts
|
||||||
are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
|
are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
|
||||||
<b>-B</b>, and <b>-C</b> options are ignored.
|
<b>-B</b>, and <b>-C</b> options are ignored.
|
||||||
|
@ -271,10 +278,10 @@ of the line that matched.
|
||||||
Files (but not directories) whose names match the pattern are skipped without
|
Files (but not directories) whose names match the pattern are skipped without
|
||||||
being processed. This applies to all files, whether listed on the command line,
|
being processed. This applies to all files, whether listed on the command line,
|
||||||
obtained from <b>--file-list</b>, or by scanning a directory. The pattern is a
|
obtained from <b>--file-list</b>, or by scanning a directory. The pattern is a
|
||||||
PCRE2 regular expression, and is matched against the final component of the file
|
PCRE2 regular expression, and is matched against the final component of the
|
||||||
name, not the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do not
|
file name, not the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do
|
||||||
apply to this pattern. The option may be given any number of times in order to
|
not apply to this pattern. The option may be given any number of times in order
|
||||||
specify multiple patterns. If a file name matches both an <b>--include</b>
|
to specify multiple patterns. If a file name matches both an <b>--include</b>
|
||||||
and an <b>--exclude</b> pattern, it is excluded. There is no short form for this
|
and an <b>--exclude</b> pattern, it is excluded. There is no short form for this
|
||||||
option.
|
option.
|
||||||
</P>
|
</P>
|
||||||
|
@ -323,7 +330,7 @@ alternatives in the description of <b>-e</b> above.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
If this option is given more than once, all the specified files are
|
If this option is given more than once, all the specified files are
|
||||||
read. A data line is output if any of the patterns match it. A filename can
|
read. A data line is output if any of the patterns match it. A file name can
|
||||||
be given as "-" to refer to the standard input. When <b>-f</b> is used, patterns
|
be given as "-" to refer to the standard input. When <b>-f</b> is used, patterns
|
||||||
specified on the command line using <b>-e</b> may also be present; they are
|
specified on the command line using <b>-e</b> may also be present; they are
|
||||||
tested before the file's patterns. However, no other pattern is taken from the
|
tested before the file's patterns. However, no other pattern is taken from the
|
||||||
|
@ -334,7 +341,7 @@ command line; all arguments are treated as the names of paths to be searched.
|
||||||
Read a list of files and/or directories that are to be scanned from the given
|
Read a list of files and/or directories that are to be scanned from the given
|
||||||
file, one per line. Trailing white space is removed from each line, and blank
|
file, one per line. Trailing white space is removed from each line, and blank
|
||||||
lines are ignored. These paths are processed before any that are listed on the
|
lines are ignored. These paths are processed before any that are listed on the
|
||||||
command line. The filename can be given as "-" to refer to the standard input.
|
command line. The file name can be given as "-" to refer to the standard input.
|
||||||
If <b>--file</b> and <b>--file-list</b> are both specified as "-", patterns are
|
If <b>--file</b> and <b>--file-list</b> are both specified as "-", patterns are
|
||||||
read first. This is useful only when the standard input is a terminal, from
|
read first. This is useful only when the standard input is a terminal, from
|
||||||
which further lines (the list of files) can be read after an end-of-file
|
which further lines (the list of files) can be read after an end-of-file
|
||||||
|
@ -352,17 +359,18 @@ and <b>--only-matching</b>.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-H</b>, <b>--with-filename</b>
|
<b>-H</b>, <b>--with-filename</b>
|
||||||
Force the inclusion of the filename at the start of output lines when searching
|
Force the inclusion of the file name at the start of output lines when
|
||||||
a single file. By default, the filename is not shown in this case. For matching
|
searching a single file. By default, the file name is not shown in this case.
|
||||||
lines, the filename is followed by a colon; for context lines, a hyphen
|
For matching lines, the file name is followed by a colon; for context lines, a
|
||||||
separator is used. If a line number is also being output, it follows the file
|
hyphen separator is used. If a line number is also being output, it follows the
|
||||||
name.
|
file name. When the <b>-M</b> option causes a pattern to match more than one
|
||||||
|
line, only the first is preceded by the file name.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-h</b>, <b>--no-filename</b>
|
<b>-h</b>, <b>--no-filename</b>
|
||||||
Suppress the output filenames when searching multiple files. By default,
|
Suppress the output file names when searching multiple files. By default,
|
||||||
filenames are shown when multiple files are searched. For matching lines, the
|
file names are shown when multiple files are searched. For matching lines, the
|
||||||
filename is followed by a colon; for context lines, a hyphen separator is used.
|
file name is followed by a colon; for context lines, a hyphen separator is used.
|
||||||
If a line number is also being output, it follows the file name.
|
If a line number is also being output, it follows the file name.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -373,7 +381,7 @@ ignored.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-I</b>
|
<b>-I</b>
|
||||||
Treat binary files as never matching. This is equivalent to
|
Ignore binary files. This is equivalent to
|
||||||
<b>--binary-files</b>=<i>without-match</i>.
|
<b>--binary-files</b>=<i>without-match</i>.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
|
@ -406,8 +414,8 @@ If any <b>--include-dir</b> patterns are specified, the only directories that
|
||||||
are processed are those that match one of the patterns (and do not match an
|
are processed are those that match one of the patterns (and do not match an
|
||||||
<b>--exclude-dir</b> pattern). This applies to all directories, whether listed
|
<b>--exclude-dir</b> pattern). This applies to all directories, whether listed
|
||||||
on the command line, obtained from <b>--file-list</b>, or by scanning a parent
|
on the command line, obtained from <b>--file-list</b>, or by scanning a parent
|
||||||
directory. The pattern is a PCRE2 regular expression, and is matched against the
|
directory. The pattern is a PCRE2 regular expression, and is matched against
|
||||||
final component of the directory name, not the entire path. The <b>-F</b>,
|
the final component of the directory name, not the entire path. The <b>-F</b>,
|
||||||
<b>-w</b>, and <b>-x</b> options do not apply to this pattern. The option may be
|
<b>-w</b>, and <b>-x</b> options do not apply to this pattern. The option may be
|
||||||
given any number of times. If a directory matches both <b>--include-dir</b> and
|
given any number of times. If a directory matches both <b>--include-dir</b> and
|
||||||
<b>--exclude-dir</b>, it is excluded. There is no short form for this option.
|
<b>--exclude-dir</b>, it is excluded. There is no short form for this option.
|
||||||
|
@ -442,8 +450,8 @@ unless <b>pcre2grep</b> can determine that it is reading from a terminal (which
|
||||||
is currently possible only in Unix-like environments). Output to terminal is
|
is currently possible only in Unix-like environments). Output to terminal is
|
||||||
normally automatically flushed by the operating system. This option can be
|
normally automatically flushed by the operating system. This option can be
|
||||||
useful when the input or output is attached to a pipe and you do not want
|
useful when the input or output is attached to a pipe and you do not want
|
||||||
<b>pcre2grep</b> to buffer up large amounts of data. However, its use will affect
|
<b>pcre2grep</b> to buffer up large amounts of data. However, its use will
|
||||||
performance, and the <b>-M</b> (multiline) option ceases to work.
|
affect performance, and the <b>-M</b> (multiline) option ceases to work.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>--line-offsets</b>
|
<b>--line-offsets</b>
|
||||||
|
@ -497,18 +505,33 @@ when the PCRE2 library is compiled, with the default default being 10 million.
|
||||||
Allow patterns to match more than one line. When this option is given, patterns
|
Allow patterns to match more than one line. When this option is given, patterns
|
||||||
may usefully contain literal newline characters and internal occurrences of ^
|
may usefully contain literal newline characters and internal occurrences of ^
|
||||||
and $ characters. The output for a successful match may consist of more than
|
and $ characters. The output for a successful match may consist of more than
|
||||||
one line, the last of which is the one in which the match ended. If the matched
|
one line. The first is the line in which the match started, and the last is the
|
||||||
string ends with a newline sequence the output ends at the end of that line.
|
line in which the match ended. If the matched string ends with a newline
|
||||||
|
sequence the output ends at the end of that line.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
When this option is set, the PCRE2 library is called in "multiline" mode.
|
When this option is set, the PCRE2 library is called in "multiline" mode.
|
||||||
|
However, <b>pcre2grep</b> still processes the input line by line. The difference
|
||||||
|
is that a matched string may extend past the end of a line and continue on
|
||||||
|
one or more subsequent lines. The newline sequence must be matched as part of
|
||||||
|
the pattern. For example, to find the phrase "regular expression" in a file
|
||||||
|
where "regular" might be at the end of a line and "expression" at the start of
|
||||||
|
the next line, you could use this command:
|
||||||
|
<pre>
|
||||||
|
pcre2grep -M 'regular\s+expression' <file>
|
||||||
|
</pre>
|
||||||
|
The \s escape sequence matches any white space character, including newlines,
|
||||||
|
and is followed by + so as to match trailing white space on the first line as
|
||||||
|
well as possibly handling a two-character newline sequence.
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
There is a limit to the number of lines that can be matched, imposed by the way
|
There is a limit to the number of lines that can be matched, imposed by the way
|
||||||
that <b>pcre2grep</b> buffers the input file as it scans it. However,
|
that <b>pcre2grep</b> buffers the input file as it scans it. However,
|
||||||
<b>pcre2grep</b> ensures that at least 8K characters or the rest of the document
|
<b>pcre2grep</b> ensures that at least 8K characters or the rest of the file
|
||||||
(whichever is the shorter) are available for forward matching, and similarly
|
(whichever is the shorter) are available for forward matching, and similarly
|
||||||
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
||||||
are guaranteed to be available for lookbehind assertions. This option does not
|
are guaranteed to be available for lookbehind assertions. The <b>-M</b> option
|
||||||
work when input is read line by line (see \fP--line-buffered\fP.)
|
does not work when input is read line by line (see \fP--line-buffered\fP.)
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
|
<b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
|
||||||
|
@ -526,9 +549,9 @@ When the PCRE2 library is built, a default line-ending sequence is specified.
|
||||||
This is normally the standard sequence for the operating system. Unless
|
This is normally the standard sequence for the operating system. Unless
|
||||||
otherwise specified by this option, <b>pcre2grep</b> uses the library's default.
|
otherwise specified by this option, <b>pcre2grep</b> uses the library's default.
|
||||||
The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
|
The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
|
||||||
makes it possible to use <b>pcre2grep</b> to scan files that have come from other
|
makes it possible to use <b>pcre2grep</b> to scan files that have come from
|
||||||
environments without having to modify their line endings. If the data that is
|
other environments without having to modify their line endings. If the data
|
||||||
being scanned does not agree with the convention set by this option,
|
that is being scanned does not agree with the convention set by this option,
|
||||||
<b>pcre2grep</b> may behave in strange ways. Note that this option does not
|
<b>pcre2grep</b> may behave in strange ways. Note that this option does not
|
||||||
apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
|
apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
|
||||||
<b>--include-from</b> options, which are expected to use the operating system's
|
<b>--include-from</b> options, which are expected to use the operating system's
|
||||||
|
@ -537,9 +560,10 @@ standard newline sequence.
|
||||||
<P>
|
<P>
|
||||||
<b>-n</b>, <b>--line-number</b>
|
<b>-n</b>, <b>--line-number</b>
|
||||||
Precede each output line by its line number in the file, followed by a colon
|
Precede each output line by its line number in the file, followed by a colon
|
||||||
for matching lines or a hyphen for context lines. If the filename is also being
|
for matching lines or a hyphen for context lines. If the file name is also
|
||||||
output, it precedes the line number. This option is forced if
|
being output, it precedes the line number. When the <b>-M</b> option causes a
|
||||||
<b>--line-offsets</b> is used.
|
pattern to match more than one line, only the first is preceded by its line
|
||||||
|
number. This option is forced if <b>--line-offsets</b> is used.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<b>--no-jit</b>
|
<b>--no-jit</b>
|
||||||
|
@ -570,7 +594,7 @@ without an argument (see above), if an argument is present, it must be given in
|
||||||
the same shell item, for example, -o3 or --only-matching=2. The comments given
|
the same shell item, for example, -o3 or --only-matching=2. The comments given
|
||||||
for the non-argument case above also apply to this case. If the specified
|
for the non-argument case above also apply to this case. If the specified
|
||||||
capturing parentheses do not exist in the pattern, or were not set in the
|
capturing parentheses do not exist in the pattern, or were not set in the
|
||||||
match, nothing is output unless the file name or line number are being printed.
|
match, nothing is output unless the file name or line number are being output.
|
||||||
<br>
|
<br>
|
||||||
<br>
|
<br>
|
||||||
If this option is given multiple times, multiple substrings are output, in the
|
If this option is given multiple times, multiple substrings are output, in the
|
||||||
|
@ -635,10 +659,10 @@ specified by any of the <b>--include</b> or <b>--exclude</b> options.
|
||||||
<b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
|
<b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
|
||||||
Force the patterns to be anchored (each must start matching at the beginning of
|
Force the patterns to be anchored (each must start matching at the beginning of
|
||||||
a line) and in addition, require them to match entire lines. This is equivalent
|
a line) and in addition, require them to match entire lines. This is equivalent
|
||||||
to having ^ and $ characters at the start and end of each alternative branch in
|
to having ^ and $ characters at the start and end of each alternative top-level
|
||||||
every pattern. This option applies only to the patterns that are matched
|
branch in every pattern. This option applies only to the patterns that are
|
||||||
against the contents of files; it does not apply to patterns specified by any
|
matched against the contents of files; it does not apply to patterns specified
|
||||||
of the <b>--include</b> or <b>--exclude</b> options.
|
by any of the <b>--include</b> or <b>--exclude</b> options.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
|
<br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -677,7 +701,7 @@ Although most of the common options work the same way, a few are different in
|
||||||
<b>pcre2grep</b>. For example, the <b>--include</b> option's argument is a glob
|
<b>pcre2grep</b>. For example, the <b>--include</b> option's argument is a glob
|
||||||
for GNU <b>grep</b>, but a regular expression for <b>pcre2grep</b>. If both the
|
for GNU <b>grep</b>, but a regular expression for <b>pcre2grep</b>. If both the
|
||||||
<b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
|
<b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
|
||||||
without counts, but <b>pcre2grep</b> gives the counts.
|
without counts, but <b>pcre2grep</b> gives the counts as well.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC9" href="#TOC1">OPTIONS WITH DATA</a><br>
|
<br><a name="SEC9" href="#TOC1">OPTIONS WITH DATA</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -722,9 +746,9 @@ message and the line that caused the problem to the standard error stream. If
|
||||||
there are more than 20 such errors, <b>pcre2grep</b> gives up.
|
there are more than 20 such errors, <b>pcre2grep</b> gives up.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The <b>--match-limit</b> option of <b>pcre2grep</b> can be used to set the overall
|
The <b>--match-limit</b> option of <b>pcre2grep</b> can be used to set the
|
||||||
resource limit; there is a second option called <b>--recursion-limit</b> that
|
overall resource limit; there is a second option called <b>--recursion-limit</b>
|
||||||
sets a limit on the amount of memory (usually stack) that is used (see the
|
that sets a limit on the amount of memory (usually stack) that is used (see the
|
||||||
discussion of these options above).
|
discussion of these options above).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC11" href="#TOC1">DIAGNOSTICS</a><br>
|
<br><a name="SEC11" href="#TOC1">DIAGNOSTICS</a><br>
|
||||||
|
@ -737,7 +761,7 @@ affect the return code.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
|
<br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
|
||||||
<P>
|
<P>
|
||||||
<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3), <b>pcre2test</b>(1).
|
<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3).
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC13" href="#TOC1">AUTHOR</a><br>
|
<br><a name="SEC13" href="#TOC1">AUTHOR</a><br>
|
||||||
<P>
|
<P>
|
||||||
|
@ -750,9 +774,9 @@ Cambridge, England.
|
||||||
</P>
|
</P>
|
||||||
<br><a name="SEC14" href="#TOC1">REVISION</a><br>
|
<br><a name="SEC14" href="#TOC1">REVISION</a><br>
|
||||||
<P>
|
<P>
|
||||||
Last updated: 23 November 2014
|
Last updated: 03 January 2015
|
||||||
<br>
|
<br>
|
||||||
Copyright © 1997-2014 University of Cambridge.
|
Copyright © 1997-2015 University of Cambridge.
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||||
|
|
146
doc/pcre2grep.1
146
doc/pcre2grep.1
|
@ -1,4 +1,4 @@
|
||||||
.TH PCRE2GREP 1 "23 November 2014" "PCRE2 10.00"
|
.TH PCRE2GREP 1 "03 January 2015" "PCRE2 10.00"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
pcre2grep - a grep with Perl-compatible regular expressions.
|
pcre2grep - a grep with Perl-compatible regular expressions.
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
|
@ -41,21 +41,22 @@ If no files are specified, \fBpcre2grep\fP reads the standard input. The
|
||||||
standard input can also be referenced by a name consisting of a single hyphen.
|
standard input can also be referenced by a name consisting of a single hyphen.
|
||||||
For example:
|
For example:
|
||||||
.sp
|
.sp
|
||||||
pcre2grep some-pattern /file1 - /file3
|
pcre2grep some-pattern file1 - file3
|
||||||
.sp
|
.sp
|
||||||
By default, each line that matches a pattern is copied to the standard
|
Input files are searched line by line. By default, each line that matches a
|
||||||
output, and if there is more than one file, the file name is output at the
|
pattern is copied to the standard output, and if there is more than one file,
|
||||||
start of each line, followed by a colon. However, there are options that can
|
the file name is output at the start of each line, followed by a colon.
|
||||||
change how \fBpcre2grep\fP behaves. In particular, the \fB-M\fP option makes it
|
However, there are options that can change how \fBpcre2grep\fP behaves. In
|
||||||
possible to search for patterns that span line boundaries. What defines a line
|
particular, the \fB-M\fP option makes it possible to search for strings that
|
||||||
boundary is controlled by the \fB-N\fP (\fB--newline\fP) option.
|
span line boundaries. What defines a line boundary is controlled by the
|
||||||
|
\fB-N\fP (\fB--newline\fP) option.
|
||||||
.P
|
.P
|
||||||
The amount of memory used for buffering files that are being scanned is
|
The amount of memory used for buffering files that are being scanned is
|
||||||
controlled by a parameter that can be set by the \fB--buffer-size\fP option.
|
controlled by a parameter that can be set by the \fB--buffer-size\fP option.
|
||||||
The default value for this parameter is specified when \fBpcre2grep\fP is built,
|
The default value for this parameter is specified when \fBpcre2grep\fP is
|
||||||
with the default default being 20K. A block of memory three times this size is
|
built, with the default default being 20K. A block of memory three times this
|
||||||
used (to allow for buffering "before" and "after" lines). An error occurs if a
|
size is used (to allow for buffering "before" and "after" lines). An error
|
||||||
line overflows the buffer.
|
occurs if a line overflows the buffer.
|
||||||
.P
|
.P
|
||||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
||||||
BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
|
BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
|
||||||
|
@ -122,10 +123,10 @@ to signify multiplication by 1024 or 1024*1024 respectively.
|
||||||
\fB--\fP
|
\fB--\fP
|
||||||
This terminates the list of options. It is useful if the next item on the
|
This terminates the list of options. It is useful if the next item on the
|
||||||
command line starts with a hyphen but is not an option. This allows for the
|
command line starts with a hyphen but is not an option. This allows for the
|
||||||
processing of patterns and filenames that start with hyphens.
|
processing of patterns and file names that start with hyphens.
|
||||||
.TP
|
.TP
|
||||||
\fB-A\fP \fInumber\fP, \fB--after-context=\fP\fInumber\fP
|
\fB-A\fP \fInumber\fP, \fB--after-context=\fP\fInumber\fP
|
||||||
Output \fInumber\fP lines of context after each matching line. If filenames
|
Output \fInumber\fP lines of context after each matching line. If file names
|
||||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||||
colon for the context lines. A line containing "--" is output between each
|
colon for the context lines. A line containing "--" is output between each
|
||||||
group of lines, unless they are in fact contiguous in the input file. The value
|
group of lines, unless they are in fact contiguous in the input file. The value
|
||||||
|
@ -137,7 +138,7 @@ Treat binary files as text. This is equivalent to
|
||||||
\fB--binary-files\fP=\fItext\fP.
|
\fB--binary-files\fP=\fItext\fP.
|
||||||
.TP
|
.TP
|
||||||
\fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP
|
\fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP
|
||||||
Output \fInumber\fP lines of context before each matching line. If filenames
|
Output \fInumber\fP lines of context before each matching line. If file names
|
||||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||||
colon for the context lines. A line containing "--" is output between each
|
colon for the context lines. A line containing "--" is output between each
|
||||||
group of lines, unless they are in fact contiguous in the input file. The value
|
group of lines, unless they are in fact contiguous in the input file. The value
|
||||||
|
@ -153,7 +154,8 @@ processed in the same way as any other file. In this case, when a match
|
||||||
succeeds, the output may be binary garbage, which can have nasty effects if
|
succeeds, the output may be binary garbage, which can have nasty effects if
|
||||||
sent to a terminal. If the word is "without-match", which is equivalent to the
|
sent to a terminal. If the word is "without-match", which is equivalent to the
|
||||||
\fB-I\fP option, binary files are not processed at all; they are assumed not to
|
\fB-I\fP option, binary files are not processed at all; they are assumed not to
|
||||||
be of interest.
|
be of interest and are skipped without causing any output or affecting the
|
||||||
|
return code.
|
||||||
.TP
|
.TP
|
||||||
\fB--buffer-size=\fP\fInumber\fP
|
\fB--buffer-size=\fP\fInumber\fP
|
||||||
Set the parameter that controls how much memory is used for buffering files
|
Set the parameter that controls how much memory is used for buffering files
|
||||||
|
@ -164,10 +166,14 @@ Output \fInumber\fP lines of context both before and after each matching line.
|
||||||
This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value.
|
This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value.
|
||||||
.TP
|
.TP
|
||||||
\fB-c\fP, \fB--count\fP
|
\fB-c\fP, \fB--count\fP
|
||||||
Do not output individual lines from the files that are being scanned; instead
|
Do not output lines from the files that are being scanned; instead output the
|
||||||
output the number of lines that would otherwise have been shown. If no lines
|
number of matches (or non-matches if \fB-v\fP is used) that would otherwise
|
||||||
are selected, the number zero is output. If several files are are being
|
have caused lines to be shown. By default, this count is the same as the number
|
||||||
scanned, a count is output for each of them. However, if the
|
of suppressed lines, but if the \fB-M\fP (multiline) option is used (without
|
||||||
|
\fB-v\fP), there may be more suppressed lines than the number of matches.
|
||||||
|
.sp
|
||||||
|
If no lines are selected, the number zero is output. If several files are are
|
||||||
|
being scanned, a count is output for each of them. However, if the
|
||||||
\fB--files-with-matches\fP option is also used, only those files whose counts
|
\fB--files-with-matches\fP option is also used, only those files whose counts
|
||||||
are greater than zero are listed. When \fB-c\fP is used, the \fB-A\fP,
|
are greater than zero are listed. When \fB-c\fP is used, the \fB-A\fP,
|
||||||
\fB-B\fP, and \fB-C\fP options are ignored.
|
\fB-B\fP, and \fB-C\fP options are ignored.
|
||||||
|
@ -229,10 +235,10 @@ of the line that matched.
|
||||||
Files (but not directories) whose names match the pattern are skipped without
|
Files (but not directories) whose names match the pattern are skipped without
|
||||||
being processed. This applies to all files, whether listed on the command line,
|
being processed. This applies to all files, whether listed on the command line,
|
||||||
obtained from \fB--file-list\fP, or by scanning a directory. The pattern is a
|
obtained from \fB--file-list\fP, or by scanning a directory. The pattern is a
|
||||||
PCRE2 regular expression, and is matched against the final component of the file
|
PCRE2 regular expression, and is matched against the final component of the
|
||||||
name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do not
|
file name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do
|
||||||
apply to this pattern. The option may be given any number of times in order to
|
not apply to this pattern. The option may be given any number of times in order
|
||||||
specify multiple patterns. If a file name matches both an \fB--include\fP
|
to specify multiple patterns. If a file name matches both an \fB--include\fP
|
||||||
and an \fB--exclude\fP pattern, it is excluded. There is no short form for this
|
and an \fB--exclude\fP pattern, it is excluded. There is no short form for this
|
||||||
option.
|
option.
|
||||||
.TP
|
.TP
|
||||||
|
@ -276,7 +282,7 @@ also the comments about multiple patterns versus a single pattern with
|
||||||
alternatives in the description of \fB-e\fP above.
|
alternatives in the description of \fB-e\fP above.
|
||||||
.sp
|
.sp
|
||||||
If this option is given more than once, all the specified files are
|
If this option is given more than once, all the specified files are
|
||||||
read. A data line is output if any of the patterns match it. A filename can
|
read. A data line is output if any of the patterns match it. A file name can
|
||||||
be given as "-" to refer to the standard input. When \fB-f\fP is used, patterns
|
be given as "-" to refer to the standard input. When \fB-f\fP is used, patterns
|
||||||
specified on the command line using \fB-e\fP may also be present; they are
|
specified on the command line using \fB-e\fP may also be present; they are
|
||||||
tested before the file's patterns. However, no other pattern is taken from the
|
tested before the file's patterns. However, no other pattern is taken from the
|
||||||
|
@ -286,7 +292,7 @@ command line; all arguments are treated as the names of paths to be searched.
|
||||||
Read a list of files and/or directories that are to be scanned from the given
|
Read a list of files and/or directories that are to be scanned from the given
|
||||||
file, one per line. Trailing white space is removed from each line, and blank
|
file, one per line. Trailing white space is removed from each line, and blank
|
||||||
lines are ignored. These paths are processed before any that are listed on the
|
lines are ignored. These paths are processed before any that are listed on the
|
||||||
command line. The filename can be given as "-" to refer to the standard input.
|
command line. The file name can be given as "-" to refer to the standard input.
|
||||||
If \fB--file\fP and \fB--file-list\fP are both specified as "-", patterns are
|
If \fB--file\fP and \fB--file-list\fP are both specified as "-", patterns are
|
||||||
read first. This is useful only when the standard input is a terminal, from
|
read first. This is useful only when the standard input is a terminal, from
|
||||||
which further lines (the list of files) can be read after an end-of-file
|
which further lines (the list of files) can be read after an end-of-file
|
||||||
|
@ -302,16 +308,17 @@ shown separately. This option is mutually exclusive with \fB--line-offsets\fP
|
||||||
and \fB--only-matching\fP.
|
and \fB--only-matching\fP.
|
||||||
.TP
|
.TP
|
||||||
\fB-H\fP, \fB--with-filename\fP
|
\fB-H\fP, \fB--with-filename\fP
|
||||||
Force the inclusion of the filename at the start of output lines when searching
|
Force the inclusion of the file name at the start of output lines when
|
||||||
a single file. By default, the filename is not shown in this case. For matching
|
searching a single file. By default, the file name is not shown in this case.
|
||||||
lines, the filename is followed by a colon; for context lines, a hyphen
|
For matching lines, the file name is followed by a colon; for context lines, a
|
||||||
separator is used. If a line number is also being output, it follows the file
|
hyphen separator is used. If a line number is also being output, it follows the
|
||||||
name.
|
file name. When the \fB-M\fP option causes a pattern to match more than one
|
||||||
|
line, only the first is preceded by the file name.
|
||||||
.TP
|
.TP
|
||||||
\fB-h\fP, \fB--no-filename\fP
|
\fB-h\fP, \fB--no-filename\fP
|
||||||
Suppress the output filenames when searching multiple files. By default,
|
Suppress the output file names when searching multiple files. By default,
|
||||||
filenames are shown when multiple files are searched. For matching lines, the
|
file names are shown when multiple files are searched. For matching lines, the
|
||||||
filename is followed by a colon; for context lines, a hyphen separator is used.
|
file name is followed by a colon; for context lines, a hyphen separator is used.
|
||||||
If a line number is also being output, it follows the file name.
|
If a line number is also being output, it follows the file name.
|
||||||
.TP
|
.TP
|
||||||
\fB--help\fP
|
\fB--help\fP
|
||||||
|
@ -320,7 +327,7 @@ type support, and then exit. Anything else on the command line is
|
||||||
ignored.
|
ignored.
|
||||||
.TP
|
.TP
|
||||||
\fB-I\fP
|
\fB-I\fP
|
||||||
Treat binary files as never matching. This is equivalent to
|
Ignore binary files. This is equivalent to
|
||||||
\fB--binary-files\fP=\fIwithout-match\fP.
|
\fB--binary-files\fP=\fIwithout-match\fP.
|
||||||
.TP
|
.TP
|
||||||
\fB-i\fP, \fB--ignore-case\fP
|
\fB-i\fP, \fB--ignore-case\fP
|
||||||
|
@ -349,8 +356,8 @@ If any \fB--include-dir\fP patterns are specified, the only directories that
|
||||||
are processed are those that match one of the patterns (and do not match an
|
are processed are those that match one of the patterns (and do not match an
|
||||||
\fB--exclude-dir\fP pattern). This applies to all directories, whether listed
|
\fB--exclude-dir\fP pattern). This applies to all directories, whether listed
|
||||||
on the command line, obtained from \fB--file-list\fP, or by scanning a parent
|
on the command line, obtained from \fB--file-list\fP, or by scanning a parent
|
||||||
directory. The pattern is a PCRE2 regular expression, and is matched against the
|
directory. The pattern is a PCRE2 regular expression, and is matched against
|
||||||
final component of the directory name, not the entire path. The \fB-F\fP,
|
the final component of the directory name, not the entire path. The \fB-F\fP,
|
||||||
\fB-w\fP, and \fB-x\fP options do not apply to this pattern. The option may be
|
\fB-w\fP, and \fB-x\fP options do not apply to this pattern. The option may be
|
||||||
given any number of times. If a directory matches both \fB--include-dir\fP and
|
given any number of times. If a directory matches both \fB--include-dir\fP and
|
||||||
\fB--exclude-dir\fP, it is excluded. There is no short form for this option.
|
\fB--exclude-dir\fP, it is excluded. There is no short form for this option.
|
||||||
|
@ -381,8 +388,8 @@ unless \fBpcre2grep\fP can determine that it is reading from a terminal (which
|
||||||
is currently possible only in Unix-like environments). Output to terminal is
|
is currently possible only in Unix-like environments). Output to terminal is
|
||||||
normally automatically flushed by the operating system. This option can be
|
normally automatically flushed by the operating system. This option can be
|
||||||
useful when the input or output is attached to a pipe and you do not want
|
useful when the input or output is attached to a pipe and you do not want
|
||||||
\fBpcre2grep\fP to buffer up large amounts of data. However, its use will affect
|
\fBpcre2grep\fP to buffer up large amounts of data. However, its use will
|
||||||
performance, and the \fB-M\fP (multiline) option ceases to work.
|
affect performance, and the \fB-M\fP (multiline) option ceases to work.
|
||||||
.TP
|
.TP
|
||||||
\fB--line-offsets\fP
|
\fB--line-offsets\fP
|
||||||
Instead of showing lines or parts of lines that match, show each match as a
|
Instead of showing lines or parts of lines that match, show each match as a
|
||||||
|
@ -429,17 +436,31 @@ when the PCRE2 library is compiled, with the default default being 10 million.
|
||||||
Allow patterns to match more than one line. When this option is given, patterns
|
Allow patterns to match more than one line. When this option is given, patterns
|
||||||
may usefully contain literal newline characters and internal occurrences of ^
|
may usefully contain literal newline characters and internal occurrences of ^
|
||||||
and $ characters. The output for a successful match may consist of more than
|
and $ characters. The output for a successful match may consist of more than
|
||||||
one line, the last of which is the one in which the match ended. If the matched
|
one line. The first is the line in which the match started, and the last is the
|
||||||
string ends with a newline sequence the output ends at the end of that line.
|
line in which the match ended. If the matched string ends with a newline
|
||||||
|
sequence the output ends at the end of that line.
|
||||||
.sp
|
.sp
|
||||||
When this option is set, the PCRE2 library is called in "multiline" mode.
|
When this option is set, the PCRE2 library is called in "multiline" mode.
|
||||||
|
However, \fBpcre2grep\fP still processes the input line by line. The difference
|
||||||
|
is that a matched string may extend past the end of a line and continue on
|
||||||
|
one or more subsequent lines. The newline sequence must be matched as part of
|
||||||
|
the pattern. For example, to find the phrase "regular expression" in a file
|
||||||
|
where "regular" might be at the end of a line and "expression" at the start of
|
||||||
|
the next line, you could use this command:
|
||||||
|
.sp
|
||||||
|
pcre2grep -M 'regular\es+expression' <file>
|
||||||
|
.sp
|
||||||
|
The \es escape sequence matches any white space character, including newlines,
|
||||||
|
and is followed by + so as to match trailing white space on the first line as
|
||||||
|
well as possibly handling a two-character newline sequence.
|
||||||
|
.sp
|
||||||
There is a limit to the number of lines that can be matched, imposed by the way
|
There is a limit to the number of lines that can be matched, imposed by the way
|
||||||
that \fBpcre2grep\fP buffers the input file as it scans it. However,
|
that \fBpcre2grep\fP buffers the input file as it scans it. However,
|
||||||
\fBpcre2grep\fP ensures that at least 8K characters or the rest of the document
|
\fBpcre2grep\fP ensures that at least 8K characters or the rest of the file
|
||||||
(whichever is the shorter) are available for forward matching, and similarly
|
(whichever is the shorter) are available for forward matching, and similarly
|
||||||
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
||||||
are guaranteed to be available for lookbehind assertions. This option does not
|
are guaranteed to be available for lookbehind assertions. The \fB-M\fP option
|
||||||
work when input is read line by line (see \fP--line-buffered\fP.)
|
does not work when input is read line by line (see \fP--line-buffered\fP.)
|
||||||
.TP
|
.TP
|
||||||
\fB-N\fP \fInewline-type\fP, \fB--newline\fP=\fInewline-type\fP
|
\fB-N\fP \fInewline-type\fP, \fB--newline\fP=\fInewline-type\fP
|
||||||
The PCRE2 library supports five different conventions for indicating
|
The PCRE2 library supports five different conventions for indicating
|
||||||
|
@ -455,9 +476,9 @@ When the PCRE2 library is built, a default line-ending sequence is specified.
|
||||||
This is normally the standard sequence for the operating system. Unless
|
This is normally the standard sequence for the operating system. Unless
|
||||||
otherwise specified by this option, \fBpcre2grep\fP uses the library's default.
|
otherwise specified by this option, \fBpcre2grep\fP uses the library's default.
|
||||||
The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
|
The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
|
||||||
makes it possible to use \fBpcre2grep\fP to scan files that have come from other
|
makes it possible to use \fBpcre2grep\fP to scan files that have come from
|
||||||
environments without having to modify their line endings. If the data that is
|
other environments without having to modify their line endings. If the data
|
||||||
being scanned does not agree with the convention set by this option,
|
that is being scanned does not agree with the convention set by this option,
|
||||||
\fBpcre2grep\fP may behave in strange ways. Note that this option does not
|
\fBpcre2grep\fP may behave in strange ways. Note that this option does not
|
||||||
apply to files specified by the \fB-f\fP, \fB--exclude-from\fP, or
|
apply to files specified by the \fB-f\fP, \fB--exclude-from\fP, or
|
||||||
\fB--include-from\fP options, which are expected to use the operating system's
|
\fB--include-from\fP options, which are expected to use the operating system's
|
||||||
|
@ -465,9 +486,10 @@ standard newline sequence.
|
||||||
.TP
|
.TP
|
||||||
\fB-n\fP, \fB--line-number\fP
|
\fB-n\fP, \fB--line-number\fP
|
||||||
Precede each output line by its line number in the file, followed by a colon
|
Precede each output line by its line number in the file, followed by a colon
|
||||||
for matching lines or a hyphen for context lines. If the filename is also being
|
for matching lines or a hyphen for context lines. If the file name is also
|
||||||
output, it precedes the line number. This option is forced if
|
being output, it precedes the line number. When the \fB-M\fP option causes a
|
||||||
\fB--line-offsets\fP is used.
|
pattern to match more than one line, only the first is preceded by its line
|
||||||
|
number. This option is forced if \fB--line-offsets\fP is used.
|
||||||
.TP
|
.TP
|
||||||
\fB--no-jit\fP
|
\fB--no-jit\fP
|
||||||
If the PCRE2 library is built with support for just-in-time compiling (which
|
If the PCRE2 library is built with support for just-in-time compiling (which
|
||||||
|
@ -495,7 +517,7 @@ without an argument (see above), if an argument is present, it must be given in
|
||||||
the same shell item, for example, -o3 or --only-matching=2. The comments given
|
the same shell item, for example, -o3 or --only-matching=2. The comments given
|
||||||
for the non-argument case above also apply to this case. If the specified
|
for the non-argument case above also apply to this case. If the specified
|
||||||
capturing parentheses do not exist in the pattern, or were not set in the
|
capturing parentheses do not exist in the pattern, or were not set in the
|
||||||
match, nothing is output unless the file name or line number are being printed.
|
match, nothing is output unless the file name or line number are being output.
|
||||||
.sp
|
.sp
|
||||||
If this option is given multiple times, multiple substrings are output, in the
|
If this option is given multiple times, multiple substrings are output, in the
|
||||||
order the options are given. For example, -o3 -o1 -o3 causes the substrings
|
order the options are given. For example, -o3 -o1 -o3 causes the substrings
|
||||||
|
@ -549,10 +571,10 @@ specified by any of the \fB--include\fP or \fB--exclude\fP options.
|
||||||
\fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP
|
\fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP
|
||||||
Force the patterns to be anchored (each must start matching at the beginning of
|
Force the patterns to be anchored (each must start matching at the beginning of
|
||||||
a line) and in addition, require them to match entire lines. This is equivalent
|
a line) and in addition, require them to match entire lines. This is equivalent
|
||||||
to having ^ and $ characters at the start and end of each alternative branch in
|
to having ^ and $ characters at the start and end of each alternative top-level
|
||||||
every pattern. This option applies only to the patterns that are matched
|
branch in every pattern. This option applies only to the patterns that are
|
||||||
against the contents of files; it does not apply to patterns specified by any
|
matched against the contents of files; it does not apply to patterns specified
|
||||||
of the \fB--include\fP or \fB--exclude\fP options.
|
by any of the \fB--include\fP or \fB--exclude\fP options.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "ENVIRONMENT VARIABLES"
|
.SH "ENVIRONMENT VARIABLES"
|
||||||
|
@ -596,7 +618,7 @@ Although most of the common options work the same way, a few are different in
|
||||||
\fBpcre2grep\fP. For example, the \fB--include\fP option's argument is a glob
|
\fBpcre2grep\fP. For example, the \fB--include\fP option's argument is a glob
|
||||||
for GNU \fBgrep\fP, but a regular expression for \fBpcre2grep\fP. If both the
|
for GNU \fBgrep\fP, but a regular expression for \fBpcre2grep\fP. If both the
|
||||||
\fB-c\fP and \fB-l\fP options are given, GNU grep lists only file names,
|
\fB-c\fP and \fB-l\fP options are given, GNU grep lists only file names,
|
||||||
without counts, but \fBpcre2grep\fP gives the counts.
|
without counts, but \fBpcre2grep\fP gives the counts as well.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH "OPTIONS WITH DATA"
|
.SH "OPTIONS WITH DATA"
|
||||||
|
@ -642,9 +664,9 @@ in these circumstances. If this happens, \fBpcre2grep\fP outputs an error
|
||||||
message and the line that caused the problem to the standard error stream. If
|
message and the line that caused the problem to the standard error stream. If
|
||||||
there are more than 20 such errors, \fBpcre2grep\fP gives up.
|
there are more than 20 such errors, \fBpcre2grep\fP gives up.
|
||||||
.P
|
.P
|
||||||
The \fB--match-limit\fP option of \fBpcre2grep\fP can be used to set the overall
|
The \fB--match-limit\fP option of \fBpcre2grep\fP can be used to set the
|
||||||
resource limit; there is a second option called \fB--recursion-limit\fP that
|
overall resource limit; there is a second option called \fB--recursion-limit\fP
|
||||||
sets a limit on the amount of memory (usually stack) that is used (see the
|
that sets a limit on the amount of memory (usually stack) that is used (see the
|
||||||
discussion of these options above).
|
discussion of these options above).
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
|
@ -661,7 +683,7 @@ affect the return code.
|
||||||
.SH "SEE ALSO"
|
.SH "SEE ALSO"
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
\fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3), \fBpcre2test\fP(1).
|
\fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3).
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SH AUTHOR
|
.SH AUTHOR
|
||||||
|
@ -678,6 +700,6 @@ Cambridge, England.
|
||||||
.rs
|
.rs
|
||||||
.sp
|
.sp
|
||||||
.nf
|
.nf
|
||||||
Last updated: 23 November 2014
|
Last updated: 03 January 2015
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
.fi
|
.fi
|
||||||
|
|
|
@ -40,15 +40,15 @@ DESCRIPTION
|
||||||
standard input can also be referenced by a name consisting of a single
|
standard input can also be referenced by a name consisting of a single
|
||||||
hyphen. For example:
|
hyphen. For example:
|
||||||
|
|
||||||
pcre2grep some-pattern /file1 - /file3
|
pcre2grep some-pattern file1 - file3
|
||||||
|
|
||||||
By default, each line that matches a pattern is copied to the standard
|
Input files are searched line by line. By default, each line that
|
||||||
output, and if there is more than one file, the file name is output at
|
matches a pattern is copied to the standard output, and if there is
|
||||||
the start of each line, followed by a colon. However, there are options
|
more than one file, the file name is output at the start of each line,
|
||||||
that can change how pcre2grep behaves. In particular, the -M option
|
followed by a colon. However, there are options that can change how
|
||||||
makes it possible to search for patterns that span line boundaries.
|
pcre2grep behaves. In particular, the -M option makes it possible to
|
||||||
What defines a line boundary is controlled by the -N (--newline)
|
search for strings that span line boundaries. What defines a line
|
||||||
option.
|
boundary is controlled by the -N (--newline) option.
|
||||||
|
|
||||||
The amount of memory used for buffering files that are being scanned is
|
The amount of memory used for buffering files that are being scanned is
|
||||||
controlled by a parameter that can be set by the --buffer-size option.
|
controlled by a parameter that can be set by the --buffer-size option.
|
||||||
|
@ -122,13 +122,13 @@ OPTIONS
|
||||||
|
|
||||||
-- This terminates the list of options. It is useful if the next
|
-- This terminates the list of options. It is useful if the next
|
||||||
item on the command line starts with a hyphen but is not an
|
item on the command line starts with a hyphen but is not an
|
||||||
option. This allows for the processing of patterns and file-
|
option. This allows for the processing of patterns and file
|
||||||
names that start with hyphens.
|
names that start with hyphens.
|
||||||
|
|
||||||
-A number, --after-context=number
|
-A number, --after-context=number
|
||||||
Output number lines of context after each matching line. If
|
Output number lines of context after each matching line. If
|
||||||
filenames and/or line numbers are being output, a hyphen sep-
|
file names and/or line numbers are being output, a hyphen
|
||||||
arator is used instead of a colon for the context lines. A
|
separator is used instead of a colon for the context lines. A
|
||||||
line containing "--" is output between each group of lines,
|
line containing "--" is output between each group of lines,
|
||||||
unless they are in fact contiguous in the input file. The
|
unless they are in fact contiguous in the input file. The
|
||||||
value of number is expected to be relatively small. However,
|
value of number is expected to be relatively small. However,
|
||||||
|
@ -141,8 +141,8 @@ OPTIONS
|
||||||
|
|
||||||
-B number, --before-context=number
|
-B number, --before-context=number
|
||||||
Output number lines of context before each matching line. If
|
Output number lines of context before each matching line. If
|
||||||
filenames and/or line numbers are being output, a hyphen sep-
|
file names and/or line numbers are being output, a hyphen
|
||||||
arator is used instead of a colon for the context lines. A
|
separator is used instead of a colon for the context lines. A
|
||||||
line containing "--" is output between each group of lines,
|
line containing "--" is output between each group of lines,
|
||||||
unless they are in fact contiguous in the input file. The
|
unless they are in fact contiguous in the input file. The
|
||||||
value of number is expected to be relatively small. However,
|
value of number is expected to be relatively small. However,
|
||||||
|
@ -160,7 +160,8 @@ OPTIONS
|
||||||
which can have nasty effects if sent to a terminal. If the
|
which can have nasty effects if sent to a terminal. If the
|
||||||
word is "without-match", which is equivalent to the -I
|
word is "without-match", which is equivalent to the -I
|
||||||
option, binary files are not processed at all; they are
|
option, binary files are not processed at all; they are
|
||||||
assumed not to be of interest.
|
assumed not to be of interest and are skipped without causing
|
||||||
|
any output or affecting the return code.
|
||||||
|
|
||||||
--buffer-size=number
|
--buffer-size=number
|
||||||
Set the parameter that controls how much memory is used for
|
Set the parameter that controls how much memory is used for
|
||||||
|
@ -172,14 +173,20 @@ OPTIONS
|
||||||
to the same value.
|
to the same value.
|
||||||
|
|
||||||
-c, --count
|
-c, --count
|
||||||
Do not output individual lines from the files that are being
|
Do not output lines from the files that are being scanned;
|
||||||
scanned; instead output the number of lines that would other-
|
instead output the number of matches (or non-matches if -v is
|
||||||
wise have been shown. If no lines are selected, the number
|
used) that would otherwise have caused lines to be shown. By
|
||||||
zero is output. If several files are are being scanned, a
|
default, this count is the same as the number of suppressed
|
||||||
count is output for each of them. However, if the --files-
|
lines, but if the -M (multiline) option is used (without -v),
|
||||||
with-matches option is also used, only those files whose
|
there may be more suppressed lines than the number of
|
||||||
counts are greater than zero are listed. When -c is used, the
|
matches.
|
||||||
-A, -B, and -C options are ignored.
|
|
||||||
|
If no lines are selected, the number zero is output. If sev-
|
||||||
|
eral files are are being scanned, a count is output for each
|
||||||
|
of them. However, if the --files-with-matches option is also
|
||||||
|
used, only those files whose counts are greater than zero are
|
||||||
|
listed. When -c is used, the -A, -B, and -C options are
|
||||||
|
ignored.
|
||||||
|
|
||||||
--colour, --color
|
--colour, --color
|
||||||
If this option is given without any data, it is equivalent to
|
If this option is given without any data, it is equivalent to
|
||||||
|
@ -301,7 +308,7 @@ OPTIONS
|
||||||
|
|
||||||
If this option is given more than once, all the specified
|
If this option is given more than once, all the specified
|
||||||
files are read. A data line is output if any of the patterns
|
files are read. A data line is output if any of the patterns
|
||||||
match it. A filename can be given as "-" to refer to the
|
match it. A file name can be given as "-" to refer to the
|
||||||
standard input. When -f is used, patterns specified on the
|
standard input. When -f is used, patterns specified on the
|
||||||
command line using -e may also be present; they are tested
|
command line using -e may also be present; they are tested
|
||||||
before the file's patterns. However, no other pattern is
|
before the file's patterns. However, no other pattern is
|
||||||
|
@ -313,7 +320,7 @@ OPTIONS
|
||||||
scanned from the given file, one per line. Trailing white
|
scanned from the given file, one per line. Trailing white
|
||||||
space is removed from each line, and blank lines are ignored.
|
space is removed from each line, and blank lines are ignored.
|
||||||
These paths are processed before any that are listed on the
|
These paths are processed before any that are listed on the
|
||||||
command line. The filename can be given as "-" to refer to
|
command line. The file name can be given as "-" to refer to
|
||||||
the standard input. If --file and --file-list are both spec-
|
the standard input. If --file and --file-list are both spec-
|
||||||
ified as "-", patterns are read first. This is useful only
|
ified as "-", patterns are read first. This is useful only
|
||||||
when the standard input is a terminal, from which further
|
when the standard input is a terminal, from which further
|
||||||
|
@ -331,17 +338,19 @@ OPTIONS
|
||||||
offsets and --only-matching.
|
offsets and --only-matching.
|
||||||
|
|
||||||
-H, --with-filename
|
-H, --with-filename
|
||||||
Force the inclusion of the filename at the start of output
|
Force the inclusion of the file name at the start of output
|
||||||
lines when searching a single file. By default, the filename
|
lines when searching a single file. By default, the file name
|
||||||
is not shown in this case. For matching lines, the filename
|
is not shown in this case. For matching lines, the file name
|
||||||
is followed by a colon; for context lines, a hyphen separator
|
is followed by a colon; for context lines, a hyphen separator
|
||||||
is used. If a line number is also being output, it follows
|
is used. If a line number is also being output, it follows
|
||||||
the file name.
|
the file name. When the -M option causes a pattern to match
|
||||||
|
more than one line, only the first is preceded by the file
|
||||||
|
name.
|
||||||
|
|
||||||
-h, --no-filename
|
-h, --no-filename
|
||||||
Suppress the output filenames when searching multiple files.
|
Suppress the output file names when searching multiple files.
|
||||||
By default, filenames are shown when multiple files are
|
By default, file names are shown when multiple files are
|
||||||
searched. For matching lines, the filename is followed by a
|
searched. For matching lines, the file name is followed by a
|
||||||
colon; for context lines, a hyphen separator is used. If a
|
colon; for context lines, a hyphen separator is used. If a
|
||||||
line number is also being output, it follows the file name.
|
line number is also being output, it follows the file name.
|
||||||
|
|
||||||
|
@ -349,8 +358,8 @@ OPTIONS
|
||||||
options and file type support, and then exit. Anything else
|
options and file type support, and then exit. Anything else
|
||||||
on the command line is ignored.
|
on the command line is ignored.
|
||||||
|
|
||||||
-I Treat binary files as never matching. This is equivalent to
|
-I Ignore binary files. This is equivalent to --binary-
|
||||||
--binary-files=without-match.
|
files=without-match.
|
||||||
|
|
||||||
-i, --ignore-case
|
-i, --ignore-case
|
||||||
Ignore upper/lower case distinctions during comparisons.
|
Ignore upper/lower case distinctions during comparisons.
|
||||||
|
@ -478,20 +487,37 @@ OPTIONS
|
||||||
is given, patterns may usefully contain literal newline char-
|
is given, patterns may usefully contain literal newline char-
|
||||||
acters and internal occurrences of ^ and $ characters. The
|
acters and internal occurrences of ^ and $ characters. The
|
||||||
output for a successful match may consist of more than one
|
output for a successful match may consist of more than one
|
||||||
line, the last of which is the one in which the match ended.
|
line. The first is the line in which the match started, and
|
||||||
If the matched string ends with a newline sequence the output
|
the last is the line in which the match ended. If the matched
|
||||||
ends at the end of that line.
|
string ends with a newline sequence the output ends at the
|
||||||
|
end of that line.
|
||||||
|
|
||||||
When this option is set, the PCRE2 library is called in "mul-
|
When this option is set, the PCRE2 library is called in "mul-
|
||||||
tiline" mode. There is a limit to the number of lines that
|
tiline" mode. However, pcre2grep still processes the input
|
||||||
can be matched, imposed by the way that pcre2grep buffers the
|
line by line. The difference is that a matched string may
|
||||||
input file as it scans it. However, pcre2grep ensures that at
|
extend past the end of a line and continue on one or more
|
||||||
least 8K characters or the rest of the document (whichever is
|
subsequent lines. The newline sequence must be matched as
|
||||||
the shorter) are available for forward matching, and simi-
|
part of the pattern. For example, to find the phrase "regular
|
||||||
larly the previous 8K characters (or all the previous charac-
|
expression" in a file where "regular" might be at the end of
|
||||||
ters, if fewer than 8K) are guaranteed to be available for
|
a line and "expression" at the start of the next line, you
|
||||||
lookbehind assertions. This option does not work when input
|
could use this command:
|
||||||
is read line by line (see --line-buffered.)
|
|
||||||
|
pcre2grep -M 'regular\s+expression' <file>
|
||||||
|
|
||||||
|
The \s escape sequence matches any white space character,
|
||||||
|
including newlines, and is followed by + so as to match
|
||||||
|
trailing white space on the first line as well as possibly
|
||||||
|
handling a two-character newline sequence.
|
||||||
|
|
||||||
|
There is a limit to the number of lines that can be matched,
|
||||||
|
imposed by the way that pcre2grep buffers the input file as
|
||||||
|
it scans it. However, pcre2grep ensures that at least 8K
|
||||||
|
characters or the rest of the file (whichever is the shorter)
|
||||||
|
are available for forward matching, and similarly the previ-
|
||||||
|
ous 8K characters (or all the previous characters, if fewer
|
||||||
|
than 8K) are guaranteed to be available for lookbehind asser-
|
||||||
|
tions. The -M option does not work when input is read line by
|
||||||
|
line (see --line-buffered.)
|
||||||
|
|
||||||
-N newline-type, --newline=newline-type
|
-N newline-type, --newline=newline-type
|
||||||
The PCRE2 library supports five different conventions for
|
The PCRE2 library supports five different conventions for
|
||||||
|
@ -522,8 +548,10 @@ OPTIONS
|
||||||
-n, --line-number
|
-n, --line-number
|
||||||
Precede each output line by its line number in the file, fol-
|
Precede each output line by its line number in the file, fol-
|
||||||
lowed by a colon for matching lines or a hyphen for context
|
lowed by a colon for matching lines or a hyphen for context
|
||||||
lines. If the filename is also being output, it precedes the
|
lines. If the file name is also being output, it precedes the
|
||||||
line number. This option is forced if --line-offsets is used.
|
line number. When the -M option causes a pattern to match
|
||||||
|
more than one line, only the first is preceded by its line
|
||||||
|
number. This option is forced if --line-offsets is used.
|
||||||
|
|
||||||
--no-jit If the PCRE2 library is built with support for just-in-time
|
--no-jit If the PCRE2 library is built with support for just-in-time
|
||||||
compiling (which speeds up matching), pcre2grep automatically
|
compiling (which speeds up matching), pcre2grep automatically
|
||||||
|
@ -555,8 +583,8 @@ OPTIONS
|
||||||
The comments given for the non-argument case above also apply
|
The comments given for the non-argument case above also apply
|
||||||
to this case. If the specified capturing parentheses do not
|
to this case. If the specified capturing parentheses do not
|
||||||
exist in the pattern, or were not set in the match, nothing
|
exist in the pattern, or were not set in the match, nothing
|
||||||
is output unless the file name or line number are being
|
is output unless the file name or line number are being out-
|
||||||
printed.
|
put.
|
||||||
|
|
||||||
If this option is given multiple times, multiple substrings
|
If this option is given multiple times, multiple substrings
|
||||||
are output, in the order the options are given. For example,
|
are output, in the order the options are given. For example,
|
||||||
|
@ -617,11 +645,11 @@ OPTIONS
|
||||||
Force the patterns to be anchored (each must start matching
|
Force the patterns to be anchored (each must start matching
|
||||||
at the beginning of a line) and in addition, require them to
|
at the beginning of a line) and in addition, require them to
|
||||||
match entire lines. This is equivalent to having ^ and $
|
match entire lines. This is equivalent to having ^ and $
|
||||||
characters at the start and end of each alternative branch in
|
characters at the start and end of each alternative top-level
|
||||||
every pattern. This option applies only to the patterns that
|
branch in every pattern. This option applies only to the pat-
|
||||||
are matched against the contents of files; it does not apply
|
terns that are matched against the contents of files; it does
|
||||||
to patterns specified by any of the --include or --exclude
|
not apply to patterns specified by any of the --include or
|
||||||
options.
|
--exclude options.
|
||||||
|
|
||||||
|
|
||||||
ENVIRONMENT VARIABLES
|
ENVIRONMENT VARIABLES
|
||||||
|
@ -662,7 +690,7 @@ OPTIONS COMPATIBILITY
|
||||||
ferent in pcre2grep. For example, the --include option's argument is a
|
ferent in pcre2grep. For example, the --include option's argument is a
|
||||||
glob for GNU grep, but a regular expression for pcre2grep. If both the
|
glob for GNU grep, but a regular expression for pcre2grep. If both the
|
||||||
-c and -l options are given, GNU grep lists only file names, without
|
-c and -l options are given, GNU grep lists only file names, without
|
||||||
counts, but pcre2grep gives the counts.
|
counts, but pcre2grep gives the counts as well.
|
||||||
|
|
||||||
|
|
||||||
OPTIONS WITH DATA
|
OPTIONS WITH DATA
|
||||||
|
@ -725,7 +753,7 @@ DIAGNOSTICS
|
||||||
|
|
||||||
SEE ALSO
|
SEE ALSO
|
||||||
|
|
||||||
pcre2pattern(3), pcre2syntax(3), pcre2test(1).
|
pcre2pattern(3), pcre2syntax(3).
|
||||||
|
|
||||||
|
|
||||||
AUTHOR
|
AUTHOR
|
||||||
|
@ -737,5 +765,5 @@ AUTHOR
|
||||||
|
|
||||||
REVISION
|
REVISION
|
||||||
|
|
||||||
Last updated: 23 November 2014
|
Last updated: 03 January 2015
|
||||||
Copyright (c) 1997-2014 University of Cambridge.
|
Copyright (c) 1997-2015 University of Cambridge.
|
||||||
|
|
Loading…
Reference in New Issue