Update pcre2grep documentation to give more details of -M matching.
This commit is contained in:
parent
5a18651441
commit
4819827879
|
@ -67,22 +67,23 @@ If no files are specified, <b>pcre2grep</b> reads the standard input. The
|
|||
standard input can also be referenced by a name consisting of a single hyphen.
|
||||
For example:
|
||||
<pre>
|
||||
pcre2grep some-pattern /file1 - /file3
|
||||
pcre2grep some-pattern file1 - file3
|
||||
</pre>
|
||||
By default, each line that matches a pattern is copied to the standard
|
||||
output, and if there is more than one file, the file name is output at the
|
||||
start of each line, followed by a colon. However, there are options that can
|
||||
change how <b>pcre2grep</b> behaves. In particular, the <b>-M</b> option makes it
|
||||
possible to search for patterns that span line boundaries. What defines a line
|
||||
boundary is controlled by the <b>-N</b> (<b>--newline</b>) option.
|
||||
Input files are searched line by line. By default, each line that matches a
|
||||
pattern is copied to the standard output, and if there is more than one file,
|
||||
the file name is output at the start of each line, followed by a colon.
|
||||
However, there are options that can change how <b>pcre2grep</b> behaves. In
|
||||
particular, the <b>-M</b> option makes it possible to search for strings that
|
||||
span line boundaries. What defines a line boundary is controlled by the
|
||||
<b>-N</b> (<b>--newline</b>) option.
|
||||
</P>
|
||||
<P>
|
||||
The amount of memory used for buffering files that are being scanned is
|
||||
controlled by a parameter that can be set by the <b>--buffer-size</b> option.
|
||||
The default value for this parameter is specified when <b>pcre2grep</b> is built,
|
||||
with the default default being 20K. A block of memory three times this size is
|
||||
used (to allow for buffering "before" and "after" lines). An error occurs if a
|
||||
line overflows the buffer.
|
||||
The default value for this parameter is specified when <b>pcre2grep</b> is
|
||||
built, with the default default being 20K. A block of memory three times this
|
||||
size is used (to allow for buffering "before" and "after" lines). An error
|
||||
occurs if a line overflows the buffer.
|
||||
</P>
|
||||
<P>
|
||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
||||
|
@ -184,7 +185,8 @@ processed in the same way as any other file. In this case, when a match
|
|||
succeeds, the output may be binary garbage, which can have nasty effects if
|
||||
sent to a terminal. If the word is "without-match", which is equivalent to the
|
||||
<b>-I</b> option, binary files are not processed at all; they are assumed not to
|
||||
be of interest.
|
||||
be of interest and are skipped without causing any output or affecting the
|
||||
return code.
|
||||
</P>
|
||||
<P>
|
||||
<b>--buffer-size=</b><i>number</i>
|
||||
|
@ -198,10 +200,15 @@ This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
|
|||
</P>
|
||||
<P>
|
||||
<b>-c</b>, <b>--count</b>
|
||||
Do not output individual lines from the files that are being scanned; instead
|
||||
output the number of lines that would otherwise have been shown. If no lines
|
||||
are selected, the number zero is output. If several files are are being
|
||||
scanned, a count is output for each of them. However, if the
|
||||
Do not output lines from the files that are being scanned; instead output the
|
||||
number of matches (or non-matches if <b>-v</b> is used) that would otherwise
|
||||
have caused lines to be shown. By default, this count is the same as the number
|
||||
of suppressed lines, but if the <b>-M</b> (multiline) option is used (without
|
||||
<b>-v</b>), there may be more suppressed lines than the number of matches.
|
||||
<br>
|
||||
<br>
|
||||
If no lines are selected, the number zero is output. If several files are are
|
||||
being scanned, a count is output for each of them. However, if the
|
||||
<b>--files-with-matches</b> option is also used, only those files whose counts
|
||||
are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
|
||||
<b>-B</b>, and <b>-C</b> options are ignored.
|
||||
|
@ -271,10 +278,10 @@ of the line that matched.
|
|||
Files (but not directories) whose names match the pattern are skipped without
|
||||
being processed. This applies to all files, whether listed on the command line,
|
||||
obtained from <b>--file-list</b>, or by scanning a directory. The pattern is a
|
||||
PCRE2 regular expression, and is matched against the final component of the file
|
||||
name, not the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do not
|
||||
apply to this pattern. The option may be given any number of times in order to
|
||||
specify multiple patterns. If a file name matches both an <b>--include</b>
|
||||
PCRE2 regular expression, and is matched against the final component of the
|
||||
file name, not the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do
|
||||
not apply to this pattern. The option may be given any number of times in order
|
||||
to specify multiple patterns. If a file name matches both an <b>--include</b>
|
||||
and an <b>--exclude</b> pattern, it is excluded. There is no short form for this
|
||||
option.
|
||||
</P>
|
||||
|
@ -352,11 +359,12 @@ and <b>--only-matching</b>.
|
|||
</P>
|
||||
<P>
|
||||
<b>-H</b>, <b>--with-filename</b>
|
||||
Force the inclusion of the filename at the start of output lines when searching
|
||||
a single file. By default, the filename is not shown in this case. For matching
|
||||
lines, the filename is followed by a colon; for context lines, a hyphen
|
||||
separator is used. If a line number is also being output, it follows the file
|
||||
name.
|
||||
Force the inclusion of the file name at the start of output lines when
|
||||
searching a single file. By default, the file name is not shown in this case.
|
||||
For matching lines, the file name is followed by a colon; for context lines, a
|
||||
hyphen separator is used. If a line number is also being output, it follows the
|
||||
file name. When the <b>-M</b> option causes a pattern to match more than one
|
||||
line, only the first is preceded by the file name.
|
||||
</P>
|
||||
<P>
|
||||
<b>-h</b>, <b>--no-filename</b>
|
||||
|
@ -373,7 +381,7 @@ ignored.
|
|||
</P>
|
||||
<P>
|
||||
<b>-I</b>
|
||||
Treat binary files as never matching. This is equivalent to
|
||||
Ignore binary files. This is equivalent to
|
||||
<b>--binary-files</b>=<i>without-match</i>.
|
||||
</P>
|
||||
<P>
|
||||
|
@ -406,8 +414,8 @@ If any <b>--include-dir</b> patterns are specified, the only directories that
|
|||
are processed are those that match one of the patterns (and do not match an
|
||||
<b>--exclude-dir</b> pattern). This applies to all directories, whether listed
|
||||
on the command line, obtained from <b>--file-list</b>, or by scanning a parent
|
||||
directory. The pattern is a PCRE2 regular expression, and is matched against the
|
||||
final component of the directory name, not the entire path. The <b>-F</b>,
|
||||
directory. The pattern is a PCRE2 regular expression, and is matched against
|
||||
the final component of the directory name, not the entire path. The <b>-F</b>,
|
||||
<b>-w</b>, and <b>-x</b> options do not apply to this pattern. The option may be
|
||||
given any number of times. If a directory matches both <b>--include-dir</b> and
|
||||
<b>--exclude-dir</b>, it is excluded. There is no short form for this option.
|
||||
|
@ -442,8 +450,8 @@ unless <b>pcre2grep</b> can determine that it is reading from a terminal (which
|
|||
is currently possible only in Unix-like environments). Output to terminal is
|
||||
normally automatically flushed by the operating system. This option can be
|
||||
useful when the input or output is attached to a pipe and you do not want
|
||||
<b>pcre2grep</b> to buffer up large amounts of data. However, its use will affect
|
||||
performance, and the <b>-M</b> (multiline) option ceases to work.
|
||||
<b>pcre2grep</b> to buffer up large amounts of data. However, its use will
|
||||
affect performance, and the <b>-M</b> (multiline) option ceases to work.
|
||||
</P>
|
||||
<P>
|
||||
<b>--line-offsets</b>
|
||||
|
@ -497,18 +505,33 @@ when the PCRE2 library is compiled, with the default default being 10 million.
|
|||
Allow patterns to match more than one line. When this option is given, patterns
|
||||
may usefully contain literal newline characters and internal occurrences of ^
|
||||
and $ characters. The output for a successful match may consist of more than
|
||||
one line, the last of which is the one in which the match ended. If the matched
|
||||
string ends with a newline sequence the output ends at the end of that line.
|
||||
one line. The first is the line in which the match started, and the last is the
|
||||
line in which the match ended. If the matched string ends with a newline
|
||||
sequence the output ends at the end of that line.
|
||||
<br>
|
||||
<br>
|
||||
When this option is set, the PCRE2 library is called in "multiline" mode.
|
||||
However, <b>pcre2grep</b> still processes the input line by line. The difference
|
||||
is that a matched string may extend past the end of a line and continue on
|
||||
one or more subsequent lines. The newline sequence must be matched as part of
|
||||
the pattern. For example, to find the phrase "regular expression" in a file
|
||||
where "regular" might be at the end of a line and "expression" at the start of
|
||||
the next line, you could use this command:
|
||||
<pre>
|
||||
pcre2grep -M 'regular\s+expression' <file>
|
||||
</pre>
|
||||
The \s escape sequence matches any white space character, including newlines,
|
||||
and is followed by + so as to match trailing white space on the first line as
|
||||
well as possibly handling a two-character newline sequence.
|
||||
<br>
|
||||
<br>
|
||||
There is a limit to the number of lines that can be matched, imposed by the way
|
||||
that <b>pcre2grep</b> buffers the input file as it scans it. However,
|
||||
<b>pcre2grep</b> ensures that at least 8K characters or the rest of the document
|
||||
<b>pcre2grep</b> ensures that at least 8K characters or the rest of the file
|
||||
(whichever is the shorter) are available for forward matching, and similarly
|
||||
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
||||
are guaranteed to be available for lookbehind assertions. This option does not
|
||||
work when input is read line by line (see \fP--line-buffered\fP.)
|
||||
are guaranteed to be available for lookbehind assertions. The <b>-M</b> option
|
||||
does not work when input is read line by line (see \fP--line-buffered\fP.)
|
||||
</P>
|
||||
<P>
|
||||
<b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
|
||||
|
@ -526,9 +549,9 @@ When the PCRE2 library is built, a default line-ending sequence is specified.
|
|||
This is normally the standard sequence for the operating system. Unless
|
||||
otherwise specified by this option, <b>pcre2grep</b> uses the library's default.
|
||||
The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
|
||||
makes it possible to use <b>pcre2grep</b> to scan files that have come from other
|
||||
environments without having to modify their line endings. If the data that is
|
||||
being scanned does not agree with the convention set by this option,
|
||||
makes it possible to use <b>pcre2grep</b> to scan files that have come from
|
||||
other environments without having to modify their line endings. If the data
|
||||
that is being scanned does not agree with the convention set by this option,
|
||||
<b>pcre2grep</b> may behave in strange ways. Note that this option does not
|
||||
apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
|
||||
<b>--include-from</b> options, which are expected to use the operating system's
|
||||
|
@ -537,9 +560,10 @@ standard newline sequence.
|
|||
<P>
|
||||
<b>-n</b>, <b>--line-number</b>
|
||||
Precede each output line by its line number in the file, followed by a colon
|
||||
for matching lines or a hyphen for context lines. If the filename is also being
|
||||
output, it precedes the line number. This option is forced if
|
||||
<b>--line-offsets</b> is used.
|
||||
for matching lines or a hyphen for context lines. If the file name is also
|
||||
being output, it precedes the line number. When the <b>-M</b> option causes a
|
||||
pattern to match more than one line, only the first is preceded by its line
|
||||
number. This option is forced if <b>--line-offsets</b> is used.
|
||||
</P>
|
||||
<P>
|
||||
<b>--no-jit</b>
|
||||
|
@ -570,7 +594,7 @@ without an argument (see above), if an argument is present, it must be given in
|
|||
the same shell item, for example, -o3 or --only-matching=2. The comments given
|
||||
for the non-argument case above also apply to this case. If the specified
|
||||
capturing parentheses do not exist in the pattern, or were not set in the
|
||||
match, nothing is output unless the file name or line number are being printed.
|
||||
match, nothing is output unless the file name or line number are being output.
|
||||
<br>
|
||||
<br>
|
||||
If this option is given multiple times, multiple substrings are output, in the
|
||||
|
@ -635,10 +659,10 @@ specified by any of the <b>--include</b> or <b>--exclude</b> options.
|
|||
<b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
|
||||
Force the patterns to be anchored (each must start matching at the beginning of
|
||||
a line) and in addition, require them to match entire lines. This is equivalent
|
||||
to having ^ and $ characters at the start and end of each alternative branch in
|
||||
every pattern. This option applies only to the patterns that are matched
|
||||
against the contents of files; it does not apply to patterns specified by any
|
||||
of the <b>--include</b> or <b>--exclude</b> options.
|
||||
to having ^ and $ characters at the start and end of each alternative top-level
|
||||
branch in every pattern. This option applies only to the patterns that are
|
||||
matched against the contents of files; it does not apply to patterns specified
|
||||
by any of the <b>--include</b> or <b>--exclude</b> options.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
|
||||
<P>
|
||||
|
@ -677,7 +701,7 @@ Although most of the common options work the same way, a few are different in
|
|||
<b>pcre2grep</b>. For example, the <b>--include</b> option's argument is a glob
|
||||
for GNU <b>grep</b>, but a regular expression for <b>pcre2grep</b>. If both the
|
||||
<b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
|
||||
without counts, but <b>pcre2grep</b> gives the counts.
|
||||
without counts, but <b>pcre2grep</b> gives the counts as well.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">OPTIONS WITH DATA</a><br>
|
||||
<P>
|
||||
|
@ -722,9 +746,9 @@ message and the line that caused the problem to the standard error stream. If
|
|||
there are more than 20 such errors, <b>pcre2grep</b> gives up.
|
||||
</P>
|
||||
<P>
|
||||
The <b>--match-limit</b> option of <b>pcre2grep</b> can be used to set the overall
|
||||
resource limit; there is a second option called <b>--recursion-limit</b> that
|
||||
sets a limit on the amount of memory (usually stack) that is used (see the
|
||||
The <b>--match-limit</b> option of <b>pcre2grep</b> can be used to set the
|
||||
overall resource limit; there is a second option called <b>--recursion-limit</b>
|
||||
that sets a limit on the amount of memory (usually stack) that is used (see the
|
||||
discussion of these options above).
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">DIAGNOSTICS</a><br>
|
||||
|
@ -737,7 +761,7 @@ affect the return code.
|
|||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
|
||||
<P>
|
||||
<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3), <b>pcre2test</b>(1).
|
||||
<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3).
|
||||
</P>
|
||||
<br><a name="SEC13" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
|
@ -750,9 +774,9 @@ Cambridge, England.
|
|||
</P>
|
||||
<br><a name="SEC14" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 23 November 2014
|
||||
Last updated: 03 January 2015
|
||||
<br>
|
||||
Copyright © 1997-2014 University of Cambridge.
|
||||
Copyright © 1997-2015 University of Cambridge.
|
||||
<br>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE2 index page</a>.
|
||||
|
|
130
doc/pcre2grep.1
130
doc/pcre2grep.1
|
@ -1,4 +1,4 @@
|
|||
.TH PCRE2GREP 1 "23 November 2014" "PCRE2 10.00"
|
||||
.TH PCRE2GREP 1 "03 January 2015" "PCRE2 10.00"
|
||||
.SH NAME
|
||||
pcre2grep - a grep with Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
|
@ -41,21 +41,22 @@ If no files are specified, \fBpcre2grep\fP reads the standard input. The
|
|||
standard input can also be referenced by a name consisting of a single hyphen.
|
||||
For example:
|
||||
.sp
|
||||
pcre2grep some-pattern /file1 - /file3
|
||||
pcre2grep some-pattern file1 - file3
|
||||
.sp
|
||||
By default, each line that matches a pattern is copied to the standard
|
||||
output, and if there is more than one file, the file name is output at the
|
||||
start of each line, followed by a colon. However, there are options that can
|
||||
change how \fBpcre2grep\fP behaves. In particular, the \fB-M\fP option makes it
|
||||
possible to search for patterns that span line boundaries. What defines a line
|
||||
boundary is controlled by the \fB-N\fP (\fB--newline\fP) option.
|
||||
Input files are searched line by line. By default, each line that matches a
|
||||
pattern is copied to the standard output, and if there is more than one file,
|
||||
the file name is output at the start of each line, followed by a colon.
|
||||
However, there are options that can change how \fBpcre2grep\fP behaves. In
|
||||
particular, the \fB-M\fP option makes it possible to search for strings that
|
||||
span line boundaries. What defines a line boundary is controlled by the
|
||||
\fB-N\fP (\fB--newline\fP) option.
|
||||
.P
|
||||
The amount of memory used for buffering files that are being scanned is
|
||||
controlled by a parameter that can be set by the \fB--buffer-size\fP option.
|
||||
The default value for this parameter is specified when \fBpcre2grep\fP is built,
|
||||
with the default default being 20K. A block of memory three times this size is
|
||||
used (to allow for buffering "before" and "after" lines). An error occurs if a
|
||||
line overflows the buffer.
|
||||
The default value for this parameter is specified when \fBpcre2grep\fP is
|
||||
built, with the default default being 20K. A block of memory three times this
|
||||
size is used (to allow for buffering "before" and "after" lines). An error
|
||||
occurs if a line overflows the buffer.
|
||||
.P
|
||||
Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
|
||||
BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
|
||||
|
@ -153,7 +154,8 @@ processed in the same way as any other file. In this case, when a match
|
|||
succeeds, the output may be binary garbage, which can have nasty effects if
|
||||
sent to a terminal. If the word is "without-match", which is equivalent to the
|
||||
\fB-I\fP option, binary files are not processed at all; they are assumed not to
|
||||
be of interest.
|
||||
be of interest and are skipped without causing any output or affecting the
|
||||
return code.
|
||||
.TP
|
||||
\fB--buffer-size=\fP\fInumber\fP
|
||||
Set the parameter that controls how much memory is used for buffering files
|
||||
|
@ -164,10 +166,14 @@ Output \fInumber\fP lines of context both before and after each matching line.
|
|||
This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value.
|
||||
.TP
|
||||
\fB-c\fP, \fB--count\fP
|
||||
Do not output individual lines from the files that are being scanned; instead
|
||||
output the number of lines that would otherwise have been shown. If no lines
|
||||
are selected, the number zero is output. If several files are are being
|
||||
scanned, a count is output for each of them. However, if the
|
||||
Do not output lines from the files that are being scanned; instead output the
|
||||
number of matches (or non-matches if \fB-v\fP is used) that would otherwise
|
||||
have caused lines to be shown. By default, this count is the same as the number
|
||||
of suppressed lines, but if the \fB-M\fP (multiline) option is used (without
|
||||
\fB-v\fP), there may be more suppressed lines than the number of matches.
|
||||
.sp
|
||||
If no lines are selected, the number zero is output. If several files are are
|
||||
being scanned, a count is output for each of them. However, if the
|
||||
\fB--files-with-matches\fP option is also used, only those files whose counts
|
||||
are greater than zero are listed. When \fB-c\fP is used, the \fB-A\fP,
|
||||
\fB-B\fP, and \fB-C\fP options are ignored.
|
||||
|
@ -229,10 +235,10 @@ of the line that matched.
|
|||
Files (but not directories) whose names match the pattern are skipped without
|
||||
being processed. This applies to all files, whether listed on the command line,
|
||||
obtained from \fB--file-list\fP, or by scanning a directory. The pattern is a
|
||||
PCRE2 regular expression, and is matched against the final component of the file
|
||||
name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do not
|
||||
apply to this pattern. The option may be given any number of times in order to
|
||||
specify multiple patterns. If a file name matches both an \fB--include\fP
|
||||
PCRE2 regular expression, and is matched against the final component of the
|
||||
file name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do
|
||||
not apply to this pattern. The option may be given any number of times in order
|
||||
to specify multiple patterns. If a file name matches both an \fB--include\fP
|
||||
and an \fB--exclude\fP pattern, it is excluded. There is no short form for this
|
||||
option.
|
||||
.TP
|
||||
|
@ -302,11 +308,12 @@ shown separately. This option is mutually exclusive with \fB--line-offsets\fP
|
|||
and \fB--only-matching\fP.
|
||||
.TP
|
||||
\fB-H\fP, \fB--with-filename\fP
|
||||
Force the inclusion of the filename at the start of output lines when searching
|
||||
a single file. By default, the filename is not shown in this case. For matching
|
||||
lines, the filename is followed by a colon; for context lines, a hyphen
|
||||
separator is used. If a line number is also being output, it follows the file
|
||||
name.
|
||||
Force the inclusion of the file name at the start of output lines when
|
||||
searching a single file. By default, the file name is not shown in this case.
|
||||
For matching lines, the file name is followed by a colon; for context lines, a
|
||||
hyphen separator is used. If a line number is also being output, it follows the
|
||||
file name. When the \fB-M\fP option causes a pattern to match more than one
|
||||
line, only the first is preceded by the file name.
|
||||
.TP
|
||||
\fB-h\fP, \fB--no-filename\fP
|
||||
Suppress the output file names when searching multiple files. By default,
|
||||
|
@ -320,7 +327,7 @@ type support, and then exit. Anything else on the command line is
|
|||
ignored.
|
||||
.TP
|
||||
\fB-I\fP
|
||||
Treat binary files as never matching. This is equivalent to
|
||||
Ignore binary files. This is equivalent to
|
||||
\fB--binary-files\fP=\fIwithout-match\fP.
|
||||
.TP
|
||||
\fB-i\fP, \fB--ignore-case\fP
|
||||
|
@ -349,8 +356,8 @@ If any \fB--include-dir\fP patterns are specified, the only directories that
|
|||
are processed are those that match one of the patterns (and do not match an
|
||||
\fB--exclude-dir\fP pattern). This applies to all directories, whether listed
|
||||
on the command line, obtained from \fB--file-list\fP, or by scanning a parent
|
||||
directory. The pattern is a PCRE2 regular expression, and is matched against the
|
||||
final component of the directory name, not the entire path. The \fB-F\fP,
|
||||
directory. The pattern is a PCRE2 regular expression, and is matched against
|
||||
the final component of the directory name, not the entire path. The \fB-F\fP,
|
||||
\fB-w\fP, and \fB-x\fP options do not apply to this pattern. The option may be
|
||||
given any number of times. If a directory matches both \fB--include-dir\fP and
|
||||
\fB--exclude-dir\fP, it is excluded. There is no short form for this option.
|
||||
|
@ -381,8 +388,8 @@ unless \fBpcre2grep\fP can determine that it is reading from a terminal (which
|
|||
is currently possible only in Unix-like environments). Output to terminal is
|
||||
normally automatically flushed by the operating system. This option can be
|
||||
useful when the input or output is attached to a pipe and you do not want
|
||||
\fBpcre2grep\fP to buffer up large amounts of data. However, its use will affect
|
||||
performance, and the \fB-M\fP (multiline) option ceases to work.
|
||||
\fBpcre2grep\fP to buffer up large amounts of data. However, its use will
|
||||
affect performance, and the \fB-M\fP (multiline) option ceases to work.
|
||||
.TP
|
||||
\fB--line-offsets\fP
|
||||
Instead of showing lines or parts of lines that match, show each match as a
|
||||
|
@ -429,17 +436,31 @@ when the PCRE2 library is compiled, with the default default being 10 million.
|
|||
Allow patterns to match more than one line. When this option is given, patterns
|
||||
may usefully contain literal newline characters and internal occurrences of ^
|
||||
and $ characters. The output for a successful match may consist of more than
|
||||
one line, the last of which is the one in which the match ended. If the matched
|
||||
string ends with a newline sequence the output ends at the end of that line.
|
||||
one line. The first is the line in which the match started, and the last is the
|
||||
line in which the match ended. If the matched string ends with a newline
|
||||
sequence the output ends at the end of that line.
|
||||
.sp
|
||||
When this option is set, the PCRE2 library is called in "multiline" mode.
|
||||
However, \fBpcre2grep\fP still processes the input line by line. The difference
|
||||
is that a matched string may extend past the end of a line and continue on
|
||||
one or more subsequent lines. The newline sequence must be matched as part of
|
||||
the pattern. For example, to find the phrase "regular expression" in a file
|
||||
where "regular" might be at the end of a line and "expression" at the start of
|
||||
the next line, you could use this command:
|
||||
.sp
|
||||
pcre2grep -M 'regular\es+expression' <file>
|
||||
.sp
|
||||
The \es escape sequence matches any white space character, including newlines,
|
||||
and is followed by + so as to match trailing white space on the first line as
|
||||
well as possibly handling a two-character newline sequence.
|
||||
.sp
|
||||
There is a limit to the number of lines that can be matched, imposed by the way
|
||||
that \fBpcre2grep\fP buffers the input file as it scans it. However,
|
||||
\fBpcre2grep\fP ensures that at least 8K characters or the rest of the document
|
||||
\fBpcre2grep\fP ensures that at least 8K characters or the rest of the file
|
||||
(whichever is the shorter) are available for forward matching, and similarly
|
||||
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
||||
are guaranteed to be available for lookbehind assertions. This option does not
|
||||
work when input is read line by line (see \fP--line-buffered\fP.)
|
||||
are guaranteed to be available for lookbehind assertions. The \fB-M\fP option
|
||||
does not work when input is read line by line (see \fP--line-buffered\fP.)
|
||||
.TP
|
||||
\fB-N\fP \fInewline-type\fP, \fB--newline\fP=\fInewline-type\fP
|
||||
The PCRE2 library supports five different conventions for indicating
|
||||
|
@ -455,9 +476,9 @@ When the PCRE2 library is built, a default line-ending sequence is specified.
|
|||
This is normally the standard sequence for the operating system. Unless
|
||||
otherwise specified by this option, \fBpcre2grep\fP uses the library's default.
|
||||
The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
|
||||
makes it possible to use \fBpcre2grep\fP to scan files that have come from other
|
||||
environments without having to modify their line endings. If the data that is
|
||||
being scanned does not agree with the convention set by this option,
|
||||
makes it possible to use \fBpcre2grep\fP to scan files that have come from
|
||||
other environments without having to modify their line endings. If the data
|
||||
that is being scanned does not agree with the convention set by this option,
|
||||
\fBpcre2grep\fP may behave in strange ways. Note that this option does not
|
||||
apply to files specified by the \fB-f\fP, \fB--exclude-from\fP, or
|
||||
\fB--include-from\fP options, which are expected to use the operating system's
|
||||
|
@ -465,9 +486,10 @@ standard newline sequence.
|
|||
.TP
|
||||
\fB-n\fP, \fB--line-number\fP
|
||||
Precede each output line by its line number in the file, followed by a colon
|
||||
for matching lines or a hyphen for context lines. If the filename is also being
|
||||
output, it precedes the line number. This option is forced if
|
||||
\fB--line-offsets\fP is used.
|
||||
for matching lines or a hyphen for context lines. If the file name is also
|
||||
being output, it precedes the line number. When the \fB-M\fP option causes a
|
||||
pattern to match more than one line, only the first is preceded by its line
|
||||
number. This option is forced if \fB--line-offsets\fP is used.
|
||||
.TP
|
||||
\fB--no-jit\fP
|
||||
If the PCRE2 library is built with support for just-in-time compiling (which
|
||||
|
@ -495,7 +517,7 @@ without an argument (see above), if an argument is present, it must be given in
|
|||
the same shell item, for example, -o3 or --only-matching=2. The comments given
|
||||
for the non-argument case above also apply to this case. If the specified
|
||||
capturing parentheses do not exist in the pattern, or were not set in the
|
||||
match, nothing is output unless the file name or line number are being printed.
|
||||
match, nothing is output unless the file name or line number are being output.
|
||||
.sp
|
||||
If this option is given multiple times, multiple substrings are output, in the
|
||||
order the options are given. For example, -o3 -o1 -o3 causes the substrings
|
||||
|
@ -549,10 +571,10 @@ specified by any of the \fB--include\fP or \fB--exclude\fP options.
|
|||
\fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP
|
||||
Force the patterns to be anchored (each must start matching at the beginning of
|
||||
a line) and in addition, require them to match entire lines. This is equivalent
|
||||
to having ^ and $ characters at the start and end of each alternative branch in
|
||||
every pattern. This option applies only to the patterns that are matched
|
||||
against the contents of files; it does not apply to patterns specified by any
|
||||
of the \fB--include\fP or \fB--exclude\fP options.
|
||||
to having ^ and $ characters at the start and end of each alternative top-level
|
||||
branch in every pattern. This option applies only to the patterns that are
|
||||
matched against the contents of files; it does not apply to patterns specified
|
||||
by any of the \fB--include\fP or \fB--exclude\fP options.
|
||||
.
|
||||
.
|
||||
.SH "ENVIRONMENT VARIABLES"
|
||||
|
@ -596,7 +618,7 @@ Although most of the common options work the same way, a few are different in
|
|||
\fBpcre2grep\fP. For example, the \fB--include\fP option's argument is a glob
|
||||
for GNU \fBgrep\fP, but a regular expression for \fBpcre2grep\fP. If both the
|
||||
\fB-c\fP and \fB-l\fP options are given, GNU grep lists only file names,
|
||||
without counts, but \fBpcre2grep\fP gives the counts.
|
||||
without counts, but \fBpcre2grep\fP gives the counts as well.
|
||||
.
|
||||
.
|
||||
.SH "OPTIONS WITH DATA"
|
||||
|
@ -642,9 +664,9 @@ in these circumstances. If this happens, \fBpcre2grep\fP outputs an error
|
|||
message and the line that caused the problem to the standard error stream. If
|
||||
there are more than 20 such errors, \fBpcre2grep\fP gives up.
|
||||
.P
|
||||
The \fB--match-limit\fP option of \fBpcre2grep\fP can be used to set the overall
|
||||
resource limit; there is a second option called \fB--recursion-limit\fP that
|
||||
sets a limit on the amount of memory (usually stack) that is used (see the
|
||||
The \fB--match-limit\fP option of \fBpcre2grep\fP can be used to set the
|
||||
overall resource limit; there is a second option called \fB--recursion-limit\fP
|
||||
that sets a limit on the amount of memory (usually stack) that is used (see the
|
||||
discussion of these options above).
|
||||
.
|
||||
.
|
||||
|
@ -661,7 +683,7 @@ affect the return code.
|
|||
.SH "SEE ALSO"
|
||||
.rs
|
||||
.sp
|
||||
\fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3), \fBpcre2test\fP(1).
|
||||
\fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3).
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
|
@ -678,6 +700,6 @@ Cambridge, England.
|
|||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 23 November 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
Last updated: 03 January 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
.fi
|
||||
|
|
|
@ -40,15 +40,15 @@ DESCRIPTION
|
|||
standard input can also be referenced by a name consisting of a single
|
||||
hyphen. For example:
|
||||
|
||||
pcre2grep some-pattern /file1 - /file3
|
||||
pcre2grep some-pattern file1 - file3
|
||||
|
||||
By default, each line that matches a pattern is copied to the standard
|
||||
output, and if there is more than one file, the file name is output at
|
||||
the start of each line, followed by a colon. However, there are options
|
||||
that can change how pcre2grep behaves. In particular, the -M option
|
||||
makes it possible to search for patterns that span line boundaries.
|
||||
What defines a line boundary is controlled by the -N (--newline)
|
||||
option.
|
||||
Input files are searched line by line. By default, each line that
|
||||
matches a pattern is copied to the standard output, and if there is
|
||||
more than one file, the file name is output at the start of each line,
|
||||
followed by a colon. However, there are options that can change how
|
||||
pcre2grep behaves. In particular, the -M option makes it possible to
|
||||
search for strings that span line boundaries. What defines a line
|
||||
boundary is controlled by the -N (--newline) option.
|
||||
|
||||
The amount of memory used for buffering files that are being scanned is
|
||||
controlled by a parameter that can be set by the --buffer-size option.
|
||||
|
@ -122,13 +122,13 @@ OPTIONS
|
|||
|
||||
-- This terminates the list of options. It is useful if the next
|
||||
item on the command line starts with a hyphen but is not an
|
||||
option. This allows for the processing of patterns and file-
|
||||
option. This allows for the processing of patterns and file
|
||||
names that start with hyphens.
|
||||
|
||||
-A number, --after-context=number
|
||||
Output number lines of context after each matching line. If
|
||||
filenames and/or line numbers are being output, a hyphen sep-
|
||||
arator is used instead of a colon for the context lines. A
|
||||
file names and/or line numbers are being output, a hyphen
|
||||
separator is used instead of a colon for the context lines. A
|
||||
line containing "--" is output between each group of lines,
|
||||
unless they are in fact contiguous in the input file. The
|
||||
value of number is expected to be relatively small. However,
|
||||
|
@ -141,8 +141,8 @@ OPTIONS
|
|||
|
||||
-B number, --before-context=number
|
||||
Output number lines of context before each matching line. If
|
||||
filenames and/or line numbers are being output, a hyphen sep-
|
||||
arator is used instead of a colon for the context lines. A
|
||||
file names and/or line numbers are being output, a hyphen
|
||||
separator is used instead of a colon for the context lines. A
|
||||
line containing "--" is output between each group of lines,
|
||||
unless they are in fact contiguous in the input file. The
|
||||
value of number is expected to be relatively small. However,
|
||||
|
@ -160,7 +160,8 @@ OPTIONS
|
|||
which can have nasty effects if sent to a terminal. If the
|
||||
word is "without-match", which is equivalent to the -I
|
||||
option, binary files are not processed at all; they are
|
||||
assumed not to be of interest.
|
||||
assumed not to be of interest and are skipped without causing
|
||||
any output or affecting the return code.
|
||||
|
||||
--buffer-size=number
|
||||
Set the parameter that controls how much memory is used for
|
||||
|
@ -172,14 +173,20 @@ OPTIONS
|
|||
to the same value.
|
||||
|
||||
-c, --count
|
||||
Do not output individual lines from the files that are being
|
||||
scanned; instead output the number of lines that would other-
|
||||
wise have been shown. If no lines are selected, the number
|
||||
zero is output. If several files are are being scanned, a
|
||||
count is output for each of them. However, if the --files-
|
||||
with-matches option is also used, only those files whose
|
||||
counts are greater than zero are listed. When -c is used, the
|
||||
-A, -B, and -C options are ignored.
|
||||
Do not output lines from the files that are being scanned;
|
||||
instead output the number of matches (or non-matches if -v is
|
||||
used) that would otherwise have caused lines to be shown. By
|
||||
default, this count is the same as the number of suppressed
|
||||
lines, but if the -M (multiline) option is used (without -v),
|
||||
there may be more suppressed lines than the number of
|
||||
matches.
|
||||
|
||||
If no lines are selected, the number zero is output. If sev-
|
||||
eral files are are being scanned, a count is output for each
|
||||
of them. However, if the --files-with-matches option is also
|
||||
used, only those files whose counts are greater than zero are
|
||||
listed. When -c is used, the -A, -B, and -C options are
|
||||
ignored.
|
||||
|
||||
--colour, --color
|
||||
If this option is given without any data, it is equivalent to
|
||||
|
@ -336,7 +343,9 @@ OPTIONS
|
|||
is not shown in this case. For matching lines, the file name
|
||||
is followed by a colon; for context lines, a hyphen separator
|
||||
is used. If a line number is also being output, it follows
|
||||
the file name.
|
||||
the file name. When the -M option causes a pattern to match
|
||||
more than one line, only the first is preceded by the file
|
||||
name.
|
||||
|
||||
-h, --no-filename
|
||||
Suppress the output file names when searching multiple files.
|
||||
|
@ -349,8 +358,8 @@ OPTIONS
|
|||
options and file type support, and then exit. Anything else
|
||||
on the command line is ignored.
|
||||
|
||||
-I Treat binary files as never matching. This is equivalent to
|
||||
--binary-files=without-match.
|
||||
-I Ignore binary files. This is equivalent to --binary-
|
||||
files=without-match.
|
||||
|
||||
-i, --ignore-case
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
|
@ -478,20 +487,37 @@ OPTIONS
|
|||
is given, patterns may usefully contain literal newline char-
|
||||
acters and internal occurrences of ^ and $ characters. The
|
||||
output for a successful match may consist of more than one
|
||||
line, the last of which is the one in which the match ended.
|
||||
If the matched string ends with a newline sequence the output
|
||||
ends at the end of that line.
|
||||
line. The first is the line in which the match started, and
|
||||
the last is the line in which the match ended. If the matched
|
||||
string ends with a newline sequence the output ends at the
|
||||
end of that line.
|
||||
|
||||
When this option is set, the PCRE2 library is called in "mul-
|
||||
tiline" mode. There is a limit to the number of lines that
|
||||
can be matched, imposed by the way that pcre2grep buffers the
|
||||
input file as it scans it. However, pcre2grep ensures that at
|
||||
least 8K characters or the rest of the document (whichever is
|
||||
the shorter) are available for forward matching, and simi-
|
||||
larly the previous 8K characters (or all the previous charac-
|
||||
ters, if fewer than 8K) are guaranteed to be available for
|
||||
lookbehind assertions. This option does not work when input
|
||||
is read line by line (see --line-buffered.)
|
||||
tiline" mode. However, pcre2grep still processes the input
|
||||
line by line. The difference is that a matched string may
|
||||
extend past the end of a line and continue on one or more
|
||||
subsequent lines. The newline sequence must be matched as
|
||||
part of the pattern. For example, to find the phrase "regular
|
||||
expression" in a file where "regular" might be at the end of
|
||||
a line and "expression" at the start of the next line, you
|
||||
could use this command:
|
||||
|
||||
pcre2grep -M 'regular\s+expression' <file>
|
||||
|
||||
The \s escape sequence matches any white space character,
|
||||
including newlines, and is followed by + so as to match
|
||||
trailing white space on the first line as well as possibly
|
||||
handling a two-character newline sequence.
|
||||
|
||||
There is a limit to the number of lines that can be matched,
|
||||
imposed by the way that pcre2grep buffers the input file as
|
||||
it scans it. However, pcre2grep ensures that at least 8K
|
||||
characters or the rest of the file (whichever is the shorter)
|
||||
are available for forward matching, and similarly the previ-
|
||||
ous 8K characters (or all the previous characters, if fewer
|
||||
than 8K) are guaranteed to be available for lookbehind asser-
|
||||
tions. The -M option does not work when input is read line by
|
||||
line (see --line-buffered.)
|
||||
|
||||
-N newline-type, --newline=newline-type
|
||||
The PCRE2 library supports five different conventions for
|
||||
|
@ -523,7 +549,9 @@ OPTIONS
|
|||
Precede each output line by its line number in the file, fol-
|
||||
lowed by a colon for matching lines or a hyphen for context
|
||||
lines. If the file name is also being output, it precedes the
|
||||
line number. This option is forced if --line-offsets is used.
|
||||
line number. When the -M option causes a pattern to match
|
||||
more than one line, only the first is preceded by its line
|
||||
number. This option is forced if --line-offsets is used.
|
||||
|
||||
--no-jit If the PCRE2 library is built with support for just-in-time
|
||||
compiling (which speeds up matching), pcre2grep automatically
|
||||
|
@ -555,8 +583,8 @@ OPTIONS
|
|||
The comments given for the non-argument case above also apply
|
||||
to this case. If the specified capturing parentheses do not
|
||||
exist in the pattern, or were not set in the match, nothing
|
||||
is output unless the file name or line number are being
|
||||
printed.
|
||||
is output unless the file name or line number are being out-
|
||||
put.
|
||||
|
||||
If this option is given multiple times, multiple substrings
|
||||
are output, in the order the options are given. For example,
|
||||
|
@ -617,11 +645,11 @@ OPTIONS
|
|||
Force the patterns to be anchored (each must start matching
|
||||
at the beginning of a line) and in addition, require them to
|
||||
match entire lines. This is equivalent to having ^ and $
|
||||
characters at the start and end of each alternative branch in
|
||||
every pattern. This option applies only to the patterns that
|
||||
are matched against the contents of files; it does not apply
|
||||
to patterns specified by any of the --include or --exclude
|
||||
options.
|
||||
characters at the start and end of each alternative top-level
|
||||
branch in every pattern. This option applies only to the pat-
|
||||
terns that are matched against the contents of files; it does
|
||||
not apply to patterns specified by any of the --include or
|
||||
--exclude options.
|
||||
|
||||
|
||||
ENVIRONMENT VARIABLES
|
||||
|
@ -662,7 +690,7 @@ OPTIONS COMPATIBILITY
|
|||
ferent in pcre2grep. For example, the --include option's argument is a
|
||||
glob for GNU grep, but a regular expression for pcre2grep. If both the
|
||||
-c and -l options are given, GNU grep lists only file names, without
|
||||
counts, but pcre2grep gives the counts.
|
||||
counts, but pcre2grep gives the counts as well.
|
||||
|
||||
|
||||
OPTIONS WITH DATA
|
||||
|
@ -725,7 +753,7 @@ DIAGNOSTICS
|
|||
|
||||
SEE ALSO
|
||||
|
||||
pcre2pattern(3), pcre2syntax(3), pcre2test(1).
|
||||
pcre2pattern(3), pcre2syntax(3).
|
||||
|
||||
|
||||
AUTHOR
|
||||
|
@ -737,5 +765,5 @@ AUTHOR
|
|||
|
||||
REVISION
|
||||
|
||||
Last updated: 23 November 2014
|
||||
Copyright (c) 1997-2014 University of Cambridge.
|
||||
Last updated: 03 January 2015
|
||||
Copyright (c) 1997-2015 University of Cambridge.
|
||||
|
|
Loading…
Reference in New Issue